Learnixo
Back to blog
AI Systemsbeginner

Tensors Explained

What tensors are, how they generalise scalars, vectors, and matrices, tensor shapes in deep learning, and common PyTorch tensor operations.

Asma Hafeez KhanMay 22, 20265 min read
Deep LearningTensorsPyTorchLinear AlgebraInterview
Share:𝕏

What a Tensor Is

A tensor is a generalisation of scalars, vectors, and matrices to arbitrary dimensions:

Rank 0 (scalar):   a single number         shape: ()
  loss = 0.42

Rank 1 (vector):   a 1D array              shape: (n,)
  embedding = [0.12, -0.34, 0.89, ...]     shape: (768,)

Rank 2 (matrix):   a 2D array              shape: (m, n)
  weight matrix W                           shape: (512, 768)

Rank 3 (tensor):   a 3D array              shape: (d1, d2, d3)
  batch of embeddings                       shape: (32, 512, 768)
  (batch_size, seq_len, d_model)

Rank 4 (tensor):   a 4D array              shape: (d1, d2, d3, d4)
  batch of images                           shape: (32, 3, 224, 224)
  (batch_size, channels, height, width)

PyTorch Tensor Creation

Python
import torch
import numpy as np

# Creation
a = torch.tensor([1.0, 2.0, 3.0])          # from list, infers dtype
b = torch.zeros(3, 4)                        # (3, 4) of zeros
c = torch.ones(2, 5, dtype=torch.float16)   # float16
d = torch.randn(8, 512)                      # N(0,1) random
e = torch.arange(0, 10, step=2)             # [0, 2, 4, 6, 8]
f = torch.linspace(0, 1, steps=5)           # [0.0, 0.25, 0.5, 0.75, 1.0]

# From NumPy (shares memory  no copy)
arr = np.array([[1.0, 2.0], [3.0, 4.0]])
t = torch.from_numpy(arr)

# Move to GPU
if torch.cuda.is_available():
    device = torch.device("cuda")
    d_gpu = d.to(device)
    d_gpu = d.cuda()   # equivalent

# Dtype control
x_float32 = torch.randn(3, dtype=torch.float32)
x_float16  = x_float32.half()    # float16
x_bfloat16 = x_float32.bfloat16()  # bfloat16 (better for training)
x_int32    = x_float32.int()

Shape Operations

Python
x = torch.randn(32, 512, 768)   # (batch, seq_len, d_model)

# Shape inspection
print(x.shape)     # torch.Size([32, 512, 768])
print(x.ndim)      # 3
print(x.dtype)     # torch.float32
print(x.device)    # device(type='cpu') or device(type='cuda', index=0)
print(x.numel())   # 32 * 512 * 768 = 12,582,912

# Reshape
y = x.view(32, -1)           # (32, 512*768) = (32, 393216)   contiguous only
y = x.reshape(32, -1)        # (32, 393216)   works always

# Transpose / permute
x_T = x.transpose(1, 2)      # swap dims 1 and 2  (32, 768, 512)
x_P = x.permute(0, 2, 1)     # same as above

# Squeeze and unsqueeze
a = torch.randn(32, 1, 768)
b = a.squeeze(1)              # remove dim 1  (32, 768)
c = b.unsqueeze(0)            # add dim 0  (1, 32, 768)

# Stack and concatenate
a = torch.randn(32, 256)
b = torch.randn(32, 256)

cat_col = torch.cat([a, b], dim=1)    # (32, 512)   concat along features
cat_row = torch.cat([a, b], dim=0)    # (64, 256)   concat along batch
stacked = torch.stack([a, b], dim=0)  # (2, 32, 256)   new dimension

Broadcasting

Python
# Broadcasting: operations between tensors with compatible shapes
# Shapes are aligned from the right, size-1 dims expand automatically

a = torch.randn(32, 512)   # (32, 512)
b = torch.randn(512)       # (512,)  broadcast to (1, 512) then (32, 512)

c = a + b   # works!  (32, 512)

# Common in neural networks: add bias to batched output
batch_output = torch.randn(8, 256)   # (batch, d)
bias = torch.zeros(256)              # (d,)
biased = batch_output + bias          # (8, 256)  bias added to each row

# Attention mask broadcasting
scores = torch.randn(8, 12, 100, 100)   # (batch, heads, seq, seq)
mask = torch.ones(1, 1, 100, 100)       # (1, 1, seq, seq)
masked = scores + mask   # broadcasts across batch and heads

Common Tensor Operations in DL

Python
# Reduction operations
x = torch.randn(32, 512)
x.mean()              # scalar: mean of all elements
x.mean(dim=0)         # (512,): mean across batch dimension
x.mean(dim=1)         # (32,): mean across feature dimension
x.mean(dim=1, keepdim=True)   # (32, 1): keeps dimension

x.sum(dim=-1)         # sum along last dimension
x.max(dim=1).values   # max along dim 1
x.argmax(dim=1)       # index of max along dim 1

# Softmax
logits = torch.randn(8, 10)    # (batch, n_classes)
probs = torch.softmax(logits, dim=-1)   # (batch, n_classes), sums to 1 per row

# Matrix multiply
A = torch.randn(32, 128)
B = torch.randn(128, 64)
C = A @ B              # (32, 64)
C = torch.matmul(A, B) # same

# Element-wise
x * y    # element-wise product (Hadamard)
x + y    # element-wise add
x.pow(2) # element-wise square
x.sqrt() # element-wise sqrt

# Norm
l2_norm = x.norm(p=2, dim=-1)        # L2 norm along last dim
x_normalised = x / (l2_norm.unsqueeze(-1) + 1e-8)

GPU Tensor Operations

Python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Everything must be on the same device
model = model.to(device)
X = X.to(device)
y = y.to(device)

# Check device
print(X.device)  # cuda:0

# Memory management
torch.cuda.empty_cache()   # free cached memory
print(f"GPU memory: {torch.cuda.memory_allocated() / 1e9:.2f} GB")

# Detach from computation graph (for inference or numpy conversion)
with torch.no_grad():
    pred = model(X)             # no gradient tracking
arr = pred.detach().cpu().numpy()  # to numpy

Interview Answer

"A tensor is a multi-dimensional array generalising scalars (rank 0), vectors (rank 1), matrices (rank 2) to arbitrary rank. In deep learning, tensors represent batches of data: a batch of images is rank-4 (batch, channels, height, width); a batch of token embeddings is rank-3 (batch, seq_len, d_model). The critical operations are: reshape/view (change dimensions without data copy), permute (reorder dimensions, essential for attention), broadcasting (implicit dimension expansion for element-wise ops), and reductions (mean, sum, max across dimensions). In PyTorch, all gradients flow through tensor operations — the computation graph is built dynamically during the forward pass, enabling backpropagation via autograd."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.