Basic Tensor Operations

Tensor Data Types
Creating Tensors
1. Meaning of Initialization Strategies
Indexing and Slicing
1. Why Do We Need Complex Indexing?
2. Advantages of Indexing
Dimensional Transformations: Decoupling Computational Complexity
1. Why Are Dimensional Transformations Needed?
2. Philosophy Behind Dimensional Transformations
Broadcasting: Intelligent Tensor Operations
1. Principles Behind Broadcasting
Conclusion

Tensor Data Types

In deep learning, traditional data structures like native Python lists or NumPy arrays are no longer sufficient for the complexity of modern computation. PyTorch Tensors were created to address this need, offering several key advantages:

Computational Efficiency

import torch

# Comparing NumPy and Tensor computation
import numpy as np
import time

# NumPy computation
np_start = time.time()
np_array = np.random.rand(10000, 10000)
np_result = np_array * 2
np_end = time.time()

# Tensor computation
torch_start = time.time()
torch_tensor = torch.rand(10000, 10000)
torch_result = torch_tensor * 2
torch_end = time.time()

print(f"NumPy time: {np_end - np_start}")
print(f"Tensor time: {torch_end - torch_start}")

NumPy time: 0.9036829471588135
Tensor time: 0.49999094009399414

Automatic Differentiation Support

The biggest advantage of Tensor is its built-in automatic differentiation mechanism, which is the core feature of deep learning frameworks:

x = torch.tensor([1.0], requires_grad=True)
y = x ** 2
z = y * 3
z.backward()  # Automatically computes gradients
print(x.grad)  # Print gradient value

tensor([6.])

Flexible Data Representation

One-Hot Encoding: Converts discrete categories into vectors for classification tasks
Embedding: Converts text into dense vectors that capture semantic information

Creating Tensors

# Random initialization strategy
torch.manual_seed(42)  # Fix random seed

# Random initialization with different distributions
uniform_tensor = torch.rand(3, 3)   # Uniform distribution
normal_tensor = torch.randn(3, 3)   # Normal distribution

# Initialization for deep learning
# Kaiming initialization
weight = torch.nn.init.kaiming_uniform_(torch.empty(3, 3))

uniform_tensor: tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009],
        [0.2566, 0.7936, 0.9408]])
normal_tensor: tensor([[ 1.5231,  0.6647, -1.0324],
        [-0.2770, -0.1671, -0.1079],
        [-1.4285, -0.2810,  0.7489]])
weight: tensor([[-1.3968,  1.2772, -1.2013],
        [ 1.0918,  0.2354, -0.4592],
        [ 0.8739,  0.2204,  1.1426]])

Meaning of Initialization Strategies

Uniform distribution: Suitable for simple random initialization
Normal distribution: Mimics natural data distribution
Specialized initialization methods: Address gradient vanishing/exploding issues in deep networks

Indexing and Slicing

Why Do We Need Complex Indexing?

# Complex data selection
data = torch.randn(3, 3)

# Flexible selection in high-dimensional data
selected_data = data[torch.randperm(data.size(0))[:1]]  # Randomly pick 1 sample

# Conditional selection
mask = data > 0
positive_data = data.masked_select(mask)

print(f"data: {data}")
print(f"selected data shape: {selected_data}")
print(f"positive data shape: {positive_data}")

data: tensor([[-0.5881,  1.7358,  0.6639],
        [ 0.6067,  0.9153, -2.4359],
        [ 1.4119, -0.4828, -2.3674]])
selected data shape: tensor([[ 0.6067,  0.9153, -2.4359]])
positive data shape: tensor([1.7358, 0.6639, 0.6067, 0.9153, 1.4119])

Advantages of Indexing

Precise operations on high-dimensional data
Complex conditional selections
High memory efficiency by avoiding unnecessary data copies

Dimensional Transformations: Decoupling Computational Complexity

Why Are Dimensional Transformations Needed?

# Typical dimension manipulation
batch_data = torch.randn(32, 3, 224, 224)  # Image batch

# Flatten before fully-connected layers
flatten_data = batch_data.view(32, -1)

# Permuting dimensions in CNNs
transposed_data = batch_data.permute(0, 2, 3, 1)

print(f"batch_data shape: {batch_data.shape}")
print(f"flatten_data shape: {flatten_data.shape}")
print(f"transposed_data shape: {transposed_data.shape}")

batch_data shape: torch.Size([32, 3, 224, 224])
flatten_data shape: torch.Size([32, 150528])
transposed_data shape: torch.Size([32, 224, 224, 3])

Philosophy Behind Dimensional Transformations

Decoupling: Convert complex multi-dimensional data into forms easier to process
Flexibility: Supports diverse network architectures
Memory Efficiency: Avoids unnecessary copies

Broadcasting: Intelligent Tensor Operations

# Practical use of broadcasting
batch_size, channels, height, width = 4, 32, 14, 14
feature_maps = torch.randn(batch_size, channels, height, width)
bias = torch.randn(channels)  # One bias per channel

# Automatic broadcasting
output = feature_maps + bias.view(1, channels, 1, 1)

print(f"feature_maps shape: {feature_maps.shape}")
print(f"bias shape: {bias.shape}")
print(f"output shape: {output.shape}")

feature_maps shape: torch.Size([4, 32, 14, 14])
bias shape: torch.Size([32])
output shape: torch.Size([4, 32, 14, 14])

Principles Behind Broadcasting

Memory Efficiency: Avoids explicit data replication
Code Simplicity: Reduces tedious dimension-matching code
Computation Optimization: Enables efficient parallel computation under the hood

Conclusion

A Tensor is not merely a data structure—it’s the bridge connecting mathematics, algorithms, and computation in deep learning. Every aspect of its design reflects thoughtful consideration of computational efficiency, flexibility, and developer experience.