Basic Tensor Operations
Basic Tensor Operations

Basic Tensor Operations

in
  1. Tensor Data Types
    1. Computational Efficiency
    2. Automatic Differentiation Support
    3. Flexible Data Representation
  2. Creating Tensors
    1. Meaning of Initialization Strategies
  3. Indexing and Slicing
    1. Why Do We Need Complex Indexing?
    2. Advantages of Indexing
  4. Dimensional Transformations: Decoupling Computational Complexity
    1. Why Are Dimensional Transformations Needed?
    2. Philosophy Behind Dimensional Transformations
  5. Broadcasting: Intelligent Tensor Operations
    1. Principles Behind Broadcasting
  6. Conclusion

Tensor Data Types

In deep learning, traditional data structures like native Python lists or NumPy arrays are no longer sufficient for the complexity of modern computation. PyTorch Tensors were created to address this need, offering several key advantages:

Computational Efficiency

import torch

# Comparing NumPy and Tensor computation
import numpy as np
import time

# NumPy computation
np_start = time.time()
np_array = np.random.rand(10000, 10000)
np_result = np_array * 2
np_end = time.time()

# Tensor computation
torch_start = time.time()
torch_tensor = torch.rand(10000, 10000)
torch_result = torch_tensor * 2
torch_end = time.time()

print(f"NumPy time: {np_end - np_start}")
print(f"Tensor time: {torch_end - torch_start}")
NumPy time: 0.9036829471588135
Tensor time: 0.49999094009399414

Automatic Differentiation Support

The biggest advantage of Tensor is its built-in automatic differentiation mechanism, which is the core feature of deep learning frameworks:

x = torch.tensor([1.0], requires_grad=True)
y = x ** 2
z = y * 3
z.backward()  # Automatically computes gradients
print(x.grad)  # Print gradient value
tensor([6.])

Flexible Data Representation

  • One-Hot Encoding: Converts discrete categories into vectors for classification tasks
  • Embedding: Converts text into dense vectors that capture semantic information

Creating Tensors

# Random initialization strategy
torch.manual_seed(42)  # Fix random seed

# Random initialization with different distributions
uniform_tensor = torch.rand(3, 3)   # Uniform distribution
normal_tensor = torch.randn(3, 3)   # Normal distribution

# Initialization for deep learning
# Kaiming initialization
weight = torch.nn.init.kaiming_uniform_(torch.empty(3, 3))
uniform_tensor: tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009],
        [0.2566, 0.7936, 0.9408]])
normal_tensor: tensor([[ 1.5231,  0.6647, -1.0324],
        [-0.2770, -0.1671, -0.1079],
        [-1.4285, -0.2810,  0.7489]])
weight: tensor([[-1.3968,  1.2772, -1.2013],
        [ 1.0918,  0.2354, -0.4592],
        [ 0.8739,  0.2204,  1.1426]])

Meaning of Initialization Strategies

  • Uniform distribution: Suitable for simple random initialization
  • Normal distribution: Mimics natural data distribution
  • Specialized initialization methods: Address gradient vanishing/exploding issues in deep networks

Indexing and Slicing

Why Do We Need Complex Indexing?

# Complex data selection
data = torch.randn(3, 3)

# Flexible selection in high-dimensional data
selected_data = data[torch.randperm(data.size(0))[:1]]  # Randomly pick 1 sample

# Conditional selection
mask = data > 0
positive_data = data.masked_select(mask)

print(f"data: {data}")
print(f"selected data shape: {selected_data}")
print(f"positive data shape: {positive_data}")
data: tensor([[-0.5881,  1.7358,  0.6639],
        [ 0.6067,  0.9153, -2.4359],
        [ 1.4119, -0.4828, -2.3674]])
selected data shape: tensor([[ 0.6067,  0.9153, -2.4359]])
positive data shape: tensor([1.7358, 0.6639, 0.6067, 0.9153, 1.4119])

Advantages of Indexing

  • Precise operations on high-dimensional data
  • Complex conditional selections
  • High memory efficiency by avoiding unnecessary data copies

Dimensional Transformations: Decoupling Computational Complexity

Why Are Dimensional Transformations Needed?

# Typical dimension manipulation
batch_data = torch.randn(32, 3, 224, 224)  # Image batch

# Flatten before fully-connected layers
flatten_data = batch_data.view(32, -1)

# Permuting dimensions in CNNs
transposed_data = batch_data.permute(0, 2, 3, 1)

print(f"batch_data shape: {batch_data.shape}")
print(f"flatten_data shape: {flatten_data.shape}")
print(f"transposed_data shape: {transposed_data.shape}")
batch_data shape: torch.Size([32, 3, 224, 224])
flatten_data shape: torch.Size([32, 150528])
transposed_data shape: torch.Size([32, 224, 224, 3])

Philosophy Behind Dimensional Transformations

  • Decoupling: Convert complex multi-dimensional data into forms easier to process
  • Flexibility: Supports diverse network architectures
  • Memory Efficiency: Avoids unnecessary copies

Broadcasting: Intelligent Tensor Operations

# Practical use of broadcasting
batch_size, channels, height, width = 4, 32, 14, 14
feature_maps = torch.randn(batch_size, channels, height, width)
bias = torch.randn(channels)  # One bias per channel

# Automatic broadcasting
output = feature_maps + bias.view(1, channels, 1, 1)

print(f"feature_maps shape: {feature_maps.shape}")
print(f"bias shape: {bias.shape}")
print(f"output shape: {output.shape}")
feature_maps shape: torch.Size([4, 32, 14, 14])
bias shape: torch.Size([32])
output shape: torch.Size([4, 32, 14, 14])

Principles Behind Broadcasting

  • Memory Efficiency: Avoids explicit data replication
  • Code Simplicity: Reduces tedious dimension-matching code
  • Computation Optimization: Enables efficient parallel computation under the hood

Conclusion

A Tensor is not merely a data structure—it’s the bridge connecting mathematics, algorithms, and computation in deep learning. Every aspect of its design reflects thoughtful consideration of computational efficiency, flexibility, and developer experience.