So you’ve got some data and you want to get it into PyTorch for training a model or doing some deep learning magic. Great! PyTorch tensors are basically the building blocks of everything you’ll do, so let’s walk through the most common ways to get your data in there. If you’re brand new to PyTorch, you might want to start with my Getting Started with PyTorch guide first.
Starting Simple: From Python Lists and Arrays
The easiest way is probably starting with data you already have in Python:
import torch
import numpy as np
From a Python list
my_list = [1, 2, 3, 4, 5]
tensor_from_list = torch.tensor(my_list)
print(tensor_from_list) # tensor([1, 2, 3, 4, 5])
From a NumPy array
my_array = np.array([[1, 2], [3, 4]])
tensor_from_numpy = torch.from_numpy(my_array)
print(tensor_from_numpy)
Quick note: torch.tensor() creates a copy of your data, while torch.from_numpy() shares memory with the original NumPy array. Most of the time you won’t notice the difference, but it’s good to know.
Working with Different Data Types
PyTorch is pretty smart about figuring out data types, but sometimes you need to be explicit:
# Specifying the data type
float_tensor = torch.tensor([1, 2, 3], dtype=torch.float32)
int_tensor = torch.tensor([1.5, 2.7, 3.9], dtype=torch.int64)
Loading Data from Files
For real projects, you’ll probably load data from files. Here’s how to handle common formats:
Load from CSV using pandas
import pandas as pd
df = pd.read_csv('your_data.csv')
tensor_from_df = torch.tensor(df.values)
# Or if you want specific columns
features = torch.tensor(df[['feature1', 'feature2', 'feature3']].values)
labels = torch.tensor(df['target'].values)
Loading Images
For images, you’ll probably want to use something like PIL or OpenCV first:
from PIL import Image
import torchvision.transforms as transforms
image = Image.open('photo.jpg')
# Convert to tensor (this also normalizes to [0,1])
transform = transforms.ToTensor()
image_tensor = transform(image)
Creating Tensors from Scratch
Sometimes you just need to make tensors with specific properties:
# All zeros
zeros = torch.zeros(3, 4) # 3 rows, 4 columns
# All ones
ones = torch.ones(2, 3)
# Random values
random_tensor = torch.rand(5, 5) # Values between 0 and 1
random_normal = torch.randn(3, 3) # From normal distribution
# Like another tensor (same shape)
same_shape = torch.zeros_like(random_tensor)
GPU and Device Management
If you have a GPU, you’ll want to move your tensors there for faster computation:
# Move to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
tensor = torch.tensor([1, 2, 3]).to(device)
# Or create directly on GPU
gpu_tensor = torch.tensor([1, 2, 3], device=device)
Working with Gradients
If you’re doing training, you might need gradients:
# Enable gradient tracking
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
# Do some operations
y = x * 2
z = y.mean()
# Compute gradients
z.backward()
print(x.grad) # Gradients with respect to x
Common Data Type Issues
You’ll run into these eventually, so here’s how to fix them:
# Convert data types
float_tensor = int_tensor.float()
int_tensor = float_tensor.long()
# Reshape tensors
reshaped = tensor.view(2, 3) # Must have compatible sizes
flattened = tensor.flatten() # Make it 1D
# Handle non-contiguous tensors
# Most of the time PyTorch handles this, but occasionally:
contiguous_tensor = some_tensor.contiguous()
Real-World Example
Here’s how you might load and preprocess some tabular data for a neural network:
import pandas as pd
import torch
from sklearn.preprocessing import StandardScaler
# Load your data
df = pd.read_csv('house_prices.csv')
# Separate features and target
features = df.drop('price', axis=1).select_dtypes(include=[np.number])
target = df['price']
# Normalize features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
# Convert to tensors
X = torch.tensor(features_scaled, dtype=torch.float32)
y = torch.tensor(target.values, dtype=torch.float32)
print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
Wrapping Up
Getting data into PyTorch tensors is usually pretty straightforward once you know the basic patterns. The key is understanding your data format and choosing the right conversion method. Whether you’re starting with lists, NumPy arrays, pandas DataFrames, or files, there’s usually a clean way to get everything into tensor form.
The most important thing is making sure your data types, shapes, and device placement are all correct before you start training. PyTorch will usually give you helpful error messages if something’s wrong, so don’t worry too much about getting everything perfect on the first try.
