So you’ve got some data and you want to get it into PyTorch for training a model or doing some deep learning magic. Great! PyTorch tensors are basically the building blocks of everything you’ll do, so let’s walk through the most common ways to get your data in there.
Starting Simple: From Python Lists and Arrays
The easiest way is probably starting with data you already have in Python:
import torch
import numpy as np
From a Python list
my_list = [1, 2, 3, 4, 5]
tensor_from_list = torch.tensor(my_list)
print(tensor_from_list) # tensor([1, 2, 3, 4, 5])
From a NumPy array
my_array = np.array([[1, 2], [3, 4]])
tensor_from_numpy = torch.from_numpy(my_array)
print(tensor_from_numpy)
Quick note: torch.tensor() creates a copy of your data, while torch.from_numpy() shares memory with the original NumPy array. Most of the time you won’t notice the difference, but it’s good to know.
Working with Different Data Types
PyTorch is pretty smart about figuring out data types, but sometimes you need to be explicit:
Specifying the data type
float_tensor = torch.tensor([1, 2, 3], dtype=torch.float32)
int_tensor = torch.tensor([1.5, 2.7, 3.9], dtype=torch.int64)
Load from CSV using pandas (Of course)
df = pd.read_csv(‘your_data.csv’)
tensor_from_df = torch.tensor(df.values)
Or if you want specific columns
features = torch.tensor(df[[‘feature1’, ‘feature2’, ‘feature3’]].values)
labels = torch.tensor(df[‘target’].values)
For images, you’ll probably want to use something like PIL or OpenCV first:
from PIL import Image
import torchvision.transforms as transforms
Convert to tensor (this also normalizes to [0,1])
transform = transforms.ToTensor()
image_tensor = transform(image)
Creating Tensors from Scratch
Sometimes you just need to make tensors with specific properties:
Move to GPU if available
device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
tensor = torch.tensor([1, 2, 3]).to(device)
Or create directly on GPU:
gpu_tensor = torch.tensor([1, 2, 3], device=device)
Gradients: If you’re doing training, you might need gradients:
Most of the time PyTorch handles this, but occasionally:
contiguous_tensor = some_tensor.contiguous()
Real-World Example
Here’s how you might load and preprocess some tabular data for a neural network:
import pandas as pd
import torch
from sklearn.preprocessing import StandardScaler
Load your data
df = pd.read_csv(‘house_prices.csv’)
Separate features and target
features = df.drop(‘price’, axis=1).select_dtypes(include=[np.number])
target = df[‘price’]
Normalize features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
Convert to tensors
X = torch.tensor(features_scaled, dtype=torch.float32)
y = torch.tensor(target.values, dtype=torch.float32)
print(f”Features shape: {X.shape}”)
print(f”Target shape: {y.shape}”)
Wrapping Up
Getting data into PyTorch tensors is usually pretty straightforward once you know the basic patterns. The key is understanding your data format and choosing the right conversion method. Whether you’re starting with lists, NumPy arrays, pandas DataFrames, or files, there’s usually a clean way to get everything into tensor form.
The most important thing is making sure your data types, shapes, and device placement are all correct before you start training. PyTorch will usually give you helpful error messages if something’s wrong, so don’t worry too much about getting everything perfect on the first try.