Hello PyTorch

Published on Mar 9, 2025 in AI Fundamentals  PyTorch  

In this short article, I’ll show you how to use PyTorch to create a simple neural network. If you haven’t already seen them, I advise you to read my two previous articles on neural networks, part 1 and part 2. In this article, we’ll just use PyTorch to implement the network of the second article.

Here’s the code:

python
import torch
import torch.nn as nn
import torch.optim as optim

class SimpleMLP(nn.Module):
    def __init__(self):
        super(SimpleMLP, self).__init__()
        # Define the layers
        self.hidden_layer = nn.Linear(2, 2)  # Hidden layer with 2 inputs and 2 neurons
        self.output_layer = nn.Linear(2, 1)  # Output layer with 2 inputs and 1 neuron
        self.sigmoid = nn.Sigmoid()          # Sigmoid activation function

    def forward(self, x):
        # Forward pass through the network
        hidden_output = self.sigmoid(self.hidden_layer(x))
        final_output = self.sigmoid(self.output_layer(hidden_output))
        return final_output

input_vectors = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]], dtype=torch.float32)
output_vectors = torch.tensor([[0.0], [1.0], [1.0], [0.0]], dtype=torch.float32)
network = SimpleMLP()

# Define the loss function and optimizer
criterion = nn.MSELoss()  # Mean Squared Error Loss
optimizer = optim.SGD(network.parameters(), lr=0.1)  # Stochastic Gradient Descent

# Train the network
max_epochs = 1000000
for epoch in range(max_epochs):
    optimizer.zero_grad()  # set all gradients to zero
    outputs = network(input_vectors)  # Forward pass
    loss = criterion(outputs, output_vectors)  # Calculate the loss
    loss.backward()  # Backward pass
    optimizer.step()  # Update the weights
    if (epoch + 1) % 1000 == 0:
        print(f"Epoch {epoch + 1}: Loss = {loss.item()}")
    if loss.item() < 0.0001:
        break

# Test the network
with torch.no_grad():  # Disable gradient tracking (no need to calculate gradients for inference)
    for inputs in input_vectors:
        output = network(inputs)
        print(f"Input: {inputs.numpy()}, Output: {round(output.item())}")

If you have understood the previous articles correctly, there is not much to explain.

When you create a neural network in PyTorch, it has to inherit from nn.Module.

Then we define our network directly by layer rather than by neuron.

nn.Linear applies a linear transformation to the input vector. This is the simplest layer of neuron.

There are also convolution/pooling layers to create CNNs, recurrent layers to create RNNs, normalisation layers, dropout layers that can be used to avoid overfitting during training (when the network remembers the data instead of understanding the pattern), and also ready-to-use transformer layers…

The forward method is used to specify how to calculate the output of our network based on the input.

The training is quite simple, since everything is already implemented. Of course, it can be more complex. You can decide to fix the weights of certain layers or add dropouts during training. But generally speaking, the magic of the OOP is rather well used in PyTorch and training a complex network is as simple as training our “Hello World” network. During training, PyTorch automatically updates recursively all submodules of a module.

In our example, using PyTorch has no other interest than to learn how to use PyTorch. In fact, this code is much slower than the previous from scratch code. The interest of PyTorch is to use a GPU to speed up tensor calculations. Tensors are a generalisation of vectors and matrices. The vectors are tensors of order 1, the matrices are tensors of order 2… But for a network with three neurons, it’s not very useful.

But you have to start somewhere.