Pytorch

8 minute read

Published:

Hello there! If you just came to this blog and you would like to learn something about PyTorch I believe you can find something to start playing around. In this post we will see how to write a simple code to solve a simple regression problem. This means we will do predictions for a quantity $y$, given values for their predictor (the input) $x$ for a noised: $y = a x + b$.

The first warn is that you can follow this post on Google Colab: Open this notebook in Colab. If you prefer to download it as a Jupyter Notebook and run on your own machine, please, go to the repo: pytorch_and_GNNs.

The first thing we could do is to learn that it is faster to run codes on the GPUs. That is why we have to define the processing unit and transfer our data and our model to there:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

After defining the unit to process everything it is time to create the data set:

#Create Dataset 
def generate_data(a, b, num_examples):  
    '''
    Generate y = a x + b + noise
    '''

    x = np.random.uniform(0, 1, num_examples)
    y = a*x + b
    y += np.random.normal(0, 0.05, num_examples)

    df = pd.DataFrame(data = np.array([x, y]).T, columns = ['x', 'y'])
    
    return df

Then, the generated data will look like this:

Figure 1: Generated data set.

Then, in this tutorial we are going to: ${x} \Rightarrow {y}$, using a simple Neural Network in Pytorch.

Transforming the dataset

First, we need to convert the data set in $3$ fractions:

  • Train
  • Validation
  • Test

Second, we need to transform the data set using: $x \Rightarrow \frac{(x - x_{mean})}{x_{std}}$ and $y \Rightarrow \frac{(y - y_{min})}{(y_{max} - y_{min})}$. This is done to constrain the range of all the input variables ($x$, $y$) to small and similar values to get a faster convergence of the method.

Third, we need to convert the data into something to be understandable by PyTorch. We convert the data into tensors, then, in TensorDataSet, and DataLoader. This last step split one more time the sets using batches (number of samples to work through before updating the internal model parameters).

def transform_dataset(df, frac_train, frac_val, batch_size):
    '''
    frac_train: fraction of the dataset to use in the training
    frac_val:   fraction of the dataset to use in the validation
    batch_size: number of samples to work through before updating the internal model parameters
    '''

    #Randomizing the dataset
    df = df.sample(frac = 1)

    #Getting the training, validation, and testing sets
    x_train = np.array( [ df['x'].iloc[:int(df.shape[0]*frac_train)] ] ).T
    y_train = np.array( [ df['y'].iloc[:int(df.shape[0]*frac_train)] ] ).T

    x_val = np.array( [ df['x'].iloc[int(df.shape[0]*frac_train):int(df.shape[0]*(frac_train + frac_val))] ] ).T
    y_val = np.array( [ df['y'].iloc[int(df.shape[0]*frac_train):int(df.shape[0]*(frac_train + frac_val))] ] ).T

    x_test = np.array( [ df['x'].iloc[int(df.shape[0]*(frac_train + frac_val)):] ] ).T
    y_test = np.array( [ df['y'].iloc[int(df.shape[0]*(frac_train + frac_val)):] ] ).T

    #Getting the statistics from the training set 
    mean = np.mean(x_train, axis = 0)
    std = np.std(x_train, axis = 0)
    min = np.mean(y_train, axis = 0)
    max = np.std(y_train, axis = 0)

    #Tranforming the training set
    x_train = (x_train - mean)/std
    x_val = (x_val - mean)/std
    x_test = (x_test - mean)/std

    y_train = (y_train - min)/(max - min)
    y_val = (y_val - min)/(max - min)
    y_test = (y_test - min)/(max - min)

    #Converting to torch reshaping
    x_train = torch.from_numpy(x_train.reshape(-1, 1)).float()
    y_train = torch.from_numpy(y_train.reshape(-1, 1)).float()

    x_val = torch.from_numpy(x_val.reshape(-1, 1)).float()
    y_val = torch.from_numpy(y_val.reshape(-1, 1)).float()

    x_test = torch.from_numpy(x_test.reshape(-1, 1)).float()
    y_test = torch.from_numpy(y_test.reshape(-1, 1)).float()

    #Converting to tensor dataset
    dataset_train = TensorDataset(x_train, y_train)
    dataset_val = TensorDataset(x_val, y_val)
    dataset_test = TensorDataset(x_test, y_test)

    #Organizing in batches
    train_loader = DataLoader(dataset = dataset_train, batch_size = batch_size)
    val_loader = DataLoader(dataset = dataset_val, batch_size = batch_size)
    test_loader = DataLoader(dataset = dataset_test, batch_size = batch_size)

    return train_loader, val_loader, test_loader, mean, std, max, min

Defining the Neural Network

A neural network is a collection of nodes (neurons) that are arranged in a series of layers:

Figure 2: Scheme representation of a NN. Image source: https://www.tibco.com/sites/tibco/files/media_entity/2021-05/neutral-network-diagram.svg]

Each connection has a weight, $\omega_i a_i$, and each layer is a summarization of them, plus a bias $b$, according to: \begin{equation} b_{\mu} + \sum_{\nu} W_{\mu \nu} a_{\nu} . \end{equation}

Another numeric transformation called activation function is responsible for performing a non linear transformation of the values of each layer \begin{equation} y_{\mu} = f \left( b_{\mu} + \sum_{\nu} W_{\mu \nu} a_{\nu} \right) , \end{equation} This activation function can assume different forms:

Figure 3: Examples of activation function. Image source: [https://miro.medium.com/max/1200/1ZafDv3VUm60Eh10OeJu1vw.png](https://miro.medium.com/max/1200/1ZafDv3VUm60Eh10OeJu1vw.png)

During the training process these weights are adjusted through the optimizer in epochs in order to minimize the difference between the network predictions $y_{pred}$, and the target values $y_{target}$ through the minimization of a loss function (that also can assume different forms). In this tutorial we are going to use the Mean Squared Error (MSE) \begin{equation} MSE = \frac{1}{m} \sum_{i = 1}^m (y_{pred} - y_{target})^2 . \end{equation}

Then, we can define the model such as:

class model(nn.Module):
  def __init__(self):

    super(model, self).__init__() #Initiating the model

    #Defining the linear layers and activation function
    self.layer1 = nn.Linear(1, 8) 
    self.layer2 = nn.Linear(8, 16)
    self.layer3 = nn.Linear(16, 1)
    self.ReLU = nn.ReLU()

  #Forward pass
  def forward(self, x):
    '''
    Define the architecture at the same time as compute y
    '''
    x = self.ReLU(self.layer1(x))
    x = self.ReLU(self.layer2(x))
    y = self.layer3(x)
      
    return y

And the loss function and optimizer:

optimizer = torch.optim.Adam(model.parameters())
loss_func = torch.nn.MSELoss()

Training and validating the model

Now, it is time to do the magic:

epochs = 200 #Number of steps to train the model

lloss_train = []
lloss_val = []
for i in range(epochs):
  aux_train = 0
  model.train()  #Telling the model we are training :p
  for x, y in train_loader:
    x = x.to(device)
    y = y.to(device)
    prediction = model(x)
    tloss = loss_func(prediction, y) 
    optimizer.zero_grad() #set the gradients to zero before starting to do backpropagation (i.e., updating the weights and biases)
    tloss.backward() #calculate the gradient during the backward pass in the neural network       
    optimizer.step() #performs a single optimization step (parameter update)
    aux_train += tloss.item()
  lloss_train.append( aux_train/len(train_loader) )     
  if i % 10 == 0:
    print(f"Epoch {i} | Train Loss {aux_train/len(train_loader)}")

  aux_val = 0
  model.eval() #Telling the model we are testing :p
  for x, y in val_loader:
    x = x.to(device)
    y = y.to(device)
    prediction = model(x)
    vloss = loss_func(prediction, y) 
    aux_val += vloss.item()
  lloss_val.append( aux_val/len(val_loader) )
  if i % 10 == 0:
    print(f"Epoch {i} | Val Loss {aux_val/len(val_loader)}")

Notice that we are always passing the objects of the code to device, in order to do everything on the GPU.

One important practice after training the model is to analyse if it is ok, i.e., if we do not have a biased model that is:

  • Overting: when the model gives accurate predictions for training data but not for new data (validation or test data). This can be seen when the validation loss is increasing or ir is higher while compared to the training loss.
  • Underfitting: when the model is unable to capture the relationship between the input and output variables accurately. This can be seen when the validation loss is decreasing of it is lower than the training loss.

Figure 4: Representation of a model that is unverfitting, ok, and overfitting. Image source: https://www.datarobot.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-22-at-11.22.15-AM-e1526498075543.png.

With the model we have just designed, the loss function of validation and training results is ok, what can be confirmed by a plot of their losses:

Figure 5: Train and validation losses for the model in question.

Testing the model

After all this work it is time to see how the model is performing on the test set:

lloss_test = []
preds = []
xss = []
yss = []
for x, y in test_loader:
  out = model( x.to(device))
  preds.append( out.detach().cpu().numpy() )
  xss.append( np.array( x ) )
  yss.append( np.array( y ) )
  loss = loss_func(out, torch.from_numpy( np.array(y[:, 0]) ).float().to(device))
lloss_test.append( loss.data.cpu().numpy() )

new_preds = np.concatenate( [preds[i] for i in range(len(preds))])
new_yss = np.concatenate( [yss[i] for i in range(len(yss))])
new_xss = np.concatenate( [xss[i] for i in range(len(xss))])

df = pd.DataFrame()

df["x_truth"] = new_xss[:, 0]*std + mean
df["y_truth"] = new_yss[:, 0]*(max - min) + min
df["y_pred"] = new_preds[:, 0]*(max - min) + min

And the analyses como with the loss for the test set:

loss_test = np.mean(lloss_test)

A histogram:

Figure 6: A histogram of the truch and test set predictions.

And a truth $\times$ predictions plot:

Figure 7: Predictions $\times$ Truth values for the test set.

As we can see by the plots above the model achieved good predictions, with a small dispersion over the pred $\times$ truth analysis. Anyhow, this is a really simple regression example, that I hope can help you to start having fun with PyTorch and machine learning.

Stay tunned for new posts!