Regression in Tensorflow v1 & v2
Continuing from the previous article, this one is going to approach Linear & Logistic Regression with Tensorflow and shade some light in the core differences between versions 1 and 2. Before we begin, it would be nice to discuss a little about the framework.
Tensorflow was originated from researchers in Google as an open source software library for Machine Intelligence for production and research. Nowadays, in its more mature version is referred as an End-to-End open source ML platform. Tensorflow, is based on graph based computation, a concept to represent mathematical calculations. Until v2, was really hard to digest and make your way out of numerous sub-apis due to lack of documentation and out-of-the-box tutorials — still. Alternatively, other frameworks such the popular Keras, came to the rescue as a wrapper library and offered a level (maybe more) layer(s) of abstraction over Tensorflow, and finally became the default in TF v2.
Linear Regression v1
Now that we are already familiar with the notion of linear regression, we can take it to a step further and train a very simple model on the linear relation between X and Y.
The dataset of our case study is the Birth Rate-Life expectancy. It is consists of birth rate (X) and the corresponded life expectancy (Y) for a number of countries. Assuming that the relation between them is linear Y = w*X + b
we can train a model that calculates W and b.
TF v1 enables us to implement the computational graph from scratch using placeholders to represent our dataset, and scalar variables for weight and bias which will be calculated with backpropagation.
Next we define the prediction formula with tensorflow computations and use its built-in functions for the MSE loss function and the gradient descent optimizer.
Now that we have defined the components of the computational graph we need to let tensorflow know that we are going to infer our data and train the variables. This, in tf v1, is accomplished with a Session where after the graph initialisation, we infer the dataset over the computational graph. Dataset preparations techniques can be a stand alone chapter but is already simplified in v2. In this case the feed dict api is used.
Epoch 10: | Loss: 375.46, w: 3.51, b: 41.13
Epoch 20: | Loss: 130.94, w: -0.88, b: 61.19
Epoch 30: | Loss: 59.20, w: -3.26, b: 72.10
Epoch 40: | Loss: 38.30, w: -4.56, b: 78.04
Epoch 50: | Loss: 32.29, w: -5.27, b: 81.27
Train Time: 6.204496 seconds
You may also noticed that there is a FileWriter object instantiated before the training, and that indicates a record of the computational graph visualised with Tensorboard.
By default in v1, you are not allowed to view the content of variables outside of a session and that is an issue when you have to debug your model. The only way to do that, is to enable eager execution but it does not act as a magic wand as you have to refactor you program. In latest Tensorflow’s version, eager execution is the standard default and evaluates operations immediately, without building computational graphs and thus makes it much easier to start with, as it offers a more natural flow of programming in contrast to the cumbersome style of programming in v1.
Linear Regression v2
Reimplementing the linear regression model with a simple neural network in Tensorflow v2 makes it much easier to monitor computations and calculate the gradients. Let’s naively say that the new API from Keras tf.GradientTape
replaced the functionality of tf.Session
. GradientTape returns a Tensor object that takes over any assigned operation/calculation and can be converted to np.array too.
Weights and bias are now tf.Variable
objects, something by the way in Tensorflow v1 was considered the old way of assigning variables in respect to tf.get_variable.
Prediction, Loss function and Optimizer can now defined as simple as you write a line of code in Python
Training process is easier to digest and comprehend and takes less to converge than feed_dict API does.
Epoch: 100 | Loss: 652.59, w: 10.31, b: 30.24
Epoch: 200 | Loss: 324.81, w: 5.31, b: 47.55
Epoch: 300 | Loss: 169.67, w: 1.87, b: 59.46
Epoch: 400 | Loss: 96.24, w: -0.50, b: 67.65
Epoch: 500 | Loss: 61.48, w: -2.13, b: 73.29
Epoch: 600 | Loss: 45.03, w: -3.25, b: 77.16
Epoch: 700 | Loss: 37.24, w: -4.02, b: 79.83
Epoch: 800 | Loss: 33.56, w: -4.55, b: 81.67
Epoch: 900 | Loss: 31.81, w: -4.91, b: 82.93
Epoch: 1000 | Loss: 30.99, w: -5.17, b: 83.80
Train Time: 2.776154 seconds
Logistic Regression
Moving forward to a more advanced implementation of Logistic Regression with a single neural neural network, we will approach a multi-class classification problem using MNIST dataset, a collection of hand-written digits from 0 to 9.
Dataset preparation
This example can be also considered a Tensorflow v1 tutorial which makes use of special terminology, built-in routines and tf.data
API, a faster method comparing to placeholders and feed_dict to load a dataset.
In the next step we must define a process to iterate through samples/digits of the dataset, let’s say an iterator. Each time a new batch/sample is being processed is called the get_next(). Constructing the classifier, we will not process the data samples one by one as this will slow down the training due to the dataset size. Thus we are going to process the data in batches to accelerate the process. We also explicitly define not to discard the remaining samples in the last batch when iterating(train & test), if it does not fit in the training set with drop_remainder=False
After every epoch, tensorflow needs to rewind the dataset for the next epoch and continue the training. The initialisation operation is defined with a iterator.make_initializer()
object, fed with the part of the dataset.
We will elaborate more about how the batches consumed in the training process in a next section.
In this case, weights and bias have to follow the dimensions of the dataset as well.
Calculate Loss
Tensorflow uses the term logits or unscaled log probabilities, which actually is the output of the model during forward propagation before any other operation is applied.
How much of these probabilities lead to correct prediction is measured with cross-entropy which a) applies a softmax activation on logits, converting the output to normalised probabilities summed to 1 aka predictions and b) computes the distance from the ground truth.
The total loss is calculated from the mean value of the total training instances.
# Sample output from a single 128 size batch# Logits (10,128)
[[-0.02467263 0.0407119 0.03357347 ... 0.07849845 -0.04018284
0.14606732]
...
[-0.03187682 0.03064402 0.02814235 ... 0.12632789 -0.07327773
0.16343306]]# Softmax + Cross Entropy (128,)
[2.280719 2.3213248 ... 2.2659633 2.3112588]# Batch Loss
2.315973
Optimizer
In tensorflow’s dialect, optimizer is an operation and is used to minimise loss. It is executed in a session.run()
, passed in a list along with loss computation. This is because, the calculation graph of Tensorflow executes the parts that optimizer is depended such loss and loss depends on input data and weights and bias as well. That can be seen from the following graph, produced in tensorboard.
optimizer=tf.train.GradientDescentOptimizer(0.001).minimize(loss)# Training process
_, batch_loss, batch_acc = sess.run([optimizer, loss, accuracy])
Accuracy & Confusion Matrix
Model’s performance is not only assessed with loss, but also with the a range of statistical metrics. In this example, model is evaluated in accuracy of the classifier to produce correct predictions along with a confusion matrix (precision, recall) with actual vs predicted class.
# Sample output from a single 128 size batch# Predictions (10,128)
[[0.099 0.10 0.11 ... 0.08 0.09 0.09]
...
[0.11 0.10 0.09 ... 0.08 0.10 0.10]]# Correct Preds (128,)
[False True ... False False]# Batch Accuracy
0.078------------------------------------------------------------------Training...
Epoch 10 - Train loss: 0.875 - Train Accuracy: 83.16%
Epoch 20 - Train loss: 0.653 - Train Accuracy: 85.49%
Epoch 30 - Train loss: 0.564 - Train Accuracy: 86.53%
Epoch 40 - Train loss: 0.515 - Train Accuracy: 87.21%
Epoch 50 - Train loss: 0.483 - Train Accuracy: 87.73%
Epoch 60 - Train loss: 0.460 - Train Accuracy: 88.10%
Evaluating...
Test Validation loss: 0.067, Validation Accuracy: 89.09%
Training and batch processing
Feeding the computational graph with data from an iterator is a process that has to be implemented in a semi-manual way. In every epoch, the iterator handles the dataset per batch size and for each batch loss, accuracy and batch cycles, are computed in order to calculate the total loss and accuracy for each epoch.
For example in a train dataset of 55K samples and batch_size=128
the batch_cycle
will be 430 times, when drop_remainder=False
and 429 when is True
.
The latter means that when an epoch needs one more batch to complete, it just discards it and depletes the iterator immediately at (128x429=54912 samples)
Conclusions
This article completes an extended explorations in regression. If you are a beginner and haven’t invested much time to deepen on how a computational graph works and does all the magic under the hood, tensorflow version 1.x gives you the chance!
In a next article we will make a multilayer perceptron (MLP) learn the Tensorflow’s playground datasets.
Source code: https://github.com/sniafas/ML-Projects/tree/master/Regression
References:
[2] https://www.tensorflow.org/guide/autodiff
[3] https://www.easy-tensorflow.com/tf-tutorials/basics/introduction-to-tensorboard