What is Stochastic gradient descent and Mini Batch Gradient Descent in ML?

 Here we are going to discuss the second and third types of gradient descent which are known as Stochastic Gradient Descent and mini-batch gradient descent.

Stochastic gradient descent

The word ‘Stochastic’ means a system or a process that is linked with a random probability. Hence, in Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data set for each iteration. In Gradient Descent, there is a term called ‘batch’ which denotes the total number of samples from a dataset that is used for calculating the gradient for each iteration, In typical Gradient descent optimization, like Batch Gradient Descent, the batch is taken to be the whole dataset. Although, using the whole dataset is really useful for getting to the minima in a less noisy or less random manner. But the problem arises when our datasets get really huge.
Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g differentiable or subdifferentiable).
                                      Convex Loss Function

                       Figure 1: Global minimum value = 1                                      Figure 2Global minimum value > 1

Figure 1 shows that the value of the global minimum is 1 and figure 2 shows that the value of the global minimum is more than 1. As we discuss later the batch gradient descent use the whole data set at the same time. As we know that in stochastic gradient descent processes each row individually one by one, we use stochastic gradient descent where the loss function not in the convex form means our process data cannot entertain only one value with respect to visualization the global minimum value is more than 1 in stochastic gradient descent. 
Stochastic gradient descent and Mini Batch Gradient Descent

Mini Batch Gradient Descent

It is the third type of gradient descent. Mini-batch gradient descent is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to calculate model error and update model coefficients.
Implementations may choose to sum the gradient over the mini-batch which further reduces the variance of the gradient. Mini batch gradient descent seeks to find a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent. It is the most common implementation of gradient descent used in the field of deep learning.
If we have a huge amount of data in millions of billions and we have to process this data so, if we use batch gradient descent as batch gradient descent process whole data set at the same time will load on the training stage and we also reduce its errors so, it will take more time and consume more computational power.
If we try to solve this huge amount of data set using stochastic gradient descent so, first we check that our problem that is it linearly separable or not means is it has more than 1 global minimal or not. If it has more than 1 global minimal then we will use the stochastic gradient descent method otherwise we will use the batch gradient descent method.
If we use mini-batch gradient descent to solve huge data set first, we will divide this data set into small chunks so it's easy to find errors and solve errors and the values of coefficients and at the last, we will use the same working procedure that we used for batch gradient descent.

Comments