## Epochs
An epoch is defined as one complete pass through the entire training dataset. A single epoch can be broken down into following steps:
1. *Batches:* The training dataset is divided into smaller batches.
2. *Predicting:* Each batch is processed and predictions are made, based on the current weights.
3. *Finding gradients:* Batch predictions are compared against actual target values. The gradients of the loss function with respect to the networks’ weights are computed using [[Backpropagation]].
4. *Update weights:* The weights are updated after each batch using an optimization algorithm (e.g. Stochastic [[Gradient Descent]]). This process of performing SGD on each batch after the other is also called mini-batch gradient descent.
After the neural network has processed the entire dataset once and updated its weights, an epoch is considered complete. The training process usually involves multiple epochs until a certain stopping criterion is met, such as achieving a desired level of accuracy, a specific number of epochs, or the model's performance on a validation dataset stops improving.
## Momentum
In `PyTorch`, the momentum parameter is commonly used in optimization algorithms like Stochastic Gradient Descent (SGD). It works by accumulating an exponentially decaying moving average of past gradients and using this moving average to update the weights. It helps in preventing the optimizer from getting stuck in shallow local minima or oscillating around the optimum.
The momentum parameter is a value between 0 and 1, where a higher value means that the moving average of past gradients has a more significant contribution to the weight updates. Commonly used values for the momentum parameter are in the range of 0.5 to 0.9.