This repository contains PyTorch code for the Sparse Label Smoothing Regularization (SparseLSR) loss function proposed in the paper "Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning" by Christian Raymond, Qi Chen, Bing Xue, and Mengjie Zhang. The SparseLSR loss function is a significantly faster and more memory-efficient way to compute than traditional (non-sparse) Label Smoothing Regularization (LSR).
A PyTorch implementation of the proposed Sparse Label Smoothing Regularization (SparseLSR) loss function. This repository contains the following useful scripts:
loss_functions.py
- PyTorch code containing an implementation of SparseLSR and conventional LSR.visualizations.py
- Script for visualizing the different classification loss functions.train.py
- Code for testing the different loss functions and visualizing the penultimate layer representations.
- Clone this repository to your local machine:
git clone https://github.com/Decadz/Sparse-Label-Smoothing-Regularization.git
cd Sparse-Label-Smoothing-Regularization
- Install the necessary libraries and dependencies:
pip install requirements.txt
The key idea behind sparse label smoothing regularization is to utilize the redistributed loss trick, which takes the expected non-target loss and redistributes it into the target loss, obviating the need to calculate the loss on the non-target outputs. The redistributed loss trick can retain near identical behavior due to the output softmax function redistributing the gradients back into the non-target outputs during backpropagation. The sparse label smoothing regularization loss is defined as follows:
where the expectation of the model's non-target output
By definition of the softmax activation function the summation of the model's output predictions is
where the first conditional summation can be removed to make explicit that
The sparse label smoothing regularization loss is prone to numerical stability issues, analogous to the cross-entropy loss, when computing logarithms and exponentials (exponentials are taken in the softmax when converting logits into probabilities) causing under and overflow. In particular, the following expressions are prone to causing numerical stability issues:
In order to attain numerical stability when computing
Regarding the remaining numerically unstable term, this can also be computed stably via the log-sum-exp trick; however, it would require performing the log-sum-exp operation an additional time, which would negate the time and space complexity savings over the non-sparse implementation of label smoothing regularization. Therefore, we propose to instead simply take the exponential of the target log probability to recover the raw probability and then add a small constant
class SparseLSRLoss(torch.nn.Module):
def __init__(self, smoothing=0.0, reduction="mean"):
super(SparseLSRLoss, self).__init__()
self.smoothing = smoothing
self.reduction = reduction
def forward(self, y_pred, y_target):
# Retrieving the total number of classes.
num_classes = torch.tensor(y_pred.size(1))
# Computing the log probabilities using numerically stable log-sum-exp.
log_prob = torch.nn.functional.log_softmax(y_pred, dim=1)
# Extracting the target indexes from the log probabilities.
log_prob = torch.gather(log_prob, 1, y_target.unsqueeze(1))
# Calculating the sparse label smoothing regularization loss.
loss = - (1 - self.smoothing + (self.smoothing / num_classes)) * log_prob + \
((self.smoothing * (num_classes - 1)) / num_classes) * \
torch.log((torch.clamp(1 - torch.exp(log_prob), min=1e-7))/(num_classes - 1))
# Applying the reduction and returning.
return loss.mean() if self.reduction == "mean" else loss
The train.py
script allows you to recreate the penultimate layer representation visualizations from the paper's appendix. In this script, AlexNet is trained on the CIFAR-10 dataset using the cross-entropy loss, label smoothing regularization, and sparse label smoothing regularization. After training, the penultimate layer representations on the testing set are visualized using t-distributed Stochastic Neighbor Embedding (t-SNE).
The code has not been comprehensively checked and re-run since refactoring. If you're having any issues, find a problem/bug or cannot reproduce similar results as the paper please open an issue or email me.
If you use our library or find our research of value please consider citing our papers with the following Bibtex entry:
@article{raymond2023learning,
title={Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning},
author={Raymond, Christian and Chen, Qi and Xue, Bing},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
publisher={IEEE}
}
@article{raymond2024thesis,
title={Meta-Learning Loss Functions for Deep Neural Networks},
author={Raymond, Christian},
journal={arXiv preprint arXiv:2406.09713},
year={2024}
}