Skip to content

Alternative loss function of binary cross entropy and focal loss

License

Notifications You must be signed in to change notification settings

inzapp/absolute-logarithmic-error

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Absolute Logarithmic Error

ALE(Absolute Logarithmic Error) is an improved function of the BCE(Binary Cross Entropy) that can be used for both classification and regression problems

ALE can calculate almost the same gradient that BCE intends with fewer operations than BCE

And it completely solves the problem of solving the regression problems that BCE has

I think ALE can replace BCE perfectly

First, let's look at the BCE formula below

Sum two values after calculation for label 1 and 0 respectively

The formula basically assumes that label is 1 or 0

This has a fundamental problem

The problem is that the loss value for a not 0 or 1 label is invalid

The value of BCE loss if label is 1 or 0,

>>> loss_fn = tf.keras.losses.BinaryCrossentropy()
>>> 
>>> y_true, y_pred = [[1.0]], [[0.0]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 15.424949
>>> 
>>> y_true, y_pred = [[1.0]], [[0.2]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 1.6094373
>>> 
>>> y_true, y_pred = [[1.0]], [[0.9]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.10536041
>>> 
>>> y_true, y_pred = [[1.0]], [[1.0]]
>>> print(loss_fn(y_pred, y_pred).numpy())
>>> 0.0

The closer you get to the target value, the lower loss is returned

This does not seem to be a problem. However, if the label is a value between 0 and 1,

>>> loss_fn = tf.keras.losses.BinaryCrossentropy()
>>> 
>>> y_true, y_pred = [[0.1]], [[0.1]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.32508278
>>> 
>>> y_true, y_pred = [[0.2]], [[0.2]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.50040215
>>> 
>>> y_true, y_pred = [[0.5]], [[0.5]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.69314694
>>> 
>>> y_true, y_pred = [[0.6]], [[0.6]]
>>> print(loss_fn(y_pred, y_pred).numpy())
>>> 0.6730114

Although the model predicts a value that is exactly the same as the target value, each returns a different loss

This can cause instability when training the regression model

And also means that loss value cannot be used as an appropriate metric when solving the regression problem using BCE

If the error value of the predicted value is equal to the target value, the loss function must return the same value regardless of the target value

ALE provides the correct loss value and gradient even when training using continuous labels between 0 and 1

It also has fewer operations, more intuitive formulas, and easier to optimize than BCE

And ALE doesn't have a problem of returning a different loss even if it has the same error according to the target value, which is a problem of BCE

>>> loss_fn = AbsoluteLogarithmicError()
>>> 
>>> y_true, y_pred = [[1.0]], [[0.0]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 15.249238
>>> 
>>> y_true, y_pred = [[1.0]], [[0.2]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 1.6094373
>>> 
>>> y_true, y_pred = [[1.0]], [[0.9]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.10536041
>>> 
>>> y_true, y_pred = [[1.0]], [[1.0]]
>>> print(loss_fn(y_pred, y_pred).numpy())
>>> 0.0
>>> 
>>> y_true, y_pred = [[0.0]], [[0.0]]
>>> print(loss_fn(y_pred, y_pred).numpy())
>>> 0.0
>>>
>>> y_true, y_pred = [[0.1]], [[0.1]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.0
>>> 
>>> y_true, y_pred = [[0.2]], [[0.2]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.0
>>> 
>>> y_true, y_pred = [[0.5]], [[0.5]]
>>> print(loss_fn(y_true, y_pred).numpy())
>>> 0.0
>>> 
>>> y_true, y_pred = [[0.6]], [[0.6]]
>>> print(loss_fn(y_pred, y_pred).numpy())
>>> 0.0

Compare to Focal Loss

ALE can also be compared to Focal loss

Focal loss helps focus on hard samples by giving lower weights to easy samples that BCE can classify

ALE can also do this the same and it's very simple and intuitive

The table below shows the results of training using the cifar10 dataset for comparison between BCE and ALE

The model was trained with the same hyperparameters and tested only by changing the loss function

Model params : 4,698,186

batch size : 128

lr : 0.003

momentum : 0.9

epochs : 10 (Adam for 7 epochs, Nesterov SGD for 3 epochs)

Gamma BCE ALE
0.0 0.8631 0.8628
1.0 0.8473 0.8584
2.0 0.8488 0.8574
avg 0.8531 0.8595

About

Alternative loss function of binary cross entropy and focal loss

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages