Pytorch中的四种经典Loss源码解析

笔者最近在OneFlow框架对齐实现Pytorch相关Loss代码，其中也涉及到部分源码解读，数学特殊操作等知识，于是想写篇文章简单总结一下。

关于Pytorch的Loss源码

了解过Pytorch的应该知道其历史包袱比较重，它吸收了Caffe2的底层代码，然后自己借用这部分底层代码来写各种OP的逻辑，最后再暴露出一层Python接口供用户使用。

因此第一次接触Pytorch源代码可能有点不太熟悉，基本上Pytorch大部分OP逻辑实现代码都放在 Aten/native下，我们这里主要是根据Loss.cpp来进行讲解

MarginRankingLoss

RankingLoss系列是来计算输入样本的距离，而不像MSELoss这种直接进行回归。其主要思想就是分为 Margin 和 Ranking。

MarginRankingLoss公式

Margin 这个词是页边空白的意思，平常我们打印的时候，文本内容外面的空白就叫 Margin。

而在Loss中也是表达类似的意思，相当于是一个固定的范围，当样本距离（即Loss）超过范围，即表示样本差异性足够了，不需要再计算Loss。

Ranking 则是排序，当target=1，则说明x1排名需要大于x2；当target=2，则说明x2排名需要大于x1。

其源码逻辑也很简单，就是根据公式进行计算，最后根据reduction类型来进行 reduce_mean/sum

Pytorch的MarginRankingLoss代码

下面是对应的numpy实现代码

def np_margin_ranking_loss(input1, input2, target, margin, reduction):

output = np.maximum(0, -target*(input1 - input2) + margin)

if reduction == "mean":

return np.mean(output)

elif reduction == "sum":

return np.sum(output)

else:

return output

TripletMarginLoss

TripletLoss最早是在 FaceNet 提出的，它是用于衡量不同人脸特征之间的距离，进而实现人脸识别和聚类

TripletLoss

而TripletMarginLoss则是结合了TripletLoss和MarginRankingLoss的思想，具体可参考 Learning local feature descriptors with triplets and shallow convolutional neural networks其公式如下

该Loss针对不同样本配对，有以下三种情况

此时虽然负样本距离anchor的距离d(ai, ni) 大于正样本距离anchor的距离d(ai, pi)，但是还不够大，没有超过 Margin，需要优化

此外论文作者还提出了 swap 这个概念，原因是我们公式里只考虑了anchor距离正类和负类的距离，而没有考虑正类和负类之间的距离，考虑以下情况

可能Anchor距离正样本和负样本的距离相同，但是负样本和正样本的距离很近，不利于模型区分，因此会做一个swap，即交换操作，在代码里体现的操作是取最小值。

## 伪代码

if swap:

D(a, n) = min(D(a,n), D(p, n))

这样取了最小值后，在Loss计算公式中，Loss值会增大，进一步帮助区分负样本。

有了前面的铺垫，我们理解Pytorch的TripletMarginRankingLoss源码也非常简单

TripletMarginLoss源码

at::pairwise_distance是距离计算函数，首先计算出了anchor与正类和负类的距离。然后根据参数swap，来确定是否考虑正类和负类之间的距离。最后output就是按照公式进行计算，下面是numpy的对应代码

def np_triplet_margin_loss(anchor, postive, negative, margin, swap, reduction="mean", p=2, eps=1e-6):

def _np_distance(input1, input2, p, eps):

# Compute the distance (p-norm)

np_pnorm = np.power(np.abs((input1 - input2 + eps)), p)

np_pnorm = np.power(np.sum(np_pnorm, axis=-1), 1.0 / p)

return np_pnorm

dist_pos = _np_distance(anchor, postive, p, eps)

dist_neg = _np_distance(anchor, negative, p, eps)

if swap:

dist_swap = _np_distance(postive, negative, p, eps)

dist_neg = np.minimum(dist_neg, dist_swap)

output = np.maximum(margin + dist_pos - dist_neg, 0)

if reduction == "mean":

return np.mean(output)

elif reduction == "sum":

return np.sum(output)

else:

return output

这里比较容易踩坑的是p范数的计算，因为当p=2，根据范数的公式，如果输入有负数是不合法的，比如

下面我们看看Pytorch对应的源码

KLDivLoss源码

首先可以观察到，除了常规的input，target，reduction，还有一个额外的参数 log_target，用于表示target是否已经经过log运算。根据这个参数，KLDivLoss进而分成两个函数 _kl_div_log_target 和 _kl_div_non_log_target 实现。

_kl_div_log_target 的实现比较简单，就是按照公式进行计算

而 _kl_div_non_log_target 有些许不同，因为target的数值范围不确定，当为负数的时候，log运算时不合法的。因此Pytorch初始化了一个全0数组，然后在最后的loss计算中，在target小于0的地方填0，避免nan数值出现

下面是对应的numpy实现代码

def np_kldivloss(input, target, log_target, reduction="mean"):

if log_target:

output = np.exp(target)*(target - input)

else:

output_pos = target*(np.log(target) - input)

zeros = np.zeros_like(input)

output = np.where(target>0, output_pos, zeros)

if reduction == "mean":

return np.mean(output)