Write $y_i = \text{softmax}(\textbf{x})_i = \frac{e^{x_i}}{\sum e^{x_d}}$.
That is, $\textbf{y}$ is the softmax of $\textbf{x}$. Softmax computes a normalized exponential of its input vector.
Next write $L = -\sum t_i \ln(y_i)$. This is the softmax cross entropy loss. $t_i$ is a 0/1 target representing whether the correct class is class $i$. We will compute the derivative of $L$ with respect to the inputs to the softmax function $\textbf{x}$.
We have $\frac{dL}{dx_j} = -\sum t_i \frac{1}{y_i} \frac{dy_i}{d{x_j}}$ from the chain rule.
We compute $\frac{dy_i}{dx_j}$ using the quotient rule.
If $i = j$, this gives:
$\frac{dy_i}{dx_j} = \frac{\sum e^{x_d} \cdot e^{x_i} - e^{x_i} \cdot e^{x_i}}{(\sum e^{x_d})^2}$
$\frac{dy_i}{dx_j} = \frac{e^{x_i}}{\sum e^{x_d}} \cdot \left(\frac{\sum e^{x_d} - e^{x_i}}{\sum e^{x_d}}\right)$
$\frac{dy_i}{dx_j} = y_i \cdot (1 - y_i)$
If $i \ne j$, this gives:
$\frac{dy_i}{dx_j} = \frac{\sum e^{x_d} \cdot 0 - e^{x_i} \cdot e^{x_j}}{(\sum e^{x_d})^2}$
$\frac{dy_i}{dx_j} = -\frac{e^{x_i}}{\sum e^{x_d}} \cdot \frac{e^{x_j}}{\sum e^{x_d}} $
$\frac{dy_i}{dx_j} = -y_i y_j$
Together these equations give us the derivative of the softmax function:
$\frac{dy_i}{dx_j} = \begin{cases} y_i \cdot (1 - y_i) & i=j \\\ -y_i y_j & i \ne j \end{cases}$
Using this result, we can finish computing the derivative of $L$. This gives:
$\frac{dL}{dx_j} = -\sum t_i \frac{1}{y_i} \frac{dy_i}{d{x_j}} = \sum\limits_i \begin{cases} t_i (y_i - 1) & i=j \\\ t_i y_j & i \ne j \end{cases}$
Since exactly one of the $t_i$s is 1 and the rest are zeros this further simplifies to:
$\frac{dL}{dx_j} = y_j - t_j$
We have computed the derivative of the softmax cross-entropy loss $L$ with respect to the inputs to the softmax function.
This page is an experiment in publishing directly from Roam Research. It is incomplete, and the formatting is probably all wonky. Bear with me while I get this sorted. Update (December 13th, 2020): The formatting looks good now!