What is derivative of this formula?

Hi, I am trying to find correct derivative with respect to x of this formula. How do you read the part in red frame? It not seems to me like pure math I can put in calculator.. Also do you have any idea what N really is?
It is cross entropy loss for machine learning https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

View attachment 34720
1) Is there a reason why you want the derivative?
2) The notation is slightly confusing. Maybe this will make more sense.
[math]l_n = -w_{y_n}\log \frac{\exp(x_{n,y_n})}{\sum \exp(x_{n,c})}\cdot y_n\\ \text{where } y_n \neq \text{ignored index }[/math]e.g. if you specified ignored_index = 10, then it will skip over index 10.

3) [imath]w[/imath] is the proportion of each class in your data. For example, if you have 100 rows with classes A, B, and C. Suppose there are 70,10,20 rows for each class, then the weights are 0.7, 0.1, and 0.2 respectively.

PS:
I find this YT Video very helpful.
 
Last edited:
Hi thank you, the reason why I want to derivative is because I am creating my own machine learning library and I need gradients of all loss functions, activation functions and layers.
 
Hi thank you, the reason why I want to derivative is because I am creating my own machine learning library and I need gradients of all loss functions, activation functions and layers.
In case you haven't already, you might want to read on automatic differentiation technique, which is heavily used for computing derivatives of composite functions, especially in machine learning.
 
In case you haven't already, you might want to read on automatic differentiation technique, which is heavily used for computing derivatives of composite functions, especially in machine learning.
Hi, what is that ? You mean chain rule? Or something really automatic? Because sometimes I am struggling with getting derivatives of some functions.
 
Hi, what is that ? You mean chain rule? Or something really automatic? Because sometimes I am struggling with getting derivatives of some functions.
In a nutshell it is using all the rules of differentiation including the chain rule. It's much simpler than symbolic differentiation (i.e., finding an explicit formula and plugging in the variable's values), but has the same precision. If you google the term you get lots of hits. Personally, I learned about it from a section in this paper.
 
Top