Hi all,
I've not touched calculus post my maths degree many years ago.
I need to calculate a gradient, to use in back-propagation in a neural network, but its making my head hurt!
I have the formula
where t, t-1 represent time steps
and need to calculate
where θ comprises the parameter set [w, b, u]. b and u are scalars, but w is a vector of multiple elements, e.g. w1, w2 etc..
The derivative of tanh(x) is 1 - tanh^2(x), so I assume the partial derivative,
The final element (dδ(t-1)/dθ) comes from recursively calculating the gradient, so I'm happy with that.
I'm less sure about the first element, given the θ decomposition into its constituent elements.
Any ideas?!
I've not touched calculus post my maths degree many years ago.
I need to calculate a gradient, to use in back-propagation in a neural network, but its making my head hurt!
I have the formula
Code:
δ(t) = tanh[ <w, f(t) > + b + uδ(t−1) ]
where t, t-1 represent time steps
and need to calculate
Code:
dδ(t)/dθ = ∂δ(t)/∂θ + ∂δ(t)/∂δ(t−1) * dδ(t-1)/dθ
where θ comprises the parameter set [w, b, u]. b and u are scalars, but w is a vector of multiple elements, e.g. w1, w2 etc..
The derivative of tanh(x) is 1 - tanh^2(x), so I assume the partial derivative,
Code:
∂δ(t)/∂δ(t−1) = u (1 - tanh^2[ <w, f(t) > + b + uδ(t−1) ])
= u (1 - δ(t)^2)
The final element (dδ(t-1)/dθ) comes from recursively calculating the gradient, so I'm happy with that.
I'm less sure about the first element, given the θ decomposition into its constituent elements.
Any ideas?!
Last edited: