backprop-2.md
Finishing what I left off: deriving Deep Learning without touching a single index
After 8 months since the last post on deriving backpropagation, I finally got some time to write the sequel. Well, during that 8 months, a lot has happened, but what matters is that I finally got some time to write this post.
In the last post, I defined derivatives on normed vector spaces, and used that to find the derivative of one of the most important functions: the Frobenius norm. The heart of our approach is using linearity to pass the derivative operator through the functions like the trace function to end up with a nice and clean expression for the derivative of complicated formulas.