Gradient Descent, Demystified

2:45pm - 3:15pm on Friday, October 5 in PennTop South

Michael (Stu) Stewart

Audience Level:: All
Slides:: https://github.com/mstewart141/GradientDescentTalk_ODSC_Conference/blob/master/GradientDescentPresentationCOMPLETE_ODSC.ipynb
Watch:: https://youtu.be/Q_dd-Chslt0

Overview

I walk through a live coding practicum (in a RISE Jupyter Notebook slideshow) in which I implement an initial gradient descent algorithm for logistic & linear regression, demonstrating the flexibility of the optimization technique & decidedly un-scary code required to get our prototype up-&-running.

Description

Gradient descent (GD) is a fundamental optimization algorithm that sounds much scarier than it is. Many users of Scikit-learn et al can apply GD through these tools, but do not grok what GD is really doing. Other more engineering-oriented practitioners are put off entirely by the seeming complexity. I walk through a live coding practicum (in a RISE Jupyter Notebook slideshow) in which I implement an initial gradient descent algorithm for logistic and linear regression, demonstrating the flexibility of the optimization technique and the decidedly un-scary code required to get our prototype up-and-running. I compare the results of our hand coded algorithm to those generated by Scikit-learn (and the closed-form normal equation, for linear regression) and show equality.

The focus of this talk is on the practicum of implementation one’s own GD algorithm, though I review the most important mathematical and theoretical components of GD to ground the practicum for attendees. Mathematical review touches on the nature of gradients, what they are, how they relate to derivates, and how they enable iterative optimization over a parameter space. This talk does not include a formal derivation of, eg, the loss functions used in linear and logistic regression, nor does it require mastery of calculus. Attendees will leave the talk with a better understanding of iterative optimization and a template of their own for implementing GD in Python, should they feel this would enrich their understanding.

I also tweet a 257 character working gradient descent method at the end of the talk, to underscore that gradient descent isn’t scary! (and also for fun!)