The Sum of Nothing

1:00pm - 1:30pm on Saturday, October 6 in Madison

Christine Zhang

Audience Level:
All
Slides:
https://github.com/underthecurve/sum-of-nothing
Watch:
https://youtu.be/35cdJR4hjxE

Overview

1 + 1 = 2. How about 1 + NaN? Or NaN + NaN? Or NaN x NaN?

The answers, when evaluated in pandas, have changed over time. I’ll take you on journey through the fun-filled history of pandas’ development, the wacky world of math, and how the two work together (or don’t) when it comes to null values.

Description

The release of pandas version 0.22.0 in December 2017 introduced several major changes. As someone who works with missing data quite a lot, I was particularly confused and somewhat dismayed by its “new” treatment of NaNs (“null values”). Specifically:

In the previous version, these values were NaN, which I thought was the “right” way to do things. After all, how can the sum (or product) of nothing turn into something? I went on a journey (or maybe the proper term is “rabbit hole” to explore this question, going through historical GitHub issues logs, pandas-dev mailing list messages, even contacting a core pandas developer and looking up how other programming languages like R handled the same issue.

I learned that really, it all just comes down to math.

In this talk, I’ll make the case that while the current behavior is mathematically consistent, it is often counterintuitive. Because who says math is supposed to make sense at first glance?

Want to edit this page?