The Sum of Nothing1:00pm - 1:30pm on Saturday, October 6 in Madison
- Audience Level:
1 + 1 = 2. How about 1 + NaN? Or NaN + NaN? Or NaN x NaN?
The answers, when evaluated in pandas, have changed over time. I’ll take you on journey through the fun-filled history of pandas’ development, the wacky world of math, and how the two work together (or don’t) when it comes to null values.
The release of pandas version 0.22.0 in December 2017 introduced several major changes. As someone who works with missing data quite a lot, I was particularly confused and somewhat dismayed by its “new” treatment of NaNs (“null values”). Specifically:
- the sum of a series of NaNs was now 0
- the product of a series of NaNs was now 1
In the previous version, these values were NaN, which I thought was the “right” way to do things. After all, how can the sum (or product) of nothing turn into something? I went on a journey (or maybe the proper term is “rabbit hole” to explore this question, going through historical GitHub issues logs, pandas-dev mailing list messages, even contacting a core pandas developer and looking up how other programming languages like R handled the same issue.
I learned that really, it all just comes down to math.
In this talk, I’ll make the case that while the current behavior is mathematically consistent, it is often counterintuitive. Because who says math is supposed to make sense at first glance?