With this post we commence our reading of Cathy O’Neil’s Weapons of Math Destruction. (If you’d like to catch up with the reading schedule, click here.)
Here I’ll summarize this week’s chapters, then offer some discussion questions.
But first, some book club business. It’s great to see a bunch of people have expressed a desire to read along, like this nice person on Twitter and on their blog:
If you’d like further resources about this book, EconTalk has a fine interview with O’Neil. (thanks to Bob Calder). The excellent librarian (and crime novel scholar) Barbara Fister published a fine review at Inside Higher Ed. Chris Newfield (a Future Trends Forum guest) co-authored a review article including WMD.
Here O’Neil introduces herself and the book’s major themes. For autobiography, the author describes her childhood love of numbers, her academic career leading to a tenure-track position, a big career jump to work for a Wall Street hedge fund, working for an ecommerce startup, blogging, and joining Occupy Wall Street.
For the book’s major themes, they concern “the dark side of Big Data.” (13) A story introduces these, one concerning Washington D.C. schoolteachers fired based on an algorithm’s findings during Michelle Rhee‘s tenure as chancellor. O’Neil uses this as a cautionary tale about how bad data analytics – the titular Weapons of Math Destruction – can backfire and cause human suffering.
- An algorithm entering a feedback loop whereby its results are trusted because the software confirms it; (7)
- The WMD “punishes the poor” – the wealthiest people tend to receive personal attention;
- The algorithm cannot be explored publicly, given secrecy, as “the model itself is a black box”;(8)
- It is more difficult to push back on an algorithm than it is to be condemned by one (10).
The proliferation of WMDs are a major ethical threat to data scientists, as the latter “all too often lose sight of the folks on the receiving end of the [data analytical] transaction.” (12)
Chapter 1, “Bomb Parts: What is a Model?”
This chapter explores data models (defined as “nothing more than an abstract representation of some process”, 18; also “opinions embedded in mathematics”, 21) using three examples. First, Moneyball (Michael Lewis, 2003), a data analytical approach which “represents a healthy case study” of applied data science (15-19). Second, the author’s mental model for preparing meals for her family, which is workable but not scaleable (19-22). Third, the story of predictive sentencing software, which embodies racism (23-30). All rely on math for its objectivity.
The chapter uses these cases to reiterate the introduction’s criteria for determining a WMD’s quality:
- Transparency. The Moneyball model is based on publicly accessible data, while the prison sentencing data is hard to get and the analytics closely guarded.
- Statistical rigor. There has to be a big enough and relevant data set.
- A learning curve. New data gets fed into the system, which adjusts itself accordingly.
- Damage. How many people are hurt by the system?
What does the model of algorithms presented so far tell us about social media?
Have you had experiences with big data that O’Neil’s account illuminates?
What would it take for an education algorithm to meet all of O’Neil’s criteria for not doing damage?
What are the best ways to address the problem of “false positives”, of exceptionally bad results, of anomalies?