Descriptive Statistics and Deceptive Description

Teachers are getting a bad rap these days. To put it in perspective, my own mother — to whom I am and shall always remain eternally grateful for —  expressed her annoyance at the fact that Ontario teachers were, yet again, going on strike. (I will also add here that she is also very supportive of the fact that her own daughter chose teaching as a profession). Like others, she felt that the strikes are an unnecessary waste of time, making teachers appear selfish and lazy. When asked what information she had to support her claim, the figure “100K” came up in conversation. WHATHow much are teachers making a year?

$100 000. One hundred THOUSAND Canadian dollars. The supposed “average” salary Ontario teachers make a year.

Reported source? “The government.”

Had it not been for the fact that a) my mother has a tendency to exaggerate the truth and b) I am a teacher myself, I may have been inclined to side with her claim. To add a bit of context: I have been teaching for five years internationally and making nowhere near that figure. For me to be earning 100K a year, I would need to have my masters degree and an additional fifteen years of full time experience. 

Let’s Talk About Averages

“Average” is a misleading term; it can refer to the mean, median or mode. In statistics, we call these “measures of central tendency.” Let me borrow an example from Wheelan’s book (Naked Statistics) to make a point.

Suppose five people are at a bar, each earning a salary of $35k a year. Undisputedly, the average salary (by all counts) of the group would be $35k. Typically, when we hear the word average, we equate it with the mean, which is the sum of all the points in a data set, and divided by the total number of values within the set. 

Suppose Bill Gates walks into the bar,  with a salary of $1 billion a year, bringing the average (mean) salary to $160 million. The reported figure, while still accurate, is not a fair representation of the average earnings of the majority of individuals in the group. 

In this case, the knowing the median (middle value when all values are arranged from smallest to greatest)  provides a bit of context. After all, the difference between 35 thousand and 160 million is no small sum. 
This is a classic example of how precision can mask accuracy. Think about any time you’ve heard a number or figure reported in the news, consider the following statements, for instance: 

Statement 1:  “99% of statistics are made up” (Ha!)

Statement 2: “I have here in my hand a list of 205 — a list of names that were made known to the Secretary of State as being members of the Communist Party and who nevertheless are still working and shaping policy in the State Department” – Joseph McCarthy, a US previous senator (1950) 

Don’t these it seem to bring credibility to whatever claim the person or organization is trying to assert? The first statement is, of course, made up. As for the second statement, it turns out that the paper had no names on it at all. Statistics is a tool that helps us bring meaning to data, but can be abused for nefarious purposes if wielded irresponsibly.

We should be cautious

Picture

While math may be infallible, we are not. No matter how convincing the data may be, there is always more than one way to interpret it. It’s a little like telling your friends and family that the guy you just met “has a great personality,” which almost always implies that there is some other flaw or red flag that has not been said (Wheelan 37). 

So, back to the this 100k salary I’m supposed to be making… How did they get this data? What are the demographics of the teachers being surveyed? (It makes a difference if the majority of teachers who have been working full time in Ontario have at least 15 years of experience under their belt). Are they including retired teachers? Teachers who have recently been laid off?

I tried to trace the origins of where this figure of 100k came from. After a bit of digging, I think its likely that my mother mis-reported the figure she heard from sources that gave out misinformation.  


​[NOTE FROM THE AUTHOR: I purchased Naked Statistics by Charles Wheelan many years ago, thinking its an important book to add to any Math Teacher’s arsenal (and it is!) but had only gotten through the first three chapters before dismissing it for another read. It is not a boring book – quite the opposite in fact – but I felt that mere passive reading was not enough for me to really retain the important ideas and intuition that Wheelan is trying to impart to his readers. This time, I’m giving it another chance and plan to summarize material I am learning, relate it to my own experiences, and share that learning here on my blog.]  

Ch 1. What’s the Point?

What is the point? The point is to do math, or to dazzle friends and colleagues with advanced statistical techniques. The point is to learn things that inform our lives.

– Charles Wheelan

[PREFACE: I purchased Naked Statistics by Charles Wheelan many years ago, thinking its an important book to add to any Math Teacher’s arsenal (and it is!) but had only gotten through the first three chapters before dismissing it for another read. It is not a boring book – quite the opposite in fact – but I felt that mere passive reading was not enough for me to really retain the important ideas and intuition that Wheelan is trying to impart to his readers. This time, I’m giving it another chance and plan to summarize material I am learning, relate it to my own experiences, and share that learning here on my blog.]  

I wrote about why statistics matters in a previous post. Here, I continue to elaborate on the point as I summarize my biggest takeaways from the first chapter. This chapter provided an overview of big ideas in statistics that we’ll be learning about throughout the book. 

Description and Comparison 
Descriptive statistics is like creating a zip file, it takes a large amount of information and compresses it into a single figure. This figure can be informative and yet completely striped of any nuance. Like any statistical tool, one must be careful of how and when we employ such figures and the implications it might have on the audience. 

So a descriptive statistic is a summary statistic. Let’s start with one that many of you may already be familiar with – GPA. Let’s say a student graduates from university with a GPA of 3.9. What can we make of this? Well, we might be able to discern that on a scale from 0 – 4.0 a GPA of 3.9 is pretty darn high. But some universities grade on a scale of 0 – 4.3, accounting for a grade of A+. What this simple statistic doesn’t tell us is what program did the student graduate from? Which school did they attend? Did they take courses that were relatively easy or difficult? How does this grade compare with others in the same program? Wheelan writes, “Descriptive statistics exist to simplify, which implies some loss of nuance or detail (6).”

Inference 
We can use statistics to draw conclusions about the “unknown world”  from the “known world.” More on that later. 

Assessing Risk and Other Probability Related Events
Examples here include using probability to predict stock market changes, car crashes or house fires (think insurance companies), or catch cheating in standardized tests. 

Identifying Important Relationships
Wheelan describes the work of identifying important relationships as “Statistical Detective Work” which is as much an art as it is a science. That is, two statisticians may look at the same data set and draw different conclusions from it. Let’s say you were asked to determine whether or not smoking causes cancer? How would you do it? Ethically speaking, running controlled experiments on people may prove unfeasible, for obvious reasons. 

An example Wheelan offers here goes something like this:
Let’s say you decide to take a few shortcuts and rather than expending time and energy into looking for a random sample, you survey the people at your 20th high school reunion and look at cancer rates of those who have smoked since graduation. The problem is that there may be other factors distinguishing smokers and nonsmokers other than smoking behaviour. For example, smokers may tend to have other habits like drinking or eating poorly that affect their health. Smokers who are ill from cancer are less likely to show up at high school reunions. Thus, the conclusions you draw from such a data set may not be adequate to properly answer your question. 

In short, statistics offers a way to bring meaning to raw data (or information). More specifically, it can also help with the following:

  • To summarize huge quantities of data
  • To make better decisions
  • To recognize patterns that can refine how we do everything from selling diapers to catching criminals
  • To catch cheaters and prosecute criminals 
  • To evaluate the effectiveness of policies, programs, drugs, medical procedures, and other innovations
  • To spot the scoundrels who use these very same powerful tools for nefarious ends 

(Wheelan 14)

Lies, damned lies, and statistics.

 – Mark Twain

Why We Should Care About Statistics

It’s easy to lie with statistics, but it’s hard to tell the truth without them.

Picture

​[PREFACE: I purchased Naked Statistics by Charles Wheelan many years ago, thinking its an important book to add to any Math Teacher’s arsenal (and it is!) but had only gotten through the first three chapters before dismissing it for another read. It is not a boring book – quite the opposite in fact – but I felt that mere passive reading was not enough for me to really retain the important ideas and intuition that Wheelan is trying to impart to his readers. This time, I’m giving it another chance and plan to summarize material I am learning, relate it to my own experiences, and share that learning here on my blog.]  

A couple of days ago, my younger brother, who just started his first year in university in the Fall, was complaining to me about the woes of student life; in particular, the obsession with grades and the paradoxical lack of willpower to work for them. Having taken an accounting class together, his friend recounted to him that it was, “The sketchiest 90 I ever received.” Let’s break that down for a moment. Humble brag? Yes, but what he really meant was that his friend was blindly memorizing formulas, plugging and chugging without any idea how they were derived and why they are meaningful. 

Does that sound familiar? How many of you have had similar experiences in math class? I know I have. Not just math, but in science, language arts, history… sometimes it feels like we are just memorizing facts in isolation without an understanding of their greater purpose. To be fair, I’ve taken statistics classes that feel no different, a series of formulas that need to be applied to raw data. What makes statistics inherently different, however, is that unlike calculus or algebra courses, which often teach skills in isolation of their applications (to which I will argue there is intrinsic value in knowing and learning, another post perhaps) statistics IS applied mathematics. Every formula, number, distribution test…etc. is meant to clarify and add meaning to everyday phenomena (though, when wielded improperly, can have the opposite effect).

Statistics are everywhere – from which are the most influential YouTubers, to presidential polling to free throw percentages. What I love about this book is that it focuses on building intuition and making statistics accessible to the everyday reader. A quote by Andrejs Dunkels shared by the author, “It’s easy to lie with statistics, but it’s hard to tell the truth without them.”