Skip to main content

Exercise: Chart Selection

Lesson 24 from: Data Storytelling: Deliver Insights via Compelling Stories

Bill Shander

buy this class

$00

$00
Sale Ends Soon!

starting under

$13/month*

Unlock this classplus 2200+ more >

Lesson Info

24. Exercise: Chart Selection

Lesson Info

Exercise: Chart Selection

it's time for another exercise. But before we begin, let me just say a little bit about charts. There are all kinds of charts from basic charts we see and use all the time like bars, lines and pies to slightly more complex charts like scatter plots, stacked bars and cumulative area charts all the way to much more advanced charts like violin plots, box plots and more. I'm not here to teach you about different chart types today, but it's important that you learn more about those. Also for now there are good resources and tools to help you see different chart types with descriptions of what they're for and even filters to help you figure out what the different chart types do best one great example of this is data Viz project, which you should definitely check out. Okay, now in this exercise I really want you to think about what chart is the best one to communicate the most important information in the data you're sharing. This isn't a quiz exactly, but the format is sort of quiz. Like I'm...

going to show you charts side by side and I'll explain the data a bit and then explain what I want to emphasize for my audience. Then I just want you to decide for yourself which chart does a better job communicating the point to be made. Got it. Okay, let's do the first one. We'll do a couple of these and we're going to use the same data for both. I'm using the flying etiquette dataset mentioned earlier in the course, it's on github. So I downloaded it and took a look and created the following charts here we have three charts looking at the height of survey respondents and their opinions about how rude it is when someone reclines their airplane seat, you can imagine my hypothesis that there is a correlation between the height of the passenger and their likelihood to think it's rude when someone reclines their seat, which of these charts does a better job communicating that correlation or lack thereof, pause the video and think about it for a second, then start me up again when you have a guest in mind. Okay, what do you think? I would argue that while you can see what correlation there is between height and reclining seat etiquette opinions in the 1st and 3rd charts The 100% stacked bar the third one is probably a better choice in this case. This actually flies in the face of an important research finding that says scatter plots do the best job showing correlation. And the top chart here is pretty much a scatter plot with only three variables in the y axis, which is why all the dots are in perfect rows. But here's why that chart doesn't really show correlation very well as it turns out, what we're really seeing here is that we don't have a lot of people at the high and low end of the height spectrum. In our dataset, Which is also evident in the 2nd chart, which is a standard stacked bar chart. The dots are all small for the tallest people in all the categories because there are so few of them. So it's hard to tell if there is a correlation. Now the 100% stacked bars, The bottom ones do show the percentage of people of each height who feel one way or the other and clearly about half of people over six ft five. Think it's rude to recline an airplane seat. Unfortunately if you look at the second chart, you can see that we have very few people who are that tall. So while the third chart does a better job showing the correlation, the second one reveals a major sample size problem and in fact the correlation is even fuzzier because while the tallest two groups certainly do have a bigger issue with reclining than almost everyone else, you can see that some of the shortest people are also at least mildly irritated by reclining seats. So I'd say you'd be better off showing the bubble chart and not committing to the hypothesis about height and reclining seats rudeness originally proposed. So people generally don't seem that annoyed by reclining seats overall though a decent chunk do find it at least somewhat rude. Let's do another one. What if I just wanted to see which responded profile attribute was most closely tied to crankiness in other words, which group is most likely to think other people are being rude. First let me explain what I've done with the data here, I've literally just converted all of the rudeness scores for a variety of questions into zeros ones or twos zero for when they answered no, it's not rude at all. Ones are for when they said yes, it's somewhat rude and twos are for yes, very rude. Then I created an overall crankiness score, which you can see highlighted here, which is just the sum of each person's answers. So a high number means that person is pretty cranky. So then I charted those average crankiness scores against gender, age and height, which of these three approaches does the best job of telling us which group is the crankiest pause me and think about it for a bit. This one was kind of a trick question. I don't know that. You can say that one is better than the other necessarily. It really depends on a more nuanced question. If I was asking something like our men crankier than high school graduates, you would need to make sure the scales of all of the charts are the same. So the blue charts would not be a good choice not to mention that the scales on those blue charts don't start at zero. Look at how drastic the difference appears between men and women. But the numbers really aren't that far apart. Given those two problems, we can eliminate the blue charts as the worst of the three. But what about the other 2? Which is better? It depends on the question. If you wanted to compare all the groups to really say which group is the crankiest, then the green set is best because they all share the same scale. So you can clearly see that the taller people are the crankiest. However, if you wanted to look individually at gender or age or height, one group at a time, then the purple set is better because you can more easily distinguish the differences within each group. Alright, one Last Question. What's the most surprising finding in this data set based on this second set of visuals that I've created? Take a look. I think everyone can come up with a different answer here. But my quick answer is that I'm surprised to see how cranky the youngest age groups are. So I'd be curious to dig a little bit deeper into that data and try to analyze it a bit more. Which is why, as you've seen throughout the course, I've used that as my sort of solution option to some of the other exercises that we've been doing

RELATED ARTICLES

RELATED ARTICLES