I was working on my presentation at JMP one month later (and answering 5 interesting questions for them) and this tweet felt like a great starting point because, as I said to Lindsey, “becoming a data viz thinker” is not a common formulation. I ended up structuring my presentation around 12 ideas that could be relevant for this goal.
The presentation was yesterday, 26 October, and it was recorded, so I’ll add a link here as soon as it becomes available. Meanwhile, let me summarize those 12 ideas, many of them can be found in my book, but not all:
I couldn’t care less about data visualization. Starting with a bang but I really mean it: not everything needs to be visualized. Often there are other methods of data exploration and communication and they complement each other. That’s why in the Anscombe Quartet you need both the charts and the statistical metrics. If you have to make a chart, make it count. Don’t replace information overload with chart overload.
Data matters. The expression “data visualization” was carefully designed to make you think that (counting the letters) you’ll spend more than 70% of your time designing cool “visualizations”, while in reality the opposite is true: you’ll spend most of your time minimizing errors, structuring the data, making sure the concepts are the right ones, and much more. Often, managers or clients fail to understand the resource-intensive nature of the task. They think it magically happens.
Perception and society matter. Being aware of internal mechanisms (the eye-brain system) and external mechanisms (social rules, corporate culture, peer pressure, audience profile) should impact how we communicate visually.
Data mapping and design. Creating new chart types is easy because we basically map data points to a 2D plane and after that everything is design. Thinking at that level of abstraction is interesting not only because your communication can become more flexible but also helps when moving between tools.
Data is interpretation. From the moment you collect the data to the moment you read someone else’s chart interpretation is always present. Torture the data to come up with multiple interpretations and points of view. Even Minard’s Napoleon March, in spite of all variables, is an interpretation (that the Russians will probably disagree with). What makes a good chart is how good it is at saying what what it wants to say. Among other things, this means that it should be a good data pre-processing system that allows the brain to focus on higher level tasks. But data visualization is not enough: you have to have the contextual knowledge to detect and interpret patterns.
Data visualization is a process. Not a linear one. Be aware of the questions you ask. They often reveal not only what you want to know but also what you actually know. Better questions mean better understanding. It’s interesting to have a classification of questions and see how they can be paired to chart types (better: chart designs). A pie chart with 50 slices is not necessarily bad: usually a visualization fails not because there are too many data points but because the author doesn’t understand the data or doesn’t care about the message.
Rules of engagement. Attracting people’s attention with decoration is lazy. There are other effective methods that should be considered first (the data itself, chart titles, avoiding defaults, self-interest…)
Aesthetics and emotions. Stephen Few and David McCandless. Nuff said.
Emotional tone. Define a subdued emotional framework for multiple charts, never The Crying Boy style. Match tone and data (fun with the Titanic data set?). Be aware of the addiction to sugary data visualization.
Complex simplicity. Simplicity is not minimalism or removing junk. Remove the irrelevant, minimize the accessory, adjust the necessary and add the useful.
Using color. Avoid cliches like the plague and color to prettify. Think of it as stimuli that should be managed (intensity, function, symbolic meaning). The aesthetic dimension of color is an afterthought for non-designers. Use a professionally designed color palette and never the default one.
Go beyond the single graph. Structured, matrix style visualizations: small multiples, trellis displays. Animation as stacked small multiples. For free-form visualizations (dashboards, infographics) find a coherent narrative or visual landscape. Use Ben Schnidermans’ Visual Information-Seeking Mantra. For the overview, use gateway charts (simple, perhaps playful charts like pies or gauges that can lead to more addictive and complex charts). Never use gateway charts by themselves. When exploring, often focus + context is often better than filtering.
So, this is a summary of my presentation in 26 October at SAS/JMP in London. I did have a great time there and people were very nice. I had no previous contact with JMP and the people behind it, except Xan Gregg, with whom I talk from time to time on Twitter.
Full disclosure: I was payed for this presentation. At no time I was asked to talk about the product and I have no financial motivation to do so. I will probably write about it in the future, just like I talk about Excel, Tableau or PowerBI. If there is any change I’ll disclose it as well.
My presentation at NTTS 2017 is titled An evaluation of data visualization practices of statistical institutes. I’m writing this post to share a few ideas with people not familiar with my work. Some of these ideas require context, and I will not be able to provide it within the 15-minute allocation time. I’ll update the post if there are questions I’m unable to answer during the session.
When we have a small table, it’s OK to use hard numbers to communicate, and perhaps we can use a chart or two to illustrate them. When our table grows, we have to shift our analysis and communication from the individual data points to their relationships. For that to happen, charts need to move to center stage, and their design must change, along with their nature. Tables and charts switch roles: now we communicate with charts, and use a few hard numbers to illustrate.
The problem with most charts in publications from the Eurostat and from national statistical institutes is that, at their heart, they remain illustrations. Because more data was added, and their nature, purpose and design didn’t change much, they became both less effective and less efficient. I present several examples of this chart-as-illustration perspective and possible alternatives.
Effectiveness and efficiency
How good a chart is at making invisible relationships visible defines its effectiveness (more on that later). How well it manages finite resources (page/screen real estate, color constraints) defines its efficiency. While we discuss effectiveness all the time, efficiency is often overlooked (because most of the time we have enough space to display a single chart?). But now we have to take small screens into account, and designing “graphic landscapes” (infographics, dashboards) requires a better management of the available resources. Effectiveness and efficiency are closely connected, and in many cases when you improve one you’ll notice a positive impact on the other.
Making a chart easy to read, making it relevant to me or displaying unexpected patterns: grabbing audience’s attention doesn’t have to always be about aesthetics. Problem is, adding makeup (by way of canned visual effects) is a simpler path. Vendors take advantage of that to add a few bells and whistles to “make your chart look professional and memorable”, code for “silly effects not found in Excel or PowerPoint”. It’s possible that they do grab your attention once, or even twice, but a third time will put you off for good.
If you can’t use canned effects, if most defaults are ugly or ineffective, and if a statistician is not required to possess artistic talent or graphic design skills, how do you make charts that are both effective and pleasing to the eye?
I too had to find a way to create more pleasing charts without this apparently basic talent/skill (you can’t imagine how painful is for me to draw a recognizable sticky figure). Much of my book is devoted to this.
… for mere mortals
Here is what works for me. All design choices when making a chart have an aesthetic and a functional dimension (form and function). Understanding and managing the functional dimension is much easier than the aesthetic dimension: if you want to emphasize a series in a line chart you can use a saturated color, then use pale colors to encode the remaining series and gray for axis and grid lines. You can read this as “managing stimuli intensity”, no aesthetics involved. Functional choices impact the aesthetic result (and the other way around), but my own experience tells me that, when I put aesthetics first, the end result will be ugly.
When you put function first you can play with ideas and concepts without feeling you are losing control to vague and contradictory sensations of beauty and aesthetics. The chart below is an example of a hobby of mine: trying to salvage apparently hopeless chart types, like the gauge / speedometer. It displays three pointers instead of one, and each pointer is actually a time series. The jury is still out on this chart, but it could be used in very specific cases. Except for the chart type itself, all design choices can be justified rationally.
Left brain, right brain. Really?
Later this month I’ll be in Pamplona, Spain, for the Malofiej, infographic summit and awards. On the surface, NTTS and Malofiej can hardly be more distant from each other. Most people at NTTS come from statistical institutes or similar organizations, while at Malofiej most people are graphic designers, artists, journalists. Kind of left brain vs. right brain.
I know several people attending both conferences, so maybe this is not about brain hemispheres. Maybe at a not-so-fundamental level they are more similar than expected. We can easily see this in a recent article, where Stephen Few proposes seven criteria to evaluate a data visualization effectiveness profile, grouped into two categories, informative (usefulness, completeness, perceptibility, truthfulness, intuitiveness) and emotive (aesthetics, engagement).
These criteria can be applied to a beautiful infographic or to a terribly distorted 3D pie chart. Both are instances of visual communication, and their effectiveness profile can be compared. That said, some criteria are valued differently from field to field, aesthetics being the obvious example. A graphic designer is supposed to be able to create a visualization that is pleasing to the eye and, in some cases, unique. At a statistical office these skills are not required or expected.
If you use data visualization to communicate, you should keep experimenting the effectiveness of your visualizations, and that applies to everyone.
Color is a difficult subject for everyone, with or without the right skills. Apparently, when left unattended, people tend to cram as many saturated colors into a chart as possible. I would need to take a closer look, but my feeling is that national publications where no color constraints seem to be in place have more color issues than the ones following the Eurostat guidelines or similar.
The real issue in both cases (with or without guidelines) is that color is not used effectively from a data visualization point of view. Again, if you identify the functional tasks of color, using it becomes much easier (or less difficult). I identify six tasks: categorize (using colors/hues), group (using colors and tints), emphasize (using color and saturation), sequence (using tints), diverge (colors and tints) and alert (color). You also need to manage gray.
If you try to use color effectively, you’ll probably discover two interesting things: first, we often use color more often (and more colors) than we need; second, if you remove color you’ll have to change other design options that will probably improve your chart.
There is no shortage of data visualization tools, from the so-called self-service BI tools (PowerBI, Qlik, Tableau) to a vast array of programming languages and libraries (R, Python, D3). And then you have Excel.
Of all the charts published by the Eurostat and the national statistical offices, I’m not aware of a single one that couldn’t be made in Excel, and then some. There are several reasons why Excel is the right tool to make charts for these publications, and also a tool to experiment and go beyond its poor chart library. Excel charts don’t have to look the same, here is one that looks a bit different:
If you think Excel has no place in a conference titled New Techniques and Technologies for Statistics here is a quick reply before you fall into your fake Excel-induced narcoleptic state: you’re wrong. You can use Excel to support new data visualization practices and explore new ways of doing so. And don’t worry, I believe there should be a place for you to explore new tools and cool data visualization gadgets.
You can download the presentation here and the extended abstract here. [I updated the presentation and exported it to PDF. You can find it here.]
Comments, suggestions? Leave them below. Don’t forget to follow me on Twitter (@camoesjo) and the NTTS hashtag (#NTTS2017)
So, was this much ado about nothing? I don’t think so.
Putting a few percentages near all things circular is a dangerous thing to do on social media. It invites scrutiny by the pie-charts-are-bad mob, so do it at your own peril. The fine line between humor and trollish behavior is often blurred. If it is round like a pie chart, percentages are displayed as in a pie chart, has slices as a pie chart, then it can only be a pie chart. So let’s have some fun!
People tend to follow the path of least effort. That means using defaults and following the rules. There is nothing wrong with defaults and rules, but some are stupid, and some need context to be understood. I’m often worried that “no pies” is the only part of the message heard beyond the choir.
Now, I don’t have the time to check all the replays to the original tweet, but I couldn’t find a mention to the elephant in the room. If you look closely, there is a note below that says, among other things, “Other items not depicted include: onions (62%), chicken (56%)….”. So, because @yougov couldn’t find a stock photo depicting onions and chicken these very large items where relegated to a footnote.
This is the real issue: molding the data to fit your clever (?) design, and it happens too often. This pizza-not-pie incident is just silly and light. But think about it a bit: replace “molding the data” with “alternative facts” and “design” with “ideology” and things suddenly become very serious. There are some red lines, don’t let l’air du temps blur them too.
This post on how to make Excel dashboards was one of the first I published here, and it still is one of the most popular posts. But I actually prefer broader data visualization discussions than Excel tips & tricks. Excel dashboards, although something that interests me, was never a priority. Until now.
I just launched an ebook on Effective Excel Dashboard design (warning: in Portuguese), so you can assume my return to dashboards are commercially motivated. And yes, if an ebook can help pay the bills I will be more than happy, because I will be able to keep thinking and writing about these things. But I’m a terrible salesman, and the true reason lies somewhere else and it is much deeper.
When I wrote my data visualization book I wanted it to be a reference for businesses, statistical offices and other organizations that share a common perspective when it comes to datavis. I wanted it to be justifiable, rational, functional and with a bit of fun. Perhaps 80% Stephen Few and 20% David Mccandless, if you want a formula. Obviously, some people think those 20% of fun are unacceptable, and that really saddens me.
Anyway, most of the book was about making individual charts. But I believe there are two key moments when you want to improve your visual literacy: first, when you understand that a chart is a communication tool, not something to illustrate a few numbers (with Excel defaults); second, when you go beyond the individual chart; the second one happens when you start thinking beyond the chart or a succession of slides in a PowerPoint presentation. You need a more complex and and consistent message that only multiple charts and other visual objects can provide. Writing about this is the next natural step.
Designing a dashboard in Excel is a great learning experience. Seriously. And it goes much beyond Excel. So, here is my list of reasons why you should design a dashboard in Excel:
Excel is an universal learning tool
Excel is everywhere, and most people are familiar with its metaphor. So, unless you prefer pen & paper (I do) Excel is kind of the least common denominator to do something with a tool in a training session.
Improved Excel skills
If you do want to improve your Excel skills, designing a dashboard in Excel is a perfect project. You have yo make several charts and technique work together, you have to manage the data, the user interface, manage screen real estate (which means finding chart type with a smaller visual footprint) etc.
You can use Excel to learn how to make better charts
I have no idea how many charts are made in Excel every single day. I think alot is a good estimate. The world would be richer it Microsoft implemented better defaults, but it always sided with us, data visualization enthusiasts, by providing an endless stream of ugly and effective defaults for our before/after exercises.
Exploring new chart types
You can guess from my ExcelCharts Shop that I like to have fun trying to find new ways of communicating with charts. I even try to salvage notoriously bad charts (like speedometers/gauges). Excel allows you to do that, unlike other tools that force you to take a more structured approach and are less flexible when it comes to chart formatting and design.
Excel is our Illustrator
There are almost no constraints when it comes to adding objects to a worksheet. You can let your imagination run wild for the basic dashboard layout. If you are not sure about it, just add a new sheet and start again.
A dashboard is a process of self discovery
You can see a dataset from multiple perspectives, and the more charts and the more datasets we add the more personal that perspectives become. Two people using the same datasets are unlikely to come up with very similar dashboards: they will have different data visualization styles, their interpretation will differ because of different priorities. When you design a dashboard with your own data, the data you use daily, you’ll have a better understanding of the data itself but also of how you see it. When I wrote my book, that was one of my goals: to understand what data visualization means to me, something that I couldn’t do with a blog post. A chart versus a dashboard is a somewhat similar process: you dig deeper and in some cases you’ll be surprised
A dashboard helps you find a line of thought for your communication
Storytelling is everywhere. All brands have a story to share with you. I think narrative is a much better word when used in data visualization. Just like graphical landscape is a more generic term for dashboard. They all share the need to create a consistent communication using a set of visual and non-visual objects.
Prototyping dashboards in Excel
It’s easy to disdain Excel as a proper dashboard tool (BI vendors do it all the time). But you need to separate Excel as a designing tool from Excel as a production tool. There are 1001 cases where Excel should never be considered as a serious production tool. Of those cases, Excel can play a relevant role in 999 of them as a design tool. Think about it. You know your data better than anyone else. When you design a dashboard you know what is relevant to you and you have an idea of how you want it to be shared, or monitored. This is a far better starting point than endless requirement meetings where a consultant without subject-matter expertise tries to understand what you are saying you need.
Designing a dashboard for benchmarking and tool evaluation
If I know what I need and how I need it, and can translate it into a dashboard, it’s easier for me to understand what each vendor has to offer, and how close they can get to my original design. Note that other tools can offer better alternatives, so prepare to be surprised.
A functional dashboard provides better feedback
A look & feel closer to the real thing will help users to understand what they should expect and provide better and more detailed feedback. If possible, try to get real data and spend a few hours/ days working on the dashboard to make it as real as possible.
I’m I forgetting something? Do you disagree? Add your comments below.
I have a soft spot for bad and seemingly hopeless charts and graphs. All they deserve a second chance, so I try to help them out of the gutter. The other day, I was roaming the dirty alleys of El Dorado Hills (don’t ask) when I saw it. The speedometer. Few charts are strong enough to endure so much rejection, and the speedometer, with its single leg, is weaker than most. I was going to help it, show it how Marketing people love it. But I had to lecture it about its drinking problem first. I helped it stand up and, lo-and-behold, it starts morphing right before me.
This is an adult fairy tale. This means that the speedometer looks better now, but it is not a beauty queen by any stretch of imagination. And it returns to its useless nature when sober. So, don’t let anyone see you with it unless it already had a few drinks.
The Drunken Speedometer uses the needle to display a time series, gets rid of the red-yellow-green color scheme and displays an alert. If you can’t get your boss/clients to come to their senses and remove the speedometers from the dashboards, at least try to make them a little richer.
I’d say the time series adds one star to a speedometer (1/5 to 2/5). Do you think the speedometer can/should be saved? Let me know in the comments below (if you still do that kind of thing).
I wanted do have something special for my students. And I don’t like wasting training time explaining how to make a chart. So I decided to record several dozen videos showing how to make charts in Excel and called them The 3 min datavis. What is so special about it?
Most of the videos will be no more than three-minute long.
Vertical video for comfortable viewing on a smartphone.
Consistent look & feel
I’ll share a list of available videos soon.
(Yes, there are a few pie chart variations.)
You are a brilliant scientist and you just made an amazing discovery. You want to announce it to the world. So you prepare a few slides and decide to use that cute font, Comic Sans.
After the presentation you realize that, although people praise you for your discovery, a very vocal minority mocks you for using Comic Sans. After a quick search, you realize using this font was a mistake, blown out of proportion by a few idiots, focusing on stupid details because they don’t understand your scientific achievement. (I’m not implying this is an accurate description of CERN’s Higgs boson presentation.)
Over the last three days I was at a science conference in Lisbon, Portugal. There were several parallel sessions, so I can’t guarantee Comic Sans was not used. But it was there in spirit: bad pie charts, 3D bar charts, you name it.
I said where the conference took place to tempt you into believing these were third-rate scientists (they weren’t) in a peripheral country (OK, it is). My working hypothesis is that these scientists, the ones at CERN, and everyone else, do not significantly differ when it comes to communicating (visually) their science: they suck.
(Let me say that I’m generalizing here: not all scientists suck, and when they are good communicators it’s a pleasure to watch them, because they have amazing things to say. And let me point out that we can’t infer anything about the quality of their work just because they make bad slides.)
What makes them suck? Glad you ask. Well, apparently they have a “science mode” and the “comic mode” (from Comic Sans). When they switch to science mode, we get outputs directly from their work: no explanations, no annotations, often no title at all (or a very descriptive one). These charts are not there to be read, they are there to silently legitimize what the presenter is saying.
Scientist’s obscure mode
From time to time they switch to “comic mode”. They think people will understand better if they use a few colorful 3D charts. These charts come out of nowhere, they don’t really say much, descriptive titles again.
Scientist’s comic mode
They also have the “cram mode”, not as a third mode but as a background mode. Do they have a few interesting images? Cram them into a single slide and make them so small the audience can’t see them. A single image only? No worries: make it small and place it to the right of a long list of bullet points.
And tables. Too many tables.
Scientists, like everyone else, suffer from the curse of knowledge and fail to build bridges between their knowledge and audience’s. And often it is as simple as adding the right title or adding a note to explain how to read a visual. And they, like everyone else, often fail to recognize the rules they live by, when these rules are used in a different context.
No one wants to turn a scientist into a visualization expert or a graphic designer. When people notice that you are using Comic Sans or 3D pie charts it’s because you show you are not aware of basic practices, like basic table manners: don’t lick your fingers, use the knife and the fork. You don’t have to be an etiquette expert to know this, you just need to be aware these rules exist.
Now, I said we can’t infer anything about the quality of your research from your slides. That’s true. But please ask yourself if adding an animated clipart fish (true story) to your slide really improves your message.
So, what can you do? From a data visualization point of view, a few good books were published over the last ten years, some lite, some less so. You could read with my book, if you don’t mind the shameless plug. Nancy Duarte and Garr Raynolds are great references for presentations. I trust Jon Schwabish’s not-yet-released book Better Presentations: A Guide for Scholars, Researchers, and Wonks will also be an excellent reference.
You don’t really need to spend too much time to become aware of the basics and stop licking your fingers and prevent other from doing so. If possible, try to go beyond that, because it will reflect positively in your work and how people perceive it.
Here, take my napkin.
(Please note that I just use this etiquette analogy to make a point: basic visualization rules are simple, but not obvious. You’ll probably miss or will not be aware of them if you are not exposed to them on a regularly basis. Learn a few of them and then check your next graph’s compliance. Over time this things will become obvious, I promise.)
I’m writing this assuming that my book Data at Work was one of the targets of your post “Data Visualization Lite”. If that is the case, thank you for spending some of your time reading the book. When I started my humble blog, never in my wildest dreams I though that would happen.
And now you say I wrote a lite book. At first, I couldn’t disagree with you, you know? I don’t think I have the right talent to write a book of substance like, say, Bertin’s Semiologie Graphique, or even your Show me the Numbers. Then you said books like mine “introduce errors and provide bad advice”. According to you, they are basically polluting an otherwise clean and bright day. They only add noise and, as we all know, only signal matters. My heart sank.
First, let’s put the errors aside. If you mean factual errors, I know they exist, I actually have a page on the book’s companion site for each chapter where readers can point them out. If you are kind enough to send me a list of such errors I will certainly correct them in a future edition, just like you did with Show me the Numbers. We are humans, we make mistakes, we don’t like them. We’ll get to “bad advice” in a minute. Let’s first discuss your post and your comments.
Your lite post
I often misread and misrepresent the richness of your thought. You told me so several times (a few in the comment section of older posts). So, chances are I’ll do it again. Sorry, it’s not my intention, it’s my limitation. If, by any chance, I’m reading your post correctly, I can only conclude that core data visualization principles are well established, and one just needs to read your book Show me the Numbers to become familiar with them. You wrote them clearly and better than anyone else.
You accept that much remains to be done. People in this field should apply sound scientific methodologies to study certain details or areas you haven’t mapped already, or make themselves useful and apply your principles to a specific tool. The layperson in the office shouldn’t worry about different perspectives, at least for the time being. Your book is basically the only source this person needs. A large majority of the books that in some way overlap yours are filled with errors and will confuse the reader, and they should never be published because, among other reasons, they harm our productivity. If, by any chance, someone can come up with a few interesting insights that you happen to agree with, but you haven’t mentioned in your book, this person should write a short blog post about them, and refer to your book for the core concepts.
You have been an independent voice. Many of us admire you for your assertive positions against vendor marketing and data visualization fads. But I think we all miss the Stephen Few who wrote “the information visualization research community produces many innovations each year, which I’m always excited to discover in the research literature or during visits to research labs.” Perhaps data visualization is now a bare and sterile place. Perhaps you’ve changed.
Models and voices
If you truly believe that “Those books written since 2004 that aren’t filled with errors and poor guidance, with few exceptions, merely repeat what has been written previously.” I don’t think there is much to talk about. But you already confessed you derive some pleasure from these discussions, so I’ll assume you wanted to be provocative. You’ve succeeded.
Let me characterize the data visualization community as containing an ecosystem of models and voices. By “model” I simply mean a set of principles, ideas, design options and other objects put together in a consistent fashion. If you prefer 3D effects and garish colors and use them consistently, that’s your model. If you have a more minimalist approach, that’s your model. And your model has consequences in the way your audience takes advantage and reacts to your outputs.
There are several models, but three can be identified easily: yours, Microsoft’s and Tufte’s. We all know that Microsoft’s model, as implemented in Excel, sucks. Microsoft should have listen to you when it changed the graph engine for Excel 2007. Tufte brought minimalism to data visualization and tried to convince us that this aesthetics was not aesthetics at all, it was the only acceptable design in data visualization.
Obviously, minimalist aesthetics requires skills that the average office user lacks. As we’ll see below, much of your model accepts Tufte’s principles, but with a twist: everything that vaguely resembles non-rational must be removed, and aesthetics above all.
Let me exemplify with three little words: aesthetics, beauty and elegance. Unlike Tufte, who writes a chapter with “aesthetics” in the title, you never mention the a- word. You do mention “beauty”, as something artists create. According to you, design in data visualization serves uniquely to improve communication. I like the word “elegance”, and so do you. For quite some time, I thought you were using it in its usual meaning (simple but sophisticated aesthetics, or something similar). Then I read this in your book: “The word elegance comes originally from the Latin eligere, which means to choose out or to select carefully.” I thought this was wrong, because the word apparently comes from elegantia, but then I found this on the Online Etymology Dictionary for elegant:
late 15c., “tastefully ornate,” from Middle French élégant (15c.), from Latin elegantem (nominative elegans) “choice, fine, tasteful,” collateral form of present participle of eligere “select with care, choose.” Meaning “characterized by refined grace” is from 1520s. Latin elegans originally was a term of reproach, “dainty, fastidious;” the notion of “tastefully refined” emerged in classical Latin.
Surely you jest! While your note is not completely wrong (but “collateral form” doesn’t mean “comes from”), it’s hard not to see it as cherry-picking. And you do it all the time in your model: go to great lengths to make it bullet-proof rational, even if you have to sacrifice a lot along the way.
Voices are unavoidable. The nature of data visualization leads to the proliferation of multiples perspectives. Some of them cluster together, and some will overlap to a certain degree: for example, if people recognize the need to take visual perception into account, the overlap will always happen. I welcome the emergence of competing data visualization voices. I see it as a sign of strength, not weakness. They act as a model amplifier. They should be cherished, not banned.
Everyone in the visualization community should go beyond their social media presence and come up with their own voice. They must spell it out, and make it more than the sum of all blog posts. They have to start from scratch and try to make sense of all they think they know. Finding a consistent narrative will require hard work. Some pieces will not fit together, and they’ll find holes in unexpected places. The final voice will overlap many other voices, but it will not be less personal because of it. It can even become a full model. Whether this turns into a book or not is irrelevant, because that’s a different logic. At the minimum, each person should make available an explicit stylebook. I’m sure some people would gladly volunteer to curate critically these stylebooks.
If data visualization is “progressing at snail’s pace” perhaps one of the reasons is not that too many lite books are published, but too few. Short-term thinking is encouraged by social media, and some people will turn into talking heads for their employers. If a person doesn’t sit down and think about the ramifications of his/her ideas, discussing other people’s voices and models will be less fruitful. I never reviewed books on my blog (well, just one) because I was unsure about my own thinking. I had to discover it by writing the book. Some people suggested I should focus more on Excel. You can’t imagine how I hated that idea. That’s why I strongly disagree with you when you advise people to write articles instead of books. That’s a terrible advice. You say “We don’t need voices to reflect the spirit of our time; we need voices to challenge that spirit—voices of transformation. Demand depth.” Demand depth, but please stick to blog posts (or maybe tweets)? Humm…
Also, how do you ask for “voices of transformation” and, at the same time, you offer your book as a single, prêt-à-porter reference for everything data visualization? You should encourage diverging approaches, since they will certainly make yours stand out. A bit of noise is actually useful to recognize signal.
All models are wrong, some models are useful. This fits like a glove. No model can capture the richness of data visualization. You didn’t like when I called you positivist, a few years ago. I changed my mind about you and Tufte, but I still think you’re a positivist. You believe every single chart must be a virtuous cocktail of rational decisions that make it effective and thus paves the way to enlightenment. Everything else is verboten. Everything else opens a Pandora box. If you don’t have everything under control, entropy creeps in. Essentially, your data visualization model is a cooking robot. And, I have to admit, you often are right: I saw this recently, when the graphs in a publication were updated by someone without the necessary skills and no understanding of the rationale behind the original design. It’s sad, really.
When people think you are not flexible enough, I admire you more, not less. We need solid and consistent models that don’t change with the latest fad. The model must be changed as a whole, and I believe you would change if you saw reasons for it to be changed. I truly hope so.
Revisiting Show me the Numbers
Telling people that they shouldn’t explore because you already provide the best of all possible paths is defenseless. And if they are willing to blindly accept your advice, do you suggest they should follow the first or the second edition of your book? Because they are different, and there are errors in the first one that you corrected in the second one. You are human, and you don’t want us to be.
I only felt the need to reply to your post when I read this amazing answer to Alberto Cairo:
“… feel free to provide examples of content
[in books published during the last decade] that doesn’t appear in ‘Show Me the Numbers’ or that does appear but is presented in a way that extends our understanding of that content. I’ve observed that when these books depart from the content that exists in ‘Show Me the Numbers,’ they often introduce errors and provide bad advice.”
So, basically there is a canon that shouldn’t be challenged, and if people deviate from the righteous path they commit some kind of apostasy. That sounded like a fun challenge.
I can’t really go through all the books I bought during the last decade, but I wrote one. I’m less interested in comparing the books than the models, but here are a few differences between my book and yours:
You obviously write better English;
A large format is great for a data visualization book, but I’m not sure if you took full advantage of it; also, while business visualization should use large screens, mobile should at least be taken into account. You don’t need a large page size for this.
I used real data, not only because it is more relatable than a few dummy data points but because it can potentially add a small level of uncertainty that made-up datasets lack.
You use generic examples (not specific to data visualization) when discussing gestalt laws. I try to give a data visualization example for each law.
You didn’t feel the need to illustrate many of your ideas with a corresponding graph. I did.
You write extensively about table design. I preferred to focus exclusively on graphs.
Since you use your book as a reference and mention the last 10 years, I’m assuming that you want us to use its first edition. That’s fine with me. Let’s go through a few topics.
Value-encoding methods in graphs
You say graphs contain quantitative and categorical components, and that the “structural variations of graphs are defined primarily by differences in the components that encode quantitative values (e.g., lines versus bars).” Also, quantitative values can be encoded using points, lines, bars and shapes, but shapes are not effective and should be crossed out. This is where your famous quote comes from: “I don’t use pie charts, and I strongly recommend that you abandon them as well.” You dismiss other objects, like bubbles.
I’ll try to stick to your book, but I couldn’t help noticing that you downplay bubble charts because they are rarely needed in business communication. In your article “Leave pies for dessert”, you say that the only advantage of pie graphs is that they are able to display cumulative values, but that’s rarely used. So, you want to change people’s minds but current practices are OK if something doesn’t fit your model. I believe bubble charts are fine, provided we see them like scatterplots that add extra, but not critical, information when encoding the circle size (muting the bubbles and placing a dot at the center helps). Interestingly, you accept this “secondary information” when using stacked bar charts but not when using bubbles.
I don’t share your view. I see no reason to remove areas or use objects like bars, when the traditional primitives (point, line, area and volume) makes more sense. We encode values using points, so that we can evaluate their distances, then we use dots, lines and areas to make these points visible, but those are design choices. This is far from new, but It helps us become aware that all graphs share the same roots and, because of that, creating new ones or improving existing ones becomes easier. Consistent with this view, in a sense there are no categorical axis: if you have a single quantitative variable you can only measure distances along a single axis. The categorical axis is nothing more than an offset from the opposite axis to make it easier to display and label categories. Being aware of this can be useful when sorting data points, a common issue with bar graphs. You talk about lines, vertical bars and horizontal bars, but that’s confusing, because they are variations of a single entity, line. It does serve your purpose of avoid adding geometric primitives in a more creative way.
Because it separates data mapping from graph design, my perspective represents a fundamental departure from your model. I strongly believe that, far from from being an error or a bad advice, this is a better starting point to understanding graphs, and I find it hard to associate this with your notion of data visualization lite.
Relationships in graphs
You say that there are seven types of relationships in graphs: nominal comparisons, time series, ranking, part-to-whole, deviation, distribution, correlation.
I defined a different set of relationships. I don’t see the usefulness of “nominal comparison” because I believe there should be a data-based sorting key in all charts. “Deviation” is more interesting, but not enough to be in its own category (it’s more data-specific). I used an “Order and Ranking” category that includes all these point comparisons. I also used “Profiling” where I include small multiples and similar constructions, because we should see these constructions as a single graph and not as multiple graphs.
Graph design solutions
This is an interesting section in your book. You take each of the previously defined relationships and select the value-encoding methods that best suit them. You came up with this:
Nominal comparison: bars and points
Time-series: lines, points and bars
Ranking: bars and points
Deviation: bars, lines and points
Distribution: bars, lines and points
Correlation: points and bars.
First conclusion: you do believe that every single relationship can be represented using a bar graph. Maybe Amanda Cox is not totally wrong, after all (teasing). Since you crossed-out areas, you don’t really have much choice.
This is one of the problems in your model: instead of a generic concept of line, you specify lines as connectors and talk about vertical and horizontal bars. Also, there isn’t much to be creative about with points, and you remove areas from the options. You end up with a very limited number of choices to make your graphs. For example, there is no support in this model to praise horizon charts.
I think I now understand why you disliked Abela’s classification (and mine, I presume) so much. Instead of actual graph types, it makes more sense to identify the best encoding objects for each type of relationship. This means that people new to data visualization will not automatically assume that there is a type of graph for each type of relationship.
I like this idea, I really do. Problem is, if my definition of components (point, line, area) gets too open because people basically can design a graph for each relationship using variations of any of component, your definition (line, vertical bar, horizontal bar, point) is too closed, and using a bar graph for each relationship is a serious possibility.
My own preference goes to suggesting a few graph types for each relationship but making it clear that each graph type can be used in more than one relationship and that small changes in a graph can impact its nature, making it more useful in one category and less useful in another. I know you disagree.
The big issue in this section is how to represent part-to-whole relationships. This was already discussed ad nauseam, so there is no point in doing it again extensively. You think bar graphs should be used to display these relationships. I believe percentages are not enough to define part-to-whole relationships and the whole must be visible. I played with the idea of having a non-stacked pie graph, but the right solution is not to use graphs that display something else, but to encourage people to go beyond the uninteresting part-to-whole analysis.
If I had any doubts regarding your positivist attitude towards data visualization, the chapter on visual perception would remove them once and for all.
Most data visualization books recognize the need for at least a basic understanding of how human visual perception works. Yours and mine are no exception. There are two major differences, though. I believe personality and social/cultural context are relevant enough to be discussed in an autonomous chapter. The second difference is that you overly emphasize the bottom-up dimension (how visual stimuli are acquired by the eye-brain system) and hardly mention the top-down process (the brain is not a passive receiver of visual stimuli, it engages actively in its selection).
Both the top-down process and context add a little sand to the beautiful mechanics of visual perception, where things are complex but ultimately we can explain and model them. To see, like you do, the eye as a camera and the eye-brain system like a computer is reassuring. We are in control, and taking rational decisions is easy. Positivism, again.
As I said above, graphs are scarce in this chapter. I think readers would benefit more if each of the sections were illustrated with a data visualization example and not only verbalized or illustrated with a generic example.
General design for communication
If I was puzzled by the lack of graphs to illustrate perception-related concepts, I find their total absence in this chapter even more bizarre.
The chapter begins with a paragraph that summarizes your approach to data visualization design: creating beauty is the work of artists, but we are here to communicate. Just like you don’t want culture messing with the mechanics of perception, you don’t want beauty wreak havoc on your rational design.
General and component-level graph design
In the final chapters of your book you go through the details of graph design. Most of them are consistent with your model and reinforce its principles. I don’t think you have a compelling answer to the problem of scale breaking in line graphs, and I have to mention the broad consensus among “experts” and experts who oppose using dual axis graphs. And I’m sure you’ll agree now that the “Correlation Bar Graph” is really, really, a bad idea.
It’s also interesting to note that this notion of when people “depart from the content that exists in ‘Show Me the Numbers,’ they often introduce errors and provide bad advice” actually began with you, when you thought that using a log scale with a bar chart was a smart idea.
I follow Tufte’s Envisioning Information and dedicate a full chapter to color, where I tried to classify all its major uses in data visualization, discuss color palettes or color versus shades of gray. My goal was to find ways of “avoiding catastrophe” (Tufte’s words).
Being such an elusive but fundamental component of graph design, you couldn’t possibly ignore it. But, except for a passing reference about the use of color in exotic countries, and the need to maintain two versions of the same hue to manage attention, there is nothing interesting to learn about color in your book. I can’t say this surprises me.
To wrap up: Here be dragons
Let me reiterate: this is not a review of your book. If it were, I would certainly emphasize its historical significance and how you managed to create the best data visualization model for a business context. I don’t forget that. But, like any other model, it has its flaws. By defining it as the golden standard and challenging your readers to provide examples of things that were not in the book or were said in a better way, you inevitably asked us to reread your book with a more critical eye.
What I found was an attempt to remove all risk, all irrationality, all the little things that could jeopardize this positivist notion of data visualization as mostly objective. No wonder you describe my book as lite. I tried to push readers into unknown territories. I warned them about the dragons, I told them to pursue at their own risk, but I provided a map as detailed as possible of my own exploration. I do regret any errors, but I’m not sure if avoiding them at all cost would be the right strategy.
I’m sure you’ll reply and once again dismiss everything as error and poor advice. At most there will be your “yes, but rarely used”, but I’m not counting on it.
You wrote your book for farmers, I wrote mine for explorers. We need both, but after reading your book again I feel even more reassured about my choices. Today, I would go further.
PS: I have two things to add. I like to keep private communications private. I don’t see social media as an extension of those private communications. I hope you agree that everything I wrote above is fully independent from them. Also, I was suggested that, for full disclosure, I should say here that you read a few draft chapters of my book. You not only read it but you also sent me very useful feedback, for which I’m grateful. I already said it in the book, but I agree that I should repeat it here.
PS2: I don’t know if this post is mere coincidence or a half-response to this letter. Either way, good post on dealing with errors in..
(All the Excel charts in my book are available for download, but I promised to write tutorials for a few of them. This is the first one.)
Name: Bullet charts
What it is used for: to display key performance indicators. Use them to replace speedometers if you want a more compact visual that can be stacked to better compare KPI. Also, speedometers suck.
Excel implementation: There are a few different ways to implement this chart in Excel. Jon Peltier suggests using bar charts. In this post, we will use a scatterplot.
So, let’s use a KPI that can assume values between zero and 150. The cut-off points are 50 and 100 (you can have as many ranges as you like, but three is the standard number). Since I’m designing a horizontal bullet chart, all data points will have the same y value.
As you can see in the image above, the series High gets its data from three named ranges:
the series name comes from cell E2 (range “ranHighT”),
the series X value comes from cells E3:E4 (range “ranHigh”),
and the series Y value comes from cells B3:B4 (range “ranY”).
All series share the same y values.
Those are the background series. Now we have to add the target value and the actual value. Add them exactly as you did for the previous series. Then, we have the alarm. The alarm is not in the original specifications but, since we will need to call the reader’s attention to abnormal values, adding an alarm to the bullet chart is an elegant way to do it. In this example, there is a rule in cell H4 that returns an error if the Actual value is above the upper limit of the Low range: “=IF(G4<C4,-10,NA())”. This means that the alarm is not visible if the formula returns an error. The alarm will be displayed to the left of the chart, at value -10 (we can change this).
Now we have to change the Low, Average and High series. For each one, set Marker to “None”, Line width to 15 pt and Cap type to Flat.
Remove marker from series Actual, change line width to 5, color to black and Cap type to Flat. Now your chart should look something like this:
The built-in marker types don’t include a vertical line, but you can make one to use in the Target series:
if you use Windows, open the Paint application;
create an image sized 15 x 2 pixels;
fill it with the color you want;
from the built-in types, select Image;
choose the one you just created;
set Marker border to No line;
keep the Alarm marker;
change its color to red;
change its size to 8.
The remaining changes are more or less cosmetic:
add the series KPI;
remove markers and lines from the series;
select the KPI series and, under Design, choose Add Chart Element / Data Labels / More Data Label Options…;
set Label Position to Left;
check Choose Value from Cells;
set the data range to “Bullet!ranKPINames”, the named range in column J (don’t forget the “Bullet!” part);
uncheck “Y Value” and “Show Leader Lines”.
remove he chart border.
You’ll have to adjust chart or plot area sizes so that the label doesn’t overlap the alarm. After these changes, the bullet chart should look like this:
I removed all gridelines and axis lines, the the labels in the vertical axis, and used a custom number format to hide negative values (add “##;;0” as a format code). The horizontal axis is set to a minimum of -50 and a maximum of 150.
Have you noticed in the first image that some of the rows are hidden? I actually defined the named ranges to go down to row 19. When you unhide these rows you’ll see five more bullet charts:
That’s it: a very simple way of making bullet charts. Was it useful? How would you improve the tutorial? Your suggestions are welcome, and let me know which chart from the book you’d like me to write about.
You see, there is nothing wrong in using data for the sole purpose of creating aesthetically pleasing visual objects. On the other hand, if you want to make sense of the data, and communicate your findings, it’s easy to argue that effectiveness should be your primary goal. This is the general model. Too black & white.
When you remove the fog of gratuitous effects (3D is the usual suspect) and you start actually thinking about the data and seeing it more clearly, something bizarre happens. Suddenly the client, who was so enthusiastic about applying more effective data visualization practices, now is not so sure if the charts should be published at all. Suddenly, the data turns dangerous, and it must be managed politically at a higher level.
Suddenly, the colorful lollipop turns into a potentially dangerous match.
Depending on the rules and the context, data should be managed politically. We do it all the time at the personal level. What is interesting (and surprised me when I experienced this for the first time) is to see the client’s reaction, now becoming aware that he was caught off guard. It’s easy to get careless when you use your data to make lollipops.
Then comes the damage control stage. What should we do, what should we do? The knee jerk reaction is to replace the indicator or lie (bad idea), change the way it is presented/calculated (still bad), keep the indicator and add context (better), keep the indicator, add context and explain with annotations (much better).
The politics of data management in an organization says a lot about its culture. If something doesn’t make sense to you, an outsider, you’re probably not aware of it’s political dimension.
Do you find many lollipops-turned matches in your work? Share them in the comments below.
Image credits: I merged two images from the Wikipedia, this one, by Matthias Kabel, and this one, by Bbxxayay.