A Visual Business Intelligence blog by Stephen Few. This blog is written by Stephen Few, a leading expert in data visualisation and business intelligence. You won’t find many glittering charts or rich infographics here, but you will find a decade’s worth of well-written and thought provoking articles about statistics, analytics and ‘data viz’ tools.
Data, in and of itself, is not valuable. It only becomes valuable when we make sense of it. Unfortunately, most of us who are responsible for making sense of data have never been trained in two of the job’s most essentially thinking skillsets: critical thinking and scientific thinking. The Data Loom does something that no other book does—it covers the basic concepts and practices of both critical thinking and scientific thinking and does so in a way that is tailored to the needs of data sensemakers. If you’ve never been trained in these essential thinking skills, you owe it to yourself and your organization to read this book. This simple book will bring clarity and direction to your thinking.
To thoroughly, accurately, and clearly inform, we must identify the intended signal and then boost it while eliminating as much noise as possible. This certainly applies to data visualization, which unfortunately lends itself to a great deal of noise if we’re not careful and skilled. The signal in a stream of content is the intended message, the information we want people to understand. Noise is everything that isn’t signal, with one exception: non-signal content that somehow manages to boost the signal without compromising it in any way is not noise. For example, if we add nonessential elements or attributes to a data visualization to draw the reader’s attention to the message, thus boosting it, without reducing or altering the message in any way, we haven’t introduced noise. No accurate item of data, in and of itself, always qualifies either as a signal or noise. It always depends on the circumstances.
In physics, the signal-to-noise ratio, which is where the concept originated, is an expression of odds: the ratio of the one possible outcome to another. When comparing signal to noise, we want the odds to dramatically favor the signal. Which odds qualify as favorable varies, depending on the situation. When communicating information to someone, a signal-to-noise ratio of 99 to 1 would usually be considered favorable. When hoping to get into a particular college, however, 3-to-1 odds might be considered favorable, but those odds would be dreadful in communication, for it would mean that 25% of the content was noise. Another ratio that is common in data communication, a probability ratio, is related to an odds ratio. Rather than comparing one outcome to other as we do with odds, however, a probability ratio compares a particular outcome to the total of all outcomes. For example, a probability ratio of 85 out of 100 (i.e., the outcome of interest will occur 85% of the time on average), is the mathematical equivalent of 85-to-15 odds. When Edward Tufte introduced the concept of the data-ink ratio back in the 1980s, he proposed a probability ratio rather than an odds ratio. He argued that the percentage of ink in a chart that displays data, when compared to the total ink, should be as close to 100% as possible.
Every choice that we make when creating a data visualization seeks to optimize the signal-to-noise ratio. We could argue that the signal-to-noise ratio is the most essential consideration in data visualization—the fundamental guide for all design decisions while creating a data visualization and the fundamental measure of success once it’s out there in the world.
It’s worth noting that particular content doesn’t qualify as noise simply because it’s inconvenient. Earlier, I said that a signal is the intended message, but let me qualify this further by pointing out that this assumes the message is truthful. In fact, the message itself is noise to the degree that it communicates misinformation, even if that misinformation is intentional. I’ve seen many examples of data visualizations that left out or misrepresented vital information because a clear understanding of the truth wasn’t the designer’s objective. I’ve also witnessed occasions when highly manipulated data replaced the actual data because it told a more convenient story—one that better supported an agenda. For example, a research paper that claims a strong relationship between two variables might refrain from revealing the actual data on which those claims were supposedly based in favor of a statistical model that replaced a great deal of volatility and uncertainty in the relationship, which could be seen in the actual data, with a perfectly smooth and seemingly certain portrayal of that relationship. On occasions when I’ve questioned researchers about this, I’ve been told that the volatility in the actual data was “just noise,” so they removed it. While they might argue that their smooth model illustrates the relationship in a simpler manner, I would argue that it over-simplifies the relationship if they only report the model without also revealing the actual data on which it was based. Seeing the actual data as well helps us keep in mind that statistical models are estimates, built on assumptions, which are never entirely true.
So, to recap, noise in communication, including data visualization, is content that isn’t part of and doesn’t support the intended message or content that isn’t truthful. Turn up the signal; turn off the noise.
I spend a great deal of time reading books. Many of them cover topics that are relevant to my work in data sensemaking and data visualization, and most of them are quite good, but only a few are extraordinary. The new book, How Attention Works: Finding Your Way in a World Full of Distraction, by Stefan van der Stigchel, definitely qualifies as extraordinary.
Stigchel is a professor in the Department of Experimental Psychology at Utrecht University in the Netherlands. Until recently, I taught annual data visualization workshops in Utrecht for several years. Had I known about Stigchel at the time, I would have definitely invited him out for a beer during one of my visits. His work is fascinating. This book focuses on a specific aspect of visual perception: visual attention—what it is, how it works, how it is limited, and how it has allowed the human species to progress beyond other species. It does so in a practical manner by explaining how an understanding of visual attention can improve all forms of information design.
I only know of one other author who has written practical works about visual perception with such clarity and insight: Colin Ware, Director of the Data Visualization Research Lab at the University of New Hampshire. It was from Ware’s two books—Visual Thinking for Design and Information Visualization: Perception for Design—that I learned much that I know about visual perception and its application to data visualization. Although Stigchel doesn’t address data visualization in particular, what he reveals about visual attention complements and, in some respects, extends what Ware covers in his books. Here’s an excerpt from the preface that will give you an idea of the book’s contents and intentions:
If you dig deeper into the subject of visual perception, you will quickly discover that we actually register very little of the visual world around us. We think that we see a detailed and stable world, but this is just an illusion created by the way in which our brains process visual information. This has important consequences for how we present information to others—especially attention architects.
Everyone whose job involves guiding people’s attention, like website designers, teachers, traffic engineers, and, of course, advertising agents, could be given the title of “attention architect.” Such individuals know that simply presenting a visual message is never enough. Attention architects need to be able to guide our attention to get the message across…Whoever can influence our attention has the power to allow information to reach us or, conversely, to ensure that we do not receive that information at all.
Everyone who visualizes data and presents the results to others is an attention architect…or should be. To visualize data effectively, you must learn how to draw people’s attention to those parts of the display that matter and to prevent the inclusion of anything that potentially distracts attention from the message. You can only do this to the degree that you understand how our brains manage visual attention, both consciously and unconsciously. Reading this book is a good start.
I received an email a few days ago from the founder and CEO of a new analytics software company that led to an interesting revelation. In his email, this fellow thanked me for sharing my insights regarding data visualization and shared that he has acquired several of my books, which are “nearing the top” of his queue. He went on to provide a link to his website where I could see his attempts to incorporate visual analytics into his product. After taking a quick look at his website and noting its poor data visualization practices, I wrote him back and suggested that he make time to read my books soon. It was in his subsequent response that he revealed what I found most interesting. In response to my concern about the poor data visualization practices that I observed on his website he wrote, “The site content has been delivered with a minimally viable product mindset.” My jaw hit the floor.
This fellow apparently misunderstands the concept of a minimal viable product (MVP). According to Wikipedia, “a minimal viable product is a product with just enough features to satisfy early customers, and to provide feedback for future product development.” When you initially introduce a new product, it doesn’t make sense to address every possible feature. Instead, it usually makes sense to provide enough features to make the product useful and put it on a trajectory, through feedback from customers, to become in time a product that is fully viable.
This misunderstanding reminds me of the way that product companies have sometimes misapplied the Pareto Principle (a.k.a., the 80/20 rule). Years ago when I worked for a business intelligence software company, it was common practice for managers in that company to encourage designers and developers to create products that only satisfied 80% of the customers’ needs, which they justified as the 80/20 rule. This has nothing to do with the Vilfredo Pareto’s observation that 80% of the property in Italy was owned by 20% of the people in Italy, a ratio that he went on to observe in the relative distribution of several other things as well. Pareto never promoted this ratio as a goal. It’s amazing how concepts and principles can be perverted in silly and harmful ways.
The concern that I expressed to this fellow about his fledgling product was not a lack in the number of features but a lack in the quality of the features that he included. Shooting for minimally viable quality is not a rational, ethical, or productive goal.
My exchange with this fellow continued. I pointed out that “the analytics space is filled with minimally viable products.” This was not a compliment. To this, however, he enthusiastically responded:
Certainly, agreed – which is one reason we believe we can be successful. I’m using MVP in the context of product development; the quicker we deliver functional capabilities the more quickly we receive feedback and iterate through enhancements. In terms of mature client solutions we stand for, and strive to deliver, an exceptional standard of quality – rare in the analytics space.
The notion that quick iterations can make up for sloppy and inexpert development is nonsense, but this philosophy has nevertheless become enshrined in many software companies. Is it any wonder that most analytics products function so poorly?
There is absolutely no justification for producing an analytics application that at any stage during the development process chooses inappropriate data visualizations and designs them poorly. Best practices can be incorporated into each stage of development process without undue or wasted effort. Not only are ineffective data visualization practices at any stage in the process inexcusable, they do harm, for they expose and thereby promote those bad practices.
This fellow used the “minimally viable product mindset” as a justification for the fact that his team doesn’t understand data visualization. This is all too familiar. To complete the story, here is my final response to this fellow’s mindset:
You are not exhibiting the “exceptional standard of quality” that you claim as your goal. Every single player in the analytics space claims to strive for “exceptional quality,” but none exhibit a genuine commitment to this goal. To seriously strive for this goal, you must develop the required expertise before beginning to develop solutions. Slow down and take time to get it right. The world doesn’t need any more “minimally viable” products.
What are the chances that he will accept and follow my advice? My experience suggests that odds aren’t good, but I’d be happy for this fellow to become an outlier. We don’t need more bad analytics products. A few that are well designed are all that we need.
Few data technologies are subject to more hype these days than VR-enabled data visualization. I have never seen a single example that adds value and therefore makes sense. Those who promote it don’t base their claims on actual evidence that it works. Instead, they tend to spout a lot of misinformation about visual perception and cognition. Those who have actually taken the time to study visual perception and cognition could take each of these claims apart with ease. VR has the cool factor going for it and vendors are capitalizing on this fact.
VR certainly has its applications. Data visualization just doesn’t seem to be one of them and it’s unlikely that this will change. If it does at some point in the future, I’ll gladly embrace it. Navigating physical reality in a virtual, computer-generated manner can indeed be useful. I recently visited the beautiful medieval town of Cesky Krumlov in the Czech Republic near the Austrian border. I could have relied solely on photographs and descriptions in a guide book, but walking in the midst of that old city, experiencing it directly with my own senses, enhanced the experience. Had I not been able to visit it personally, a VR tour of Cesky Krumlov could have provided a richer experience than photographs and words alone. Data visualizations, however, display abstract data, not physical reality, such as a city. There is no advantage that we have discovered so far, either perceptual or cognitive, to flying around inside a VR version of the kind of abstract data that we display in data visualizations. We can see and make sense of the data more effectively using 2-D or, on rare occasions, 3-D displays projected onto a flat plane (e.g., a screen) without donning a VR headset.
I was prompted to write this blog post by a recent article titled “Data visualization in mixed reality can unlock big data’s potential,” by Amir Bozorgzahed. This fellow is the cofounder and CEO of Virtuleap and host of the Global WebXR Hackathon, which puts his interest in perspective. The article quotes several software executives who have VR products to sell, and the claims that they make are misleading. They take advantage of the gullibility of people who are already susceptible to the allure of technological hyperbole that goes by such names as VR, Big Data, AI, and self-service analytics. They market their VR-enabled data visualization tools as techno-magical—capable of turning anyone into a skilled data analyst without an ounce of training, except in the use of their VR tools.
Let’s examine a few of the claims made in the article, beginning with the following:
The tech enables not only enterprises and organizations, but anyone, to use their spatial intelligence to spot patterns and make connections that breakthrough the tangled clutter of big data in a way that has been out of reach even with traditional 2D analytics.
“Anyone” can use their “spatial intelligence to spot patterns and make connections.” Wow, this is truly magical and downright absurd. While it is true that spatial perception is built into our brains, it is not true that we can use this ability to make sense of abstract data without having developed an array of data sensemaking skills.
The self-service claims of VR data visualization can get even more outlandish. Consider the following excerpt from the article, which describes WebVR’s “forthcoming seismic-upgrade:”
In fact, their platform wasn’t designed to cater to just highly-trained data scientists, but for anyone with a stake in the game. In the not so distant future, I picture the average Joe or Jane regularly making use of their spatial intelligence to slice and dice big data of any kind, because everyone has the basic skill-sets required to play Sherlock Holmes in mixed reality. All they need to get started is access to big data sets, which I also foresee as being more prevalent not too long from now.
Amazing! I suppose it’s true that everyone can “play” Sherlock Holmes, but playing at it is quite different from sleuthing with skill.
Here’s an example of a VR data visualization that was included in the article:
First of all, you don’t need VR to view data in this manner. At this moment you’re viewing this example on a screen or printed page. You do need VR hardware and software, however, to virtually place yourself in the middle of a 3-D scatter plot and fly around in it, but this wouldn’t make the data more accessible, perceptible, or understandable. Viewing the data laid out in front of us makes it easier to find and make sense of the meaningful patterns that exist within.
The spatial perception that is built into the human brain can indeed be leveraged, using data visualization, to make sense of data. It is not true, however, that it can do so independent of a host of other hard-won skills. Here’s another similar excerpt from the article:
Pattern recognition is an inherent talent that we all possess; the evolutionary edge that sets us apart from the animal kingdom. So, it’s not so much that immersive data visualization unlocks big data but, rather, that it allows us to interact with big data in a way that is natural for us.
This is quite misleading. Other animals also have tremendously good pattern recognition abilities built into their brains, in many cases much better than ours. What sets humans apart in regards to pattern recognition is our ability to reason about patterns in abstract ways, sometimes called patternicity. This is both a blessing and a curse, however, for we can and often do see patterns that are entirely meaningless. We are prolific meaning generators, but separating valid from illusory meanings requires a rich set of data sensemaking skills. No tool, including and perhaps especially VR, will replace the need for these skills.
Here’s another visualization that’s featured in the article:
The caption describes this as “a volatile blockchain market.” What is the claim?
The Bitcoin blockchain in particular pushes the limits of traditional data visualization technology, as its support for transactions involving multiple payers and multiple payees and high transactional volume would create an incomprehensible jumble of overlapping points on any two-dimensional viewer.
Let’s think about this for a moment. If we view a forest from the outside, it appears as a “jumble” of trees. Due to occlusion, we can’t see each of the trees. If we walk into that forest, we can examine individual trees, but we lose sight of the forest. This is a fundamental problem that we often face when trying to visualize a large and complex data set. We typically attempt to resolve this challenge by finding ways to visualize subsets of data while simultaneously viewing how those subsets fit into the larger context of the whole. A traditional data visualization approach to this problem involves the use of concurrent “focus+context” displays to keep from getting lost in the forest while focusing on the trees. Nothing about VR helps us resolve this challenge. In fact, compared to a screen-based display, VR just makes it easier to get lost in the forest.
Here’s the ultimate expression of nonsense that I encountered at the end of Bozorgzahed’s article:
We have reached a point in time where much of the vast digital landscape of data can be now rendered into visual expressions that, paired up with artificial intelligence, can be readily deciphered and understood by anyone with simply the interest to mine big data. And all this because the underlying tech has become advanced enough to finally align with how we visually process the world.
Notice the abundant sprinkling of buzzwords in this final bit of marketing. When you combine data visualization with VR, AI, and Big Data you have a magic trick as impressive as anything that David Copperfield could pull off on a Las Vegas stage, but one that is just as much an illusion.
I will continue saying what I have said before too many times to count: data sensemaking requires skills that must be learned. No tool will replace the need for these skills. It’s time that we accept the unpopular truth that data sensemaking requires a great deal of training and effort. There are no magic bullets, including VR.
We typically think of quantitative scales as linear, with equal quantities from one labeled value to the next. For example, a quantitative scale ranging from 0 to 1000 might be subdivided into equal intervals of 100 each. Linear scales seem natural to us. If we took a car trip of 1000 miles, we might imagine that distance as subdivided into ten 100 mile segments. It isn’t likely that we would imagine it subdivided into four logarithmic segments consisting of 1, 9, 90, and 900 mile intervals. Similarly, we think of time’s passage—also quantitative—in terms of days, weeks, months, years, decades, centuries, or millennia; intervals that are equal (or in the case of months, roughly equal) in duration.
Logarithms and their scales are quite useful in mathematics and at times in data analysis, but they are only useful for presenting data on those relatively rare cases when addressing an audience that consists of those who have been trained to think in logarithms. With training, we can learn to think in logarithms, although I doubt that it would ever come as easily or naturally as thinking in linear units.
For my own analytical purposes, I use logarithmic scales primarily for a single task: to compare rates of change. When two time series are displayed in a line graph, using a logarithmic scale allows us to easily compare the rates of change along the two lines by comparing their slopes, for equal slopes represent equal rates of change. This works because units along a logarithmic scale increase by rate (e.g., ten times the previous value for a log base 10 scale or two times the previous value for a log base 2 scale), not by amount. Even in this case, however, I would not ordinarily report to others what I’d discovered about rates of change using a graph with a logarithmic scale, for all but a few people would misunderstand it.
I decided to write this blog piece when I ran across the following graph in Steven Pinker’s new book Enlightenment Now:
The darkest line, which represents the worldwide distribution of per capita income in 2015, is highlighted as the star of this graph. It has the appearance of a normal, bell-shaped distribution. This shape suggests an equitable distribution of income, but look more closely. In particular, notice the income scale along the X axis. Although the labels along the scale do not consistently represent logarithmic increments—odd but never explained—the scale is indeed logarithmic. Had a linear scale been used, the income distribution would appear significantly skewed with a peak nearer to the lower end and a long declining tail extending to the right. I can think of no valid reason for using a logarithmic scale in this case. A linear scale ranging from $0 per day at the low end to $250 per day or so at the high end, would work fine. Ordinarily, $25 intervals would work well for a range of $250, breaking the scale into ten intervals, but this wouldn’t allow the extreme poverty threshold of just under $2.00 to be delineated because it would be buried within the initial interval of $0 to $25. To accommodate this particular need, tiny intervals of $2.00 each could be used throughout the scale, placing extreme poverty approximately within the first interval. As an alternative, larger intervals could be used and the percentage of people below the extreme poverty threshold could be noted as a number.
After examining Pinker’s graph closely, you might be tempted to argue that its logarithmic scale provides the advantage of showing a clearer picture of how income is distributed in the tiny $0 to $2.00 range. This, however, is not its purpose. Even if this level of detail regarding were relevant, the information that appears in this range isn’t real. The source data on which this graph is based is not precise enough to represent how income in distributed between $0 and $2.00. If reliable data existed and we really did need to clearly show how income is distributed from $0 to $2.00, we would create a separate graph to feature that range only and that graph would use a linear scale.
Why didn’t Pinker use a linear scale? Perhaps it is because the message of the graph would reveal a dark side that would somewhat undermine the message of his book that the world is getting better. Although income has increased overall, the distribution of income has become less equitable and this pattern persists today.
When I noticed that Pinker derived the graph from Gapminder and attributed it to Ola Rosling, I decided to see if Pinker introduced the logarithmic scale or inherited it in that form from Gapminder. Upon checking, I found that Gapminder’s graphs of wealth distribution indeed feature logarithmic scales. If you go to the part of Gapminder’s website that allows you to use their data visualization tools, you’ll find that you can only view the distribution of wealth logarithmically. Even though some of Gapminder’s graphs provide the option of switching between linear and logarithmic scales, those that display distributions of wealth do not. Here’s the default wealth-related graph that can be viewed using Gapminder’s tool:
This provides a cozy sense of bell-shaped equity, which isn’t truthful.
To present data clearly and truthfully, we must understand what works for the human brain and design our displays accordingly. People don’t think in logarithms. For this reason, it is usually best to avoid logarithmic scales, especially when presenting data to the general public. Surely Pinker and Rosling know this.
Let me depart from logarithms to reveal another problem with these graphs. There is no practical explanation for the smooth curves that they exhibit if they’re based on actual income data. The only time we see smooth distribution curves like this is when they result from mathematical calculations, never when they’re based on actual data. Looking at the graph above, you might speculate that when distribution data from each country was aggregated to represent the world as a whole, the aggregation somehow smoothed the data. Perhaps that’s possible, but that this isn’t what happened here. If you look closely at the graph above, in addition to the curves at the top of each of the four colored sections, one for each world region, there are many light lines within each colored section. Each of these light lines represents a particular country’s distribution data. With this in mind, look at any one of those light lines. Every single line is smooth beyond the practical possibility of being based on actual income data. Some jaggedness along the lines would always exist. This tells us that these graphs are not displaying unaltered income data for any of the countries. What we’re seeing has been manipulated in some manner. The presence of such manipulation always makes me wary. The data may be a far cry from the actual distribution of wealth in most countries.
My wariness is magnified when I examine wealth data of this type from long ago. Here’s Gapminder’s income distribution graph for the year 1800:
To Gapminder’s credit, they provide a link above the graph labeled “Data Doubts,” which leads to the following disclaimer:
Income data has large uncertainty!
There are many different ways to estimate and compare income. Different methods are used in different countries and years. Unfortunately no data source exists that would enable comparisons across all countries, not even for one single year. Gapminder has managed to adjust the picture for some differences in the data, but there are still large issues in comparing individual countries. The precise shape of a country should be taken with a large grain of salt.
I would add to this disclaimer that “The precise shape of the world as a whole should be taken with an even larger grain of salt.” This data is not reliable. If the data isn’t reliable today, data for the year 1800 is utterly unreliable. As a man of science, Pinker should have made this disclaimer in his book. The claim that 85.9% of the world’s population lived in extreme poverty in 1800 compared to only 11.4% today makes a good story of human progress, but it isn’t a reliable claim. Besides, it’s hard to reconcile my reading of history with the notion that, in 1800, all but 14% of humans were just barely surviving from one day to the next. People certainly didn’t live as long back then, but I doubt that the average person was living well below the threshold of extreme poverty as this graph suggests.
I’ve grown concerned that the recent emphasis on data storytelling has led to a reduction in clear and accurate truth telling. When I was young, to say that someone “told stories” meant that they made stuff up. This negative connotation of storytelling describes a great deal of data storytelling today. Encouraging people to develop skills in data sensemaking and communication should focus their efforts on learning how to discover, understand, and tell the truth. This is seldom how instruction in data storytelling goes. The emphasis is more often on persuasion than truth, more on art (and artifice) than science.