Eagereyes - Visualization and Visual Communication
EagerEyes is Robert Kosara place to reflect on the world of information visualization and visual communication of data. The goal is to help digest things that are happening in the field and discuss developments that may be tangential or early, but that are likely to have an impact.
Why do pie charts look the way they do? What makes this particular way of slicing up a circle the preferred way of showing part-to-whole relationships? In two short papers that I’m presenting this week at EuroVis, I looked at the design space of circular part-to-whole charts, including pie charts.
So I designed a study to test seven different ways of slicing up a circle to represent a fraction by area. Out of these, the pie chart did the best, but there was another chart, which I’m going to call the moon pie, that did equally well for accuracy (and better for speed): sliding a circle of the same size over the base circle to create the part that shows the fraction (similar to how the earth’s shadow creates the phases of the moon).
The second study, The Impact of Distribution and Chart Type on Part-to-Whole Comparisons, builds on the first one and addresses a criticism I keep getting about the earlier work: why only two slices? Why not test more? So I took the pie chart, the moon pie, the straight-slice pie (the area-only condition from earlier), as well as a stacked bar chart and a treemap, and sliced them up into five slices. I also varied the distribution of the numbers to have either a fat-tail or a long-tail shape.
The moon pie did well in this one too, and much better than the treemap. Stacked bars held their own in this part-to-whole task, even though they’re terrible for most other uses. And the pie chart, which served as the baseline, again did better or at least as well as, any of the other techniques.
The visualization community may not like pie charts, but in the real world they’re hugely popular and very common. Rather than sneering at them (and the people who use them), why don’t we try to understand them better? In particular, the design space of part-to-whole charts is almost entirely unexplored. The only other chart that’s used for this purpose out in the world, the treemap, hasn’t been studied for this purpose much (if at all). And it seems to actually do worse than the pie chart (and the moon pie).
The two papers very much belong together, you could almost call them two parts of a single paper. I can’t confirm or deny that they were a single paper at some point and may have gone through several rounds of review (and rejection) over several years.
Criticizing visualizations is a cottage industry of sorts, and an activity I have indulged in in the past as well. Redesigning those charts is also not uncommon, though it's usually other people's charts, and that isn't always welcome. Sarah Leo of The Economist has redesigned some of the charts made by that publication, and not only do her redesigns work better, her thoughts around some of the design issues are also very insightful.
This being The Economist, the charts Leo picked for redesigns aren't all that egregious to begin with, but her redesigns are clearly an improvement. Two stood out to me in particular. The first one seems to show a correlation that's a little bit too perfect.
I don't know much about dogs, but I figure weight and neck size are highly correlated. It may also just be a coincidence of how the vertical scales fall that these lines end up on top of each other, so it's easy to just go with that. At the same time, we're comparing apples and oranges here and neither of the axes start at zero.
Leo makes an interesting point here about how to pick axis ranges in such a case: pick the same range as a percentage of the full scale. That at least makes the rate of change somewhat comparable and we see that perhaps the neck sizes haven't been dropping quite as drastically as the weights. I think this is an interesting idea and a useful guideline for cases where including zero would be impractical. Another approach, though possibly not one that fits the Economist's editorial style, would be to try a connected scatterplot.
The other example I though was particularly interesting is also a dual-axes line chart, this one about the U.S. trade deficit and employment.
What's extremely confusing here, and frankly not very well done in the original, is that the trade deficit is all negative numbers while unemployment numbers are all positive. The half-length red line at the top is supposed to indicate this, but I at least did not understand that and thought it was a mistake.
The redesign is clearly much better, and I particularly like that Leo chose bars here to connect the points to the axis. This way, there's no way to miss that the numbers are negative. I'm partial to downwards-pointing bars and think they should be used a lot more; but they need to be done in a way that's clear and easy to understand, or risk being misread.
This kind of reflection and redesign of your own charts (even if Leo didn't actually make all of those original charts, but they're done by the people around her) is a great exercise. It allows you to flex your design muscles and think in terms of more general design rules and guidelines that will inform and improve future work. And doing this with your own work means you're not stepping on people's toes, even when you're doing this in public.
I'm one of the organizers of the new TrustVis Workshop at EuroVis this year. We haven't done a good publicizing its existence, so here is a reminder and a deadline extension: submit your papers on trust in visualization until April 5!
So it's not entirely new, you may remember the EuroVis Workshop on Replicability, Validation, and Verification in Visualization, better known under its highly euphonic acronym, EuroRVVV.
TrustVis is not just a new name but the attempt to broaden the scope and frame the workshop in a slightly more practice-oriented way (it's still plenty academic, don't worry). After all, replication, verification, and validation are all matters of trust in the end. Can we trust this method, this study, this technique, etc.? And how do we know?
This is also a question when it comes to users exploring a new tool. Does this show me my numbers correctly? Can I read and understand what it produces? What kinds of biases does it create? Do I know how to use it to properly show my data? These are all common and real hurdles in trying to adopt a new tool.
This even goes into matters of policy. For example, for a long time, doctors had to look at every single slice in a CT scan and could not base their diagnosis on 3D renderings. I don't know if that's still the case, but trust can clearly have enormous consequences.
The full call and details on how to submit are available on the TrustVis website. But the gist of it is: deadline is April 5, we're looking for short/position papers, the workshop will be held at EuroVis in Porto in early June.
My co-chairs Kai Lawonn, Lars Linsen, Noeska Smit, and I are looking forward to your submissions!
I'm not generally a fan of year-end lists, but they do provide a great way to see many fantastic pieces of work in one place. And if a simple list already does that, what might a list of lists do? Check out Maarten Lambrecht's List of 2018 Visualization Lists to find out!
Maarten's list is largely focused on data visualization, including data journalism pieces from Reuters, The Guardian, the South China Morning Post, the New York Times, the Washington Post, and many more; but also postings about maps, satellite imagery, science photography, and illustration.
A lot of amazing work was created last year, it's worth spending some time to look over and get a sense of both the level and amount of good visualization work that's being produced today.
While activity on this site has been a bit slow this year, I’ve helped start a new group blog focused on visualization research, called Multiple Views.
Visualization research is difficult to access for most people who aren't academics: you have to find the papers, you have to have access to the digital libraries or know where to find free copies, etc. Many people are curious about what is happening in visualization research, though, and manage to follow the field despite our best efforts.
The goal of Multiple Views is to provide this access: paper explainers, reports from conferences, background on a variety of visualization topics, etc. Jessica Hullman has written a nice introduction of what we think the visualization research is and should be, and the current list of postings includes thoughts on visualization literacy, some background on perceptual studies in visualization, a couple VIS reports, etc.
The name was chosen deliberately and isn't just a pun on a visualization term: it is run by four people, and the goal is to get people in the field to contribute. Are you doing research in visualization and want to write about your work in a way that is approachable and interesting for non-academics? Talk to us! Our goal is to solicit postings from others rather than write ourselves.
To make this as easy as possible, we created the blog on Medium. Anybody can sign up for an account easily there (if you’re on Twitter, logging in via Twitter is the most natural), and the writing interface is clean and simple. And who knows, maybe a Multiple Views posting is the starting point for your own blog!
The blog is set up as a Medium publication, which means it aggregates postings. Authors can submit articles either as drafts (which the editors can then comment on) or already-published stories. The stories retain their original authors no matter how they have been submitted and can be included in multiple publications. That means we can republish stories written for other blogs, and others can do the same with the stories we publish.
Blog personnel consists of Enrico Bertini (NYU), Jessica Hullman (Northwestern University), Danielle Albers Szafir (University of Colorado), and yours truly.
Subscribe either on Medium or through the RSS feed!
The Tapestry 2018 program is complete, including the three keynotes and eight newly-added short stories. We are now looking for proposals for demos and will send out a call for PechaKucha-style talks to attendees soon, too.
We're excited to also have a great selection of awesome speakers and topics:
Aritra Dasgupta, As you show, so shall you reap: causes and consequences of bad design in (scientific) visual communication
Jason Forrest, The Data Visualizations of W.E.B. Du Bois
Kenneth Field, The Cartography of Elections
Jonni Walker, Bringing the Genome Home – Bridging the Gap between Conservation and Data Storytelling
Kristin Henry, Storytelling with Color
Bill Shander, WORDS > VISUALS
Alex Wein, Charts as Utterances
Nadja Popovich, Personalizing Climate Change
The conference takes place November 29 and 30 at the University of Miami – so if you've been on the fence about going, now is the time to register and book your travel!
To submit a demo proposal, use this demo submission form. We're still working out logistics and aren't sure how many demos we'll be able to accommodate. We should know early next week, so get your proposal in by Friday (November 9)!
The final report from VIS 2018 (see previously here and here) again covers papers, papers, and more papers. There are new ways to specify visualizations, a panel, perception research, as well as new work on how to deal with uncertainty in data.
New Ways to Make Charts
How to best specify visualizations is still an open question, and one that hasn't gotten a lot of attention in recent years. This year saw quite a bit of activity in this area though, with Draco (mentioned earlier) and the following two approaches.
Yet another approach to creating visualizations are notebooks (like they're used in R or Jupyter). Design Exposition with Literate Visualization by Jo Wood, Alexander Kachkaev, Jason Dykes looks at the importance of capturing the process that leads to a visualization in order to understand the reasoning, trace the data, and also to show the labor that went into its creation.
Panel: Meet the Founders
I had organized a panel that I called Meet the Founders: How to Start and Sustain a Business in the Visualization Space with Lisa Avila (Kitware), Jeff Heer (Trifacta), and Anders Ynnerman (SCISS). The goal was to talk about what it takes to start your own business and what actual founders had learned doing so. We had a pretty full room, which was a pleasant surprise especially because the room we were in was a bit out of the way and this was the session right after lunch (always dangerous).
Lisa, Jeff, Anders, and I (speaking for Tableau, even if I’m not a founder) covered a lot of ground, and we got a number of good questions from people clearly interested in hearing from the people who had set out on their own.
Perception & Cognition
Creating charts is all good and well, but we need to understand how they are being read. These papers can be a bit theoretical, but they help us figure out how to build better systems. In addition to a good number of interesting papers, this year also included a meetup of the VisxVision group that is aiming to make more connections between visualization and the vision sciences.
Mitigating the Attraction Effect with Visualizations by Evanthia Dimara, Gilles Bailly, Anastasia Bezerianos, and Steven Franconeri reports on a fascinating effect known from psychology: when somebody has to choose between two alternatives, adding a "decoy" similar but slightly inferior to one option makes that option more attractive. They show that this is present even in scatterplots and that it takes considerable effort to mitigate the effect.
Which configuration is best for comparing values?
Face to Face: Evaluating Visual Comparison by Brian David Ondov, Nicole Jardine, Niklas Elmqvist, and Steven Franconeri compares different configurations of different charts, like bars, donuts, and a few others, to see which made it easiest to compare between the values. Interestingly, what wins out is dependent on whether one looks for similarity or biggest change. For the latter, animation between the charts turns out to work well (at least for the small number of items they tested). In addition to some interesting results, this was also an incredibly well structured talk with some amazing overview and summary slides that gave a great graphical overview of the study, findings, etc.
Task-Based Effectiveness of Basic Visualizations by Bahador Saket, Alex Endert, and Çagatay Demiralp extends some of the classic studies on chart effectiveness by looking at more fine-grained tasks that people might try to accomplish with them. They find that tasks makes a difference for some of the ranking between different charts, and in particular pointed out that the oft-maligned pie chart wins out for a handful of them.
Showing uncertainty in the data is still an issue with many visualization types and system. It’s too easy to believe that the data we see is precise and certain, when it often is not.
Visualizing Uncertain Tropical Cyclone Predictions using Representative Samples from Ensembles of Forecast Tracks by Le Liu, Lace M. K. Padilla, Sarah Creem-Regehr, and Donald House presents a design for a particularly important type of communication: showing the forecast for a hurricane. The existing cone of uncertainty is confusing (most people think the way it expands means the storm is growing) and leads to a false sense of security for people outside the cone (though the cone only covers 66% of the likely paths). The redesign presented in this paper vastly improves this by resampling the paths and presenting them with snapshots of the size of the storm at regular intervals. There are still some issues that it doesn’t solve, but it’s a huge step forward compared to the state of the art.
In Pursuit of Error: A Survey of Uncertainty Visualization Evaluation by Jessica Hullman, Xiaoli Qiao, Michael Correll, Alex Kale, and Matthew Kay looks at a large number of evaluations of uncertainty visualizations to see what kinds of tasks they consider important, etc. The webpage has a nice overview visualization that shows the categories they found.
Where’s my data? Evaluating Visualizations with Missing Data by Hayeong Song and Danielle Albers Szafir compares different ways of imputing and showing missing data in time series plots. Should the fact that the values are missing be highlighted or not? How should the missing data be imputed, and should it be done at all?
Next Year: Vancouver
Next year’s VIS will be outside the U.S. once again: Vancouver, BC (Canada), October 20-25, 2019.
While the first part of this report covered mostly workshops and other events, it's all papers from now on. Plus a session on the future of the VIS conference.
Temporal aspects of data are extremely important, in particular when the data is more complex than just a simple time series. A number of papers addressed time this year.
Comparing Similarity Perception in Time Series Visualizations by Anna Gogolou, Theophanis Tsandilas, Themis Palpanas, and Anastasia Bezerianos looked at how well different chart types (like horizon graphs, paired line graphs, heatmap-style charts) work to find similar patterns in time series. In short: it depends. Different charts work better for different tasks.
A Multiresolution Streamgraph Approach to Explore Hierarchical Time Series by Erick Cuenca, Arnaud Sallaberry, Florence Ying Wang, and Pascal Poncelet is a new take on stream graphs (which show categorical data over time). Stream graphs work well with few categories and smooth data, but with many categories or spiky data, they quickly become useless. Their Multistream technique combines three different views to allow the user to focus on groups of categories, zoom into the graph, etc.
Line Graph or Scatter Plot? Automatic Selection of Methods for Visualizing Trends in Time Series by Yunhai Wang, Fubo Han, Lifeng Zhu, Oliver Deussen, and Baoquan Chen asks a pretty basic question: given a noisy time series, is a line chart or scatterplot (really more of a dot plot) the better choice? Noise can make it harder to read line charts because the connecting lines amplify the problem. Their technique is pretty simple, but they find that it matches what people would have chosen in a study.
A Vector Field Design Approach to Animated Transitions by Yong Wang, Daniel Archambault, Carlos E. Scheidegger, and Huamin Qu looks at how to animate points between time steps in a scatterplot so that they are the easiest to follow. While techniques for this have been proposed in the past, they have not taken the issue of crowding into account, which has recently been shown to be a major factor in being able to follow what is going on. Their new algorithm does, which makes for better transitions between time points.
Temporal Treemaps: Static Visualization of Evolving Trees by Wiebke Köpp and Tino Weinkauf looks at hierarchical data whose structure changes over time. They create a fairly simple treemap-like visualization that shows nodes merging, splitting, and being added and deleted. It’s also an interesting algorithmic approach because they use constraint programming to specify criteria and create the best solution.
Figuring out which visualizations and charts work for what data and purpose is another important question.
Evaluating ‘Graphical Perception’ with CNNs by Daniel Haehn, James Tompkin, and Hanspeter Pfister reports on the results of training a number of neural networks to solve the classic Cleveland&McGill perception tasks. The results are mixed, with some tasks being done better by the neural networks than actual humans, others worse. I’m a bit confused by this paper, since it uses neural networks that aren’t specifically built to mimic human perception.
What Do We Talk About When We Talk About Dashboards? by Alper Sarikaya, Michael Correll, Lyn Bartram, Melanie Tory, and Danyel A Fisher argues for the importance of dashboards as a separate and distinct way of using visualization from single views, not just collections of charts. Despite their ubiquity in the real world, dashboards are still largely ignored by visualization researchers. The authors of what they call the dashboard conspiracy argue that dashboards are more than the sums of their parts and serve distinct purposes, such as decision-making, awareness building, motivation and learning, and persuasion.
Restructuring VIS for the future
The current structure of the VIS conference has developed over time, and many of the distinctions and ways of doing things are based on we’ve always done it like this rather than real reasons. Over the last year, a number of people have worked on identifying issues and looking at ways to structure the conference in ways that make more sense for the way things work today.
This is an ongoing process, but the first changes are being made, and the members of the committee in charge of this held a well-attended and -structured meeting to solicit feedback at the conference. A few documents summarizing their work and thinking are available on the VIS website for those interested.
The IEEE VIS conference is the most important outlet for academic research. This year's conference took place in Berlin, Germany. Here is a report on some of the most interesting (to me, anyway) papers, events, and developments, in three parts.
As usual, I link to paper websites and materials where possible. Luckily, many papers this year had paper or project webpages and code or materials available. While still not entirely a given, the majority of papers have at least something available (even if it’s only the paper manuscript itself).
Sunday started with the Workshop on Visualization for Communication, organized by Ben Watson and me. We had a good turnout, quite a bit better than I expected in fact. We had not gotten as many submissions as I had hoped, and this being a workshop on Sunday in parallel with other compelling and related sessions (like BELIV at VIS and the second day of Information+) had me worried we’d be talking to a mostly empty room.
But not so! We had 50-70 attendees there and a pretty good program. You can read all the papers and poster write-ups on the workshop website. We are planning on running the workshop again next year, this time hopefully doing a better job explaining what we’re after and possibly expanding the scope of the workshop a bit.
BELIV’s topic for the keynote (which I missed) and papers this year was replication in visualization. The afternoon sessions were organized as breakouts, which makes a lot of sense for workshops, but is actually fairly unusual (most are run as mini-conferences). I think it worked well though, and we discussed a wide range of topics, like how data exploration and statistically sound reasoning can co-exist, replication for quantitative studies, etc.
Maarten Lambrechts talking about xenographics and associated phenomena
The Vis In Practice program aims to make the VIS conference more interesting and appealing for practitioners and people from the industry, not just the academics it primarily serves. This year was the first time they organized a full day’s worth of talks, I believe, and they brought in a good number of interesting speakers.
Among them was Maarten Lambrechts, who talked about xenographics, or unusual data visualizations. He had many interesting examples of how journalists and others create unusual ways to show data for specific purposes, like fitting them onto screens, to get attention, etc. He has collected a whole zoo of them on the xenographics website.
Another session looked at data visualization tools. Lisa Charlotte Rost gave a great talk about using a large number of visualization tools to create the same chart (if this sounds familiar, it was an update of her wildly popular article from last year). She had an interesting distinction between apps (which tend to be easy to learn but less flexible), libraries (hard to learn but flexible), and a new generation of data drawing apps, like Charticulator and Data Illustrator, that are easier and yet more flexible.
Michael Behrisch presented a very in-depth state-of-the-art survey of commercial visual analytics systems he had conducted with Siemens. They’ve released their findings on a browsable website for you to explore.
How many episodes have you listened to?
Data Stories Meetup
Enrico Bertini and Moritz Stefaner organized a meetup for listeners of (and guests on) their Data Stories podcast. It was a fun event where they had us create visualizations by standing along a line in response to different questions, and gave us a little look into their process (including their secret Trello board!).
Opening, Best Papers
VIS had record attendance this year, with 1265 attendees. This is up quite a bit from the last year or two, though those were down slightly. This was only the second time VIS had taken place outside the U.S., and both times had gotten very good attendance numbers (despite a lot of worries that they’d have trouble attracting people).
Draco in a nutshell
I have to say that I was not terribly impressed with this year’s VAST or SciVis best papers, but I did like the choice of best paper for InfoVis: Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco by Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith, Bill Howe, and Jeffrey Heer. It describes a system named Draco that uses constraint programming to allow the user to specify only the minimal amount of information to create charts, yet pick good defaults that are likely to yield useful and informative visualizations. They have written up the idea in a nice blog posting, the whole thing is available as open source, and there is an online editor where you can play with Draco yourself.
What makes Draco interesting is not only the novel approach, but also the fact that this is the first really new way of creating visualizations (at least in terms of being usable for normal end users) in a while. Another one is Charticulator, which I’ll cover in one of the other VIS postings.
The opening session also included a number of test of time awards, which I will cover in a separate posting.
Visualization doesn't have the replication issues that some other fields are struggling with right now, but is that because our science is so strong or because nobody actually bothers with replications? And what can we do to get ahead of potential problems before we run into a full-on crisis? In a paper to be presented at BELIV, Steve Haroz and I list potential pitfalls and present possible solutions.
Replication is a part of the scientific process, and in some of the more established sciences like physics, it's a given that any single finding needs to be repeated by others before it is accepted. Over the last few years, researchers in fields like psychology have found that they were unable to produce similar results when repeating experiments, leading to what is called the replication crisis: if many studies only produced an effect once, how much of what we think is true is just due to chance or mistakes (to say nothing of data manipulation or fraud)?
Visualization isn't all that different from psychology. There are very few replications (and the few that are done are very difficult to get published), and the way we like to work with data easily leads to cherry-picking and other problematic practices. Are we on the verge of a replication crisis – or would we be if anybody bothered to replicate experiments in visualization?
Steve Haroz and I look at six different sources of problems, from bad study design to misinterpreted results, describe why and how they happen, and what can be done about them. We also discuss a number of ways replications can work, from direct replication (same experiment) to conceptual replications (same phenomenon, but different experiment) and registered reports (which get reviewed before the experiment is run to minimize p-hacking).
One reason why replications are hard to publish in our literature is that they are not considered novel. We therefore also propose a few ways of working replications into papers to make them more palatable for reviewers, though we also argue for a more scientific publication landscape in visualization.
The paper will be presented at BELIV, which is part of IEEE VIS, on Sunday, October 21. We're in the first paper session after the keynote, and Steve also has a related paper in the session after that one. Even if you're not coming to VIS, you can click the link below to read the paper.