Back in 2010, when I started storytelling with data while working at Google, I would happily travel and talk to any group that was open to learning how to communicate more effectively with data. Doing that over and over (and over and over) helped me hone my presentation skills, presence and style. It also demonstrated to me that the audience for this content is wide, diverse, and located in many different types of organizations all around the globe.
Over the years, as the SWD side project became my sole (perhaps soul?) project and an actual business, demand outpaced supply and budget size became—fortunately and unfortunately—a means to keep pace. While this helped us build a sustainable business, it didn’t necessarily allow us to serve the widest audience. Much of SWD content is free but our in-person workshops are where incredible progress is made.
I want what we teach at SWD to be within reach for all types of organizations. As you may have heard in podcast Episode 16, we recently conducted a search to expand our team. The response was amazing and we conducted a thorough hiring process through which we met a fantastic group of people. At the end of it, the SWD team was positively unanimous in the decision to offer roles to two outstanding and passionate data storytellers: Mike Cisneros and Alex Velez. I couldn’t be happier to welcome both to the team!
Mike Cisneros joins us from the federal contracting world, where he spent the better part of two decades analyzing and communicating complex data for public and private sector clients. In looking for other ways to express his interests, he discovered—and become deeply involved in—the worldwide community of public data visualization practitioners, gaining a reputation for combining unique design aesthetic with insightful, thought-provoking analysis. The quality of his creations and commitment to the greater data visualization community led to his selection for the Zen program at Tableau. Mike, believes everyone has the ability to communicate more clearly and we are incredibly excited for him to join the team and help others hone their data storytelling skills.
Alex Velez has a background in statistics, which she has put to use in a number of data engineering and analysis roles across the finance, insurance and pharmaceutical industries. Her broad exposure to various stages of the analytical world led over time to a profound realization: many everyday challenges can be overcome through better communication. Alex has developed this important skill through a lot of practice—both on the job in her analytical work and through motivating people by teaching energizing Zumba classes! While she no longer spends her time in a gym studio, we’re ecstatic that she’ll continue helping others in her new role with storytelling with data, sharing her learnings and passion for data visualization and effective, thoughtful communication.
With this welcome addition of two new talented data storytellers, we are now in a position to serve a greater audience, including those who may not have been able to access workshops due to limited budgets. Today, I’m pleased to announce a new program aimed at non-profits, educational institutions and other organizations where learning and development budget may be limited: SWD reach.
SWD reach is an application-based program that brings the in-person half-day workshop to organizations for a fraction of the cost. The application process for SWD reach is open now and we will accept applications through the end of July: learn more and complete the online application form.
Mike and Alex are joining a small but mighty team. Elizabeth has been busy in recent months delivering short presentations and custom workshops to organizations around the US (and soon the world!). Behind the scenes, Jody plays point with clients, keeps us on schedule, plans events like the upcoming Chicago public workshop, and manages our (growing) programs, including SWD reach. Randy wears many hats—in addition to being my incredible and supportive husband—he makes sure we have the right technology, systems, and processes; he pushes us to innovate and try new things; you may recognize his voice from the SWD podcast or his face and friendly banter from the recent live stream event. We also have a number of partners who have been playing critical roles to support our growth.
Honestly, I couldn’t be more pleased that today, we have the right team in place to truly inspire positive change through the stories we tell with data. With more people and new programs, I’m thrilled that we’ll be able to spread effective data storytelling more broadly.
Please join me in welcoming Mike and Alex to the SWD team!
SWD team at our Milwaukee offsite this week. From left: Randy, Alex, Elizabeth, Mike, Jody, and Cole.
Let’s talk about brand. First, a caveat: I am definitely still learning when it comes to this aspect of design. This will be fun, because I’m personally going to gain a lot through this month’s challenge. I hope you will, too.
In our workshops, conversations about brand often come up when we talk about color or in response questions related to required slide templates. Discussion points around brand are often brought up as complaints. We try to shift the discussion to one of creativity-inspiring constraints: “How am I going to communicate effectively if I’m forced to use three shades of green?” …sounds like a fun puzzle to solve! Branding can be an awesome component of visualizing data when we apply it thoughtfully.
Companies go through great amounts of time and expense to create their brand elements: logos, colors, fonts, templates, and related style guidelines. Beyond being required to use these, there can be value in rolling branding into how you visualize data—it helps create a cohesive look and feel can add personality into your data communications. Branding done well goes beyond the specifics of font and color into creating feelings, associations, and perhaps even an emotional response. On a related note, many companies have style guides and increasingly data visualization style guides (Jon Schwabish has been collecting the latter and shares them here) that can lend insight into designing communications in the style of a particular brand.
There’s also something interesting about rebranding a graph: taking a visual originally done in a certain style and changing it entirely. I did this a couple of times for an exercise in my forthcoming book and the process highlighted unexpected learnings for me. While the book won’t be available for a few more months, in the meantime I’d like to share related insight—and give you an opportunity to experience considering brand generally and practice branding a graph specifically—via this #SWDchallenge.
I have an easy time making a graph in the storytelling with data style. This should be no surprise. I’ve cultivated this style as I’ve graphed many thousands of graphs over the years. But the process of taking one of these graphs and remaking it through the lens of another brand made me hyper-aware of the many small decisions I make that impact the overall look and feel of SWD graphs. We can learn nuances about our design choices, preferences, and proclivities through this process of rebranding a graph. Let’s look at an example.
Take the following graph. The storytelling with data typical look and feel has been applied. The font is Arial (it’s basic and I love it!). Both y-axis title and graph title are justified at upper left, with the former in all caps to create a clean frame for my graph. Most elements are grey except sparing use of color to direct attention (orange for a negative callout and associated data point, brand blue for positive data point and corresponding comment). Things are generally titled, labeled, and annotated. There is a good amount of detail, but much of it is pushed to the background for easy scanning and so we can easily focus on the data.
I’m going to rebrand this graph in a totally different style. For inspiration, I’ll use San Pellegrino. I have some immediate associations with this brand, but want to do a little targeted research to build a slightly more robust picture for myself. First, I look at their website. Then I do some Google image searching and browse products, logos, and advertising. I seek out a style guide and find this (“itineraries of taste”—a different sort of guide than I was imagining, though this does give me more insight into the brand!). Here are a few images I gathered over the course of this process:
Next, I list a dozen words I associate with the brand: bright, crisp, citrus, bubbly, fine, Italian, worldly, refined, expensive, international, stylish, luxury. I do a Google search for “San Pellegrino font” and learn it’s probably a custom font, but find Modesto, which people say is reasonably close. I download a free version. I grab colors from the logo and labels for my palette (related resources: Coolers.co has a tool that grabs colors from photos and Canva has a pretty good article on brand colors; I went the easy route and used Excel’s eyedropper to pull colors from the pics I’d gathered). I’m set to begin rebranding. An hour or so later, here’s what I created:
The bottle label ended up being my primary resource in the redesign. I realized quickly that Modesto, a heavy serif font, would need to be used sparingly. I paired it with Century Gothic, which had a round look similar to some of the less prominent bottle label text. I played with a light blue background inspired by the label for the entire graph, but ended up giving up on that (too much blue, and when there was that much solid it actually didn’t feel on-brand) in favor of using it for the footer text only. I moved things around and played with line weights, styles, and borders. I noted the website and ads were quite minimalistic in use of written words, so I adjusted annotations to include less detail and pithier text. The logo star might be overkill (and potentially get into the area of trademark concerns, though I’m hoping since this is simply an illustrative exercise that it isn’t an issue), but I couldn’t help but have a little fun!
Check out the side-by-side—the same graph but VERY different look and feel:
Ready for your fun? You’re up next!
Rebrand this graph in the style of your choice (the link will download Excel file with data and graphs above; you are welcome to use any tool). Feel free to take liberties as you’d like with the specifics of the data for the purposes of the exercise. In your commentary, in addition to the brand—which can be anything (your company, a sports team, a university, use whatever you’d like as inspiration for your design!)—please outline the steps you took to get familiar with it and how that played into your redesign. Share any specific resources used and learnings from your process, too!
DEADLINE: Monday, June 10th by midnight PST
SUBMISSION INSTRUCTIONS: upload your visual and related commentary via at storytellingwithdata.com/SWDchallengeSUBMIT. (DO NOT EMAIL: we are no longer monitoring the old alias!) Feel free to also share on social media at any point using #SWDchallenge. For inclusion in the summary recap post, submissions must be officially submitted to us (still a time-intensive process and we aren’t able to scrape Twitter and other social media sites).
I’m excited to see what you come up with! I’m already imagining with glee the visual we’ll create with the many iterations of this graph, branded in different styles. Stay tuned for the recap post later this month, where we’ll share back with you all of the visuals created and shared via form as part of this challenge. Until then, check out the #SWDchallenge page for past challenge details and recaps. Have fun!
One of my goals with storytelling with data is that we continue to be accessible. The beauty of what can happen when we improve the way we communicate with data is positively endless. You’ll notice we continue to experiment with things like real-time captioning, embedded website transcription, podcast formats and this week, we’ll be trying something really new with a 1-hour (free!) live stream event that viewers can enjoy from anywhere in the world.
I also mentioned at the end of our latest podcast that we have (once again) expanded the SWD team (more on that to come) and I will travel to Europe in a few weeks to offer workshops in new cities. There is a lot going on and we will have even more exciting things to share with you soon.
And so, I want to do two things with this post. First, make you aware of the opportunities that are available and coming soon. Second, to thank you for coming on this journey with us. We’re going to try things—some will be successful, others perhaps not so much, but we will learn and iterate. None of this works without your support, feedback and desire to continue your own growth and development.
To that end, be sure to check out these upcoming events:
The first ever SWD live event will take place this Thursday, May 30th at 10am EST and will be broadcast around the world. This will be an interactive event where together, we’ll learn data visualization through critique. The event is free but you must register to attend.
The upcoming European workshops in Dublin (June 18), Copenhagen (June 25) and Zurich (June 27) offer in-person opportunities to learn, practice, and network with others interested in enhancing their data storytelling skills. Space is limited and you can register here.
Thanks for your continued support for all we do here at SWD!
This month, we explored the concept of artisanal data: a dataset collected entirely on your own—electronically, manually, via surveys, or by observation. Guest author Mike Cisneros challenged us to analyze and find the conclusions that we can have full confidence in because we are the true stewards of the data.
Fifty-three readers submitted their bespoke visualizations. Accountability was a recurring theme: many commented that their intent was to hold themselves responsible to achieving a goal—or that this exercise enlightened them to begin doing so. Not surprisingly, as a result of this hot topics included exercise, weight loss,and spending. It was neat to see the range—both in approaches and tools—that readers applied when creating their visualizations and we encourage you to scroll through the entire post to be inspired by how your peers collected, analyzed and were influenced by their unique datasets.
When reflecting on the submissions, Mike observed the thoughtful considerations participants applied:
“There are many lessons to be gleaned here; not only from the data you chose to collect and the way in which you presented it, but also from the issues and considerations raised in the process. In setting out the challenge to collect your own data this month, I had worried that it might be too burdensome for people; thankfully, dozens of you proved me wrong, and I thank all of you for your efforts.”
Scroll through the entire post to see further commentary from Mike with examples of these underlying themes: data can be humanizing (Andre and Alyssa), visual metaphors can be evocative (Hilje, Kate and Tiffany) and individual experiences can be universal (Julia and Penola).
Other standout entries include Colin and Lotte,who applied an effective takeaway title to their exercise trends while Lizselected clean, uncluttered charts to highlight trends in her personal hobby, reading. Jared took a trip down memory lane with a cool visual timeline of his evolving professional development and soft skills while Lance employed annotations with categorization to visualize the length of names in his family. Several readers discovered something perhaps previously unknown: Haley realized the effect of Amazon Prime on her spending habits, Rebekah validated how she’s been occupying her time post-grad school, Tania discovered the impact of clicking on Gmail ads,and Pris evaluated consistency in implementing her New Year’s resolution.
As an added bonus, Cole highlighted the benefit of having guest author this month: she was able to participate! Her submission provided a peek into the labor-intensive process of writing her second book— storytelling with data: let’s practice!—which will be published soon.
To everyone who submitted examples: THANK YOU for taking the time to create and share your work! The submissions are posted below in alphabetical order. If you tweeted or thought you submitted one but don't see it here, upload your submission as a .png here and we'll work to include any late entries this week (just a reminder that tweeting on its own isn't enough—we don't have time to scrape Twitter for entries.)
The next monthly challenge will launch on June 1st. Until then, check out the archives of previous month’s challenges on our #SWDchallenge page. Happy reading!
For this months #SWDChallenge I have pulled together my Work Train travel for 2019 up to the end of April. I travel quite a bit for my role as Snr Data & Viz officer and we generally are required to travel for product development meetings, team planning meetings, current project face to face meetings and team working days. Our core team is disparately located around the UK, meaning meet ups are not always quick commutes. Interactive viz | Blog
Interactive visualization of organic home garden productivity over time, including soil amendment and environmental effects. Total production by growing season & vegetable category is on top. Selected by filter, the lower graph displays individual vegetable yields with environmental input data. Future year iterations with sufficient historical data may include a yield forecast. Interactive viz
Alyssa’s submission exemplifies how data can be humanizing. Mike notes “Alyssa’s submission hit on several key topics in data visualization: trust, both between collector and visualizer and trust in the accuracy of the observations; careful handling of personally-identifiable information (PII); the importance of not depicting subjective categories as having absolute values; and in being transparent about what data is not shown, as well as what is shown.”
I created this dataset from the narrative sleep log of an anonymous patient at the Counseling Center. They kept incredible detailed notes, and any uncertainty is reflected in the design (missing data, not putting precise labels on hours slept, etc). I feel incredibly lucky to have been trusted with this dataset and hope that the final product reflects one part of the way PTSD impacted the patient's life. A note on the design: the amounts of sleep designated as incapacitated/impaired/operational/rested are intentionally not labelled for three reasons. First, everyone's sleep needs are different, so the number of hours of sleep isn't as informative as the impact that sleep has on a person's life. Second, I want to avoid pissing contests over sleep deprivation ("wow, you consider THAT impaired? I haven't slept more than that in fifteen years," etc). Third, the number of hours of sleep in the log were rounded to the half hour, and according to the patient may have been off by 15-20 minutes since it's difficult to know the precise moment they fell asleep. Therefore, I would prefer to give a vaguer impression (low end of operational, high end of impaired, etc) rather than facilitate falsely precise estimates of the number of hours the patient slept.
The humanizing factor carries over to Andre’s submission. Mike notes,“Andre turned the challenge into an opportunity to combine his interests with his young son’s interests, allowing them to work together on a visualization they could both enjoy. While we hope, at times, to make human connections through the outputs of our data visualization process, Andre showed us that we can also strengthen human connections through its creation.”
My 5 year old son, Túlio, absolutely loves pokemon cards. He doesn't know yet all the rules but is fascinated by the cards' types, numbers, colors and powers. Since he spends A LOT OF TIME looking and analyzing each one of them, I thought it would be interesting to build a visualization showing all of his cards. After cataloging all 153 of them, we made a histogram and a bar plot with pencil and paper. Then I thought it would be a good idea to show the correlation between the cards's "attack damage" and "health points". I also used color to distinguish the card's "evolution stages". Túlio is really getting into data visualization and I am really getting into pokemon cards!
This entry has the most personal data: my own blood pressure and heart rate. The motivation was to obtain a view over time of these data so that I could track trends and also to share it with my physician. The data was collected from the OMRON BP786N blood pressure monitor, and then recorded in a simple CSV file containing date, systolic pressure, diastolic pressure and heart rate. A script called "bpadd" records the data and then calls a decksh script to visualize and show the data.
Starting in March 2019, I really got into data visualization (viz) and starting practicing with data viz community projects, like the #SWDchallenge. Practice fosters improvement and learning! This month’s #SWDchallenge involved collecting our own data, and I have been keeping my track of my weight. Here, we look at my 2019 weight with a focus on the time (before and) after I really got into data viz. It looks like some of my weight loss is correlated with my new found data viz enthusiasm! LinkedIn
My family participates in an annual Thanksgiving Day 5K run, the Trinity Turkey Trot in Princeton NJ. I have an informal approach to getting in shape for this and other 5Ks, with mixed results. Peter Drucker says that “you can't manage what you can't measure.” With this adage in mind, I started tracking my workouts in late 2015. The free MapMyRun web site, sponsored by Under Armour, provides the tools I need. I draw each course I run on a road map of the area and store the course for future reference and comparison. By course and date, I record information on individual workouts: length, time, and number of steps. I also track time spent doing other types of exercise. MapMyRun calculates Average Pace per Mile. I was able to download my MapMyRun workout records to a .csv file. I then imported it into Excel, when my first assumption, “the data is clean”, was dashed. Date of workout was either in DD-MMM-YY format or a string “MMM. DD, YYYY”. Rather than exercising Excel functions or manually correcting rows, I imported the dataset to Tableau Public, which decoded the formatting, and set me up to use Tableau Public to build the chart. I kept Workout Types that were relevant to the analysis of my running, and discarded others. Average Pace per Mile values were generally reasonable. I discarded rows with zero or extremely large numbers. I anticipated that my running pace leading up to a race would impact my 5K race time. After looking at the data, I think that the number of workouts weeks before a race are also important. The chart itself took several iterations to get to where it adequately showed what I envisioned. Not perfect, but close enough.
This submission tracked all the expenses that I had for my cars from when I purchased my first car in July 2012 up to April 2019. This was the 3rd of 4 visualizations; starting with total overall expenses, then stripping out major expenses before getting to minor expense breakdown. It was eye opening to see how much I used to spend on gasoline and on maintenance!
If you are often tired and then track your sleep but don't change any of your sleep habits, then mostly what you'll learn is that you aren't getting enough sleep (which you already knew because you are tired). I did find out that I tend to get a little more sleep on Thursday and Friday evenings, when I would have expected to get the most on the weekends. The missing days are when I went to bed but didn't put my fitness tracker on. (I only wear it at night to track my sleep.) I used the FitBit API and an R script to get the raw data. Tableau was what I used to create the visualization. You can hover over each day and see the waterfall of that night's sleep. Interactive viz
I wanted to evaluate my grocery expenses before and after signing up for the $5 Meal Plan service. It also occurred to me that other food expenses such as restaurants, fast food, coffee shops, etc. should be included since groceries could be substituted for eating out and vice versa. I pulled my dataset from Mint.com, after re-categorizing many transactions, and used Power BI to create the visualization.
Fans of Netflix’s Tidying Up with Marie Kondo have been inspired by guru Kondo’s Japanese-based method of clearing out the clutter in their homes. The benefits are huge. Devotees report living more peacefully and co-existing better with their partners. The key element? Actively working to identify and eliminate anything that doesn’t “spark joy.”
We can apply this same thought process to our data visualizations.
When it comes to clutter in our visuals, we challenge you to regularly examine what specific elements aren’t adding information. What’s making it harder for our audience to get at the data? When we identify and remove clutter from our visuals, the data stands out more.
We’ve discussed this topic frequently. In this video, Cole provides five tips for how to avoid clutter in visuals; SWD book and workshops each have an entire section focused on decluttering. We don’t intend to create cluttered visuals—rather they often materialize when we don’ttake a step back and question our tools’ default settings. Today’s post illustrates one such example and the benefit we can reap from decluttering.
I recently encountered a visualization similar to the following graph. This shows the percentage of babies born within a 24-hour period, broken down by day of the week (having welcomed a baby several months ago, all things maternity still linger in my various news feeds). I recognize this graph: it’s what happens when I put data into Excel and create a stacked bar chart with default settings.
This caught my eye not because of the topic but because of how much time it took me to figure out what information it was trying to convey. What should I do with this? There’s a lot competing for my attention in this chart and distracting me from the data.
Spend a moment examining this graph and take note of which specific elements are challenging. Make a list: what might we eliminate or change to reduce cognitive burden?
I came up with eight specific design changes I would make. How does my list compare with yours?
Remove the chart border as it isn’t adding informative value. Often, we use a border to differentiate parts of our slide/visual. In most cases, we can better set them apart with white space.
Delete the gridlines. Will the audience be physically dragging their fingers across the y-axis to identify an exact value? If that level of specificity is important, label the data point(s) directly.
Be sparing in use of data labels. Use them in cases where the exact values are important to the audience. Otherwise, remove and use the axis instead.
Thicken the bars. While thereare no hard and fast rules, the bars should be wider than the white space between them so we can more easily compare. In this case, the superfluous white space can be reduced.
Title the axes appropriately. Exceptions are rare for omitting an axis or chart title. Don’t make the audience do work to figure out what they’re looking at, and instead make a habit of titling appropriately to enable the audience’s understanding before they get to the data. Let’s take two related steps here:
Use a more descriptive y-axis title: Instead of the vague %, we can eliminate the guesswork and be more specific: % of total births. While we’re at it, let’s drop the unnecessary trailing zeroes from our y-axis labels.
Clean up x-axis: Diagonally rotated text is slower to read.We can abbreviate the days of the week so they render horizontally. A super-category (such as Weekday or Weekend) could also simplify the process of taking in the information.
Move the legend directly next to the data it describes. This alleviates the work of referring back and forth between the legend and the data.
Use color sparingly. There are so many colors in this graph that our attention is scattered and it’s hard to focus on any one thing. Depending on what we want our audience to take from the graph, we can use color more effectively to focus attention on those pieces only.
Add a takeaway title. Don’t assume that two different people looking at this same graph will walk away with the same conclusion. If there is a conclusion the audience should reach, we should state it in words with an effective takeaway title.
Each step seems relatively minor on its own, but check out the impact when I apply all eight steps simultaneously:
Now we can more easily see that babies delivered on a weekend are more likely to arrive during the early hours of the day (midnight - 6am), compared to weekday deliveries. Related note: this dataset didn’t include the absolute number of babies born each day. Ideally, we’d want that information for context, but for the purposes of this illustrative example, we’ll assume the numbers are large enough to accurately compare across days of the week.
By reducing clutter, the audience can use their precious brainpower to decide what potential actions might be warranted, rather than trying to figure out how to read the graph. Taking time to modify the default settings means we can focus on the data and the message.
In my case, I might have wanted to get some extra rest on the weekends as my due date approached! As it turned out, baby Henry arrived safe and sound among the 17% of Thursday babies born in the 12am-5:59am window.
Elizabeth Ricks is a Data Visualization Designer on the Storytelling with Data team. She has a passion for helping her audience understand the ’so-what?’ as concisely as possible. Connect with Elizabeth on LinkedIn or Twitter.
This month's #SWDchallenge comes from guest author Mike Cisneros. Mike is an active participant the online data visualization community, a member of the Data Visualization Society, and is in his second year as part of the Tableau Zen Master program. Recently, we engaged in a discussion about where the chart designer's responsibilities begin and end, in terms of validating the data being graphically depicted. It led down some interesting paths about ethics, truth, and ownership, which in turn motivated him to submit this challenge to the SWD community. To see some of Mike's more creative work, check out his Tableau Public profile or his Instagram feed, or connect with him on LinkedIn or Twitter. Thanks, Mike, for this galvanizing challenge!
If you work in data visualization, then it’s a safe bet that most of the data you are asked to visualize comes from other people. That is to say, you are not the owner of that data—it belongs to your customer, or comes from a vendor, or it’s open source.
And because of this, we’ve all probably had the experience of looking at a new dataset for the first time, and scratching our heads as we try to make sense of what we’re seeing. We come to expect that. It’s normal to expect a bit of idiosyncrasy and messiness from any data.
We’ve also learned to take certain datasets with a grain of salt, based on our first impressions with them. When you open up a file to find what was clearly manually-recorded data—survey data, free-form text data, things of that nature—your experience and your instinct tells you that you’re not getting perfect information, and you likely treat it accordingly. Which is to say, you are careful with the conclusions you draw and the analyses you perform, because you know that you might be missing critical context or observations. Those missing pieces could be essential to getting at the truth of the situation.
Sometimes you find yourself working on more objective-looking data—maybe it’s auto-generated (like log files), or electronic records of transactions (purchases, transactions, other financial data), or maybe some kind of sensor-recorded data (from weather stations other environmental monitors).
Even though it looks objective, it’s still not perfect information; there’s always the chance for recording error, for anomalies in the systems, or other confounding factors. Our mental alarm bells don’t always ring quite as loudly and insistently with this data, but that’s why it behooves us to listen closely, just in case.
And even if we get an absolutely PERFECT dataset—clean, complete, validated, the works—we STILL can’t be 100% assured in our analyses. Because we don’t know what choices the dataset’s owner made before we saw that data. For example:
We can see what measures are IN the data, but we don’t know what was chosen to be excluded, or why.
We don’t know why the sensors began recording when they did, and why they stopped when they stopped.
We don’t know who made any of these editorial decisions surrounding that data.
We don’t know why the data was being collected in the first place.
When, then, can you, as the data visualizer, be perfectly confident in EVERY SINGLE FACET of a dataset?
Only when this is true: when YOU are the collector.
If you collected the dataset, and you made those choices, and you cleaned that data, and you know every reason behind every decision, then you are PERFECTLY positioned to analyze THAT data with full confidence. The analyses you perform, the conclusions that you draw—you’ll know just how far you can take them, and where you begin to overstep the bounds of what analyses the data supports. You will know the context in which your analyses can safely and justifiably be performed.
So, for this month’s #SWDchallenge, we’re not focusing on a specific chart type, but rather on a data type: bespoke data. A dataset that you collected yourself—electronically, manually, via surveys, by observation…whatever you like, as long as it is yours and yours alone.
Now, it doesn’t have to be about you, although it certainly can be. That’s the choice I made for my own entry this month. I decided to look back at my own history of purchasing books on Amazon, and to see how my purchasing habits changed once I bought my first Kindle. I expected to see my “books purchased” go down in roughly the same volume as my “ebooks purchased” number increased.
This was not what I found.
First of all, I made an invalid assumption: I assumed my ebook purchases would begin after my first Kindle purchase, which was in 2012. In fact, I started buying ebooks in 2011, when I first downloaded the Kindle app to my tablet.
And second of all, my purchases of physical books didn’t decrease in line with the increase in ebook purchases. In fact, while my Kindle book purchases went up WAY past any prior annual purchase of physical books, my paper-and-glue book purchasing stayed the same, and even increased in some measure.
Because I knew this data so well, I could find some other interesting tidbits.
I found that I bought three Edward Tufte books back in 2005, so I can pinpoint that year as the time I first thought of data visualization as an actual, distinct discipline that I could study and focus on as a career.
I found that 2 of the 3 books I bought in 2007 were the same book, on the same day. (Can you imagine what book it was, and why?)
And, I found that since 2015, there’s been a bright line dividing the different genres of book I choose to buy in physical copies and which I buy electronically.
What will you create, and what conclusions will you be able to state with confidence, when you’ve collected your very own, hand-crafted, artisanal, dataset? Let’s find out.
Go out and collect a dataset of your own, analyze it and create a graph visualizing your findings. (Remember: sometimes the smallest, most specific stories can tell the most universal truths.)
We look forward to seeing what data you collect and visualize! Stay tuned for the recap post later this month, where we’ll share back with you all of the visuals created and shared via form as part of this challenge. Until then, check out the #SWDchallenge page for past challenge details and recaps.
This month, we pulled inspiration from Austin Kleon’s book Steal Like An Artist and challenged you to emulate an inspiring visual of your choosing. Fifty-four people rose to the occasion! It was neat to see a common theme of “honorably copying” applied insightfully and many readers elaborated on the honorable piece with detailed insight into their thought processes of color choice, annotations and chart type.
Popular sources of inspiration were Charles Joseph Minard (including RJ’s internal migration patterns and Frans and Lisa’sview’s of Napoleon's Russian campaign), FiveThirtyEight (Seonaid used Python to recreate non-fatal gunshot injuries, Liz added context to English Premier League teams’ makeup and Hamza chose a trend of The Daily Show guests over time) and even storytelling with data! Rolando and Ash chose visuals from the book and added their own flavor with additional context. Nadieh Bremer’s work was emulated multiple times: Moyocoyani used an approach originally depicting homeless relocations to visualize NBA player distributions, Kate recreated the Baby Spike viz, adding some amusing annotations (the original was created together with Zan Armstrong) and Jonas utilized a similar view for cycling data. Jared beautifully emulated David McCandless with a visual on lifespan influences and Pris prepared a nice boxplot of Trump’s approval rating inspired by Geoffrey Skelley.
To everyone who submitted examples: THANK YOU for taking the time to create and share your work! The submissions are posted below in reverse alphabetical order by first name (+ last initial when needed; we omitted full last names in respect of those who would rather remain anonymous). If you tweeted or thought you submitted one but don't see it here, email your submission (including your graph attached as .png) to email@example.com and we'll work to include any late entries this week (just a reminder that tweeting on its own isn't enough—we unfortunately don't have time to scrape Twitter for entries, so emailing is the sure way to get your creations included).
We encourage you to scroll through the entire post to be inspired by how others in the community approached this challenge (please click “like” when you get to the bottom— this helps us assess the time spent pulling together this recap summary).
Stay tuned for next month’s challenge which launches on May 1st. Until then, happy reading and check out the archives of previous month’s challenges on our #SWDchallenge page.
I have combined the revenue generated by airports in Pakistan as well as the elevation in feet in which their airplanes fly through the air.
This chart is from the New Jersey page of the United Way 2018 report. I used Excel to prepare the emulated report, and learned some new techniques. Although the numbers are the same, I added color and moved the color legend above the chart for readability.
First, I tried to copy the same visualizations, and added some changes to further improve it. I inverted the charts for profit and sales. I used a line chart to represent avg yearly profit and a bar graph to represent avg yearly sale. Next, I shaded the area below the line chart, which makes it easier to grasp the profit percentage. The more the shaded area, the greater the profit. Also, the bars above the shaded area give an idea of the cost of products. Hence, the entire bar represents avg sale; shaded bar represents profit and non-shaded bar gives the estimation of the cost. Furthermore, to narrow down and refine the grain, I’ve also added a map that shows countries in a particular region. Users can further click on one of the blue circles on the map to see country-wise sales and profits. Also, to give a better idea of which product performed better, I’ve displayed the total sale price for each product.
Took Inspiration from: www.twooctobers.com housing visualisation. Tools: PowerBI and Microsoft Excel Dataset: Cricket Rankings ODI from kaggle.com Idea: As the World Cup 2019 draws near, we wonder the ever so elusive question, what constitues as a good batsman? This is a comparative analysis of some of the top performers in cricket and their performance traits.
For the "Emulate" challenge, I also chose to reproduce a recent FiveThirtyEight graph, this one looking at the uncertainty in the CDC's estimates for non-fatal gunshot injuries. The uncertainty has been increasing over the last few years, and "why?" is an outstanding question. I found the emphasis on the uncertainty in the data, (rather than the estimates themselves) interesting, and discovered that the design paralleled that emphasis in its choices of colour and line weight. I used python for my visualization, using the original data which is available from the CDC website.
I emulated a temperature anomaly graph over land ocean by NASA. I used PowerBi as a tool for dashboard. Interactive viz | Twitter
I have tried to emulate a global superstore dashboard using Power Bi tool.
I have always loved Kirk Goldsberry's visualizations on basketball, particularly his shot charts, so I attempted to recreate his one of his designs. I chose this type of shot chart because of the neon look and because the black background makes the colors stand out well. It wasn't particularly hard to recreate in Tableau, however the finer details, such as the halo's around the marks, were more difficult to get exactly right. Blog | Twitter | Tableau Public
My inspiration is Cole Knaflic's chart from her book. I used Excel for recreating the DataViz. One of the key difference is the addition of a 3rd series, which is the 'Backlog' bar chart. I decided to add this so that the audience can quickly see the increase in backlog, without having them to calculate in their minds how much the backlogs where.
My favorite emulation is a piece for SIGNIFICANCE magazine that I worked on with Howard Wainer. It emulates the style of Charles Joseph Minard generally, and particularly his series on the transport of mineral fuels. Blog | Twitter: @infowetrust
In response to the April SWD challenge, I took inspiration from Geoffrey Skelley's Trump Approval Rating chart and replicated it on Tableau Desktop using the DESI dataset used in an earlier MakeoverMonday challenge. I chose the chart because I loved how it simplified a box and whiskers chart to make it easy to understand for everyday audiences.
Tool: Tableau Public Inspiration: Probability you will live x additional years from flowingdata.com Nathan Yau. The original visualization simulated random death events and real time histogram accumulation. Animation capability out of my reach using Tableau. What I liked about the inspiration was the whimsy & simulation.
Found the original visual on PowerBI community, thought I could try it out in my own way, and so I did.
I have recreated football players attributes visualization in the game FIFA 18. In the image attached, on the left is my visualization and on the right is the source. I used the Microsoft Power BI tool to achieve this task. The source data was provided by Tableau public sample data. The overall attributes were missing so I devised my own formula to calculate those statistics. This is the reason for some inconsistencies with the source image.
The plot was elaborated with python, more specifically by using the libraries of pandas, numpy and holoviews with matplotlib backend. Inkscape was used for the final edition. The dataset was not the same that the original authors used, instead I used "NBA Players by season" dataset. I decided to chose this work because I am a big fan of Nadieh Bremer's work, she's an inspiration. You can follow me at twitter @MoiYo
For the SWD Challenge, I worked on Houston Crime Data. The city of Houston provides high level details on crime statistics via Excel files on the police website. These statistics are stored in monthly files available via Excel or Access. The data set was used along with PowerBI as the visualization tool, in order to better understand the data. Source | Twitter
I have emulated a tableau dashboard interactivity into PowerBI dashboard. Along with emulation, I've tried to improve the dashboard by adding some additional feature to enhance the effectiveness. It was a great learning process!
For the emulate challenge, I am sharing a dashboard I created based on Adam McCann's Dynamic KPI Metric Design. I loved the simple yet effective design KPI card design, so I decided to apply it to a viz of my own with separate data. I was curious to see if Fortune Magazine's stock picks were actually effective - using Tableau + Google Sheets/Finance I created a tracker to see how each pick is performing relative to the stock price mentioned in the original article. An added bonus was the viz was selected as a Viz of the Day on Tableau Public.
I’ve had a number of career conversations lately. People reach out, wanting to pick my brain on various topics: they are interested in “getting into” data visualization or want to write a book. These chats tend to prompt some self-reflection. What advice would I have given myself ten years ago? What would I do differently today?
I did not set out with a master plan to build a business. That is what has happened, though, as I’ve followed my passion, focused on doing good work, capitalized on opportunities, and tried to share as much as I can with others. I articulated my mission when I first started blogging back in 2010: to rid the world of ineffective graphs. This was the point where a class I was teaching at Google sparked interest for a conference talk, which led to workshops, where I somehow worked my way into the dream job of helping organizations around the world achieve their goals by improving the way they communicate with graphs and presentations. (For more on the evolution of storytelling with data, check out Episode 3 of the SWD podcast, “How I’m Building This.”)
At one point, I had to make the decision: am I good with things as they are, or do I want to scale? In business school, when we talked about scaling, it was mainly conversations around diversifying or acquiring or generally doing things to make more money. But that’s never been my goal. The reason to scale—from my perspective in this space—is to reach more people. I am a strong believer that there is a huge amount of value out there to be obtained by work that is already being done that simply isn’t being communicated as well as it could be. And I think I—we—can change that.
In the past few years, I’ve scaled in a couple of ways. First, through a book. In 2015, storytelling with data: a data visualization guide for business professionals was published. In 2017, Elizabeth and I found each other; she shares my love of data driven communication and is doing an awesome job bringing related strategies to organizations large and small. Randy and Jody do a ton behind the scenes to ensure we are thinking big and the business runs smoothly and efficiently. There are also a number of other lovely people who help support us in different ways. We have officially become a team! Our overarching goal today remains the same—beyond banning bad graphs, we aim to help others be better data storytellers and drive real change through data. We are fortunate that this aligns with something that individuals and organizations recognize as important and want to develop.
Still, there is so much more to do. In many ways, we’ve only scratched the surface. As a result, we’ve been evaluating additional means to scale. In the past year, I have been working diligently on my second book (it’s nearly done!) as well as new projects for you (more on this soon!). But we’ve been running at capacity. We can reach more people if we search for others who share our passion. I am excited to announce that SWD is hiring new team members to help with this goal.
So, what would I do differently today? Looking back, there have been some difficult points (here’s one, and another, more recently this). But I am very happy with where things are at currently and extremely excited about the future. Though things have been challenging at times, those bumps have, in part, led us to where we are today and I consider myself very fortunate to be able to do what I do and call it work. Because of that, I wouldn’t change a thing. What advice would I give myself ten years ago: look forward to April 2019, when you’ll kick off the process to expand your team and spread the lessons for effective data storytelling to even more people!
Visit our careers page to learn more about our current openings.
Think back to your imaginative play as a child. I’ll bet that you spent time dressing up and pretending to be someone else—a superhero, a firefighter, fictional character or maybe even a scary dinosaur! At our most basic level, we learn by copying others. This idea of emulating your sources of inspiration is the basis of this month’s #SWDchallenge.
When learning data visualization, we consider imitation to be a good thing. Practicing by copying others can help you further develop your skills. In Episode 14 of the SWDpodcast, twelve dataviz experts shared how they honed their data storytelling skills and several cited Steal Like An Artist as influencing their creative processes. In this book, Kleon writes that no one is completely original, rather we are a blend of our sources of inspiration plus our own style. He gives many examples of artists who developed their style by studying and emulating the masters in their field. Even Paul McCartney employed this principle, giving credit to Buddy Holly, Little Richard, Jerry Lee Lewis and Elvis as heavily influencing the Beatles’ evolution from cover band to the greatest rock group in history. Kleon’s principle is that by honorably copying others, we take on their style and refine our own at the same time.
Note that this is distinctly different from plagiarizing. Kleon illustrates via example:
“We're talking about practice here, not plagiarism—plagiarism is trying to pass someone else's work off as your own. Copying is about reverse-engineering. It's like a mechanic taking apart a car to see how it works."
Let me share an example illustrating how we can honorably copy in data visualization. I’m a fan of Nate Silver’s work. He’s the founder and Editor-in-Chief of FiveThirtyEight and has written an excellent book The Signal and the Noise on the value of thinking probabilistically. I read FiveThirtyEight for my daily dose of current events and to get inspiration for application of good data visualization principles.
I set out to emulate the chart below. This visual was referenced in this article, which examines the optimal intensity of exercise to reduce your chance of death.
There are many things I like about this chart. The title sets my expectations for what I’ll see in the graph (the data suggests an inconclusive answer, thus why it’s posed as a question). I also like the choice of a connected scatterplot, which allows me to examine the relationship between the independent variable (intensity of exercise or MET hours per day) on the x-axis to the dependent variable (hazard rate of dying) on the y-axis.
In my process of emulating, my first step was to copy the visual as close as possible to the original. Even though I made some changes in my final version (I’ll address those momentarily), I was surprised how much value I found value in copying theirs exactly as is! I learned a few things in my tool (Excel) that I hadn’t tried before (namely, changing the gridline interval and axis ranges of the connected scatterplot).
For reference, here’s my first iteration in Excel. I’m visually estimating the numbers, so they won’t exactly match the original.
Step 1: I learned something new in my tool by copying the FiveThirtyEight original as closely as possible.
In moving from copying to emulating, I wanted to blend in my own style with a few design changes. Mostly, I wanted a visual that could stand on its own if it were removed from the article, which provided the relevant context that you’d need to interpret the data. I changed the subtitle to include the main takeaway and to elaborate on the question posed in the chart title. I added annotations about the two studies next to the data they describe. Finally, I wanted to emphasize Study 2, which backs up the main takeaway. I used the same color (purple) to create a visual connection between the subtitle and the line depicting Study 2.
Below is my final “honorable copy.” I’ve emulated, using the original FiveThirtyEight graph as a source of inspiration and adding my own style.
My final emulated copy (note: data visually estimated from FiveThirtyEight original).
Now it’s your turn.
My challenge to you: find a visual you like and emulate it. This can be any visual from any source (please give credit to the source). We encourage you to peruse examples from the media, blogs, or dataviz experts to find sources of inspiration (over time, start a visual library that you can refer to and be inspired!). Don’t be hindered if you can’t find the underlying data—it’s fine to visually estimate for the purpose of this exercise (note clearly if you’ve done so).
DEADLINE: Wednesday, April 10th by midnight PST. We are back to normal protocol this month: submissions must be emailed to us for inclusion in our summary recap post—full details below.
Make it. Create your emulated visual with the tool of your choice. If you need help finding data, check out this list of publicly available data sources. You're also welcome to use a real work example if you'd like, just please don't share anything confidential.
Share it. Email your entry to SWDchallenge@storytellingwithdata.com by the deadline. Attach an image of your submission as a .PNG. Feel free to include both the original and your honorable copy but please send as a single image. Put any commentary you’d like included in the follow up post in the body of the email (e.g. what tool you used, any notes on your methods or thought process you’d like to share); if there’s a social media profile or blog/site you’d like mentioned, please embed the links directly in your commentary (e.g. Blog | Twitter). If you’re going to write more than a paragraph or so, we encourage you to post it externally and provide a link or summary for inclusion. Feel free to also share on social media at any point using #SWDchallenge but for inclusion in the summary recap post, submissions must be emailed to us—we don’t have the time to scrape Twitter and other social media sites.
The fine print. We reserve the right to post and potentially reuse examples shared.
We look forward to seeing how you’re influenced by your sources of inspiration! Stay tuned for the recap post later this month, where we’ll share back with you all of the visuals created and shared as part of this challenge. Until then, check out the #SWDchallenge page for past challenge details and recaps.
The March #SWDchallenge took on a slightly different flavor. Rather than summoning you to try a certain graph type or approach, this month’s goal was effectiveness. We sought to see how people varied in answering specific questions about the same dataset. Data was sourced from AidData in partnership with Enrico Bertini, Associate Professor at NYU, who will be undertaking some data visualization research based on this challenge.
Sixty-seven people submitted a response to answer the primary question posed (“Who donates?”) and related sub-questions on the interesting patterns in the distribution across countries and recipients. With so many ways to visualize same dataset, you’ll see evidence that there isn’t a single “right” answer when it comes to how we show and communicate with data. Data can be visualized in countless different ways and by varying views of the same data, we enable our audience to see different things.
To everyone who submitted examples: THANK YOU for taking the time to create and share your work! We aren’t going to call out specific entries this month, so as not to introduce bias. By participating in this month’s challenge, you’ve helped Enrico push forward some important research. We’ll be sure to share more on that front once it’s available. The submissions below are posted below in alphabetical order and include the link to the original Tweet or interactive visual.
We encourage you to scroll through the entire post and be inspired by your peers’ approaches to this challenge! Spoiler alert: inspiration will be a central theme in the next challenge, which will be announced on April 1. Until then, check out the #SWDchallenge page for the archives of previous months' challenges and submissions. Happy browsing!
My approach to this challenge was to answer the question based on 2013 data, looking at who the top 10 donors are, who they have made donations to and for what causes. One thing that I have done differently was to group the purposes into broad categories to give us a rough idea on what efforts the top 10 donors are focused on contributing to. (I am looking to write a medium post about my approach soon.
For this month’s challenge I decided to use Power BI as an opportunity to get more familiar with DAX. I focused on all donations by The Netherlands to other countries. Because there are a lot of recipients and a lot of purposes in this dataset, I decided to show only the top 5 recipients and purposes. But because this shows only a part of the picture, I also wanted to visualize the part of this top 5 in the total amount. I did this with the stacked bar chart and the accompanying text. The interactive visual can be seen here.