Author’s note: Thank you to Naomi Robbins, Eva Murray, Chris Love, Luke Stanke, & Steve Wexler for reviewing and providing helpful input to this blog post. That doesn’t necessarily mean they agree with everything in it. Just that they shared thoughts that helped shape what is shared here.
Tl;dr – If someone hasn’t asked you for feedback about something they’ve made, reach out in private first and ask them if they’re interested in hearing your thoughts. If they say ‘yes’, ask whether they’d prefer to hear your thoughts in public or in private. Then proceed with tact. If they say ‘no’, just keep it to yourself and move on. The world will keep turning.
I’ve been involved with the broader data visualization community for almost a decade now, and I feel that it has gotten stronger over the years. A few times every year a contentious debate surfaces. Often it revolves around a particular visualization or a technique that someone has put out there. That’s okay, and we need to have healthy debates and be able to disagree.
But it’s also important to be civil and considerate of one another. That doesn’t always happen in other online communities, and I think a big part of the reason why things get ugly is that people don’t go about giving constructive criticism with care. They just sit behind their keyboards and fire off their thoughts before taking time to think about how it will be received. I know I’ve been guilty of that on more than one occasion.
I have some thoughts on this, thoughts that have shifted recently, and I’d like to pose a question that I’ve seen discussed recently related to the topic of giving critique:
When, if ever, is it okay to give unsolicited, constructive feedback in public to someone who shares something they’ve made broadly online?
Before reading any further, think about how you’d answer this question. Try to put yourself in the shoes of both the giver and the receiver of feedback.
When Feedback Lands Well..
Now, I’d like to share a time this was modeled well for me.
About seven years ago, when I was just getting started in the data visualization community, I created a dashboard in Excel for a contest. It was an Excel contest run by Chandoo about visualizing salary data. I wanted to challenge myself to see if I could build something similar to a Tableau dashboard in Excel. It took me a long time to build my final submission, and it was like swimming in molasses for me. People like Chandoo or my friend Jorge Camoes could crank out this and ten times better in a fraction of the time that it took me to do it. Interactivity in Excel just wasn’t (still isn’t) my strong suit.
Here’s what I built:
My Excel salary survey dashboard from 2012
After blogging about my submission, I got an email in my inbox. It wasn’t from just anyone. It was from Naomi Robbins, author of ‘Creating More Effective Graphs‘, a book I had read and loved. She has a Ph.D. in mathematical statistics from Columbia University. She has served as chair of the Statistical Graphics Section of the American Statistical Association (ASA).
So to me, this was Naomi “effing” Robbins – a person I had never met, but someone who I admired greatly from afar. Just seeing her name in the From field was a thrill to me. It was exciting that someone of her experience and stature would be reaching out to me. That alone totally made my day.
She had seen my contest submission and blog post, and she was reaching out to give me feedback. Here’s the essence of her feedback to me:
She started the email by saying that she was reaching out privately because she didn’t want to affect the judging process of the contest, which at that point was still ongoing. A highly considerate start…
She then mentioned that she had no issue with how I was showing the data, but that she had an issue with what was being shown – averages (means) of salaries, and she explained why she had a problem with that – “incomes are not usually distributed symmetrically…”
She complimented me for using error bars, and she mentioned that she has noticed most data designers don’t.
She shared that she didn’t feel, however, that using +/- 1 standard deviations to create the error bars was the best approach.
She closed by stating that she hoped I was a person who’s open to receiving feedback.
What was my response to this? I felt honored that she would take the time, I felt respected that she cared enough not to affect the contest, and I felt educated by someone who knew more than me. I was elated.
Now, returning to the question I posed at the top of the blog post:
When, if ever, is it okay to give unsolicited, constructive feedback in public to someone who shares something they’ve made broadly online?
I have to admit, when I first thought about this question a few months ago, my immediate gut feeling was that a community where you can’t proactively give feedback would quickly become a “Love Fest”, which I wrote about a couple years ago. I feel as strongly now as I did then that this is something to avoid as much as the opposite extreme (the “Shark Tank”).
A 2016 diagram showing how communal growth happens when feedback is balanced
Notice that in my 2016 diagram above, what prevents the marble from falling from the top of the curve down into the “Love Fest” on the right is the barrier called “give constructive feedback.” We need to be able to give & receive feedback to help each other get better. How do we do that well, though?
There are two separate parts to it – the first part is figuring out whether or not a person is open to receiving constructive feedback from me. The second part, which is even more important, is to consider HOW to give the feedback. I want the feedback to be perceived as constructive rather than negative. The barrier on the left that prevents the marble from falling in the shark tank is titled “don’t be a bully.” How do I avoid being a bully when even well-intentioned advice or feedback can seem harsh to people? Getting the person’s green light upfront is a big part of it. Then it all comes down to choosing the right words.
A Shift in Thinking
The shift in my thinking happened when I recalled that first encounter with Naomi, who I’ve been lucky enough to become friends with since then. I asked myself how I would’ve felt if she tweeted her critique instead, or blogged about it without first reaching out to me. Or if she had approached me, even in private, with a measure of snark or arrogance. It would’ve been harder to receive that feedback well. I hate to think about it, but perhaps I would’ve become defensive or, heaven forbid, childishly critical in return. If you know Naomi and how simply wonderful she is, this is a scenario I just cringe to think about.
So I’m beyond grateful that she was as considerate as she was in that moment. And I’m glad that she modeled empathetic feedback for me, early in my time with the community. It’s a very effective way to interact with people who put their work in the public eye. It’s a nerve-wracking experience for people who make and share stuff, and each and every visualization has a very large number of opportunities for error, so it would be easy for the whole deal to become a shooting gallery of criticism.
A Possible Approach…
At the risk of over-thinking this, I’d like to put forward an approach that I’m going to try following in the coming year. I’m actually taking it one step further than Naomi did with me and, when the person hasn’t explicitly asked me for feedback, I plan on first asking them whether they’re interested hearing my thoughts. Then I’ll inquire as to how they’d prefer to receive it.
I’m happy to know if you agree or disagree with all or parts of this, and if you’d change it or add to it.
The scenario: I see something someone has created and published to the general public, and I either see an error, or an opportunity for improvement. What do I do?
A simple flow chart to determine whether and how to give feedback
I added the note about context at the top, “Context: a community of individuals sharing their own work”, because I believe the guidelines of engagement should be slightly different for news sites, governmental bodies, companies, non-profits, etc, that speak in a more official capacity than you or I as hobbyists or individuals with just our own voice. The rules might be actually more similar than different, by the way, and should still involve basic human decency and civility. But surely we should be able to call out grossly inaccurate reporting, outright propaganda, and heavily biased information put out by organizations without first getting their permission upfront.
Here are a few additional thoughts:
Be extremely careful about “subtweeting” or “sub-blogging” your critique – speaking in general terms about the issue you saw, but not referencing the specific work. This can work out, but it can also backfire. People have a way of sniffing out your true intention, and it creates a climate of hidden agendas and such.
If you’re an “influential” or a “leader” within a community (if you think you might be but you’re not sure, then you are), then approach these exchanges with great care and always err on the side of gentleness. Our words carry weight, and we can do a lot of harm or a lot of good with them.
If you make something and you ARE open to feedback in public, ask for it!! Just say “Hey, I made a thing, what do you think about it? Any ideas how to make it better?”
My friend Eva Murray, who provides lots of people feedback through the Makeover Monday project she helps to lead, gave me a further suggestion – be specific in requesting feedback. What is it that you’d like people to give feedback on? Your use of color? The flow of the story? etc.
Open conversations about what works well and what works less well are amazing and many people benefit from them. So I encourage you, if you’re comfortable, to spark a debate about your own work from time to time. Know that some people will have ideas on how to improve it. You might agree with them, and you might not agree. That’s okay. You’ll learn a lot from the conversation because I know there are many smart people out there willing to share what they know and think. If you’re still new and not quite up for that sort of thing, that’s okay, too. You don’t need to feel like this is something you need to expose yourself to.
So ultimately, after thinking all this through, I changed my original point of view on the lead-in question. At first, I would’ve said, “absolutely, we should be open to constructive criticism in public, and we should all be empowered to give it, with or without an invitation to do so.”
Now I’d say, be careful with constructive feedback, and only give it in public if the person has signaled that they want to receive it from you that way. Otherwise, reach out in private and ask whether they’re open to hearing your thoughts and having a conversation. Another great suggestion from Eva – “make sure to be very clear in your choice of words that you are critiquing their work/output/viz and not them as people or their choices to be part of a particular group.”
This comment really gets at HOW to give feedback well. People in general are shockingly willing to be harsh and unkind in their comments online, and to attack someone they’ve never met. We’d never treat a dinner guest that way. Why do we feel like we can talk to someone like that on the internet?
In a nutshell, let’s try to be a little more like Naomi by reaching out in private first, unless someone has made it clear that they’re inviting public feedback. We’ll continue to foster a safe and mutually beneficial space, and we’ll make some pretty cool friends in the process.
Well it has been one month since I left Tableau and started my first business, Data Literacy, LLC. What a fun month for me! I’ve really appreciated all of the positive encouragement and support I have received since making the announcement. I was surprised to see the news covered by GeekWire, and my kids definitely got a kick out of that. It was also refreshing that so many people who have been working on data literacy for some time now went out of their way to welcome me to the space – people like Jordan Morrow, Valerie Logan, Jane Crofts, Matthew Jesser, Dave Mathias and others. It’s a very supportive group, and I’m glad to be a part of it. Closing the data education gap that exists right now is an all-hands kind of thing.
I’ve been hard at work on my website and programs, the Data Literacy Advocates group I started with Anna Foard on LinkedIn has grown to over 1,100 people, and I’ll be setting up my downtown Bellevue office starting tomorrow, so things are coming along nicely. I was excited to be able to run a free program last month for veterans covering a set of data literacy training materials for the first time. It was great to get to know them a bit over four 3-hr courses in two weeks, and I was grateful for the opportunity to spin the wheels on some new content.
I’m also thrilled to announce today that an ebook I’ve been working on for some time is now available for download! The title of the ebook is “17 Key Traits of Data Literacy”, and anyone who subscribes on my site can get a free copy in their inbox. Yep, I’m making you sign up for it, dammit. At least it’s free, though, and don’t tell anybody else, but you can always just unsubscribe on the very first email you get with the ebook. Shhhh.
So what’s this ebook about, and why am I so happy to put it out there? Going back about a year or so, I started to ask myself what it meant to be highly “data literate”. I had heard that phrase popping up here or there, but I couldn’t find a thorough explanation of what it meant. So I decided to make a list of the traits that came to mind when I thought about all of the talented data professionals I’ve had a chance to work with over the years at Tableau, at Medtronic, and elsewhere. It’s not very long – it’s just 22 pages, and a bunch of those pages are basically graphics and art. So for once I wasn’t long-winded.
I divided the list into knowledge (things they know), skills (things they can do), attitudes (ways they think or feel) and behaviors (how they act). I wanted the list to go beyond just tools and techniques. It doesn’t mention a single tool, actually. Don’t get me wrong, tools are great. But they’re a means to an end, they’re constantly in flux, and they should be. Concepts like data ethics, inclusiveness, confidence and continuous improvement are included, though. I don’t see them as just nice warm-and-fuzzies to add to the list – they’re critical and without them any data-driven initiative will fall apart.
I then was able to get some incredibly helpful feedback from some people I greatly admire in the data space. People like Tamara Munzner, Andy Cotgreave, Anna Foard and others all gave it a once over and made some very helpful suggestions. It was delightful to me that four absolute data heroes – Alberto Cairo, Giorgia Lupi, RJ Andrews and Cheryl Phillips – all agreed to not only read it over, but to take time to provide exclusive quotes for the book – one for each of the four sections mentioned above. I’m extremely grateful for their contributions.
I’d also be remiss not to mention how much I appreciate the amazing design and layout work of Kelsey O’Donnell. The polish is literally all Kelsey – the icons, the layout, the fonts, the graphical treatment. I gave her the photos, the data art and the copy, but the rest was her. She even gave me editorial feedback and steered me away from a primary-color-only palette, not just for this piece, but for my entire business. I was beyond lucky to find her and secure her assistance. You can reach out to her if you want her help, too, but take it easy – I plan to work with her a LOT more going forward. I remember when I first saw her online portfolio – she had me at “What the Frack?“:
Notice that I don’t claim this to be “THE” 17 traits. I feel strongly that these elements all matter quite a lot, but I don’t believe they’re necessarily a comprehensive set. The data space itself is moving fast, and new methods are being created all the time. So you may know an 18th or 19th key trait, a 20th may emerge next year, and I’d be happy to hear your thoughts about all that.
There’s a checklist at the end of the ebook for people to go through and evaluate themselves, but it’s not really like these traits are binary, just as data literacy itself isn’t binary. There are definitely degrees and levels to each of these traits, and I love Michael Correll’s recent article about this on the new Multiple Views blog in which he points this out. The purpose isn’t to use the list to say “I’m data literate and that person is not.” The purpose is to think deeply about what it means to grow in data literacy, and to aspire to even higher levels of fluency, just as a poet, scientist or artist would push themselves to increasing levels of proficiency in a lifelong effort to hone their craft.
I invite you to subscribe and download the ebook, and I hope you find it helpful! I definitely benefitted from writing it. I’ll be hosting a webinar on this topic on January 16th from 12pm to 1pm PST, so feel free to register for that here. It’s only open to the first 100 people who register, but don’t worry, I’ll put the whole thing on the Data Literacy YouTube channel when it’s over and done with.
Thanks again for all the support and encouragement. I wish you a Happy New Year, and I’m glad to be on this data literacy journey with each of you.
Today’s my last day as an employee of Tableau. It has been an unforgettable 6 years for me, bookended by the Tapestry Conference, and I’m really thankful for the opportunity I’ve had to work at this incredible company. I came to Tableau as a member of the broader data community, I stayed one while I was here, and I’ll continue being one going forward.
Why am I leaving? I’m leaving my dream job to start my dream company – Data Literacy, LLC, a data training and education outfit with the mission to help people learn the language of data. I couldn’t pass up the chance to start building a company to help close the gap between what people feel confident doing with data and what’s needed by the organizations and communities to which we belong.
I couldn’t be more thankful to Tableau and to all who have taught me and guided me, and I couldn’t be more thrilled about what’s next. I’ll take the rest of the year to spend with my family and get things prepared, but look for the official launch in early 2019.
As excited as I am about what’s next, I’d like to take a minute to reflect back on the journey that has brought me to this point, and share a few thoughts with you. I’m a writer, too, so this is what you get…
Looking back to October 2012, I remember sitting down for breakfast with the person who would become my first Tableau boss and mentor Ellie Fields at the Tableau Conference in San Diego. If you’ve met Ellie, you’ll know why I realized right then and there that the people at Tableau are even more special than the product they make. That’s what it took to get me to leave my hometown of Thousand Oaks, California – bless that place. Turns out Washington is okay, too.
There are a few things that really stand out for me about my time at Tableau. I spent almost the whole time – up until last June – heading up the Tableau Public platform for Elissa Fink’s marketing team. What a huge honor that was for me. It was more than a job. I hope you get the chance, if you haven’t already, to be entrusted with something you really believe in, and to have the feeling that you’re doing the exact thing you were meant to do in a particular time of your life. Just a moment of that is gold. Half a decade is something I feel very lucky to have had.
I’m really proud of how well it went. Tableau Public is the epitome of team effort, and the community gets all of the credit. But at least I can say for sure that I didn’t get in the way of this kind of growth. What a fun ride:
Being involved with Tableau Public, and then also the wonderful Academic Programs team, I got to train many people at conferences and in universities around the world – people who were literally afraid of data, and who had been for much of their lives. It was so rewarding to see the light bulb go on for them, and to see them having fun exploring data and feeling like they finally get it. You can’t put a price tag on that, and I’m hooked.
I also got to witness super data-savvy people in the Tableau Public community break free from corporate constraints and get creative with their own passion projects – projects involving data about our world that really matters to them – serious topics and sometimes frivolous ones, too. I feel like when I started back in 2013, the BI world wasn’t yet ready to fully embrace artistic and fun approaches to working with data, but that is changing. We’re coming to realize that clarity and aesthetics aren’t mutually exclusive elements at all. Instead of thinking of this endeavor as a pendulum swinging back and forth between them, we can instead imagine charting a course into a region on a map that accomplishes both at the same time. So many are doing that now, and I love it.
One of the greatest thrills for me was being able to shine the spotlight on some of these projects. My job would’ve been pretty boring without the efforts of thousands of Tableau Public authors around world – so thank you all for that. You made coming into work each day an absolute joy for me.
One side note – working in the Fremont neighborhood of Seattle made me really appreciate that surroundings matter a lot when it comes to creativity. It has been a huge blessing to work in such a visually striking place. There are boats, bridges, a 1950s cold-war rocket fuselage, a statue of Lenin with hands continually painted red, a park with the remnants of the sole remaining coal gasification plant in the U.S., a statue of an Emmy-award winning clown, a statue of people standing by a bus-stop that passers-by dress in beanies, scarves and gloves when it gets cold, and even a massive troll sculpture under a bridge holding a VW beetle. And that’s all within a 3 block radius of the Tableau headquarters. I know correlation doesn’t imply causation, but do you think it’s a pure coincidence that a company that has contributed such creative flair to the world of data is to be found in surroundings like this? I don’t. There were many times I walked down to the water to clear my head, and came back refreshed and inspired.
If I have any regrets, it’s that I didn’t manage to launch as many Tableau Public features as I wanted to. I’ve always seen myself as just another author who happens to work at the company, so I lobbied hard (sometimes too hard) for features and enhancements that I thought would improve the platform. Things like a browser-based app to create vizzes, a mobile app for your feed & favorites, author stats to tell you how you’re doing, animated views to let your readers see the data in motion, comments on viz home pages (yes, I think we should open that pandora’s box – but also we should let authors turn it on or off for each particular viz), a most viewed of the day/week/month gallery to see what’s hot around the world, a trending authors list to find new talent, forking, or at least a breadcrumb feature for downloaded workbooks to tell you if what you’re seeing is original work or not. These are just a handful. The reality is that the Dev team was so hard at work over the past few years overhauling the platform to keep up with all that traffic, that I didn’t get to give you the other awesome features that I wanted to. But I know for sure that the current team under Ellie, Katie Maertens, Taha Ebrahimi and Mark Jewett will get to these and many more. And I’ll cheer them on when they do.
I can’t really list all the people that I’ve been blessed to work with, but I’d be sorely remiss not to mention and thank my friend and fellow technical evangelist Andy Cotgreave. I learned so much from Andy over the years – about data, about presenting, about music and games and beer. He has seen me go through a wild journey in my career and in my personal life, and I always felt like he was there for me. Thanks, Andy. You’re one of a kind, and I’m glad to know you.
I also have to call out Jewel Loree. Jewel and I joined within a few months of each other. She puts both the “data” and the “rockstar” into data rockstar (I’m serious, look at the photo). She’s a badass chick with a mind for data, an eye for design, an ear for tunes, and a heart of gold. These people exist. They’re rare, but they’re out there. Every now and then you meet one, like Jewel. And you stop and say, Wow, how is one person so amazing at all that? Thanks for everything, Jewel. Rock on, girl!
Last thing – a quick thanks and hat tip to Daniel Hom, Mike Klaczynski, Dash Davidson, Scott Teal, Florian Ramseger, Thierry Driver, Sophie Sparkes, Jenny Richards, Steve Schwartz, Jade Le Van, Jonni Walker, KJ Kim, Cynthia Andrews, Andrew Cheung, Emma Trifari, Meagan Corbett, Dean David, Midori Ng, Maxime Marboeuf and Courtney Totten.
You all are the best! If any of you ever need anything, just let me know. I’ll be around.
Do you need a dedicated “data curator”, or is that a task that should be shared by existing roles within the business and IT?
This question was prompted by an article I read at CIO entitled “6 data analytics trends that will dominate 2018“, written by Thor Olavsrud and published back in March of this year. I first stumbled on this article last week, and the 3rd trend caught my eye: “Rise of the data curator?” It was worded as a question, and I didn’t have a firm answer, so I tossed it out there to the Twitterverse:
It was an honest question, but you can see by my wording that I was a little skeptical at first. Back in February, a separate conversation on twitter involving Adam, Ken Flerlage, Josh Jackson and me centered on whether there’s a need for a dedicated “analytics translator” that was triggered by an article in the Harvard Business Review in that same month. Sentiments were mixed – should a company ask / train analysts and data scientists to learn the needs of the business, or should they hire specialists who act as a go-between? The discussion about data curation poses a similar question, only farther upstream in the process that converts data into a decision.
What is a Data Curator?
Before we go any further, let’s stop to define what data curation is. Wikipedia provides the following:
Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for reuse and preservation.
In the CIO article, the data curator is described as someone who…
…sits between data consumers (analysts and data scientists who use tools like Tableau and Python to answer important questions with data) and data engineers (the people who move and transform data between systems using scripting languages, Spark, Hive, and MapReduce). To be successful, data curators must understand the meaning of the data as well as the technologies that are applied to the data.
Sort of like this – my attempt at a (simplistic) diagram:
Drawing from my own experience
I haven’t had the chance to lead a business analytics team since my days at Medtronic half a decade ago, but while I was there, a critical component of my team’s success was having very close relationships with the IT specialists who created the data sets themselves. Often an analytics project would start with a question to IT about whether a particular type of data was available or not, or how close we could get with existing data sources to approximate an answer to a relevant business question.
Often, there’s a need to “push” information to analysts about new or interesting data sets instead of waiting for such a “pull” to come based on a business need. Here at Tableau, our Marketing department holds periodic training sessions to help people within the department understand what high-value data sources are available, how to access them, what exactly they contain, and examples of insights gleaned from them. These are highly attended and appreciated.
It occurred to me that an analogy exists in the world of open data in which I have been immersed over the past 5 years as marketing head of the Tableau Public platform. Curation is critical in that space, too. We love regularly updated repositories of data like Data is Plural by BuzzFeed data editor Jeremy Singer-Vine and Awesome Public DataSets on GitHub because they clue us in to fascinating and relevant (and sometimes quirky) data sets on a weekly basis. While I was overseeing the Tableau Public website, the Sample Data page was one of the highest generators of organic traffic to the site. If you think about it, part of the reason the Makeover Monday project is so wildly popular is because Andy Kriebel and Eva Murray work very hard each week to provide a steady stream of data sets for participants to use to practice and hone their skills. That’s a huge shortcut for thousands of people. Imagine the human-hours that are saved because participants don’t have to hunt for data sets themselves with which to practice.
So I believe there’s not doubt that data curation – finding, surfacing, annotating, even sometimes cleaning and blending fascinating data sets and serving them up for broad consumption – is a critical task for private as well as public data discovery. But the question remains – does it warrant a dedicated role?
Actual data curators weigh in
It was really interesting to hear responses from actual data curators on the thread I started with that original question. Kelly Gilbert, Wendy Brotherton, and Hayley (who owns the OG @datacurator handle) all replied that their current role basically meets the job description of a data curator.
Kelly had this to say about it:
This is really my title at this stage of my career. I spend most of my time wrangling and combining data and QA’ing with business owners to create “authoritative” datasources for the enterprise. #dataninja
She also related that she actually works with an entire team of data curators. The reason for such a team? According to Kelly, “We want LOB analysts to be able to focus on generating insights rather than assembling, maintaining, and finding data.”
Adam McCann also weighed in on “Team Dedicated Role”:
I actually like the data curator concept way more than a data translator. I was just discussing this with some engineers. Do u centralize data engineers or embed them w/ analytics. I think centralization is better BUT w/ a curator as intermediary who treats analytics as customer
The other side of the debate “Team Just Part of the Job” – was represented by Jim VanSisteen and Jason Forrest, among others:
I can see it as a niche role in the short term as companies work to raise the overall data literacy across their org. It's hard to build competency in everything at once. But I put it more in the "it's just part of the job" category.
If you follow the entire thread, there are a number of interesting opinions – some speak about how this role was transformative for their business, others ask where to find one so they can make a hire, and others raise concerns about how this role could actually pose problems.
Daniel Zvinca raised a particularly interesting challenge to the role by expressing his preference to be as close to the raw data as possible when conducting analysis:
Just trying to understand the role, not sure if is needed or not. I can tell you this. As a direct responsible for the analysis output, I would very much prefer to have raw data than, maybe, adjusted data.
Seeking to wrap up the conversation, I ran an informal and not-at-all-scientific social media poll to find out what people think about this role from a more quantitative perspective. Here are the results – it seems like those in my social network who took time to respond mostly don’t have this role on their team, but they think it would be a helpful thing:
Do you have a "data curator" where you work? Someone who sits between data engineers and analysts, whose full-time job is to source, organize & accelerate high value datasets?
Like many things, whether or not a company decides to hire dedicated data curators might depending on a number of factors – how large the teams are, how complicated the data sets are, how critical a role data plays in the business, how well-versed existing team members are and how mature analytics processes are. And it might not be a yes or no answer – perhaps there’s a need for a dedicated curator or curators at certain stages of the maturity model, but not at others. It’ll be interesting to see what the future holds for this role, and whether or not we end up seeing the “The Rise of the Data Curator”.
What’s fairly certain to me after the conversation is that the tasks associated with this role need to be done well by someone. Thanks to all who chimed in and gave me a more complete perspective.
I’m currently working on my second book, Avoiding Data Pitfalls, which I hope to complete soon so that the amazing people at my publisher, Wiley & Sons, can do their thing and get it out to the world. My editor has the patience of a saint.
An example in the book features the commonly-referenced FAA wildlife strikes data set, which is regularly updated and downloadable in state-by-state files (or as a full MS Access file) on the FAA website, and which I unioned together for all 50 states and D.C. (thank you Tableau Prep!) from January 1, 2000 through December 31, 2017 and uploaded here in case you want to play with the data yourself.
I’m interested in understanding and writing about something I call the Data-Reality Gap, and so I focused on one particular aspect of this voluntarily-reported data. If you take a look at the time of day that pilots report striking a poor creature (or creatures) on the runway or in flight, and you focus on the number of minutes after the hour that pilots provide in their report, you see a strikingly regular geometric pattern, almost like something generated by a mathematical formula as opposed to over 85,000 incidents individually reported over the course of 18 years:
I put this out there on Twitter earlier today, and Jay Lewis replied back with this column chart of the first 1,976 diaper changings of his 6-month old baby (so, dirty data, basically). The pattern is the same:
Showing time data in a clock chart
It occurred to me that this is a great opportunity to plot data using a polar rather than a linear arrangement, because we’re very accustomed to reading minutes past the hour as 1/60th of a revolution of a circle thanks to the development of the sexagesimal system by the Sumerians around 2,000 B.C.E. We don’t count using this system anymore, but it turns out 60 is a really handy number as it’s the smallest number divisible by the 10 smallest counting numbers: 1, 2, 3, 4, 5, 6 and also 10, 12, 15, 20 and 30.
So, in honor of the Sumerians as well as the Babylonians and the Greeks who carried this system forward and gave us our base-60 clocks, here is the FAA wildlife strike data in polar form, with each minute represented by a circle located at, well, it’s minute location:
How did I make it?
This isn’t a super sophisticated example, and many have done much more challenging things than this, but I’m certainly not the king of circular plots, so it was something of an accomplishment for me, personally. This wasn’t rocket science, I just used some good old 11th grade trigonometry and a steno pad to get it done:
I converted the chicken scratch on that steno pad into the following formulas to convert 0 through 59 minutes into 60 points on a circle, each located 6° away from it’s nearest neighbors, and starting a (0,1):
UPDATE: Let’s replace my nice Rube Goldberg style calcs with far simpler ones suggested by my friend Chris Love, who evidently has a much better recall of high school mathematics:
Here are the calculations in text form in case you want to copy and paste them (I won’t even bother putting my original ones here):
SIN(( 6*DATEPART(‘minute’,[Incident Date and Time])) * (PI()/180))
COS(( 6*DATEPART(‘minute’,[Incident Date and Time])) * (PI()/180))
So Which is “Better”?
“Better” is such a loaded word. “Best” even more so, which is why I tend to avoid it. These two versions each have their relative strengths. Clearly the circular version is easy to understand due to the fact that it mirrors reality in its clock shape. But the distinct triangular ramp pattern – where the frequency of occurrence from 5 to 10 to 15 and then back down from 15 to 20 to 25 (and an identical pattern on the other side of the bottom of the hour) – is really only on display in the linear version. So even though creating the clock effect was nice, I feel that it takes away a bit more than it adds.
I tend not to create round things for the sake of them being round, even though I know they can be more visually appealing and even “irresistible“. It’s clearly not my strong suit from a technical perspective, but obviously one’s ego requires a better reason than that. So I came up with another one – round conveys notions of cycle, closure and return that may or may not apply to our data. In this case, it definitely applies – after the 59th minute of every hour, we go back to the top of the hour, an inherent trait of the data that isn’t conveyed by the column chart with minute on the x-axis. The cyclic nature of the way we think about time – hours, months, seasons – mean that there are plenty of chances to work with circles. It’s not just time data either – there are other types of data that fit into ellipses and circles. Geographic data can definitely do that, too.
So what do you think? Which version would you go with, and why? Or do you think there’s an even better solution that would capture both the cyclic nature of the data as well as the pattern in pilots choosing minutes to report? Perhaps a coxcomb or a sunburst?
Yesterday evening I started teaching a new data visualization class at the University of Washington’s Continuum College. I’ve been teaching this class for a few years now, but this time I decided to add a Data Ethics segment to the first 3 hour session, along with the obligatory course overview, grading criteria, distribution of software licenses, etc, etc. I based the segment on the Manifesto for Data Practices that I was fortunate enough to be able to co-author with a few dozen other professionals at the Open Data Science Leadership Summit led by the good folks at data.world last year (videos of the summit are available here).
Data Ethics is a topic many people are talking about these days, following the long parade of news headlines over the past few years of user privacy debacles, major data breaches, and evidence of machine learning algorithm biases. The data world is in dire need of an ethics overhaul, and we all know that.
But what, exactly, is needed? Not another code or set of guidelines or manifesto, surely. Like our group last year, others have published their own (ACM, ASA, IEEE, The UK, Google). Not a lot of hand-wringing or virtue-signalizing that even the most egregious offenders can participate in and even hide behind. So what, then?
Today I took an hour to read through “Ethics and Data Science” – a free ebook written by Hilary Mason, DJ Patil, and Mike Loukies and recently published by O’Reilly Media. I applaud these authors and O’Reilly for putting this book out there for free. We need more of that.
What I really like about the book is its focus on the practical side of implementing sound ethical principles. The authors point out that the ethical guidelines related to using data are all there, and some of them have been there for decades. Often times the offending software or algorithm has been created by very well-intentioned developers who are working on insane timelines, who don’t have a mechanism to stop the train from leaving the station if they notice a problem, and who are working in corporate cultures that don’t put a premium on requirements that deliver on ethics.
One proposed remedy? A checklist and the 5 C’s (Consent, Clarity, Consistency, Control and Consequences). They talk about the ways checklists have helped in other fields – medicine, manufacturing – and that adding such a tangible process step can help us increase the chances that we don’t do harm with our data work. I took the liberty of putting their checklist items into a Google Sheets file that you can save to your own Google Drive and modify and use as you see fit:
Another proposed remedy? Talk about it. All the time and everywhere. Even at the dinner table:
It’s an critical topic that we need to get right in the next few years. The algorithms and technology that surround us are about to get much more sophisticated and have even more control over our world in the next decade. As the authors of the book point out, we can either work to make sure that what we end up with is a technology utopia, or we can let it run away from us and we can have a technology dystopia on our hands.
For those of you who have been anxious to hear whether 2018 would include the 6th consecutive Tapestry Data Storytelling Conference, wait no longer! The dates (November 29-30) and location (Newman Alumni Center, University of Miami) have been finalized and announced, and registration is now open to anyone who would like to attend! I’d recommend booking early, though: registration is very limited, and when it’s full, it’s full.
Huge thanks to Alberto Cairo, the University of Miami, and the leadership team at Tableau for making this special event happen again, and to Robert Kosara, my colleague at Tableau, for his unyielding efforts all year to make sure this event is on the calendar yet again. Robert broke the news about Tapestry 2018 on his fabulous site, Eager Eyes, this past Wednesday.
If you’ve gone to Tapestry before, you know why this is an exciting announcement. We’ve all been to massive tech conferences, and – as great as those can be – this isn’t one of them. It has been referred to by past attendees as a “retreat”, a “reunion”, and a “getaway”. It’s very intimate – just 100 – 150 attendees are there from a blend of academia, journalism, design and business. And it’s typically at a much smaller venue that has a rich history to it. But it’s no boondoggle; we dive deep into fascinating topics and some of the world’s foremost data storytellers present, discuss and debate.
The speakers are a huge highlight for me each year, and the 44 talented women and men who have presented at past Tapestry Conferences reads like a Who’s Who of data storytelling: Cole Knaflic, Giorgia Lupi, Scott Klein, Hannah Fairfield, Viegas & Wattenberg, Enrico Bertini, Jon Schwabish, Kim Rees, Matt Daniels and Michele Borkin to name a few. Their presentations have been recorded and uploaded to the Tapestry YouTube page, so you can watch them any time you like.
The lineup for Tapestry 2018 is shaping up to be pretty amazing as well: Mona Chalabi, Matthew Kay and Elijah Meeks have all agreed to headline the event with our three keynotes, and we’re accepting applications for the 6 “Short Story” presentations (each 15 minutes in length) now – just with your proposal.
I hope to see you in Miami at the end of November. I’ve attended all 5 Tapestry Conferences to date, and I can’t wait for the 6th!
When ancient humans first started coming up with systems to aid with counting and arithmetic prior to the invention of the numeral system, they chose to move wooden beads, beans or stones along a wire or a bamboo rod, or within sand channels. Archaeologists have found evidence of a form of abacus in Mesopotamia as early as 2,700 BC. It’s a testament to the effectiveness of this simple yet ingenious device that we all learned counting with a modern form of the abacus in our earliest school experiences, almost 5,000 years later.
Here’s a photo of one, called the Chinese abacus, or Suanpan, in which the 5 beads on each rod below the horizontal dividing beam are 1s, 10s, 100s, etc (from the right-most rod to the left), and the pairs of beads above the beam are 5s, 50s, 500s, etc, from right to left:
Fig.1 – a Chinese abacus, or Suanpan
The abacus allows us to capture, count, add, subtract and multiply quantities by virtue of the fact that the position of each bead along its respective rod conveys an amount based on its position in the grid. Move one of the beads on the right-most rod toward the horizontal dividing beam, and you have just counted to 1. Move the next bead toward the beam, and you have just counted to 2. Move one of the two beads above the beam on the right-most rod toward the center and you’ve added 5 to your previous total of 2, for a new total of 7. We all know basically how it works, but here’s a helpful video playlist for a refresher if you need one like I did.
Our ancient ancestors found that position along an axis is a good way to keep track of quantitative information. Is it any wonder, then, that if you zoom forward to more recent times, Cleveland and McGill found in their 1985 study ‘Graphical Perception and Graphical Methods for Analyzing Scientific Data’ that subjects were able to judge values using position along a common scale with the least amount of error? They evidently learned from their experiment, as they chose to present their experimental results using a dot chart, with error bars:
Fig.2 – A measure of subject error from Clevaland & McGill’s analysis
Cleveland & McGill’s results have been replicated, including in 2010 by Jeff Heer & Mike Bostock, and we now have a pretty good idea how accurately people can judge proportions when they are presented with different encoding types. The following image is from Tamara Munzner’s book Visualization Analysis and Design, which is an absolute classic and I use this image as a starting point whenever I’m asked to present on the fundamentals of data visualization:
Fig.3 – Figure 5.1 from Munzner’s book
As you can see, position is at the very top (‘most effective’) of both the ordered as well as the categorical attribute lists. If you want to win in data visualization, you’re likely to succeed if you separate distinct categories in your data into marks that are in different spatial positions in the visual space and then move those marks to places where the distance from a common base is proportional to a characteristic value you’d like to compare. In other words, make a dot plot.
Fig.4 – a dot plot and accompanying commentary from Robbin’s book
When I wrote my book Communicating Data With Tableau in 2014, I encouraged my readers to employ this visual technique, even though adding horizontal on-center lines isn’t currently a default in Tableau (if you’d like it to be, you can vote it up in the Tableau Ideas Forum here). It’s definitely doable in Tableau, though, as Andy Kriebel demonstrates in a helpful video tutorial here. Back in 2014 I employed a similar, albeit more simplistic approach by placing a static value of 1 on a dual axis and then fixing that secondary axis to go from 0 to 1 and then hiding it. The bars that go from 0 to 1 in each row can be made very thin using the Size shelf, effectively letting you mock-up an on-center gridline through the middle of the marks in each row, rather than the default “swim lane” configuration of horizontal lines separating rows between each dot. It’s a stylistic choice, but I think perhaps it can prove to be an important one.
All of this brings me to my submission for Cole’s challenge. I’ve enjoyed playing acoustic guitar since my father bought me one for Christmas when I was 16, and it occurred to me as I was considering Cole’s latest challenge that the chord diagrams I’ve been looking at and memorizing since I was a teenager are a form of dot plot – dot plots where the position of the dot along each line isn’t abstract, it’s actually a quite physical representation of where to place your fingers along the six guitar strings to form a chord. You can find these chord diagrams on the internet and in learner booklets everywhere, most commonly in a small multiples form which I love, but I decided to create an interactive app to let users choose a key and a chord to display one diagram at a time. I’d also like to believe that I added one small visual innovation – I used the viridis color palette to color each dot by the musical note that its string makes when plucked – most chord diagrams are in black and white:
Data is used to measure and compare human beings in many ways in the world we live in. We get accustomed at a very young age through the school system to being tracked, scored, assessed and ultimately judged by numbers and figures. This typically continues well into our adult lives – sales reps get ranked based on performance to quota, employees get their annual performance review, authors and professors get rated online, etc.
These numbers and figures can be related to different kinds things:
They can be based on our levels of activity – how much did we do something?
They can be the subjective opinions of others – what did someone or some group of people think of us?
or they can be some objective measure of results, performance, or output – what was the result of our efforts?
High achievers and competitive people can react pretty strongly to news about poor performance scores, no matter what the metric. That fact was on display this week, when NBA star LeBron James of the Cleveland Cavaliers was told by Jason Lloyd of The Atlantic that he’s recording the slowest average speed of anyone on the floor so far in the Eastern Conference finals series being played against the Boston Celtics. This metric is based on the NBA’s new player tracking system, and the updated stats tables for all players can be found here.
Is the Best Player Really the Slowest?
Technically, Lloyd was right, at least as much as we trust the accuracy of the player tracking system. It’s actually worse than just the Eastern Conference series, too. As amazing as he is, LeBron is, in fact, tied with one other player for dead last out of the 60 players who have played 8 or more games with player tracking activated in this year’s NBA playoffs.
So what was James’s reaction to this information?
“That’s the dumbest shit I’ve ever heard. That tracking bullshit can kiss my ass. The slowest guy? Get out of here.”
So, basically he didn’t like it. He didn’t stop there:
“Tell them to track how tired I am after the game, track that shit. I’m No. 1 in the NBA on how tired I am after the game.”
Thou Dost Protest Too Much
What I find most interesting is that he didn’t object along the lines that I thought would be most obvious – to point to his league-leading scoring statistics, his freakishly high efficiency and game impact metrics, or his team’s incredible play of late. Those would be objections about the use of an activity metric (how fast was he running up and down the court) instead of an output metric (how much was he actually contributing and helping his team to win).
He could have just laughed and said “imagine what I could do if I actually ran hard.” But no – he took exception to a metric that seemed to indicate he wasn’t trying hard. He appealed to something else entirely – how tired he felt after the game – to counteract that implication.
Is Average Speed a Bogus Metric?
So is it the “dumbest shit” to use this particular metric to track basketball player performance in the first place? Is average speed over the course of a game a good performance indicator of that player’s contribution to the outcome of the game? Some of my Twitter followers don’t think so:
This is pretty terrible use of tracking data, but also hen did average speed become an advanced stat? https://t.co/hEkawXiEfw
That and timed races seem like the only types of games where avg speed matters much. Basketball is more about bursts of acceleration. "Quick" players are the ones who can rapidly change direction to create (or prevent) separation between ball handler and defender. /3
So is there a better way to measure a player’s impact on a game? It turns out there are a bunch of different ways to measure this. An interesting way to measure player contribution is known as PIE – Player Impact Estimate – and it seeks to measure “a player’s overall statistical contribution against the total statistics in games they play in.” Or, “in its simplest terms, PIE shows what % of game events did that player or team achieve.” You can find the formula on the NBA stats page here.
Of course no one would be surprised to find out that LeBron has the highest PIE of any player in the playoffs, and it’s not even close. LeBron is involved in 23.4 percept of game events thus far in the 2018 playoffs. The next closest player is Victor Oladipo of the Indiana Pacers with a PIE of 19.3. When it comes to impacting the game come playoff time, there’s no doubt that LeBron is king.
So how does average speed relate to PIE? If LeBron is last in the former and first in the latter, we’d guess that there’s not a strong positive correlation. And we’d guess right. If we correlate average speed with PIE, we see that there’s a very weak correlation (the coefficient of determination, R^2, is only 0.056):
What’s interesting is that this view shows that LeBron is way up in the top left corner of this chart – he has a low average speed and a high player impact estimate compared to other players. Turns out he’s in really good company in this top-left quadrant, with 10 of the 12 remaining All-Stars also in this space. You can see that the player with whom he’s tied for the slowest average speed is James Harden, and no one is challenging Harden’s performance. Especially not Draymond Green.
I’ll wrap it up by sharing what someone said to me about this situation – we need to get buy-in from stakeholders before sharing performance metrics with them. I think there’s a lot of wisdom in that. People tend to take measurements of their effort and performance very personally. I know I do. We’d do well to relax a little about that, but it’s human nature.
We should also take care to put the emphasis on the metrics that actually matter. If a metric doesn’t matter, we shouldn’t use it to gauge performance. And activity and opinion metrics are one thing, but they should always be secondary in importance to output or performance scores. Just measuring how much people do something will simply prompt them to increase the volume on that particular activity. Just measuring how much someone else approves of them will lead them to suck up to that person. We all want to contribute to a winning team, and our personal performance metrics should reflect that.
At the same time, though, data is data, and tracking things can help in interesting ways. Perhaps the training staff could use the average speed data to track a player’s recovery from an injury. Or perhaps a certain, ahem, all-star player later in his career could benefit from keeping average speed down to conserve energy for the final round. Or perhaps a coaching staff could evaluate their team’s performance when they play an “up-tempo” style versus running the game a slower place. Who knows?
In other words, data is only “the dumbest shit you’ve ever heard” when it’s used for the wrong things.
Those of use who work with data call ourselves many things these days. From the typical and perhaps mundane “data analyst” to the controversial and supposedly sexy “data scientist”, titles abound. Titles aside, though, let’s just call us all “data workers” for a moment – we’re all people who use quantitative information found in spreadsheets and databases to steer our businesses, our communities, and ourselves into the future. Some may use code and sophisticated algorithms to do that, others may use off-the-shelf software and fairly basic analytics techniques. Others use all of the above, and more. We’re living in exciting times of growth and development for many people.
But this post isn’t about titles, or anything like that. You may or may not like me using the term “data worker”, and I’m okay with that. I’m not married to it, and as long as you understand what’s intended by that description, then let’s go with it for now. Because this post is about the biggest challenges that data workers face today. What are these big challenges, and what are we doing about them? I got thinking about this after I saw a tweet from someone named Shane Morris this morning:
I think the biggest problem facing #DataViz today is that so few can think past the capabilities of their tool of choice. Back then their creativity was the only constraint.
Now I don’t believe I’ve met Shane in person before, and I don’t disagree that he has identified a major issue in tool-based limitations to our creativity. But I did get to wondering whether or not this is the “biggest” challenge, and what the other “big” challenges there are, if I were to try to list them. So here’s what came to mind. I’d love to know your thoughts on these, and which other big challenges you have on your list.
What are the biggest challenges?
Challenge #1: Tool-based Limits and Silos
I’ll agree with Shane on the identification of a major challenge, and raise him one sub-challenge: it isn’t just that people are limited by what a particular tool let’s them do, it’s that often times their network is limited to others that use that same tool. So tools don’t just limit what we can create, they often times limit with whom we connect. I’ve been working for Tableau Software for over half a decade now, and Tableau has a very active and passionate community, both online and in person. These enthusiasts share effective practices, support each other in very practical ways, and push each other to improve. All of that is a very good thing, and I’ve benefitted from it in many ways over the years. We should keep doing that.
But there’s no need to make it exclusive. It’s human nature to gravitate to groups where we feel we belong, and in this sense the landscape of data workers is no different from OS users who cluster around Mac, PC or Linux, or sports fans who go to the same bars where people wearing the same jerseys will high five after certain plays but not others. We can’t get rid of that part of who we are, and I’m not saying we need to.
But I do believe that we’d be better off as a whole if software providers, conference planners and meetup group organizers did more to encourage and even facilitate connection to other similar groups. I don’t mean this in some sort of mushy “why don’t we all just get along” kind of way. I mean that much could be learned from conversations between people who solve similar problems with different tools. That has been my experience as co-chair of the Tapestry Data Storytelling Conference (of which more, soon). I’ve been able to meet very talented people who have spent time learning tools and methods that have different strengths and weaknesses as the ones I’ve learned.
There are other such connections happening more and more. For example, I admire what Wes McKinney and Hadley Wickham are doing joining the R and Python worlds with the new venture Ursa Labs, and feel that more initiatives and groups could be formed along these lines.
After all, what science fiction author envisioned a future where we are divided based on the languages we speak to computers?
Challenge #2: Lack of Widespread Data Literacy
If Challenge #1 deals with people in the data worker space, Challenge #2 deals with those who aren’t yet in that space. I think there are a whole lot of them. Those of us over, say, 30 grew up in a world that didn’t really have distinct “data” programs in colleges, and high schools taught us much more calculus than statistics or analytics. Add that to the fact that numerical competencies can be challenging to develop for many, and even tricky for experts to consistently get right, and you have a situation where the majority of people just don’t speak the language of data very well yet.
The “data illiterati” can be divided into those who aren’t aware of their illiteracy, and those who are aware of it. Those who are aware of it can be further divided into those who want to change, and those that don’t. Those that want to change either feel that they can or they feel that they can’t. Those that don’t want to change feel that it’s just not necessary. Based on my anecdotal experience, This last group is shrinking.
The good news is that the group that wants to learn data and feels that they can do it have a ton of alternatives available to them these days. Universities are coming out with data programs left and right, online sites like Udemy and Coursera let you learn via DIY, and tool companies have tutorials that are incredible as compared to only a few years ago.
But the thing that concerns me is the group that feels like they need to learn data, but they feel like they can’t for some reason. I think this is a huge group of people. Either they feel blocked by some perceived innate deficiency (“I’m just not good at math”), or they don’t know where to turn. I’m not sure the alternatives in the previous paragraph do the trick for them. So something more is needed to address this challenge. We’ll have to see what comes next.
Challenge #3: Poor Adherence to Standards of Data Ethics
This is a major issue in today’s data space, because it gets to the “why” behind what we’re doing. What good is it to have high competency and skill working with data if what we’re doing with those capabilities isn’t even ethical? Can we agree on what is and what isn’t ethical in this space? What’s stopping companies from doing what they want with our data in order to achieve their goals, unchecked?
We don’t just need a bunch of fancy words that everyone agrees to but that in effect are meaningless. It’s easy for ethics to become that. We need something substantial that stands in the way of companies and governments using data in inappropriate ways. We need a “Three Laws of Data Ethics”, or something similar to that (Evernote took a stab back in 2011).
One other group I was a part of recently took a pass at this. In December 2017, the first Open Data Leadership Summit organized by the team at data.world brought together individuals from a variety of specialties and backgrounds, and we spent half of the event talking about this exact issue. What came out of that discussion was the Manifesto for Data Practices, and I recommend you take a look at it. Over 1,400 people have signed their name to this document to date – that’s a small but not insignificant number of people who feel that the values and principles laid out by this group would be helpful to adopt. Here are the principles:
Will this manifesto do the trick? Not all by itself. Acknowledging something as important isn’t the same as adhering to it in practice. So more is needed still. That’s why this remains a big challenge.
Challenge #4: Preservation and Conservation of Records and Insights
I’ll call this one the silent challenge, because I don’t think many are thinking about it. It’s highly important, but isn’t quite perceived as urgent by most. Important but not urgent challenges are the ones that can be the most difficult to solve, because there are so many other challenges that are screaming in our face to fix right now.
So what’s this challenge all about and why is it so big? Within our societies and our businesses, we’re amassing incredible amounts of data, and we’re leveraging that data for very powerful insights and publishing it in various ways and on various platforms. But will all of that survive for subsequent generations?
Everything we build eventually crumbles or gets replaced by something different. Will our data be around for others to see what life was like for us, or will our great-great grandchildren look back on the early 21st century with a whole lot of questions because the files got corrupted, the servers stopped working, the software was no longer supported, the media storage devices couldn’t be plugged in to anything, and there were no backups, no hard copies, no instructions left when the company got acquired or the standards changed or the lights went out.
Does that sound like a crazy thing to worry about, to you? If so, I really hope you’re right. Just don’t think about other technologies that have gone the way of the dinosaur, and how much information got lost when the transition left them behind. We can still buy floppy disk drives on Ebay, Atari 2600 consoles on OfferUp, and a whole host of dongles to convert from this type of connector to that, and there’s always the Wayback Machine to remind us about all those ugly GeoCities sites we made.
But even in the near term, data preservation can sometimes be a struggle. My father passed away a few years ago, and I was delighted to find some old cell phones in a recent move that I was sure contained voicemails from him. I had to dig deep to find some old-style iPhone connectors, and when I did I listened to each recording a few times. That was really nice for me.
But zoom ahead another 30 years, or 40, or 100. What are the odds I’ll be able to listen to those recordings? It’s the long term horizon that’s the concern. It’s more than a little ironic to think that our generation – the selfie generation, the one that posts pictures of every meal for the whole world to see, but prints out none of it – could be the one that future generations know the least about. If they know about us, it’s because people out there – the librarians and archivists – will have come up with redundant solutions to preserve and conserve.
So that’s what I have! 4 “Big Challenges” that I believe we face as data workers. Which of those four is the biggest? I don’t really know, and I don’t have a good way to rank them. Maybe you’re aware of a 5th or 6th that’s even bigger. What do you think?