I've noticed that to newcomers to Customer Experience sometimes want a quick reference for the best practices we've developed for customer feedback programs, and an easy way to get smart about some of the jargon we use.
We've added a new section to our website, Customer Feedback 101. Here we're collecting short articles (most of them are just a few paragraphs) to explain key concepts in customer feedback programs.
People who are new to this space can use this as an easy way to come up to speed; whether you're completely unfamiliar with customer feedback, or coming from a market research background (we do some things just a little differently here), I hope you'll find this a helpful resource.
This past weekend I went to the movies on a date night with my wife. We went to the AMC megaplex to see Crazy Rich Asians. Normally we go to a smaller (and cheaper) theater that's a lot closer to home, but we had been given some AMC passes so we made the drive.
It's been several years since I last went to an AMC theater, and the first thing I noticed when we went in the door was two different lines to buy tickets: one for ordinary people, and a second for premium members of AMC's loyalty program. A similar two-line setup was visible at the concession stand.
"Just like the airport," was my gut reaction.
Fortunately there were no lines at the theater, so it didn't really matter that we weren't part of the exclusive club.
When we got to the ticket booth, we found that the showtime we wanted was in a premium "Dolby" theater, which required a surcharge in addition to our passes. It wasn't clear what the difference was between the regular theater and the super-fancy once, but faced with the choice of paying extra or waiting an hour for a cheaper theater, we decided to pay the random, unexpected surcharge.
"Also just like the airport," I thought.
But before we could buy the tickets, we had to go through an overly-complicated process of checking in because the theater only offered reserved seating despite the fact that it was two-thirds empty. (It didn't help that the screen for choosing seats was in tiny type which was difficult for my middle-aged eyes to read.)
"Someone at AMC really has a thing for air travel," I concluded.
The premium Dolby theater was definitely nicer than your standard movie theater, though I suspect a fluffy rom-com was probably not the best vehicle for showing off whatever fancy sound and projection gear the theater was equipped with. And we definitely enjoyed the movie.
But my overall impression was that next time we should stick with our local theater. Because while the AMC theater was definitely bigger and fancier, it just wasn't as pleasant. I suspect I'm not the only AMC customer to think "airport" when faced with AMC's premium lines, unexpected upcharges, and unnecessary hoops.
I suspect that I'm also not the only AMC customer for whom "airport" is not a positive association.
P-Hacking is a big problem. It can lead to bad decisions, wasted effort, and misplaced confidence in how your business works.
P-Hacking sounds like something you do to pass a drug test. Actually, it's something you do to pass a statistical test. "P" refers to the "P" value, the probability that an observed result is the result of random chance and not something real. "Hacking" in this case means manipulating, so P-Hacking is manipulating an experiment in order to make the P value look more significant than it really is, so that it looks like you discovered a real effect when in fact there may be nothing there.
It's the equivalent of smoke and mirrors for statistics nerds. And it's really really common. So common that some of the foundational research in the social sciences has turned out to not be true. It's led to a "Replication Crisis" in some fields. forcing a fresh look at many important experiments.
And as scientific techniques like A/B testing have become more common in the business world, P-Hacking has followed. A recent analysis of thousands of A/B tests through a commercial platform found convincing evidence of P-Hacking in over half the tests where a little P-Hacking might make the difference between a result that's declared "significant" and one that's just noise.
The problem is that P-Hacking is subtle: it's easy to do without realizing it, hard to detect, and extremely tempting when there's an incentive to produce results.
One common form of P-Hacking, and the one observed the recent analysis, is stopping an A/B test early when it shows a positive result. This may seem innocuous, but in reality it distorts the P value and gives you a better chance of hitting your threshold for statistical significance.
Think of it this way: If you consider a P value of less than 0.05 to be "significant" (a common threshold), that means that there's supposed to be a 5% chance that you would have gotten the same result by random chance if there was actually no difference between your A and B test cases. It's the equivalent of rolling one of those 20-sided Dungeons and Dragons dice and declaring that "20" means you found something real.
But if you peek at the results of your A/B test early, that's a little like giving yourself extra rolls of the dice. So Monday you roll 8 and keep the experiment running. Tuesday you roll 12 and keep running. Wednesday you roll 20 and declare that you found something significant and stop. Maybe if you had continued the experiment you would have kept rolling 20 on Thursday and Friday, but maybe not. You don't know because you stopped the experiment early.
The point is that by taking an early look at the results and deciding to end the test as soon as the results crossed your significance threshold, you're getting to roll the dice a few more times and increase the odds of showing a "significant" result when in fact there was no effect.
If there is a real effect, we expect the P value to keep dropping (showing more and more significance) as we collect more data. But the P value can bounce around, and even when the experiment is run perfectly with no P-Hacking there's still a one-in-20 chance that you'll see a "significant" result that's completely bogus. If you're P-Hacking, the odds of a bogus result can increase a lot.
What makes this so insidious is that we are all wired to want to find something. Null results--finding the things that don't have any effect--are boring. Positive results are much more interesting. We all want to go to our boss or client and talk about what we discovered, not what we didn't discover.
How can you avoid P-Hacking? It's hard. You need to be very aware of what your statistical tests mean and how they relate to the way you designed your study. Here's some tips:
Be aware that every decision you make while an A/B test is underway could be another roll of the dice. Don't change anything about your study design once data collection has started.
Every relationship you analyze is also another roll of the dice. If you look at 20 different metrics that are just random noise, you actually expect that one of them will show a statistically significant trend with p
When in doubt, collect more data. When there's a real effect or trend, the statistical significance should improve as you collect more data. Bogus effects tend to go away.
Don't think of statistical significance as some hard threshold. In reality, this is just a tool for estimating whether or not the results of an analysis are real or bogus, and there's nothing magical about crossing p
There's two elements to designing a hybrid survey program which combines the depth of actionable feedback from a live-person phone interview with the ability to cost-effectively collect huge sample sizes with an online survey. In this article I'll explore designing the survey questions and how the two feedback channels relate to each other. In a future article I'll write about designing the survey process itself, and some of the considerations which go into sampling and channel selection.
To get the most benefit from a hybrid survey we want to play to the strengths of each feedback channel. Online surveys are cost effective for collecting tens of thousands or millions of survey responses, while phone interviews let you collect a lot of details about individual customers. Online surveys are good for calculating metrics, and phone interviews give you insights into the individual customers' stories.
Keep The Online Survey Short and Sweet
The online survey is where you get to cast a very wide net, including large numbers of customers in the survey process. This is also where most of your tracking metrics will come from. But it's not the place to try to collect lots of detailed feedback from each customer: long survey forms often don't get a good response rate.
I recommend limiting the online survey to a handful of key metrics plus one box for customers to enter any other comments or suggestions they may have. The particular metrics you choose will depend on your survey goals, but I tend to think that one metric is too few, but more than five will just make the survey longer without yielding much (if any) new information.
It's also good practice to give customers a Service Recovery option, usually as a question at the end of the survey along the lines of, "Do you want a representative to contact you to resolve any outstanding issues?" Just make sure that those requests get routed to the right department and promptly handled.
You can ask a surprising number of questions in a typical five-minute phone interview. This is the place to ask follow-up questions, maybe include some metrics that had to be eliminated from the online survey due to length (you did keep it short, right?), and most importantly, give the customer a chance to really tell her story.
I usually start with the questions from the online survey and add to them. We may need to adjust the wording of some of the questions--not every question that looks good written will sound good when read aloud--but we want to cover the same ground. One of the purposes is to compare the results from the online survey to the interview, since we normally expect the interview to give us a truer reading of the survey metrics. If metrics for the interview and online survey diverge, that's an indication that something may be going wrong in the survey process.
It's a good idea to keep the interview questions flexible. Unlike the core metrics in the online survey, which need to stay consistent over time, the interview questions may need to be updated frequently depending on changing business needs or the particular reason a customers was selected for an interview rather than an automated survey.
I also bias heavily towards open-ended questions on the interview. This gives the customer a chance to use their own words and will often surface unexpected feedback. If needed, the interviewer can code the responses (along with providing a written summary) to allow for tracking of the types of feedback you're getting.
The end result is going to be a handful of metrics, with a healthy dollop of open-ended questions to explore the reasons behind the ratings. The metrics should be comparable to the online survey, so it can serve as a check on the validity of the high volume feedback process, but the true value will be in understanding individual customer stories.
Next week our local chapter of CXPA will be hosting a session called "Battle of the Metrics." I'm looking forward to it: it should be an informative and (I hope) entertaining meeting. If there's one thing that can spark a lively discussion among Customer Experience professionals, it's someone who takes a strong stand for or against any particular metric.
But why do we spend so much time and effort worrying about metrics?
Most reasonable CX metrics provide directionally similar results: when Customer Satisfaction or Net Promoter improve, chances are very good that Customer Effort, Customer Loyalty, or any scorecard composed of customer survey responses will also improve. The numbers will be different, but they should all tell a similar story. Viewed in that way, arguing about which metric is best is a little like arguing about whether miles or kilometers are better.
Though come to think of it, when the United States tried unsuccessfully to go metric a half century ago, it turned out that a lot of people suddenly felt very strongly about whether to measure highways in miles or kilometers. So maybe it's not so surprising that we also have strong feelings about which CX metric to use.
When used properly, it shouldn't matter all that much which metric we choose. Most of the real CX action is below the level of the metrics: it's about finding ways to improve individual customer journeys, most often by helping people at all levels of the organization put themselves in those customers' shoes. Metrics, like signposts on the highway, give us some sense of how far we've gone and whether we're moving in the right direction. Miles or kilometers, either one will tell us that we're making progress.
And to the extent that different metrics give us different results, that's a sign that something unexpected is happening and we need to pay attention. Because while different CX metrics usually move together, they do measure somewhat different things. So if Net Promoter (which measures the strength of a customer's overall relationship) improves while Customer Effort (which measures how smoothly a particular transaction went) is getting worse, that could be a sign that something's afoot. It may be that there are some operational problems which your customers are willing to forgive (for now); or it may be that you are benefitting from a competitor's misstep. Whatever the situation, it's worth spending some effort to dig deeper.
In the end, I think metrics appeal to us because they give us a simple view into a complex reality. Boiling down our CX efforts to one number makes it easier to explain the impact of Customer Experience, and it makes it easier to show leadership what exactly it is that we're trying to achieve.
This is fine, but it comes with a steep price. Because in the end, it's not the metric that matters. It's everything that goes into the metric, all those thousands or millions of individual customers and their individual stories that matter. The metric, while it makes it possible to think about the bigger picture, conceals far more than it reveals.
Cancelled flights are not all that unusual in Minnesota during the winter, but Sun Country's strategy for dealing with it was. They simply refunded the passengers' remaining airfare and told them to find their own way home. The (now former) customers were not offered any meals or lodging, nor any assistance in finding another way home.
Nor did Sun Country offer to book those passengers on another Sun Country flight: as the airline patiently explained to passengers and the media, those were the last Sun Country flights from Los Cabos and Mazatlan for the season and the airplanes were needed elsewhere. And because there were no more flights, there were no Sun Country representatives on the ground to help passengers, either.
Naturally this didn't go over well. The couple hundred dollars for the refunded tickets (remember, Sun Country is now an Ultra Low Cost Carrier) wasn't even remotely enough to book a last-minute flight on another airline. It's likely that some passengers had to borrow from credit cards or friends just to get home.
The crazy thing is, Sun Country was probably within its rights. The company's Contract of Carriage (a 30-page document almost no passenger, including myself, has read in detail) only says is that the airline will try to provide transportation if they cancel a flight "to the extent reasonably possible." In this situation, someone at the airline made the decision that it would be too hard or too difficult to figure something out. So they decided to exercise their option to punt.
This highlights the huge gap between what companies are legally or contractually required to do, and what their customers expect them to do. Short of war or bankruptcy, no passenger expects that their airline will just leave them sitting in an airport in Mexico with no money, no ticket home, and no assistance. But Sun Country wrote its contract so it could do exactly this, and I'm sure that every other airline does exactly the same thing.
It's fair to assume that these contracts go far far beyond what customers think they're signing up for when they click "agree." Not even the companies think these contracts are reasonable, judging from how quickly they back down from the PR stink when they actually try to exercise these rights they've given themselves: after a few days of bad publicity, Sun Country announced it would reimburse customers for the cost of getting home. Wells Fargo eventually said it wouldn't force customers with fraudulent accounts into binding arbitration. And Facebook admitted that letting people's personal profiles into the hands of shady political operatives was a mistake.
The problem is that Terms of Service agreements are generally written by lawyers, and the job of a lawyer writing a contract is to reduce or eliminate the chances that the client will be sued. So when a company is allowed to write its own ToS agreement without any negotiation with an actual customer, the lawyer will cover every base possible and write it so the customer agrees to everything that might possibly happen whether or not the company plans to operate that way.
(I can just imagine a lawyer at a conference table advising his client, "Of course you don't intend to kill your customers' puppies. But hypothetically speaking, some day a customer's puppy might die because of some mistake the company made, and the customer could claim that you meant to do it, or even enjoyed it. Juries love puppies, so we should head this off and just put in the contract that the customer gives us permission to maliciously and gleefully murder his puppy.")
When a crisis comes up it's tempting for a company to think it's OK to actually do the things it wrote into the terms of service. If leaving passengers stuck in Mexico gets your airline out of an expensive pickle and you're legally allowed to do it, it's hard to say no. If you can interpret your customer contract in a way that avoids years of litigation around millions of fraudulent accounts, the temptation will be extreme. But of course this will only make things worse, because at the end of the day meeting your customers' expectations (to say nothing of regulators' and lawmakers' expectations) is far more important than merely staying within the bounds of what's legally required.
The solution is both obvious and very difficult in the real world: Write terms of service agreements according to how you intend to treat customers, not according to what will minimize legal risk in every possible situation. If you don't give yourself the contractual right to abandon customers in Mexico, you're much less likely to actually do it when the crunch happens.
This requires having someone to advocate on behalf of customers, someone to ask the lawyers, "If you don't intend to kill my puppy, why give yourself the right to do it?" With the recent Facebook data leaks and the new GDPR regulations in Europe, there's been more of a spotlight on how long, unreadable, and egregious many of these contracts really are. So maybe we're in a moment when consumer contracts can be rewritten to reflect the customer experience companies intend to provide, rather than giving them legal cover for almost any sort of misbehavior.