In a San Francisco Lyft car, there’s a chart taped to the back of the front passenger seat: “The Rating System Explained.” It details — in exaggerated terms — what Lyft’s one- to five-star rating scale really means to drivers.
Beginning at five stars — “got me where I needed to go” — the explanations quickly descend into parodic paranoia. Four stars: “This driver sucks, fire him slowly … Too many of these and I may end up homeless.” Three stars: “This driver sucks so bad I never want to see him again.” Two stars: “maybe the car had something dangerously wrong with it or he was doing 120 in a 40 mile zone.”
One star? “Threats or acts of violence possibly made, perhaps a callous disregard for his own safety.”
Though tongue-in-cheek, this rating system explainer touches on an essential truth of the gig economy: When companies like Lyft, Uber, and Postmates penalize workers who have low ratings, anything less than five stars feels like a rebuke.
“The rating system works like this: You start off as a five-star driver,” Don, a San Francisco Lyft driver. “If you drop below a 4.6, then your career becomes a question. Uber or Lyft will reach out to you and let you know that you are on review probation. And if you continue to drop, then you’re going to lose your job. They’ll deactivate you.”
The gig economy has made us comfortable rating the people we pay to do tasks for us. Both data and anecdotes suggest five-star rating systems are subjective, prone to bias, and generally confusing, yet labor marketplaces continue to ask customers to choose from one to five stars to determine who’s good at their job and who isn’t. Last week, Netflix officially replaced its five-star system for rating movies with a more simple thumbs-up, thumbs-down. Maybe it’s time for other data-driven platforms to consider making a change, too.
“They think that 3 is okay, and a 4 is like a B.”
Don’s concern about the impact of a low rating is well-established: Workers in the on-demand economy are at the mercy of the customers, whose in-app ratings can jeopardize an individual’s ability to earn bonuses, land gigs, and generally make a living.
Uber says only a very small percentage of drivers have ratings anywhere close to the deactivation threshold, which is a different number depending on where in the world you’re driving.
In a statement, a Lyft spokesperson said that in order to “ensure that drivers are not rated unfairly for circumstances that are out of their control, a number of steps are taken, including: ratings are based on an average of the last 100 rides; the system does not look at drivers in isolation, rather it looks at them in comparison to other drivers in their region; and drivers are able to submit comments after each ride to raise any concerns about the ride or passenger.”
But ratings are nonetheless a stressor for some drivers. Julian, who drives for both Uber and Lyft in San Francisco, said maintaining a good rating can be difficult because customers don’t really understand them. “They think that 3 is okay, and a 4 is like a B, and 5 is exceptional,” he said. “Well, if you got a 4 every time, you’d be terminated. You have to maintain a 4.7, so anything less than a 5 is not okay.”
A few months ago, Julian was driving a female passenger to her hotel when he realized she had passed out in the back of his car. Julian called the police, who told him to roll her over onto her stomach — but he was worried about what might happen if she woke up while he was trying to help her. “The sad thing is, I was most concerned about my rating, because it was below a 4.7,” he said. (The woman woke up and ran into her hotel, Julian said; he doesn’t remember if she left a rating.)
This sort of rating anxiety extends well beyond Uber and Lyft. “The rating system is terrible,” said Ken Davis, a former Postmates courier, who noted that under the company’s five-star rating system couriers who fall below 4.7 for more than 30 days are suspended. Said Joshua, another Postmates courier, “I really don’t think customers understand the impact their ratings have on us.”
“I really don’t think customers understand the impact their ratings have on us.”
Instacart uses a five-star system, too; shoppers whose rating is in the top 25% of their region earn a $100 bonus. Shoppers say in most regions, just one rating that isn’t a perfect five stars usually disqualifies you for that week’s bonus. “It’s unbelievably annoying to wake up and see that a customer complained about something and you know it’s either not your fault or not true,” said Liz Temkin, who shops on Instacart in Los Angeles. (Temkin is a named plaintiff in the recently settled Instacart class-action lawsuit.)
Instacart had not provided a comment by the time this story was published. Postmates did not respond to multiple requests for comment.
The problem is, for an Instacart shopper to earn a bonus or a Postmates courier to keep their ratings up, they need the vast majority of their ratings to be five stars. Some savvy users (read: millennials) know this, and are sparing with their four- and three-star ratings. “Unless they’re super rude or weird, I tend to give everybody five,” said Kristen, a visitor to San Francisco who had just stepped out of a Lyft in Union Square. “That actually means something on the app. I don’t want to mess up their life, you know?”
But not all customers are so well informed. Wendy and her son Brian, visiting San Francisco from Indiana and using Uber for their first time, were surprised to hear that most drivers consider four stars to be a bad rating. “I would have thought 5 is excellent, and 4 is good,” Wendy said. That revelation was equally shocking to Elnaz, a longtime Uber user visiting San Francisco from LA. “Four stars sucks,” she said, incredulous. “Really?”
“Customers don’t understand the impact ratings have on couriers at all,” said a former Postmates community manager, who requested anonymity while discussing her previous employer. “A customer might rate a delivery three stars, assuming that three stars is fine. Several three-star ratings could bring a courier’s rating down significantly, especially if they’re new. It could even get the courier fired.”
Matthew Smith is yet another Uber and Lyft driver who, frustrated with the five-star rating system, took it upon himself to draw up a custom explainer for the back seat of his car. Smith’s is succinct, and reads “5 stars = This ride was acceptable or better, 4 stars = this driver should be fired.”
“I have consistently had riders blown away that giving me a 4 was such a bad thing… they really do feel that a 4 was a good ride,” Smith, who lives in Colorado, wrote via email. “Since having this sign up, I have had about 35 rated trips, all five stars.”
Uber and Lyft both say the vast majority of drivers do get five-star ratings. But while they argue this is evidence that most drivers are doing an excellent job, it might actually be further proof that the five-star rating system doesn’t work.
Some ride-hail passengers say they give drivers five stars because they’re worried about what might happen if they don’t. “I always give five, unless they’re really rude or something,” said Golda, another Uber passenger. “I actually heard that even below a four or five, they can get in trouble. They’re just trying to earn some money, so it has to be pretty bad for me to give a bad rating.”
David Celis, a software engineer at Github who used to run a beer-rating website, says it’s not just empathy that causes people to give a lot of five-star ratings. It’s also because five-star ratings systems in and of themselves lead to choice paralysis. “The more options are presented within a rating system, the more mental effort it’s going to take to give a rating,” he said.
Back in 2009, YouTube found that “the overwhelming majority of videos on YouTube have a stellar five-star rating.” Shiva Rajaraman, then a product manager, started to wonder if there was something wrong with their feedback system, which was “primarily being used as a seal of approval, not as an editorial indicator of what the community thinks about a video.” Six months later, YouTube replaced its five-star ratings with a thumbs-up, thumbs-down system. If Uber and Lyft were to adopt a simple thumbs-up, thumbs-down rating system, Celis said, “on the consumer end, it would be a much better experience.”
The other problem is that not everyone can agree on what the star ratings mean — not even the companies themselves. Lyft says that five stars means “awesome,” four means “Ok, could be better,” and three means “below average.” But for Uber, five stars is “excellent,” four is “good,” and three is “OK.”
Individuals have different interpretations, too. “For some people, three could mean this is good, while four is great and five is perfect. Some people might say, nowhere is going to be perfect, so I’m going to say five stars is really good, and four is good,” Celis said. “The way you can interpret those stars is infinite, and most people don’t have the exact same system.”
Five years before Uber even existed, Yelp popularized the use of five-star rating system for reviewing restaurants and other businesses. “On Yelp, anything four stars or above is very good. Three to four stars is, it might be worth your time. Less than three stars, that’s where you start to see businesses actually fail,” said Darius Kazemi, a computer programmer and former elite Yelp user. But because of the artificial cutoff use by Uber and other apps, that system doesn’t map perfectly to the gig economy, which leads to confusion for consumers. “The Yelp cutoff for ‘You’re fired’ is three. That’s the point where you see businesses lose money. That’s a lot lower than Uber’s parallel cutoff.”
If people ascribe different meaning to the five-star ratings, and the ratings functionally mean different things depending on what app or website you’re using, it seems unlikely that the data these rating inputs produce are very meaningful. Some labor marketplaces, recognizing this, have started experimenting with ways to lessen the impact of five-point rating systems on their workers.
Rinse, an on-demand laundry service, used to text customers asking them to rate their delivery person from one to five, but the average score “basically hovered around 4.9 over the entire time period we tested it,” said co-founder Ajay Prakash. Prakash determined that the texts, which very few customers responded to anyway, weren’t producing data of much value, and scrapped them. Another example is Managed by Q, a startup that dispatches field operators to clean and manage office spaces, which stopped asking customers to rate workers partly because it created tension with clients. “Five-star review systems on their own are not good barometers of individual performance,” said Director of Product John Cockrell in an email.
“I was like, Holy shit! The guy was nice, I wish I hadn’t done this.”
But in the world of online work, the five-star rating system remains pervasive. On sites like Fiverr and Freelancer.com, ratings left by clients affect freelancer search rankings. Feedback systems on sites like these tend to have more components than gig economy apps, but the impact is similar: the lower your rating, the lower your search rank — and the less likely you are to book a lucrative gig. Said Freelancer.com’s CEO Matt Barrie, “It’s kind of like Uber.”
Michael Truong is a senior product manager at Uber, where he’s working on improving the company’s rating system. “We’re really trying to understand what riders’ feedback is for a ride,” he said.
Truong said that Uber once considered switching to a thumbs-up, thumbs-down system, but decided against it. “The emotional burden riders have, where they feel like their driver is going to get deactivated if they give a low rating, pushes people away from a thumbs-down,” he said. “So we would have no opportunity to relay that feedback to drivers.”
For the last few months, the Good Work Code has been compiling research on how to build a better rating system for labor platforms. “The managers of the company want information about how a job or gig was done, and the customer wants to offer feedback. But how do the workers actually get information that allows them to succeed and thrive in these working arrangements?” asked Palak Shah, director of the Good Work Code’s parent organization, Fair Care Labs. “It’s our sense … that there’s a lot of opportunity for growth and improvement.” The report — which recommends transparency, human interaction, processes for disputing ratings, and system that’s more dynamic than “on a scale of one to five” — is supposed to be published in the next few weeks.
John Gruber, publisher of Daring Fireball, is among those who believe that five-star rating systems don’t produce particularly useful data, and that generally speaking, binary systems are better. “There’s no universal agreement as to what the different stars mean,” Gruber said. “But everybody knows what thumbs-up, thumbs-down means.”
A few years ago, during a trip to Orlando, Gruber had an experience that made him realize how this confusion over what the stars mean can impact individuals in ways customers don’t realize. After taking a ride in an Uber that had an overpoweringly strong smell of air freshener, Gruber gave the driver a four-star rating. The next day, he got a call from an Uber employee asking him to explain what the driver had done wrong.
“I was like, Holy shit!” Gruber said. “The guy was nice, I wish I hadn’t done this.”