The surprising truth about Amazon and Booking.com’s culture of experimentation

Written by:

Dan Croxen-John

Last updated: 23 January, 2023

Please note this article is in the process of being updated

The story of Amazon and Booking.com reminds me of David Beckham’s goal from a 30-yard free kick in England’s qualifier against Greece for the 2002 World Cup. In the game’s 93rd minute his kick sent the ball sailing past ten defenders into the top left-hand corner of the goal. Beckham said of the goal, ‘people focus on the kick, what they don’t see is the thousands of hours I spent on the practice field taking one shot after another’¹. His mother said the same thing: ‘He was always out there, rain or shine, practising, practising practising’².

For so long Amazon and Booking.com have been held up as the poster children of digital experimentation. The number of tests they run each and every day runs into the thousands. For an organisation not at this level – and so very few are – this ‘test everything’ mantra might feel overwhelming, daunting and err, almost impossible.

But if you dig a little deeper the truth is rather surprising.

While Amazon and Booking.com are clearly experimenting at great scale, they weren’t always doing this, and more to the point they took quite some time to even start testing – an average of 7.5 years from their websites going live to the first recorded mention of a disciplined approach to experimentation and testing.

At any given moment, Booking.com is running over 1,000 experiments³. It’s part of their culture: 80% of their product and technology teams have launched experiments³.

It’s tempting to look at the size of Booking.com experimentation efforts and be overwhelmed. But Booking wasn’t always the size it is now, and the number of experiments it ran wasn’t alway this huge. Its website went live early January 1997 and on February 8th a customer made their very first booking⁴.

However, from the early days, senior managers believed that experimentation beat opinion.

As Gillian Tans, former CEO, said:

‘Many companies start with a nice product and market it all over the world. Booking did the opposite. We had a basic product and then worked hard to get it right for customers. But figuring out what they like is hard. We got it wrong so many times⁵.’

Booking learned the hard way that getting it wrong is painful and there’s another way of doing it – but this wasn’t immediate.

In 2004, some years after launching, one of their engineers attended a conference where he saw Microsoft’s Ronny Kohavi speak about experimentation. He realised that experimentation was exactly what they needed at Booking.com to settle constant, time-sucking arguments about the right thing to do.

So they started running simple tests to learn which options customers preferred without any big technology behind it, and then built the product based on their preferences. In fact, many of their first tests were concerned with avoiding revenue losses, or reducing the risk of it. As Gillian says, ‘We grew like this, without any marketing or PR, just testing what our customers liked’.

In 2005, Booking.com finished developing an experimentation platform that allowed it to scale testing as well. Adrienne Enggist, Director of Product Messaging, recalled, ‘I came from small businesses where CEOs launched a big product redesign every six months, and by the time you rolled it out, it was hard to figure out what worked and what did not work. Here, the team was small, fitted on one floor, and it was exciting to see everyone take risks, push small changes very quickly, and use experiments to measure the impact’⁶.

The shift to experimentation took time

So it took some time, several years in fact, for Booking.com to shift to experimentation. But once it did, adoption was universal, complete and fundamental to the company’s success.

Director of Experimentation Lukas Vermeer noted, ‘We call this evidence-based, customer-centric product development. All our product decisions are based on reliable evidence about customer behavior and preferences. We believe that controlled experimentation is the most successful approach to building products that customers want’.

Today Booking.com has 17,000 employees with estimated revenues of $13bn, and the website features over 28 million listings and is available in 43 languages.

To recap; important milestones

1997 – Launch of Booking.com
2004 – First use of controlled experiments
2005 – Launch of internal experimentation platform

Delay between launching website and first controlled experiments: 7 years

Similar to Booking.com, Amazon didn’t immediately start using testing and experimentation. There was a long gap before it started. In fact it was 8 years.

Conventional wisdom says that Bezos started his business in July 1994 began testing immediately on his website visitors and the rest is history. That’s overlooking the facts in order to make a neat story. It’s a narrative fallacy, a term coined by Nassim Nicholas Taleb in his book, The Black Swan⁶.

The narrative fallacy describes how humans are biologically inclined to turn complex realities into soothing but oversimplified stories. Reading the story of Amazon, it’s easy to fall for this fallacy.

Bezos made plenty of improvements to Amazon’s website with no apparent mention of testing these new features as controlled experiments. The inclusion of customer reviews, personalised recommendations, top sellers lists, Free Super Saver Shipping, a forerunner of Prime: none were tested⁷.

Unlike Microsoft, who put a clear measure on the financial impact of speed improvements by split-testing⁸, in Amazon’s early days, it was ‘define-build-release’. Clearly some of these bets paid off. Some didn’t.

In 1998 Amazon introduced a feature, ShopTheWeb, where users were given links to lower-priced items on other websites – and off they went – to these other sites⁹. While ShopTheWeb morphed into the highly successful Amazon Marketplace – with visitors staying on the Amazon website – there were other failures. K9, Amazon search, the FirePhone and the Endless.com shoe site are the famous ones.

Only in 2002, eight years after its website launched, was there evidence of testing – where Amazon decided – based on split-test results – to use algorithms to make personalised recommendations rather than human-powered editorial teams¹⁰.

Even in 2006, there were examples of senior managers not wishing to test important changes, such as product recommendations in the basket. Greg Linden, an Amazon developer, was nearly blocked by his SVP from running this test¹¹. Fortunately he did, as for Amazon the increase in revenue was profound. According to McKinsey the addition of product recommendations has added 35% to Amazon’s sales¹². It has also been a much copied feature.

Amazon created its own experimentation platform

It wasn’t until 2011 that Amazon created its own experimentation platform: Weblab. In the first year, the platform was used to run just 546 experiments, against 1092 in 2012 and 1,976 in 2013. Amazon now runs over 12,000 experiments each year to continuously improve customer experience¹³.

To recap some important milestones:

1994 – Launch of Amazon.com
2002 – First use of controlled experiments
2006 – Greg Linden nearly blocked from running his test of in-basket product recommendations
2011 – Launch of Amazon’s experimentation platform, WebLab

Delay between launching website and first controlled experiments: 8 years

Conclusion

At the time of writing this Amazon and Booking have been trading online for a total of 49 years, and running controlled experiments for 32. Just like David Beckham they have been ‘practising’ for quite some time and like Beckham’s fans we have been distracted by the results rather than the practice.

On a positive note, the technology that Amazon and Booking.com had to build from the ground up is available to all and the methodology of build, test, measure, learn is well understood. We have seen a shift to experimentation as a dominant force behind innovation and being customer-centric.

Right now, you have access to this technology and approach and you don’t have to wait the average of 7.5 years to use it like Amazon and Booking.com did.