Explore vs Exploit: Finding the Balance in CRO
The discussion between “Explore” and “Exploit” experiments has become increasingly popular among experimentation circles.
Even in common language, we find plenty of examples of this divide: “who dares wins”, “go big or go home”, “slow and steady wins the race”, “slowly, slowly catch a monkey”. So, while the topic isn’t new, it is important because Experimentation is largely about managing risk and making decisions.
There’s a lot to say, so I decided to expand on my contribution to Convert’s piece in this separate article with the help of my good friend John Ostrowski, Director of Product and Experimentation at Wisepublishing.
The “Exploration-Exploitation dilemma”
The “Exploitation vs. Exploration” discussion is simply a “new” fancy way of referring to a problem that is as old as making decisions about efficient resource allocation. It is, in fact, a name we have borrowed from probability theory, very likely via product management, data science, and financial services circles. In their article “The exploration-exploitation trade-off: intuitions and strategies”, Joseph and Baptiste Rocca provide a comprehensive definition for both:
“Exploitation consists of taking the decision assumed to be optimal with respect to the data observed so far. This safe approach tries to avoid bad decisions as much as possible but also prevents from discovering potential better decisions.
On the other hand, exploration consists of not taking the decision that seems to be optimal, betting on the fact that observed data are not sufficient to truly identify the best option. This more risky approach can sometimes lead to poor decisions but also makes it possible to discover better ones, if there exist any.
The problem of choosing between exploitation and exploration can be encountered in many situations where observations drive decisions and decisions lead to new observations.”
A common exemplification of the dilemma is the “Multi-Armed Bandit problem”, a term popularised by Experimentation platforms a few years back. Wikipedia defines it in this way:
“The multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice’s properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice.”
Both the above articles provide a number of different approaches to decision-making using multi-armed bandits, from greedy, to e-greedy, Optimistic initialisation, Thompson Sampling, Bayesian approximation – each one with their advantages and drawbacks.
Why does this matter right now? Hold that thought a bit longer!
John’s take: Exploration vs Exploitation, a challenge to mathematicians and invincible companies
John talks about the explore-exploit conundrum in the following way:
“I first heard about the exploration vs. exploitation dilemma going through numerical methods during my engineering days.
Iterative methods were the core of the discipline, used to find the maxima (or minima) of a function when analytical methods are either too difficult or impossible to use. The basic idea is to start with an initial guess and then repeatedly apply a process that will hopefully converge to the maxima.“
Does that sound familiar?
The major shortcoming of basic iterative methods is their tendency to get trapped in local maxima and missing to find the highest possible value in the entire domain, the global maxima.
Incorporating mechanisms to escape these local peaks is a common workaround
What does that mean for conversion programs and product roadmaps?
What mechanisms do we have available for escaping the trap of optimising the status quo?
“I found the answers through Strategyzer, and that was my second encounter with the exploration vs. exploitation dilemma, now from an innovation management point of view.
To build an Invincible Company, you need to be able to operate in two very different modes with different levels of uncertainty.“
Exploit — For existing business models, you operate with relatively high certainty, and it is possible to make accurate forecasts about sales, and predictions about growth. These business models can be managed and improved through detailed planning and proper execution.
Explore — Searching for new value propositions and business models in an environment with high uncertainty. And the further an innovation is from your core business, the higher the uncertainty. Forecasts and plans make little sense in this uncertain environment, which is why a different financial approach, skillset and culture are required.
Exploration vs. Exploitation in CRO
Based on what we’ve seen so far, it is easy to understand why the terms are applied to the topic of balancing an experimentation portfolio:
- There is a limited set of resources: limited traffic, limited time, limited people;
- There are two or more competing alternatives to invest our resources; some ideas are complex, some are easy; some are groundbreaking, some are conservative;
- We only have partial starting knowledge: we know things that work because we’ve tried them; we don’t know if there are better alternatives because we haven’t tried them;
- Only trying things we haven’t tried will tell us if these solutions were better, but they may not and we would have wasted our resources.
So, in CRO terms, the most common interpretation of the “Exploit vs Explore” terminology would be the one that differentiates experiments mostly from the point of view of risk vs reward. I argue that this differentiation is more of a Product Management problem than a CRO problem, but that’s a can of worms that we can open some other time.
In a nutshell, by this first interpretation:
- Exploitation is about continuing to do what we do, but better and faster. The common understanding is that we aim to minimise risk by incrementally improving our product using data that we believe to be true given our observations so far. It is about obtaining marginal gains, critical for any business.
- Exploration, on the other hand, is about fundamentally changing what we do in search of a better way we don’t know about yet. It is about taking big leaps to discover the unknown, therefore accepting higher risk to acquire new knowledge which could result in bigger rewards, but also losses, by testing new things.
Exploration is where new, groundbreaking ideas come from. Needle-moving changes that give us an edge over the competition.
I’m a bit of sucker for semantics, and I find that there is another valid interpretation, similar to the one above yet nuanced, that separates exploit and explore experiments not based on risk/reward but rather on the goal of such experiments.
This second interpretation is somewhat akin to that of “CRO” vs “Experimentation” (the way I differentiate them them, anyway). Thus, they could be defined as:
- Exploitation is about optimising, a.k.a. getting more juice out of a product or process (Conversion Rate Optimisation)
- Exploration is more expansive in that experiments can be simply used to learn something we don’t know, without the express need to extract any additional juice.
Put simply then, by this second definition exploitation would be about making the right decisions, whereas exploration would be simply about learning new things (even if this may be a bit of a purist concept)So, what’s the right balance then? Should companies do only exploitation? Should they only do exploration? should there be a mix?
Balancing Exploration and Exploitation in Experimentation
In every situation in life, we find people along the entire exploration-exploration spectrum, not only because each one presents its own challenges but also because people have different levels of risk aversion, and these decisions are made by people.
In the case of Experimentation, more specifically, and regardless of which of the previous interpretations one prefers, most practitioners will likely argue that a good experimentation program is one that combines both because experimentation is about learning, decisions and managing risk. I think in many ways this is a biased response, though:
- There is an inherent selection bias in the sample. In a sense, most people in the experimentation field are there because they want to partly related to common behavioural traits among experimenters (most people in experimentation like to tinker with new stuff!), partly due to peer/client pressure (people like to work on new, shiny things) and partly because of
- There is also social desirability bias: coworkers/clients/peers are likely to want new, shiny things, so experimentation practitioners are not likely to want to displease everyone else in a room!
Biased or not, right or wrong, the explanation of why a “balanced” portfolio makes the most sense is relatively simple.
If we think in terms of “risk/reward”, just like managing an investment portfolio, managing an experimentation program is about finding the right balance between “safe bets” with small but steady returns and “big bets” for a chance to win big.
Invest all your resources in exploitation and bolder competitors will eventually overtake you. Invest all our resources in a handful of exploratory big-bets and, given that about 8 in 10 business ideas fail, risk losing it all with a bad hand.
If we think about it from the angle of “optimise vs. learn”, having a healthy balance of wins and learnings is important because a company needs to make money to remain in business, but it also needs to understand its customers to better serve them and stay current in an increasingly competitive word. What got you here, won’t necessarily take you further.
And while we in the business would likely argue that the main role of experimentation is to provide learning, very few of us can deny that bill-payers want ROI. If we want to keep learning, we must ensure that ROI keeps us in business.
How to decide on the optimal balance?
Remember I asked you to hold the thought of “Why do I care about data science and multi-armed bandits”?
If data scientists and statisticians, dedicating thousands of hours of study to these decisions alone, cannot give a simple answer as to which is the best method to use, how can we presume to be able to tell?
The short answer is that there isn’t an answer, or rather… “it depends!”
For example, AWA digital colleague Brendan McNulty has this perspective:
“Conventional wisdom says there’s a call for a balanced approach between leveraging current opportunities (exploitation) and discovering new possibilities (exploration).
While this strategy works well in standard market conditions, there are certain situations that might require a greater emphasis on exploration. Situations such as market saturation, shifts in customer attitudes, and the rise of new markets could all drive your strategy towards a stronger exploration focus.
Your campaign’s maturity, the volume of tests you conduct, and the particularities of your business also play a role in deciding the balance. Short-term victories matter, but they often result in incremental improvements rather than unearthing major challenges or opportunities.
My suggestion? Seize the moment and tip the balance towards exploration. A 2/3 exploration to 1/3 exploitation ratio might be a daring approach, but it’s often through pushing boundaries that we find the most valuable insights and innovations.”
While Brendan is partial to a ratio in favour of Exploration, he recognises the need for a balanced approach.
Optimisation Consultant and another colleague at AWA digital Dave Mullen, while also recognising the need for both approaches, has a slightly different view on balance and priority:
“You definitely need both big leaps and iterative polishing tests. Being deliberate about when to add polish to an existing feature and when to make a bigger leap is key. Let’s say feedback shows a feature is giving users a hard time. What do you do?
Option1: Improve it. Concentrate on the areas that are troubling users the most. In most cases this is by far the quickest and cheapest option, so usually a good place to start.
Option 2: Replace it. Big leaps can lead to big improvements, but they can also eat up time and money.
Unless the evidence is overwhelming, I’d start with Option 1. If your adjustments fall flat, that’s not failure. It’s proof that a larger change might be necessary. On the other hand, if your tweaks hit the mark, you’ve got a stronger contender as a new baseline against any major overhaul. You can now review the business case for the big leap, from a stronger position”
Essentially, the reason why we face these choices in the first place is that resources are limited. Explorative changes are often big pieces that need more resources and time, so if all our focus is on these, we will miss many opportunities in the meantime.
But even though incremental changes are often faster and easier to deliver, meaning we can keep our pipeline full, they can also clog the system with so many small changes to make, not leaving space for needle-moving changes.
Using Ruben de Boer’s formula, “success = chance x frequency”
If we are not maximising frequency, we are missing half of the equation!
One must admit, though, that “It depends”, while true, is unlikely a very helpful answer for anyone trying to find this balance, so we share below a few frameworks that may help you guide your thinking. As I said previously, they are more “product strategy” frameworks than CRO-specific frameworks, and they all tackle this issue (some more directly than others).
John’s wisdom: Managing your innovation portfolio, the 70:20:10 rule
Firms that excel at total innovation management simultaneously invest at three levels of ambition, carefully managing the balance among them.
Core to Transformational is another frame for the explore and exploit spectrum that mature companies use to understand the portfolio split of their business initiatives.
We encourage Product leaders to get inspired and map both their portfolio of ideas as well as product roadmaps using a similar framework.
Now, what is the ideal distribution? Is there a Golden Ratio?
Larry Page, Google’s co-founder popularised the 70-20-10 in a Fortune magazine interview that the company strives for a 70-20-10 balance, and he credited the 10% of resources that are dedicated to transformational efforts with all the company’s truly new offerings.
Analysis reveals that the allocation of resources shown correlates with meaningfully higher share price performance. For most companies, this breakdown is a good starting point for discussion.
4D Product Strategies
The first framework I propose, taken from my time as a Reforge Alumnus, is the “4D Product Strategy”, which looks at the product strategy from 4 lenses:
- Strategy: Work that aligns with your product strategy, building features
- Vision: Work that gets your business closer to realising its vision
- Business: Work geared towards improving critical input metrics
- Customer: Work that aims to deliver improvements directly identified by customers
The Strategy and Vision lenses are more intrinsically related to growth. They are about big bets that may not have an immediate impact on core KPIs but are necessary for the evolution of the product. They are therefore more related to Exploration.
The Business and Customer lenses are what most would consider BAU. They are geared towards creating sustainability, and they are therefore more related to Exploitation.
The balance of these 4 is up to Product Leads/Managers, but it needs to ensure that none of these lenses is forgotten or sub-utilised to build strong, long-lasting products.
From this point of view, we could see that Dave Mullen’s take above prioritises the Business and Customer lenses unless strong evidence suggests otherwise, whereas Brendan McNulty advocates for a stronger emphasis on Vision and Strategy work.
Experimentation portfolio evolution
Another Reforge framework is the “Experimentation Portfolio Evolution”, which talks about how to balance experimentation effort based on the 4 core stages of product and experimentation process maturity:
- Companies at pre Product-Market Fit stage should invest all their efforts in Exploration. There isn’t anything to optimise, so it’s all about big bets at this point. A cold start problem (yet another term borrowed from data sciences).
- Companies that have achieved PMF but are still relatively young are likely to have a lot of room for improvement both their processes and products. A lot of low-hanging fruit justifies programs heavily biased towards exploitation (75/25), and since processes are not mature teams tend not to be very efficient in delivering experiments (mostly manual processes)
- Companies with maturing processes and products are likely to benefit from an approach that includes 30% to 40% exploitation. The exploration efforts from this and previous phases will likely still require plenty of improvements, but teams become faster at delivering experiments (thanks to more efficient processes and experience) and therefore they can deliver the same volume of exploitation experiments in less time, freeing up time for the remaining 60% to 70,% dedicated to Exploration, which at this point start including both new ways to do what we they already do (big bets) and finding new things to do (validation of new revenue streams, products…)
- Companies with consolidated, mature products and processes will likely dedicate around 50% of their time to big bets (doing the same things in better ways) and around 25% to each validation (growth, diversification) and exploitation. At this point, with very mature and refined processes, automation, etc, teams are much quicker at delivering experiments, so this 25% exploitation is likely to produce even more that the original 75% in phase 2.
This framework considers, therefore, that a certain “absolute” volume of exploitation is always to be maintained, but deals with the problem by reducing the resources need to deliver the same results thanks to refinement and efficiencies.
In a nutshell… it depends!
Balanced portfolios in real life
As we said earlier, there isn’t a one size fits all so the only way my team can deal with these situations is by working very closely with our clients to understand their circumstances.
Since each client is different in how they organise and make decisions, our role as consultants is to closely collaborate with key decision-makers, digging deep to understand their business challenges, figuring out what keeps them awake at night, and supporting them by designing experimentation programmes that help them make the right choices while moving them towards improved operational efficiency.
An example I shared in the Convert.com article is that of a well know customer of ours that we’ve been working with for years.
Their core, market-leading products continue to be their main revenue drivers. However, consumer needs and preferences are shifting so their strategic focus is to adapt and capture market share, to avoid becoming obsolete.
They spend considerable amounts of resources and effort in designing and developing new services and products, and we support them by testing their value proposition, market acceptance, etc. These are unlikely to bring immediate revenue, but the data gathered is essential to their strategy. We explore.
However, for a client-agency relationship to last as long as ours has lasted, the programme must also bring monetary ROI. While we run these exploratory experiments, we ensure that the programme pays for itself by continually “exploiting” the existing flows, products and services to squeeze the extra £££ juice.
By combining explore and exploit initiatives, we keep all swim lanes full and avoid wasting “untested traffic” as much as possible
To make this possible, at the beginning of each season, we agree on both velocity and ROI targets. We divide experiments into “Revenue generating” and “Non-revenue generating” and ring-fence resources so that no stream is neglected. When calculating ROI we only include revenue-generating experiments so that we get a fair picture.
Hence, my advice is to agree on the explore/exploit balance upfront with key stakeholders so that resources can be allocated accordingly and results measured fairly. Google may suggest 70-20-10, but this will differ for each company and the stage they’re in.
Start by clearly defining the different “streams” of experimentation so that results can be fairly measured and compared. If everyone in the room has a different interpretation of what the definitions mean it will lead to misalignment.
To this end, before joining AWA as Head of Optimisation, John Ostrowski, Andrea Corvi and I created the “The experimentation framework” for our company, which made us runners-up for the Experimentation Culture Awards in 2021.
It was an operational framework that divided all experimentation initiatives within the business in three main categories:
- “Product initiatives”: supporting decisions that product teams were making, some big and some small, with experimentation (a mix of explore and exploit)
- “CRO” experiments: fast, agile and mostly small improvements aimed solely at squeezing as much juice as possible from our existing experiences (exploit)
- “Safety net”: experiments aimed at measuring the impact of decisions that had been made without prior testing to understand risk (explore)
With these in place, we were able to improve our reporting and our understanding of the performance of the different streams to refine the allocation of resources over time, improving efficiency.
So yes, “explore” vs “exploit” isn’t a new thing, but it’s definitely an important one: wherever there is a limitation of resources, the right allocation to achieve our goals will always be key.
There is general agreement that achieving a balanced portfolio of experiments is crucial to foster innovation and drive business growth. By combining both exploratory and exploitative experiments, organisations can manage risk, make informed decisions, and optimise their processes.
However, there isn’t agreement on what the “right balance” is because, in reality, there isn’t such a thing: YOUR balance will depend on many factors, from industry dynamics, market conditions, the maturity of YOUR organisation and, let’s not kid ourselves, the risk appetite of decision-makers. You can clearly see this in the quotes and references provided, where each practitioner has a different view.
I may be accused of taking the safe “it depends” stance but, if you think about it, experimentation is about trying things and finding what works in your specific case. Rather than categorical answers, I prefer to provide frameworks that can be used to make decisions, peppered with a few examples and testimonials.
I hope you will find them useful.
Is your CRO programme delivering the impact you hoped for?
Benchmark your CRO now for immediate, free report packed with ACTIONABLE insights you and your team can implement today to increase conversion.
Takes only two minutes
If your CRO programme is not delivering the highest ROI of all of your marketing spend, then we should talk.