This blog has been produced from one of our CRO and Experimentation webinars earlier this year.
Our experts, Johann van Tonder, COO of AWA digital, and Matt Scaysbrook, Founder of We Teach CRO answer your burning CRO and Experimentation questions.
I started out client side about 8 years ago, first job I ever had, the company had employed Google Analytics on the site and nobody knew how to use it, so I decided I should be the one that learnt how to use it, so I was no longer the graduate that knew nothing and only took advice from everyone else.
That interest and experience in data and analytics then led me into customer experience roles, client side with Compare the Market and then BT, bit of time at Betfair as well. Got it all from the client side then thought, ‘do you know what, I really want to know what the other side of the fence is like, spending time there’. At each of those businesses we had external partners that we worked with so I then moved into the agency side, or vendor side. First with Webfence and then with Maximise who had recently been bought by Oracle at the time.
A couple of years on the vendor side, working specifically with vendor tools and then set up We Teach CRO in January 2017, seems like a long time ago. Since then I have worked with clients from GoDaddy to Nando’s, down to some smaller start-ups as well, and then some other businesses across the UK, US and in Europe. Pretty wide ranging and I think it gives me a broader perspective, not just from an agency side but also how vendors do things and how it works client side with in-house teams as well.
I started doing A/B testing 12 years ago and this was before the tools that Matt referred to, were around. I was a Beta member of Optimizely in the early days. But even before that we were testing by hacking things together. It started when I was working in a corporate with many ecommerce properties and online businesses and chasing growth like anyone else and I had recently done an MBA but it was only getting me so far.
I needed something else, more than those models and theories that I had learnt at school and I found the most amazing thing, it is as simple as talking to your customers and understanding the people that you are trying to sell to. We started to do that in a very disciplined way to try and understand the role of the customer. Try to understand what do they need and how are we best placed to serve those and then not just deploying and hoping for the best but testing.
I remember in the early days being amazed and also it feels uncomfortable initially, realising just how often it doesn’t work the way you thought it would work, but it was remarkable the difference in performance following this strategy. Then later on, when I left corporate, I ran an ecommerce business and found the same principles working really well so understanding your customer, understanding what is wrong with how you are currently trying to serve their needs, figuring out how you can do it better and then testing it. Validating all your core assumptions is really what it comes down to. So if you have an assumption, and business is full of assumptions, validating those is core, as it could sink the ship or it could steer you down a path where you shouldn’t be then you should be validating it and that is how I got into it.
MS: Having worked in both sides of the fence in numerous different places, it is different but I think it is worth touching on what some of the advantages are of it as well. I think a lot of the pro’s are also the con’s in a way.
You have a total focus. You know your own site inside and out, in a way that no outside agency would probably know as expertly as you do.
You spend some much time looking and working on your website, that sometimes you can’t see some of the major issues with it, because you know how it works so clearly. When you go through the site, you click through certain aspects of it so fast, because you know where everything is, that you are not reading or analysing it, and you are not looking at it with those fresh eyes.
MS: One of the other things that I found, when I was first doing my in-house roles, I was still pretty junior and I would go to my managers, and say, where else should I look and a lot of the time it was “what do our competitors do?” and there is a lot of value in knowing what your competitors are doing.
In a lot of businesses it is about you expanding the business, because you take share from the competition, especially in utilities world, so my time at Compare the Market, BT, they are both utilities, the penetration of those markets is quite high, so that means that the only way you can get more business, is if you take it from someone else. But the downside of that is if you are only getting your ideas from the competition, you will only be as good as they are.
To lay this out I have seen this agency side as well, where teams are divided in to sector specialisms, which has some advantages in there, but the downside tends to be that you only look at what happens in the same sector. What I have found over the years, working over multiple sectors is, there are some really cool things that if you take them from one sector and move it into another sector that the principles still work and if you are in-house make a habit of looking at things that are completely different to your sector or industry. Look at things that are a different geography as well. Because there are some particularly wild differences there. Try and be broad in what you look at as much as possible.
JVT: I think one of the biggest challenges that any in house team can face is that lack of buy in and lack of support in culture. So when there is not that mentality to test and win, the HIPPO’s rule (Highest Paid Persons Opinion). I think that as an outsider you are able to crack through that in ways that in house teams don’t have the ability to do, because you don’t want to challenge your boss maybe, but I would say that is the biggest blocker.
MS: I have a good story on that one! From my in-house days. I think I had been working for the company for 2 weeks and I am not going to name who the company is, I had come up with a different way that I thought we should sell the new product that I was working on. I went to my manager and their response was “it is too complicated” and it was complicated, don’t get me wrong and it would have taken a lot of effort, but I persisted for about a month and got nowhere. Eventually I decided to walk across town to the big global agency that we worked with for customer service stuff and I told them my idea, and I said “in our next team meeting with the agency”, with the whole digital agency and team meeting, we had a big meeting on a Friday every week.
I said “can you pitch this as yours, don’t tell them that I came to you and ask you to do this, but can you pitch it, if you believe it is good” and they did believe it was good. They did pitch it and within a couple of weeks it was being done. That works completely to what Johann was saying there. Sometimes having someone external there, who is not in that chain of command is a way of getting things through in a way that can be quite difficult.
JVT: It is a funny thing, because when you think about why you are testing it, when you start out testing, maybe it is all about the ROI, maybe it is about the money you are making, how you will prove the uplift, but there are other reasons as well and one of the reasons is to end debates. You don’t have to debate these things, you don’t have to debate about whether something is going to work or not – test it, get the answer and then make the decision. But to come back to the initial question, that is probably the worst thing that can happen to you as someone working in an organisation, is not having that support or even less about the support, more about the mentality, the underlying way management thinks about this.
JVT: So initially you are going to start with a lone ranger so you are going to start it with a champion, and maybe the title is analyst, maybe it is optimiser, maybe it is data scientist, or UX, but the essence is the same, your mandate is to user research, look at data and then formulate a hypothesis and then execute those by running tests. You are starting with that one person and then hopefully gradually that team will grow, you can have various specialist roles within that team, maybe 2 -3.
A team of 5 is a good size for a big organisation and then there are 3 different ways you can place it, centralised, decentralised or so-called centre of excellence. I am not going to get into the detail of all of that, but initially it starts as a centralised role – it is one team serving all of the business units with an organisation and then as organisation matures, it moves towards that centre of excellence.
What I mean by that is you have pockets of talent in the various business units, but you have a layer on top, one big team where all the specialists sit and then they support all those individual teams and that is a good way of organising it in my experience.
MS: Yes I have seen in pretty big organisations, where they almost have an internal agency for it where they have got that centralised team and then each of the business units have got a certain proportion of that teams time and resource, split each quarter, based on the priorities for the wider business as a whole.
One of the core changes I have seen with inhouse teams is when they expand beyond the ecommerce manager, to include dedicated technical resources to build beyond things you could do in a WYSIWYG editor and that is when I have seen those in house teams go from maybe a 1-2 level of maturity up to your 3-4 s because they have both sides of the coin that you need to execute them.
JVT: It is worth saying Dan, in the big organisations with massive experimentation cultures, we are thinking here about LinkedIn, Google, Facebook and Booking.com and the names you regularly hear. Experimentation is entirely democratised, so 70-80% of the organisation are able to launch a test.
That is where the centre of excellence, central team that offers support is very important to make that happen.
Just to pick up on the point you made there Matt, we are touching on coding there and starting with the WYSIWYG so that is most of the tools, the testing platform will enable you to make basic changes without getting a developer involved, and that is a good way to start, but it is not something you want to rely upon for a long time. We can go into the dangers of that, but it is not the way you want to do this in the medium term. You use that as a means to get some quick wins to find your feet, or to get the buy in, and then as soon as you are able to make your business case based on those WYSIWYG wins, get some developer resource to get more meaningful tests out.
MS: I think with that part, our philosophy for that, there should never be a technical reason why you can’t run the right tests. The right test is the right test regardless of what tech or involvement you need. 99% of the time it can be done. It might need project management and planning to go around it but if it is the right test, it is the right test, don’t ever let technology, be that the platform you use or the technical resources you have to use, that should never be a barrier and if it is then that needs addressing.
JVT: How do you decide which tests to run, because there are always dozens of ideas if not more and which ones of those do you move forward on because you can’t do everything! You don’t have the capacity in terms of team, but you don’t also have the capacity of slots on your site. So there are a number of prioritisation models that are well established. I will mention just two, Pie is one Ice is another one we tend to build our own models depending on the situation because you know often you need something very specific but I will run through some of the principles.
The first one is really about at the most basic level, is value versus effort or cost.
Let’s start with the second one first, and in fact this comes back to the comment you made Matt, about building a big test that takes you a long time to build and it is full of complexity, so that is one of the factors, if you have two different ideas and they both have the same expected outset and one is complex and the other is super easy, of course there is no debate as to which one you are going to start with so you are going to bring complexity into account and especially initially if you starting to do this you are not wanting to start with super complex tests. So that is the one dimension that is often present in most of the prioritisation models. That is a constant one.
One more as I see things go wrong here quite often, is that when you run a test you need a sample size, you need enough numbers. In order to deliver a statistically valid result and for that you need numbers, volumes or a certain number of transactions. So one of the things you have to bring into consideration, no matter what model you use, is in this area of the site do you have enough traffic and again if you compare two different hypothesis, two different options, same level of complexity and one is targeting an area of the site where you have a lot less traffic, then I would favour the other one.
One more if I may, and that is also one that I see falling through the cracks quite often is the impact on the ultimate objective. Let’s say your mandate, your objective to drive revenue or conversion rate – you want to be sure that what you are testing has the ability to impact that, preferably directly but it needs to be at least indirectly and then also alongside that you need to be sure you can measure that. Sometimes you run a test and you can’t measure directly that correlation and the impact. It is those 3-4 things that I would say should be on the list to look at before you decide.
JVT: There are two ways I look at that, so one is the evidentiary base.
Let’s start here, if you have all hypotheses that you are testing should be grounded in some evidence, qualitative, quantitative or a combination of the two, it should not be a case of throwing mud against the wall and seeing if it sticks. That phrase I used earlier, you understand the role of the customer and opportunities jump out of that.
So that is the one signal I use but I am going to contradict myself slightly and say that over the years I have learnt that there is no one person in this world who is going to tell you which way a test is going to go.
Therefore, I don’t obsess over that and the question I ask myself is what is the evidence I have and if I have evidence then I go ahead and the second question is, can I impact this variable that I am interested in and how close can I get to say conversion rate. Let me give you an example to give you a little less abstract. If I am tasked with increasing the revenue on a site, so revenue per visitor, and I have got a typical ecommerce site so you have got the category page, the landing page and a basket page the checkout and finally you have got money in the bank.
The revenue generating activity sits at the end of that funnel that I have just outlined. If I am testing something at the top of that funnel, so homepage or category page, the distance to that revue generating activity is quite far, so the ability to impact that particular matrix is a lot weaker then say moving a step closer to the basket or moving a details page and so to answer your question, I wouldn’t obsess about that.
MS: I will take this one as we have discussed this at length before, and the fact that Matt rants on about how we should do ROI. The most common way is how much did we spend, be that resource with an agency or in house, whatever that might be, over what we made, and there is our X-1, 10-1, 100-1 whatever it might be.
There are a couple of elements there. How long are you basing that on, are you basing it on the additional revenue made during the test period between control and experimental experiments, are you projecting forward over a certain period of time, is that a month, is it 3 months is it 12 month.
What I have found working with a whole array of businesses is that it is about how that business choses to report numbers. I would never look to project over anything more than a quarter. I think there is way too much change that can happen if you go over that.
However, I know that some of the clients we work on now and have done in the past choose to project that over a year. Whichever way you choose to do it, just make it consistent.
I see more scale rather than the absolutes a lot of the time. Did this test is this test of greater scale and significance than this one over here, and as long as you are measuring that over the same period it is fine.
The other part, and this is the part that gets overlooked the most. What happens when you don’t win. What happens when you don’t generate more leads or revenue or subscribers or whatever else it might be and the value there is what did you save, if you hadn’t have had an experimentation programme.
This new thing, you may have rolled it out, if you would have just rolled it out but you didn’t, you tested and you found out it was going to generate a lot less revenue, what was the size of that potential revenue loss and that should really be going into your ROI calculations as well because, the ROI is about the value of the programme or even the discipline of the experimentation within your organisation.
So if you couldn’t have found out that you would have lost money, if you don’t have that programme, then that is a saving you have made. It is exactly what you would do if you were able to drive down a CPA in a paid media world, you would count that as ROI. I think in our world that is overlooked far too often.
JVT: I think it is an important question. One of the most fascinating conversations I have had recently was with a data scientist, he works with a big company, not going to mention the names but she spent 9 months looking at millions of data points for AB testing programmes that have been run over a decade, trying to answer some of these questions and out of that she calculated a default way which we approach this, but sometimes there will be a way, but as I say it is different for every business, but default for time horizon is definitely never 12 months.
She advises 6 months and in practice I find it varies between 3 and 6 months. Then something interesting that she also mentioned is to account for things like interaction effects for example.
Interaction effect would be if you run two different experiments together to what extent are they contaminating one another? Then they say there are 2 firm camps about this. Both camps are inhabited by very clever scientists with equally impressive PHDs who say exactly the opposite.
The one camp says there is nothing to worry about and the other says there is a lot of contamination and you have got to account for this. Her advice, having looked at the data, is to apply some sort of mechanism to account for that.
We use a standard recommendation of 30%. If we get an uplift of 100 we only count 70 and we will write off the rest to this affect whether it is there or not, because the effect could also be positive but we er on the side of caution.
One more is the novelty effect, the fact that over time the win you get today is maybe not as strong as it was today 6 months from now, and nobody knows, there is nobody on this planet earth that can tell you whether that is actually true or not for this experiment, but we have a model for that, a sliding scale that you start with 100 but you end up with 50, 6 months down the line, it tapers off.
There is another perspective to this question, so if you ask the question regarding ROI, the best answer I’ve ever heard is from a guy I love from Harvard Business School, Professor Stefan Thomke, who is one of the world’s authorities on experimentation. He has spent 20 years of his life researching this, writing papers on this and talking about it to the brightest minds and he knows a few things about this and when he was asked this question about ROI, he said the question is not how do I measure ROI or what is the ROI, he said the question is what is the cost of not doing it – how can you even talk about not testing, because you have 3 options really the way I see it:
ROI I think takes a different approach to that.
MS: This comes down to what you do with your results and also what you did to set the test up in the first place. The three of us here, we believe that CRO is not a project, you can’t complete it. Websites or any digital service declines in performance over time if you leave it alone, that is a fact. If you leave something alone, people will stop coming there and stop buying things. In order to offset that you have to constantly keep making small wins along the way, even just to keep level.
How that ties into the individual test, I went into this briefly earlier when you are looking at the tests you are about to run, you have to look at what tracking you have in there that will tell you something more than just if your experience is better or worse. That is great to know, you need to know if the test is going to make your company more or less money, that is a given. But what other data points are you looking for that might also change along the way.
So that comes into the planning stage, before you run the test, what extra information could I have to fuel something for the future. The other part happens post-test, when you’re working through your analysis. I want to draw a pretty clear distinction between reporting and analysis at this point. Reporting is telling someone the numbers from a test, that is not analysis that is reading numbers off a screen, and anyone can do that.
Where it becomes analysis, is where you follow a flow to an end outcome, so for us we refer to this as DIR, Data Insight Recommendation.
Data = What happened (your numbers)
Insight = Why do you think that happened
Recommendation = What do we do with that insight that we have?
If you can’t take a piece of data and follow that flow down, there is no value in that data. The data is about what you can do with it in the future.
Every time we conclude a test for a client, we will follow that DIR flow and there will always be at least 3 recommendations on what we should do next based on data captured directly from that test. What that means is from every test that you conclude you have got 3 tests you could run, you are then going to go through a prioritisation model that Johann covered earlier on in order to decide on what you are going to do next, but the purpose of that analysis, there is a small backward looking element which says this is the results of the test, chances are that most of the people that read that analysis they already know that the test won or didn’t win or whatever that might be.
The real value in that analysis is looking forward, you know it is should always be looking forward and that creates a natural reciprocal environment where, they call it what they might perceive it as the end of that flow being the write up of results is actually the first step of restarting that flow which is right now is the ideas that we want to take forward next.
Often how we see that padding out is the first test, often the results of that are powering the third test in a row then you then run a second test, the results of that are then powering test number 4 and then you end up with those two concurrent tracks where you have time to plan, build, run and analyse one test whilst building another, so you have 1,3 5 and 2,4, 6 etc. You can multiply that out for as much traffic and as many tests as you can run.
MS: and that is an environment that doesn’t exist.
JVT: Before I answer that question, there isn’t one answer to that question, at a high level you want to measure more than one thing. You want to measure a key metric that you are trying to influence, that you are trying to make a change to your website, your mobile app, whatever it is because you are expecting a change in performance relating to some metrics, so that is something you definitely want to measure.
But you don’t want to rely on just one metric, you want to have secondary metrics. Secondary metrics are metrics that won’t tell you if the test has won or not but it helps you to understand why you observed certain changes. Let’s say you ran a test and revenue per visitor is your key metric, so measuring to what extent the change that you have made, is impacting the amount of money people are spending on the site.
But you have made the change and maybe you have changed some of the configuration of the page, you have moved some of the things sitting at the bottom to some of the things sitting at the top now and that change that you have made, in addition to measuring the RPV, the final matric you also want to measure to what extent the changes you made influenced behaviour, so it is usage metric.
Then you are going to measure how many people used the add to basket button, and how that compares, how many clicked through to the basket and so on. It helps you to construct a story.
But then there is a third class of metrics, and that is the guardrail metric and that is one that you don’t want to see a negative performance on. Whilst you’re aiming for the uplift in your key metris, in your primary metrics, you have an eye on this guardrail metric that is one you don’t want to see go down and if it does go down then you are probably going to have a debate as to whether you are going to make this change live or not and most likely it won’t.
To give you a quick example here something that jumps to mind, we worked for a big London shoe retailer, we tested free delivery and this was a couple of years ago when it was actually still debated if it was something to do or not. The hypothesis was, as we increase free delivery, then returns would also increase, people would buy more pairs of shoes and then they would send back more shoes to the shop and there is a cost associated with that. We kept a very close eye on the returns and the cost associated with that so that the way we evaluated the results of that test was additional revenue from this free delivery test against the cost of increased returns. That is the way to do it.
MS: Yes sure, let’s say that the biggest mistake in this area is the belief that the more functions or more bells and whistles the tool has the better it is, which is untrue in my experience. It is about what you are actually going to do with it. We started work with a pretty massive multi-national a couple of years ago and they didn’t have a platform in place at the time and they asked for our advice at the time and we gave it, which was you need something that can cost effectively deliver an AB test for you.
They were looking at something that was 20x the price because it had CRM imports and loads of cool stuff that they could do and my question was, do you actually have the data that you can pipe into that and the answer was no.
So if you are choosing a tool, it is about choosing the right tool for the job. If you are not at a point where you need those bells and whistles, don’t pay for them because you are effectively paying for something that you are going to leave in the cupboard.
JVT: Exactly and when you are talking about ROI that is the conversation you should be having. It is not about the ROI on testing. It is about the ROI on overspending on tech. There is massive over investment on tech. As a starting point, something like Google Optimize, which is free, allows you limited bandwidth so you can run 5 tests at a time but it allows you to get out of the box and it doesn’t cost you a penny.
MS: The easiest tool to get signed off by a stakeholder is one that doesn’t cost anything and one of the advantages of the CRO world is that you can very effectively build a business case for those things, depending on the site you have got – Google Optimise could be a perfect option for you.
But just to build on one of the points raised, a zero budget has to cover two things, it has to cover tech and it has to cover people. Whether that is in house people or external people. It doesn’t matter.
The tool itself doesn’t matter, the tool itself is only as good as the people using it, the ideas that are put into it so if you are spending 60% of your budget on the tool you use, chances are you have got that split wrong. There are big businesses that run on tools that cost 100s of pounds a month, not tens or hundreds of thousands a month and it is because they recognise that at this point in their maturity that is what is required for them to do what they want to do and further down the line, if they continue to invest in their programmes then they may need something bigger, but they have recognised that if their tool costs 5%, they have got 95% of their budget to spend on the people that actually going to do, that will actually have the ideas, the experience the knowledge to deliver that ROI from that programme.
DCJ: and that is something that Avinash Kaushik has been banging on about for quite some time, he is a google analytics evangelist who wrote Analytics Now and he says you spend 95% of your budget on people and 10% on tools roughly.
MS: yes definitely. I have worked client side of businesses where a problem was identified and the solution was ‘what tool can solve that problem’ and the answer is none of them – the people will solve the problem using the tools that you have.
MS: I guess the first thing is what do you want to know, that is where you have to start. So if you wanted to know why someone didn’t buy, for example, well that rules out anyone that has already bought on that site. You are already not going to annoy everyone as you are not going to talk to those people. Whatever end outcome you want it needs to go to the right audience, you need to know how you are going to define that audience and reach that audience. It is all well and good you saying ‘we only want to ask females between the ages of 25 and 30’ great, can you identify that on site?
The other thing with exit surveys is that you have to bear in mind the status of the person that you are asking questions too. If you are trying to ask why someone didn’t buy, most people come to a site with a goal in mind, if it is an ecommerce site it is to at least find out what something costs with a secondary option of I might go on and buy that, but if they have just gone to exit, is that because they have found what they want already, or is it because they haven’t found what they want and left?
That is where you are going to target your audience, someone that leaves on a category page, if we take ecommerce as a classic example, they may have found out what they wanted to know, such as cost, how you would phrase a survey to them would be very different to someone that bombs out on delivery or shipping, because that is probably a frustration point there.
If you to ask them, their response rate is going to be significantly lower, it is the difference between a natural end to a journey that they feel they have completed whether you agree or not. A natural end and a friction end or a rage end to that process. The answer to what is the best strategy is firstly what do you need to know, what sort of person do you want to get that information from, how are you going to identify them, where they exit is going to give you an idea of how to approach that question and don’t ask anything too long as that person is expecting to move on to another task. If you ask them 6 questions you may not get anything out of them.
JVT: If I can follow that quickly with 2 practical tips:
JVT: Experimentation is a mindset, CRO is an application of that mindset.
MS: Frankly I think they are both pretty poor names for a discipline there is a long running debate in our industry as to what the best description is. CRO, conversion rate optimisation is often your conversion rate isn’t what you want to shift and experimentation sounds a bit too much like guess work and is a little bit woolly, I agree with Johann that one is a mindset and one of them is an application but frankly they are pretty bad names. If anyone has any better suggestions please let us know because as an industry we have not come up with one yet.
JVT: There is another hour that we could talk about conversion rate, you have just opened that can of worms!
MS: Let’s just close that one…..
Sign up to our newsletter and get all of the latest news straight to you.