Interview with Head of Experimentation: Sarah Buetof at FREE NOW
We’re joined by Sarah Buetof, Data Product Manager for Experimentation at FREE NOW.
Watch the video to view the full 42-minute interview, or read the transcript below.
Sarah Buetof bio:
Sarah is Data Product Manager for Experimentation at FREE NOW and as such responsible for establishing a company-wide experimentation infrastructure, methodology and mindset. Being a psychologist with a strong focus on research and statistics, Sarah has worked around 10 years in academic science, mainly neuroscience, where she has studied pathological alterations in human brain connectivity with the means of 4D neuroimaging data. After concluding her research career with her doctoral thesis, she decided to look out for new challenges. In a first smaller move she became a full-time lecturer for experimentation and research methodology, before making the great leap into the private industry sector. Sarah joined a large e-commerce company as an Experiment Analyst and was instantly amazed by the application of experimentation methods in the world of agile product development. Inspired by what she had learned in an already well-established experimentation and data-driven environment, she was eager to go the next step and take responsibility for creating a framework on her own and bringing experimentation to life in another company.
Dan Croxen-John (CEO of AWA digital): Welcome everybody to another one of our head of experimentation Interviews. Today I’m talking to the Sarah Buetof. Sarah is the Data Product Manager for Experimentation at FREE NOW, the multi-mobility platform.
Just for those people who might not understand what FREE NOW does could you give us a bit of an introduction?
So Free Now and, you said correctly, is a multi-mobility platform so you can book a taxi – ride hailing. But you can also book car sharing or E-Bikes – different types of mobilities on our platform. We have user-side, passenger-side, and we have a driver-side interface.
Currently you are the product manager for experimentation for FREE NOW. Tell us a bit about how you came to be doing the role you are currently doing.
So originally I am a psychologist. Not the clinical type, but with having research focus and overall I spent about 10 years in academic science most of the time at the University Medical Centre in Hamburg, Germany, where I did research on connectivity. So network analysis in the human brain and there you can see a very academic, very methodical background. After I finished my doctoral thesis about this topic I felt that it was time to switch fields.
First, I joined a University to become a lecturer and research methodology and statistics, but there was more to take some time off, because in the beginning, what was really difficult for me was to find my ways and the industry of commercial job market, there were so many different names of positions, and I didn’t really know what to do with my skill set. So in the end, it was more of a coincidence that I ended up with the position as a test or experiment analyst for a large E-Commerce Company in Germany.
I started there, and the transition was relatively easy. Surprisingly, a lot of things are like that in academia, but much, much faster. And obviously I had to learn a lot about the different terminology. I didn’t know what an AB test was, for example, I only knew the experiment. I didn’t know about Agile product development or User XP, and so on. But I was part of a dedicated experimentation team that was really that easy to learn the ways, and after some time I decided to take the next step and become a Data Product Manager for experimentation at FREE NOW and so here I am!
In your experience, what makes a good product manager in experimentation. What is some of the qualities or skills or attributes do you think are important?
For myself I see four aspects or pillars of my role and I try to fulfil them in my everyday work. The first aspect is the methodology. I think that if you are a product manager for experimentation, you should have profound knowledge of the methodology behind this. You should also have a good understanding of the data and research quality.
The second pillar for me is technology. I don’t have a technological background, but I think at least you should have an overview of the available options. You should know the advantages and disadvantages.
The third pillar certainly mindset. We should have the passion for the topic because you will be the ambassador for experimentation, probably in your company and the last aspect is processes, because the quality and the culture really arises from processes. Now you have to be able to deliver the infrastructure and the artefacts that are needed for those processes.
How do you explain your job to someone who’s perhaps unfamiliar with the term experimentation. If you were to go to a bar for dinner and you meet new people what would you say about the role?
That’s a really good question. Honestly, I don’t know a good answer. I haven’t figured that out yet. I feel that it depends on the audience and I tend to focus a bit on the pillar aspect that I just used. Just two days ago, my 11 year old niece asked me the question – So what do you do for work? I really struggled to give her an answer.
In the end, I told her something about yeah, I’m working for an App and you can put it on your mobile phone and I’m helping to figure out what customers like, and what’s difficult for them and how to solve such problems. Not sure if that was a good explanation for an 11 year old, but I think she was okay with it.
We’ll have to ask her if she remembers. What is it about experimentation that draws you and also the organisation, because the organisation clearly is committed to this way of making decisions?
So I joined FREE NOW at the end of 2019, so I have been there almost a year. But the whole journey started already before that. I was told that there was already a start of a transformation at FREE NOW to more data literacy. But also I would add data maturity and as part of this transformation as more resources were granted to data and also tracking analytics reporting and so on, just became more relevant and we hired more colleagues with experience in experimentation.
So it was just a matter of time until the necessity of more advanced validation was raised. But also yeah, the importance of efficiency would say these two factors, validation and efficiency were decisive for the initiation of the programme itself for the experimentation programme. And so for me also personally, I think that validation and product development is really the best way of moving forward fast, to really achieve user centric innovation. Because obviously you, instead of listening to your gut feelings, you start to listen to the customer. You look at how customers react. I provide training here now and sometimes what I say to the participants is that what we are doing in our everyday life is much more difficult than many people might think, because it’s pretty difficult to figure out what people want.
It is pretty difficult to build the right product, like the way I want it, and we, as human beings also bring something to the process. It’s pretty difficult to become aware of this. So in the end being wrong it’s just part of the game. And experimentation for me is really about learning and identifying when we are wrong and correcting the path so that I think it’s what makes me passionate about experimentation in a business context.
Sarah, did FREE NOW always operate in this way, or have they come to this realisation over time?
Yeah, it was more over time, I would say. I just tried to explain this but the transformation really started at some point. Obviously, we have also always done tracking, for example, but not in such a sophisticated way as we’re doing it now.
We have lots of conversations with companies that haven’t started an experimentation program like yours. What would you say to somebody thinking about experimentation but haven’t yet started?
From my very personal perspective, I think that experimentation is just a super exciting method and I really like running experiments simply because I’m curious. But I also understand that not everyone shares a passion for this. But on the other hand, experimentation for me is really a business strategy and I’m convinced that it can help to save resources, time, effort, reduce risk and also for me, it helps to create this holistic picture of what the customers want.
Really piece by piece or experiment by experiment, obviously with other teams such as user research and market research together. So I would say that it really increases efficiency and it is sustainable and I can’t find any reason when you wouldn’t want that?
You put it very well. Can you think of an example where you’ve been surprised by a win, by a positive outcome or maybe even a negative outcome? How did you feel about the experiment after the results had come in?
So there are several examples when I was really surprised about the outcome. One time, was not at FREE NOW, but at my previous company, where we removed something, it was for a more technical reason, but we wanted to see if there’s any difficulty, for example, relating to that, any downside. And we removed that for a certain time and we realised without this feature we were better off because we had more orders. So that was really a huge surprise for everyone.
But just recently what we did at FREE NOW, was that we tested the removal of the skip button that we have on a specific screen, but at the same time the introduction of a shortcut on the same screen and we had two by two designs. So we had, like, basically two experiments running at the same time with each two groups. And thanks to this design, we were able to get a much better idea of how those two features were interacting with one another because they were conceptualised completely separated from one another.
Although it was the same screen and it might not sound very spectacular as an experiment, but the learning experience, that was really great! Because the more thoughtful experiment designed you really revealed the power of the experimentation, I would say. So I really liked it.
What do you see as the mistakes that people or companies make in experimentation?
Something that I see quite often, unfortunately, is that people don’t see the larger picture. So people, I think they might have heard of experimentation, probably more, AB testing. They know it’s somehow some kind of method that you can use for validation, that you could use in product development. But they don’t see that it’s more strategy and mindset, a way of working, so they want to apply the method, but they don’t really want to use it. I think that is something that I see quite often.
Another mistake is that some people, even people with more experience with data, underestimate the relevance of methodical or the analytical part of experimentation. So I feel that sometimes people seem to believe that, for example, error probability, sample size estimation, that these are, I would say, scientific gimmicks. You can use them, but you don’t really have to, and they don’t understand the impact that these steps can have on your business decision. For me, it’s always about business decisions because we’re in a business context.
Maybe the last thing, but that’s more on the management level, I see a tendency that people, at management level, seem to believe that there are no special skills needed for experimentation. So everyone can do experimentation, right? That maybe they’re reading about companies such as Booking.com how they democratise the experimentation framework. But what they don’t see, of course, is the effort and the time that those companies put into their systems into their frameworks and order to enable everyone and give them the tools everyone needs and so it comes across as if it’s not really necessary to have any special skills.
And what does your team look like? In terms of size, structure and skills, how is it organised?
At the moment we do not have a dedicated experimentation team in a classical sense. We have a project team, so since I have joined we have some kind of project team, and with the team I’m working on enabling the company, so we’re providing infrastructure for guidance. Recently, we have built our own internal steps engine. Of course, we support the product managers and the experiments.
Sometimes we really take them by the hand and walk them through the process, step by step to another change on these things. But a large part of the roles or the tasks are actually covered by our product business analyst. So we have a kind of hybrid set up, I would say, something between the centralised and decentralised approach.
The project team, in the beginning, was me plus two data scientists but meanwhile, we also have two developers, and they are supporting us and providing advice but also solving technological challenges if they occur. There are different reasons why we went this time with this kind of set up. One thing that we’re trying to achieve with that is, I would say, broad but also quick distribution of the knowledge and the mindset throughout the entire company because of course I alone or even with two people, wouldn’t work. So the analysts are really important to us as they are ambassadors for us.
In the end, I think that it’s really important that the entire company kind of gets the idea quickly. For example, with the tech department was that set against experimentation? I assume that it’s pretty impossible to establish such a programme, but thankfully at FREE NOW everyone is really, really eager to learn, so that is good for us. But right now we’re also discussing different scenarios of how we can transform this project team into some kind of experimentation enabling team. But that is really working for us.
Is it correct that you’re working as a centre of excellence that you’re helping the team to adapt to this new approach?
And you talked about the visibility of the programme in your business. First of all, how many employees do you think are aware of your experimentation programmes?
I would say by now a lot of the employees. But at least in the product organisation, I would say almost daily I’m in contact with most of the product managers. So I would say the product organisation data, I belong to the data department and data is part of product and all of us, we are aware of this, but also marketing, certainly aware of this, and management are also aware of this.
When you think about experiments, just to give some context to our audience, how many tests would you be running in a typical month?
In a month? That’s very difficult to say. So when I joined experimentation well as a programme did not really exist. We had experiments running, but there was no documentation on that. So it’s really, really difficult to say in 2019, how many were started and how many were concluded, and when I joined, I introduced a very simple Board and now every experiment has to be on the Board, so I know it’s not the most elegant way, but it was easy, it was quick and it was feasible, and based on a month, Yeah, really difficult to say.
But what I can say is that since the beginning of 2020 we had approximately 20 experiments around, and currently we have another 25 in progress. Not all of them will be run within this year but this divided by 12 should give you an indication.
Something like 2 to 3 tests per month?
Yes, around this.
In terms of technology, some of which you’re putting in place yourself and you mentioned that you’re building your own stats engine. Some companies will make use of third-party tools and third party platforms. What’s the reason to look at your own stats engine and even construct your own testing platforms?
Sorry, just to clarify we’re not constructing our own platform, what we have built is really the stats engine to cover the whole analytical part. And we have a third-party solution in place. Although it’s not really a unified approach, we have different solutions in place and a large part of the last year I spent on evaluating the needs of the organisation and the limitations of the current infrastructure that we had.
We’re currently working on a better solution, but the stats engine again, which we only use the analytical part of this, because we just saw that with this kind of decentralised hybrid approach that a lot of the analysts were not familiar with experimentation and they needed more tools in order to undertake the analytical task of experimentation. So we decided that providing code would be a good way to make sure that everyone is doing the right thing and the same thing in the end.
Can you think of an example where having your own stats engine helped you to do something?
On one hand, I think that, only when it comes to statistics, not the technological part, only the statistics, that a lot of the available options do not offer so many different, for example, KPI types. This is something that we face when we looked into different options. So, for example, we have a lot of superficial metrics, for example, how many offers you accept. We believe that it requires a specific statistical approach which you can also see in the literature. And I think this is something that I didn’t find so far in any available online tool for example.
In addition as a ride hailing platform, we face certain challenges when it comes to network interfaces because obviously, our drivers they are talking to each other, but also, if we do something on the passenger side, it has some impact on the driver’s side and the other way around, and to develop and apply certain corrections for that, nothing that I’ve ever seen in any tool were or any statistical tool. When you conduct a sample size estimation and you have a rollout phase because you have launched, this is not like when you do server side testing on a website but you have to wait until you have scaled up if you are already collecting data whilst it is getting up.
This is something that you have to take into consideration when you do sample size estimation, and these are just a few of the many tiny things that we wanted to customise just to make it a bit easier.
In terms of the launching of tests do you use server-side or full stack? What is the technology that you make use of for the launch?
The current solution is client side.
Do you think that you’ll see a movement towards server-side at some point?
What’s driving that move to server-side for you?
There are different aspects and one aspect doesn’t have anything to do with client or server side, it’s more that we would like to have a more unified approach that everyone in the company is using the same tool. So it was clear that the current solutions then we have might not cover all the use cases that we have all in one.
When you look at one tool for all the different use cases we were thinking about, OK, why shouldn’t we just stick to client side, but you could also look into server side solutions. Different reasons and as I said, we’re currently working on a solution or we’re currently looking into different options.
In terms of the way in which you judge or evaluate the success of your experimentation programme. Is there a number that you look at to say we’re doing well, not so well that we’re delivering to the business? What’s your KPIs?
I have to admit that there are no KPIs, we don’t measure the success of the initiative at all unfortunately, I would really love to introduce these. The number of experiments undertaken, for me is not that meaningful. I understand that a lot of people think that it’s nice to run a lot of experiments, but for me, the quality is much more important. I would really love to introduce quality criteria to see how many of the experiments fulfil them, because I think that is where you create the most business value if you have high quality, really reliable experiments.
As I said, I have been with FREE NOW a year and the first year was really about starting the engine, starting the processes and enabling the analyst and so there was simply no time for a more sophisticated solution, but we would really love to have one.
It sounds like at the moment this is an act of faith that the experimentation is helping the business. You haven’t got a KPI and you’re avoiding using the number of experiments as the sole measure. What in your mind, separates a poor quality experiment from a high quality one?
I think this is where my academic background comes in for me. It’s really about the classical quality criteria for experimentation or for research, validity, reliability, objectivity. I think these are things that are important to me that no matter what the result. I don’t want to say I don’t care about the result, but it’s not as important to me. But I want the result to be reliable.
So this is quality for me that you have a good theory behind your hypothesis, for example, that you have a good methodological approach, that the statistics is correct, that you think about how large the sample size has to be, that you think about the business case for example, once you expect what uplift is necessary to be shown, that in the end are the things that that I think are important.
The example that I told you about earlier that you also have some idea of more complex experimental designs to really get the information that you need. I think you can see sometimes in conferences, you see the people present experiments that follow a very strict framework and this can be fine, but sometimes I feel that they’re missing the true power of experimentation. I would love to get to the point where we can use the variety of experimentation and then have really reliable valuable outcomes.
It seems to me that you’re saying you would prefer to run fewer, but perhaps bolder, more complex experiments, then lots of simple tests. Is that fair?
I think that’s a really tricky question because first of all, I have seen really small copy changes have a huge impact. Sometimes you don’t know the extent of an experiment until you run the experiment, that’s kind of the nature of the experimentation right? Sometimes it’s not possible to make small alterations, but that’s also something to take into consideration here. You have to make a big step.
It’s not really about whether you wondered or not it’s quality criteria and then I’m not sure if you mean when you run a few experiments does it mean that you launched your features or does it mean that you launched as many features as before, but just from less experiments on them, So they’re different dimensions to that. But I think quality always beats quantity. If you are asking me, do I prefer lots of small experiments with low quality or a few experiments with high quality, I will always go with the few with high quality outcomes.
To you, high quality is about the validity of the hypothesis, the reliability of the data in the way in which the statistical methods used to evaluate that?
Yes, Absolutely. I want the business manager, when they get the results, I want them to trust the data. I want them to be sure that we did the best to give them the most trustful data.
If you were looking at the business from the outside, as if you were not working for this business, what indicators would you have if they had a culture of experimentation?
Good question. I think from the outside it’s not always easy to see, especially when you’re talking about culture. Culture is something that is very difficult to spot from the outside. But I’m thinking, obviously, when you ask me this question that companies like Booking.com etc, and when I ask myself, why do I think that they have a good culture? I think it’s because I see people at conferences. I see people giving talks, I see blog articles, I see papers published so that makes me think that they have some kind of experimentation culture.
If you’re looking for a new job, and you find a lot of positions in a certain company I think this is also an indication that they at least have an experimentation mindset also, maybe how their positions are called gives a good indication.
Who do you admire in the world of experimentation? And for what reason?
I have read Stefan Thomke’s book that was published. I experienced a lot, or a fear that people sometimes don’t trust me when I tell them that experimentation is the effective business strategy but because I don’t have a business background, they say ‘you come from academia, no wonder you like this’ and for me, Stefan Thomke, from all the people that I’ve seen, presentations, talks and so on and from all that I have read, he really represents the business perspective of experimentation.
I recommend his book to everyone, especially management. So yeah, he is really someone that I look up to. Someone else is Ronny Kohavi simply because he has published so much. Also so much on the methodology, again coming from academia, sharing knowledge is something that is in my DNA and I really admire how he shared all of that and really helped so many other people who tried to build something on their own or are trying to understand the complexity of experimentation and AB testing.
I think also in that book recently published he explained how easy it is to get it wrong.
Yes absolutely. That’s also what I was trying to say before, that sometimes people who are not so deep into the topic, they think, well, it’s easy. You may hire one person, maybe that’s not even necessary. You just buy a third-party tool and that’s it. It’s really difficult to see when an experiment is off.
Thinking about your younger self, or perhaps a younger person you might meet who is interested in this area. What would you say to them? What advice would you give them if they wanted to get into the area of experimentation?
I would probably advise them to become really good at some key skills. One skill that is really important for me is data handling and data analysis. With that, I mean really manual data analysis. I would advise them to learn how to play with data, applied transformations, drawing data from different data sources. Learn how and also when sanity checks are necessary or useful. Never rely on reporting tools, but rather learn how to do it yourself. Once you know how to do it yourself, then you can also use reporting tools.
Obviously, statistics are also good, but if your input data is flawed, your results will be flawed as well, so understanding data, I think is key to me. And obviously when I was working in academia, you have a lot of time with this, and I was probably making mistakes every single day that you can do in data analysis, and I think that helped tremendously in my job.
Besides that, I think the other obvious things to do would be to run experiments, science experiments but also read papers, read research reports and try to be very critical, asking is it valid and is it reliable because later on in your job you have to be able to see, for example, confounding the variable.
You have to support others in understanding the result and limitations of their choices. If they still want to go with a limitation, fine, but you are the one who will point it out. It’s not easy, and because it’s not easy it is very difficult to see that an experiment is flawed or the data’s untrustworthy. You cannot know until you know, because you know!
There is red sign around this, so it’s really a skill that you learn with hard work, and that would be my last piece of advice to the person, be prepared for situations where you have to stand your ground where people doubt you, because they will say ‘but the result looks fine to me’, but you have to say that the result is not trustworthy.
How would you explain the difference between the business that has a culture of experimentation and one that’s simply running lots of tests?
For me, the difference is definitely about whether you run experiments to learn and tell yourself whether you’re open minded for all possible outcomes, or whether you test to prove that you were right. But whether you do it this way or that way it really is a reflection of the entire company.
Because honestly, as a product manager, for example, I would say it’s impossible to stay open minded for different outcomes if your supervisor or the product development manager says whatever you’re using that’s not allowed, for example, for alterations or failed ideas and stuff like this. So I think that is something that can easily be seen actually.
And lastly, what development do you think we might see in the field of experimentation in the next 2 to 3 years?
Difficult question! In the last couple of years we have seen the strength from CRO to experimentation, but also AB Testing to experimentation and I really hope that we will resume this transformation and this trend and that more companies stop seeing experimentation as something on top that you can add later but start conceptualizing it as something that is really in the core of every business model.
The problem that I had when I was starting to look for a job in the industry was that there were so many different names out there and in the last years we see so many buzz words popping up all over right, and I believe that it would be helpful for the field experimentation to have some consolidation around this different terminology or terms and get rid of some of the very misleading buzzwords that we hear here and there, because often people don’t really understand them.
What would be an example of a misleading buzz word for you?
Well what I hear a lot is growth and some people just don’t understand what growth really can be or the potential of growth. It’s just a word that they hear and now, suddenly everything is growth and then other companies suddenly it’s a different name. It’s just really more or less randomly how people use it. Maybe not always fully embrace the concept behind this.
Lastly, with respect to methodology, I meant what I said earlier about more complex designs. I really hope that we are moving from this A versus B testing and learn how to use the full potential of the existing research methodology because there’s so much more that we could use in our context and really experimentation for me is so much more than A versus B.
Do you mean it should be A B C D. Or do you mean that there should be lots of data research sources to form a hypothesis?
Both actually, I’m a huge fan of more multi-modal product research. That was actually one of the things that brought me to you, when I joined the industry. Because when you come from academia, it’s a natural thing to have different research methods and that you integrate them to get the full picture of whatever you’re trying to find out, it’s just normal.
And when I joined the industry, it was more like Okay, I am a user researcher and I am the experiment analyst and maybe we talk sometimes but really integrating the results is something very advanced. This was a surprise to me and I honestly don’t understand it. And I’m always trying to get everyone together.
Also, here at FREE NOW we have a very strong connection with user research, market research and so on, but also what you said about A B C D, yeah, if you take a look I have a book here that is very large, all about experimental and quasi experimental designs. There’s a whole world of different designs that you could use with advantages and disadvantages, of course, but there’s really so much more than A versus B and it sometimes hurts me when I see that people stick so strictly to A versus B instead of having a bit more confidence and doing more complex stuff.
I think it’s very clear from what you said, and you’d be very generous with time, is how, coming from a research background has really helped you understand how experimentation could be improved within a commercial setting, because I think much of what we are doing is helping businesses make better decisions, but if the data on which those recommendations is based is somewhat flawed, not any particular reason or deliberate reason but poorly captured or poorly analysed, poorly presented, then that doesn’t help the field of experimentation.
Thank you very much for your time, you have been very generous. And let’s catch up soon. But thank you and see you soon.
If you need help with your CRO or Experimentation programme, get in touch with us today.
People from Facebook, FarFetch and RS Components receive our newsletter. You can too. Subscribe now.
If your CRO programme is not delivering the highest ROI of all of your marketing spend, then we should talk.