What is A/B Testing?

Tram Ho

Sometimes making a decision can determine the success or failure of a solution and no one wants to be held accountable for a decision that has a bad outcome, made on instinct. That is of course a good thing and fortunately, there are many ways to get information without having to rely on one’s instincts. One of the most popular methods, and also commonly used to evaluate models whose effectiveness is influenced by many factors, is A/B testing or A/B Testing . In this article, we will familiarize ourselves with the concept of A/B Testing before learning about how to use the Iter8 library to deploy models to Seldon with the Progressive Rollout strategy in the following article (if written: v )

So what is A/B Testing ?

A/B or A/B Testing , at its most basic, is a way of comparing two versions of something to find out which one performs better. Although it is commonly used in the development of websites and applications, the method is almost 100 years old when in the 1920s statistician and biologist Ronald Fisher discovered the principles the most important thing behind A/B testing and randomized controlled trials in general when he asked himself, What if I put more fertilizer on this land? and test that hypothesis by designing experiments to determine the effects of various external factors on plant and crop growth.

To this day, the idea of A/B Testing has not changed and they are basically the same concepts, just now we are doing it online, in real time and in scale. different dimensions of the number of participants and the number of trials , according to Kaiser Fung in the article A Refresher on A/B Testing in the Harvard Business Review.

image.png

The illustration is taken from the article A/B Testing Guide

How does A/B Testing work?

Imagine that your website has a big button in the middle that is blue and the whole team is about to punch each other because half want to change it to blue gray and half want to keep it. To be honest, before I was scolded by my best friend, I still couldn’t distinguish these two colors :v ). This debate is still not over after a billion efforts to unify and then we will need to come up with a method to quantify which will give better results.

However, quantifying the effectiveness of a change will often require a number that can be added, subtracted, multiplied and divided, and in this case a useful metric to quantify would be the number of users pressing the button . And to do the testing, we’ll split the user group into two smaller groups and show them each version of the holy button and determine the more efficient version that is here. Which color will make more users click on it.

image.png

Image from the article A/B testing

However, life is not so simple when we have many factors that affect whether someone clicks or not, for example, maybe the color display on some screens is wrong, causing the color to show up. not exactly the color we want or there will simply be more color blind people like me =))). This is where randomization can be useful and becomes very important when, by randomly choosing which users belong to which group, we minimize the possibility that other factors will affect the outcome of the verification process. check. And as with all Randomized Controlled Experiments , we will need to estimate the sample size to achieve statistical significance , which will help us ensure that the results we are seeing are not just due to background noise .

How should the results of A/B Testing be read?

Usually, after performing A/B Testing , we will usually get results that include some of the values ​​of the metric measured with Control (the term refers to the old version – here is keeping the old color). for button) and and Variation (the concept to refer to the new version – here is changing the color of the button to another color) such as Control: 15% (+/- 2.1%) Variation 18% (+/- 2.3%). This means that 18% of users have clicked on the Variation version of the button with a margin of error of 2.3% while the number with Control is 18% with a margin of 2.1% now we will have the concept of conversion rate . conversion rate) gets a value between 15,7% and 20,3% , but it’s not entirely reliable given the fact that if you run A/B tests multiple times, 95% of the price range The value will capture the real conversion rate while in 5% of cases the conversion rate results we get will be false and only when aggregated from many implementations will the statistical significance of the test be established.

Well, after having the results that specifically according to the above example is a 3% increase from 15% (ie 120% efficiency), it is clear that we can have more reliable arguments to confirm that the change change will yield better results. Of course, it has to be said again and again because a 3% increase in click-through rate is sometimes not commensurate with the effort and risks of problems that may arise from the change process, but having more Some of the information in the form of data also reduces the subjectivity when making any change decisions.

When using A/B Testing , what should be avoided?

With a billion blog posts to be found on the Internet, we can summarize their content into a few things to avoid as follows:

  • Because many services provide functions with names like real-time optimization , sometimes users will decide too quickly and inaccurate conversion rate values ​​will lead to incorrect decisions.
  • Another problem is that when using too few or too many metric to evaluate, there will be some corresponding bad cases. While using too few metric causes the perspective provided by the results to be one-sided, using too many leads to what are known as Spurious Correlations , and they are both misleading. bad results again

Besides, there are quite a few other mistakes that I have never met or read, so then searching Google with the keyword A/B testing mistakes will give you more complete content of this article of mine =))

summary

It is clear that A/B Testing is not a solution that works for all cases as there are more complex types of tests that are more efficient and will provide us with more reliable data. Still, it’s a great way to get a quick understanding of the question and what we have. This article introduces A/B Testing and in the next article we will learn how to use Iter8 library to deploy models to Seldon with Progressive Rollout strategy. This is the end of the article, thank you for taking the time to read.

References

Share the news now

Source : Viblo