- Numbers – If you’re working on a well-trafficked site and on a well-trafficked part of that site, you will probably be getting results in the 1000s. Now, compare that to the 7 to 12 you might get on a usability test.
- Genuineness – A typical beef with usability testing is that it’s artificial – the users are in a lab, being observed, and typically doing some role playing. In A/B testing, though, they are actually using your site, to do things they want to do, and without ever knowing they are being “tested.”
- Relevance – A/B results are typically tied to specific results. These, in turn, are tied to specific business goals – account openings, purchases, downloads. In other words, this stuff can have a genuine effect on the bottom line.
- Concreteness – There’s a joke among usability engineers that the profession’s motto should be, “It depends.” We sometimes don’t realize how frustrating this can actually be, though, for our clients who are more business- or numbers-oriented. A/B testing, on the other hand, is definitely something that speaks their language.
At the same time, however, A/B testing also has plenty of possible issues:
- Comparative only – A/B testing is a comparative method only. In other words, you need 2 things. But what if you only have 1? Does that mean you’ll have to come up with a straw man in that situation? (Usability testing doesn’t have that liability, and can also be used for comparative testing as well.)
- Finished product – Though A/B testing is great for getting feedback in real situations, that also means you can’t try something out before it gets to a polished, ready-for-prime-time state. (In usability testing, you can get feedback simply using pieces of paper.)
- Emphasis on details – A/B testing tends to focus on smaller issues, like button size or color or location. Great stuff to know, but a pretty incomplete picture. Who knows, maybe there were things that weren’t even considered that could bump up conversion even more. How would you ever know? (Usability testing looks at whatever causes users problems, whatever that might happen to be.)
- Cumulative effect – Because A/B testing often means tweak this here, tweak that there, attention isn’t always focused on the overall or cumulative effect. Yes, that marketing tile was effective on the home page in week 1. And, yes, that call-to-action that was added in week 6 worked well too. Does that mean, though, that we can keep adding whatever we want as the weeks go on? I am actually familiar with a site that did just that. And, as is right now, it’s also ready to pretty much sink under its own weight.
- Short-term focus – As illustrated above, A/B testing is often very much about short-term effects. Now, that fits in very well with an Agile methodology (which often relies on A/B), but that approach might also backfire in the long run. How, for example, will that cluttered homepage impact conversion down the road, or overall?
- Scalability – Along the same lines … So, increasing the size of a button bumped conversion up by 2%? That’s great! So, why not just bump it up again? In other words, how can we tell when we’ve passed the line of too much of a good thing? Heck, why should we even really care?
- Neutral results – A lot of the press around A/B testing tends to focus around dramatic results. From what I understand from people who are really doing it, though, typical results tend to be more along the lines of an “eh” and a shrug of the shoulders. Now, was all that effort and expense really worth it for that 0.01% move on the needle? Even worse, what if both designs tested equally badly? What other data would do you have in that situation to come up with something new and different?
- Effect on standards – One particular kind of dramatic result seems to be when the findings break conventional thinking, or go against established standards. Now, that’s pretty fascinating, but what exactly do you with it? Does that invalidate the standard? Is there something about this particular instance that would be a good example of “it depends”? Do we need to test every instance of this particular design (which is what I’m thinking the vendor might suggest)?
- What happens next – A/B testing focuses solely on conversion. As I mentioned above, that can be a good thing. At the same time, though, conversion doesn’t come close to describing the totality of the user’s experience. What if the user successfully signed up for … the wrong account? What if the user successfully purchased the product … but did so offline? What if they downloaded that newsletter … but never read it? What if they signed up for your service ... but it was such a painful process that they really don't want to come back? Unfortunately, siloed marketeers often don’t care. Just getting the user “through the funnel” is typically their only concern. How the whole experience might affect, say, brand is something else entirely. To quote Jared Spool, “conversion ≠ delight.”
- Unmeasurables – Those brand concerns above hint at this one. Note, though, that unmeasurables can be a lot less lofty as well. I, for example, do a lot of work with online banking systems. Now, these people are already customers. What key business metrics might they align with? Actually, they typically have their own, very diverse goals. They might be all about checking their balance, or looking for fraudulent transactions, or paying their bills on time. All we really can do is support them. Indeed, there are tons of sites out there that are more informational or that involve user goals that are vaguer or might not directly align with business KPIs.
- Why – Perhaps the most important drawback with A/B testing, though, is a lack of understanding why users preferred A over B. I mean, wouldn’t it be nice to learn something from this exercise so that you can apply it, say, in other situations? Honestly, isn’t understanding why something happened a necessary precursor to understanding how to improve it? Unfortunately, A/B testing is all a bit of a black box.
A/B testing is basically only binary feedback. You essentially get a thumbs-up or thumbs-down. But perhaps there’s more to it than that. Perhaps it does after all depend …
H.L. Mencken working on some early man-machine issues