When you create quality content, the headline you use matters. In less than a day and with less than $30, you can reliably split test your alternative blog or article headlines with this simple Facebook Ads setup!
Successful Websites Test Their Headlines
Ringier, the largest international media company, boosted page impressions per visitor on the website of Switzerland’s largest tabloid, Blick, by 4.9 % in a month using headline A/B tests. Just imagine what such growth does to online advertising revenues!
The virality wizards at BuzzFeed or Upworthy also exploit headline A/B tests. For BuzzFeed, the testing mechanics are simple: they vary the headlines of an article served on their own site and thanks to their massive traffic find out very quickly, which alternative gets more clicks. Upworthy uses a custom click testing system they call the magic unicorn box
If you have a lot of traffic, there are several ready-made solutions out there. An Optimizely plugin offers similar functionality, well integrated into WordPress. The KingSumo Headlines App automates testing – you enter alternative titles for your WordPress post and when the post gets shared, the app rotates them, gradually converging on the one that performs best.
But what is there to do for a small website?
A smaller site such as a company blog can enjoy no such luxury. Testing two alternative article headlines would take weeks or even months to achieve meaningful sample size given usual traffic levels.
The folks at Buffer use a different protocol: they recommend using Twitter to test alternative headlines. They tweet alternative headlines one by one with a delay in between and track the number of clickthroughs each tweet generates. Can you see the statistical pitfalls? How do you account for different Twitter activity levels at different times of day? How do you account for your followers who end up seeing (and perhaps even clicking) both headlines?
A Small Hack Using Facebook Ads
You can use Facebook ads not only as a tool to promote your content but also as a testing platform! We will use targeting by age, which Google AdWords – the main competing online ads platform – does not offer at the same level of detail at the moment.
First you need to generate alternative headlines for your content. Upworthy says it creates as many as 25 different headlines for each piece of content (and you should, too!). You can then winnow down the list to a few alternatives (e.g. through an internal vote) that you will actually test.
For a start, you can to pick two (let’s call them A and B) to set head to head.
Treatment and Control
Every good experiment needs a well designed treatment and control group – two groups statistically indistinguishable from one another in any respect other than which of your two headlines they come across.
In other words, you need to be able to separately target two groups of people through Facebook Ads targeting instruments, which differ as little as possible on all other characteristics.
We then show our groups A and B the same ad (leading to our content) but with two versions of a headlines (using the same image in the ads but of course, we could use the same protocol to split-test two images rather than two alternative headlines).
We use interests for Facebook ads targeting (with an effort to approximate the natural audience for our content). We then split up Facebook Ad Sets by the age bracket we target (loosely inspired by the logic of “regression discontinuity” models) – for instance:
Group A (control)
Group B (treatment)
In this specific set up, we end up with 14 separate Facebook Ad Sets (seven each for our Treatment and Control), which we can add up results for. In subsequent tests, we can reuse these ad sets (making adjustments to Interests targeted, if needed), saving us a bit of work.
Why are these two groups similar? Because a person aged 29 and three days should not be statistically significantly different from a person aged 28 years and 362 days.
Facebook ads targeting options do not allow us to target by age with absolute precision but this is not a major problem. If there is sufficient difference in the performance of the treatment and control ads, it will show up even if the groups targeted overlap slightly (there is a lot going on here in terms of statistics and I am sure there are people out there with more statistical wherewithal who can thinks this through in greater detail).
If you are targeting a large market such as the United States, you could use geography rather than age to create your groups (coincidentally, Siteber’s Jessie Liu explores just such a set up in a recent blog post – this would even be suited for multivariate testing).
But does this actually work?
This procedure works if there is significant difference in the performance of the two alternative headlines. At Pizza SEO, we have tested this procedure repeatedly on our own content (and now also client content) with astonishing results:
As you can see, the differences in performance of the two versions are massive in some cases. In the first three experiments reported here, which reached a conclusive result in well under 24 hours. After reaching a result we continued to promote the content with the more effective headline only!
Are the differences statistically significant?
The A/B test result statistical significance calculator at http://abtestguide.com/calc helps you figure out if your results are statistically significant. Unlike some other A/B testing tools and calculators, this one avoids making the mistake of “peeking” on your test results throughout (this is a technical issue with those A/B testing tools that report on statistical significance during an ongoing test but potentially very significant). You simply enter your sample size (in our case the number of impressions across the Ad Sets for the given variation) and the number of conversions (in our case, this is the number of ad clicks). The calculator shows whether there is a statistically significant difference (you do not need to change any of the other settings).
The first time around the experiment took maybe and hour or two to set up but in subsequent experiments we needed significantly less than an hour.
This means you can test all of your new content in this way and get much more out of your content marketing investment!
What about the cost?
You will not believe how cheap it was to run each of our headline experiments! The most expensive experiment of the four we report here was 27 € (about $29) in Facebook Ad Spend. The ones where the difference in performance between the two headlines was greater took shorter and spent just a little over 10 €!!!
More than CTR
To compare the attractiveness of headline variations we simply need to add up ad performance data. With a few extra steps we can gather more useful insights.
During the A/B test we tagged the two variations with unique UTM_campaign parameters. Based on the parameter we then served the visitor the same headline on our website that they saw in the Facebook ad they clicked on – variation A or B (this required a minor manual change in our WordPress template but we plan to design a plugin to make the headline management even easier). After reaching a conclusive test result we simply changed the headline to the one that performed better.
This unique parameter allows us two compare in Google Analytics how the two groups behaved on the web – time on site, bounce rate or pageviews per session.
These data helps us make sure that ads with one of the headlines do not bring inferior quality traffic. Otherwise, you could end up with a headline that brings the clicks but sets expectations that the content fails to meet!
So how about you? Do you test your content headlines? Will you try it now?