Becoming Better SEO Scientists – Whiteboard Friday

Posted by MarkTraphagen

Editor’s note: Today we’re featuring back-to-back episodes of Whiteboard Friday from our friends at Stone Temple Consulting. Make sure to also check out the second episode, “UX, Content Quality, and SEO” from Eric Enge.

Like many other areas of marketing, SEO incorporates elements of science. It becomes problematic for everyone, though, when theories that haven’t been the subject of real scientific rigor are passed off as proven facts. In today’s Whiteboard Friday, Stone Temple Consulting’s Mark Traphagen is here to teach us a thing or two about the scientific method and how it can be applied to our day-to-day work.

For reference, here’s a still of this week’s whiteboard.
Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Mozzers. Mark Traphagen from Stone Temple Consulting here today to share with you how to become a better SEO scientist. We know that SEO is a science in a lot of ways, and everything I’m going to say today applies not only to SEO, but testing things like your AdWords, how does that work, quality scores. There’s a lot of different applications you can make in marketing, but we’ll focus on the SEO world because that’s where we do a lot of testing. What I want to talk to you about today is how that really is a science and how we need to bring better science in it to get better results.

The reason is in astrophysics, things like that we know there’s something that they’re talking about these days called dark matter, and dark matter is something that we know it’s there. It’s pretty much accepted that it’s there. We can’t see it. We can’t measure it directly. We don’t even know what it is. We can’t even imagine what it is yet, and yet we know it’s there because we see its effect on things like gravity and mass. Its effects are everywhere. And that’s a lot like search engines, isn’t it? It’s like Google or Bing. We see the effects, but we don’t see inside the machine. We don’t know exactly what’s happening in there.

An artist’s depiction of how search engines work.

So what do we do? We do experiments. We do tests to try to figure that out, to see the effects, and from the effects outside we can make better guesses about what’s going on inside and do a better job of giving those search engines what they need to connect us with our customers and prospects. That’s the goal in the end.

Now, the problem is there’s a lot of testing going on out there, a lot of experiments that maybe aren’t being run very well. They’re not being run according to scientific principles that have been proven over centuries to get the best possible results.

Basic data science in 10 steps

So today I want to give you just very quickly 10 basic things that a real scientist goes through on their way to trying to give you better data. Let’s see what we can do with those in our SEO testing in the future.

So let’s start with number one. You’ve got to start with a hypothesis. Your hypothesis is the question that you want to solve. You always start with that, a good question in mind, and it’s got to be relatively narrow. You’ve got to narrow it down to something very specific. Something like how does time on page effect rankings, that’s pretty narrow. That’s very specific. That’s a good question. Might be able to test that. But something like how do social signals effect rankings, that’s too broad. You’ve got to narrow it down. Get it down to one simple question.

Then you choose a variable that you’re going to test. Out of all the things that you could do, that you could play with or you could tweak, you should choose one thing or at least a very few things that you’re going to tweak and say, “When we tweak this, when we change this, when we do this one thing, what happens? Does it change anything out there in the world that we are looking at?” That’s the variable.

The next step is to set a sample group. Where are you going to gather the data from? Where is it going to come from? That’s the world that you’re working in here. Out of all the possible data that’s out there, where are you going to gather your data and how much? That’s the small circle within the big circle. Now even though it’s smaller, you’re probably not going to get all the data in the world. You’re not going to scrape every search ranking that’s possible or visit every URL.

You’ve got to ask yourself, “Is it large enough that we’re at least going to get some validity?” If I wanted to find out what is the typical person in Seattle and I might walk through just one part of the Moz offices here, I’d get some kind of view. But is that a typical, average person from Seattle? I’ve been around here at Moz. Probably not. But this was large enough.

Also, it should be randomized as much as possible. Again, going back to that example, if I just stayed here within the walls of Moz and do research about Mozzers, I’d learn a lot about what Mozzers do, what Mozzers think, how they behave. But that may or may not be applicable to the larger world outside, so you randomized.

We want to control. So we’ve got our sample group. If possible, it’s always good to have another sample group that you don’t do anything to. You do not manipulate the variable in that group. Now, why do you have that? You have that so that you can say, to some extent, if we saw a change when we manipulated our variable and we did not see it in the control group, the same thing didn’t happen, more likely it’s not just part of the natural things that happen in the world or in the search engine.

If possible, even better you want to make that what scientists call double blind, which means that even you the experimenter don’t know who that control group is out of all the SERPs that you’re looking at or whatever it is. As careful as you might be and honest as you might be, you can end up manipulating the results if you know who is who within the test group? It’s not going to apply to every test that we do in SEO, but a good thing to have in mind as you work on that.

Next, very quickly, duration. How long does it have to be? Is there sufficient time? If you’re just testing like if I share a URL to Google +, how quickly does it get indexed in the SERPs, you might only need a day on that because typically it takes less than a day in that case. But if you’re looking at seasonality effects, you might need to go over several years to get a good test on that.

Let’s move to the second group here. The sixth thing keep a clean lab. Now what that means is try as much as possible to keep anything that might be dirtying your results, any kind of variables creeping in that you didn’t want to have in the test. Hard to do, especially in what we’re testing, but do the best you can to keep out the dirt.

Manipulate only one variable. Out of all the things that you could tweak or change choose one thing or a very small set of things. That will give more accuracy to your test. The more variables that you change, the more other effects and inner effects that are going to happen that you may not be accounting for and are going to muddy your results.

Make sure you have statistical validity when you go to analyze those results. Now that’s beyond the scope of this little talk, but you can read up on that. Or even better, if you are able to, hire somebody or work with somebody who is a trained data scientist or has training in statistics so they can look at your evaluation and say the correlations or whatever you’re seeing, “Does it have a statistical significance?” Very important.

Transparency. As much as possible, share with the world your data set, your full results, your methodology. What did you do? How did you set up the study? That’s going to be important to our last step here, which is replication and falsification, one of the most important parts of any scientific process.

So what you want to invite is, hey we did this study. We did this test. Here’s what we found. Here’s how we did it. Here’s the data. If other people ask the same question again and run the same kind of test, do they get the same results? Somebody runs it again, do they get the same results? Even better, if you have some people out there who say, “I don’t think you’re right about that because I think you missed this, and I’m going to throw this in and see what happens,” aha they falsify. That might make you feel like you failed, but it’s success because in the end what are we after? We’re after the truth about what really works.

Think about your next test, your next experiment that you do. How can you apply these 10 principles to do better testing, get better results, and have better marketing? Thanks.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Deconstructing the App Store Rankings Formula with a Little Mad Science

Posted by AlexApptentive

After seeing Rand’s “Mad Science Experiments in SEO” presented at last year’s MozCon, I was inspired to put on the lab coat and goggles and do a few experiments of my own—not in SEO, but in SEO’s up-and-coming younger sister, ASO (app store optimization).

Working with Apptentive to guide enterprise apps and small startup apps alike to increase their discoverability in the app stores, I’ve learned a thing or two about app store optimization and what goes into an app’s ranking. It’s been my personal goal for some time now to pull back the curtains on Google and Apple. Yet, the deeper into the rabbit hole I go, the more untested assumptions I leave in my way.

Hence, I thought it was due time to put some longstanding hypotheses through the gauntlet.

As SEOs, we know how much of an impact a single ranking can mean on a SERP. One tiny rank up or down can make all the difference when it comes to your website’s traffic—and revenue.

In the world of apps, ranking is just as important when it comes to standing out in a sea of more than 1.3 million apps. Apptentive’s recent mobile consumer survey shed a little more light this claim, revealing that nearly half of all mobile app users identified browsing the app store charts and search results (the placement on either of which depends on rankings) as a preferred method for finding new apps in the app stores. Simply put, better rankings mean more downloads and easier discovery.

Like Google and Bing, the two leading app stores (the Apple App Store and Google Play) have a complex and highly guarded algorithms for determining rankings for both keyword-based app store searches and composite top charts.

Unlike SEO, however, very little research and theory has been conducted around what goes into these rankings.

Until now, that is.

Over the course of five studies analyzing various publicly available data points for a cross-section of the top 500 iOS (U.S. Apple App Store) and the top 500 Android (U.S. Google Play) apps, I’ll attempt to set the record straight with a little myth-busting around ASO. In the process, I hope to assess and quantify any perceived correlations between app store ranks, ranking volatility, and a few of the factors commonly thought of as influential to an app’s ranking.

But first, a little context

Image credit: Josh Tuininga, Apptentive

Both the Apple App Store and Google Play have roughly 1.3 million apps each, and both stores feature a similar breakdown by app category. Apps ranking in the two stores should, theoretically, be on a fairly level playing field in terms of search volume and competition.

Of these apps, nearly two-thirds have not received a single rating and 99% are considered unprofitable. These studies, therefore, single out the rare exceptions to the rule—the top 500 ranked apps in each store.

While neither Apple nor Google have revealed specifics about how they calculate search rankings, it is generally accepted that both app store algorithms factor in:

  • Average app store rating
  • Rating/review volume
  • Download and install counts
  • Uninstalls (what retention and churn look like for the app)
  • App usage statistics (how engaged an app’s users are and how frequently they launch the app)
  • Growth trends weighted toward recency (how daily download counts changed over time and how today’s ratings compare to last week’s)
  • Keyword density of the app’s landing page (Ian did a great job covering this factor in a previous Moz post)

I’ve simplified this formula to a function highlighting the four elements with sufficient data (or at least proxy data) for our analysis:

Ranking = fn(Rating, Rating Count, Installs, Trends)

Of course, right now, this generalized function doesn’t say much. Over the next five studies, however, we’ll revisit this function before ultimately attempting to compare the weights of each of these four variables on app store rankings.

(For the purpose of brevity, I’ll stop here with the assumptions, but I’ve gone into far greater depth into how I’ve reached these conclusions in a 55-page report on app store rankings.)

Now, for the Mad Science.

Study #1: App-les to app-les app store ranking volatility

The first, and most straight forward of the five studies involves tracking daily movement in app store rankings across iOS and Android versions of the same apps to determine any trends of differences between ranking volatility in the two stores.

I went with a small sample of five apps for this study, the only criteria for which were that:

  • They were all apps I actively use (a criterion for coming up with the five apps but not one that influences rank in the U.S. app stores)
  • They were ranked in the top 500 (but not the top 25, as I assumed app store rankings would be stickier at the top—an assumption I’ll test in study #2)
  • They had an almost identical version of the app in both Google Play and the App Store, meaning they should (theoretically) rank similarly
  • They covered a spectrum of app categories

The apps I ultimately chose were Lyft, Venmo, Duolingo, Chase Mobile, and LinkedIn. These five apps represent the travel, finance, education banking, and social networking categories.

Hypothesis

Going into this analysis, I predicted slightly more volatility in Apple App Store rankings, based on two statistics:

Both of these assumptions will be tested in later analysis.

Results

7-Day App Store Ranking Volatility in the App Store and Google Play

Among these five apps, Google Play rankings were, indeed, significantly less volatile than App Store rankings. Among the 35 data points recorded, rankings within Google Play moved by as much as 23 positions/ranks per day while App Store rankings moved up to 89 positions/ranks. The standard deviation of ranking volatility in the App Store was, furthermore, 4.45 times greater than that of Google Play.

Of course, the same apps varied fairly dramatically in their rankings in the two app stores, so I then standardized the ranking volatility in terms of percent change to control for the effect of numeric rank on volatility. When cast in this light, App Store rankings changed by as much as 72% within a 24-hour period while Google Play rankings changed by no more than 9%.

Also of note, daily rankings tended to move in the same direction across the two app stores approximately two-thirds of the time, suggesting that the two stores, and their customers, may have more in common than we think.

Study #2: App store ranking volatility across the top charts

Testing the assumption implicit in standardizing the data in study No. 1, this one was designed to see if app store ranking volatility is correlated with an app’s current rank. The sample for this study consisted of the top 500 ranked apps in both Google Play and the App Store, with special attention given to those on both ends of the spectrum (ranks 1–100 and 401–500).

Hypothesis

I anticipated rankings to be more volatile the higher an app is ranked—meaning an app ranked No. 450 should be able to move more ranks in any given day than an app ranked No. 50. This hypothesis is based on the assumption that higher ranked apps have more installs, active users, and ratings, and that it would take a large margin to produce a noticeable shift in any of these factors.

Results

App Store Ranking Volatility of Top 500 Apps

One look at the chart above shows that apps in both stores have increasingly more volatile rankings (based on how many ranks they moved in the last 24 hours) the lower on the list they’re ranked.

This is particularly true when comparing either end of the spectrum—with a seemingly straight volatility line among Google Play’s Top 100 apps and very few blips within the App Store’s Top 100. Compare this section to the lower end, ranks 401–)500, where both stores experience much more turbulence in their rankings. Across the gamut, I found a 24% correlation between rank and ranking volatility in the Play Store and 28% correlation in the App Store.

To put this into perspective, the average app in Google Play’s 401–)500 ranks moved 12.1 ranks in the last 24 hours while the average app in the Top 100 moved a mere 1.4 ranks. For the App Store, these numbers were 64.28 and 11.26, making slightly lower-ranked apps more than five times as volatile as the highest ranked apps. (I say slightly as these “lower-ranked” apps are still ranked higher than 99.96% of all apps.)

The relationship between rank and volatility is pretty consistent across the App Store charts, while rank has a much greater impact on volatility at the lower end of Google Play charts (ranks 1-100 have a 35% correlation) than it does at the upper end (ranks 401-500 have a 1% correlation).

Study #3: App store rankings across the stars

The next study looks at the relationship between rank and star ratings to determine any trends that set the top chart apps apart from the rest and explore any ties to app store ranking volatility.

Hypothesis

Ranking = fn(Rating, Rating Count, Installs, Trends)

As discussed in the introduction, this study relates directly to one of the factors commonly accepted as influential to app store rankings: average rating.

Getting started, I hypothesized that higher ranks generally correspond to higher ratings, cementing the role of star ratings in the ranking algorithm.

As far as volatility goes, I did not anticipate average rating to play a role in app store ranking volatility, as I saw no reason for higher rated apps to be less volatile than lower rated apps, or vice versa. Instead, I believed volatility to be tied to rating volume (as we’ll explore in our last study).

Results

Average App Store Ratings of Top Apps

The chart above plots the top 100 ranked apps in either store with their average rating (both historic and current, for App Store apps). If it looks a little chaotic, it’s just one indicator of the complexity of ranking algorithm in Google Play and the App Store.

If our hypothesis was correct, we’d see a downward trend in ratings. We’d expect to see the No. 1 ranked app with a significantly higher rating than the No. 100 ranked app. Yet, in neither store is this the case. Instead, we get a seemingly random plot with no obvious trends that jump off the chart.

A closer examination, in tandem with what we already know about the app stores, reveals two other interesting points:

  1. The average star rating of the top 100 apps is significantly higher than that of the average app. Across the top charts, the average rating of a top 100 Android app was 4.319 and the average top iOS app was 3.935. These ratings are 0.32 and 0.27 points, respectively, above the average rating of all rated apps in either store. The averages across apps in the 401–)500 ranks approximately split the difference between the ratings of the top ranked apps and the ratings of the average app.
  2. The rating distribution of top apps in Google Play was considerably more compact than the distribution of top iOS apps. The standard deviation of ratings in the Apple App Store top chart was over 2.5 times greater than that of the Google Play top chart, likely meaning that ratings are more heavily weighted in Google Play’s algorithm.

App Store Ranking Volatility and Average Rating

Looking next at the relationship between ratings and app store ranking volatility reveals a -15% correlation that is consistent across both app stores; meaning the higher an app is rated, the less its rank it likely to move in a 24-hour period. The exception to this rule is the Apple App Store’s calculation of an app’s current rating, for which I did not find a statistically significant correlation.

Study #4: App store rankings across versions

This next study looks at the relationship between the age of an app’s current version, its rank and its ranking volatility.

Hypothesis

Ranking = fn(Rating, Rating Count, Installs, Trends)

In alteration of the above function, I’m using the age of a current app’s version as a proxy (albeit not a very good one) for trends in app store ratings and app quality over time.

Making the assumptions that (a) apps that are updated more frequently are of higher quality and (b) each new update inspires a new wave of installs and ratings, I’m hypothesizing that the older the age of an app’s current version, the lower it will be ranked and the less volatile its rank will be.

Results

How update frequency correlates with app store rank

The first and possibly most important finding is that apps across the top charts in both Google Play and the App Store are updated remarkably often as compared to the average app.

At the time of conducting the study, the current version of the average iOS app on the top chart was only 28 days old; the current version of the average Android app was 38 days old.

As hypothesized, the age of the current version is negatively correlated with the app’s rank, with a 13% correlation in Google Play and a 10% correlation in the App Store.

How update frequency correlates with app store ranking volatility

The next part of the study maps the age of the current app version to its app store ranking volatility, finding that recently updated Android apps have less volatile rankings (correlation: 8.7%) while recently updated iOS apps have more volatile rankings (correlation: -3%).

Study #5: App store rankings across monthly active users

In the final study, I wanted to examine the role of an app’s popularity on its ranking. In an ideal world, popularity would be measured by an app’s monthly active users (MAUs), but since few mobile app developers have released this information, I’ve settled for two publicly available proxies: Rating Count and Installs.

Hypothesis

Ranking = fn(Rating, Rating Count, Installs, Trends)

For the same reasons indicated in the second study, I anticipated that more popular apps (e.g., apps with more ratings and more installs) would be higher ranked and less volatile in rank. This, again, takes into consideration that it takes more of a shift to produce a noticeable impact in average rating or any of the other commonly accepted influencers of an app’s ranking.

Results

Apps with more ratings and reviews typically rank higher

The first finding leaps straight off of the chart above: Android apps have been rated more times than iOS apps, 15.8x more, in fact.

The average app in Google Play’s Top 100 had a whopping 3.1 million ratings while the average app in the Apple App Store’s Top 100 had 196,000 ratings. In contrast, apps in the 401–)500 ranks (still tremendously successful apps in the 99.96 percentile of all apps) tended to have between one-tenth (Android) and one-fifth (iOS) of the ratings count as that of those apps in the top 100 ranks.

Considering that almost two-thirds of apps don’t have a single rating, reaching rating counts this high is a huge feat, and a very strong indicator of the influence of rating count in the app store ranking algorithms.

To even out the playing field a bit and help us visualize any correlation between ratings and rankings (and to give more credit to the still-staggering 196k ratings for the average top ranked iOS app), I’ve applied a logarithmic scale to the chart above:

The relationship between app store ratings and rankings in the top 100 apps

From this chart, we can see a correlation between ratings and rankings, such that apps with more ratings tend to rank higher. This equates to a 29% correlation in the App Store and a 40% correlation in Google Play.

Apps with more ratings typically experience less app store ranking volatility

Next up, I looked at how ratings count influenced app store ranking volatility, finding that apps with more ratings had less volatile rankings in the Apple App Store (correlation: 17%). No conclusive evidence was found within the Top 100 Google Play apps.

Apps with more installs and active users tend to rank higher in the app stores

And last but not least, I looked at install counts as an additional proxy for MAUs. (Sadly, this is a statistic only listed in Google Play. so any resulting conclusions are applicable only to Android apps.)

Among the top 100 Android apps, this last study found that installs were heavily correlated with ranks (correlation: -35.5%), meaning that apps with more installs are likely to rank higher in Google Play. Android apps with more installs also tended to have less volatile app store rankings, with a correlation of -16.5%.

Unfortunately, these numbers are slightly skewed as Google Play only provides install counts in broad ranges (e.g., 500k–)1M). For each app, I took the low end of the range, meaning we can likely expect the correlation to be a little stronger since the low end was further away from the midpoint for apps with more installs.

Summary

To make a long post ever so slightly shorter, here are the nuts and bolts unearthed in these five mad science studies in app store optimization:

  1. Across the top charts, Apple App Store rankings are 4.45x more volatile than those of Google Play
  2. Rankings become increasingly volatile the lower an app is ranked. This is particularly true across the Apple App Store’s top charts.
  3. In both stores, higher ranked apps tend to have an app store ratings count that far exceeds that of the average app.
  4. Ratings appear to matter more to the Google Play algorithm, especially as the Apple App Store top charts experience a much wider ratings distribution than that of Google Play’s top charts.
  5. The higher an app is rated, the less volatile its rankings are.
  6. The 100 highest ranked apps in either store are updated much more frequently than the average app, and apps with older current versions are correlated with lower ratings.
  7. An app’s update frequency is negatively correlated with Google Play’s ranking volatility but positively correlated with ranking volatility in the App Store. This likely due to how Apple weighs an app’s most recent ratings and reviews.
  8. The highest ranked Google Play apps receive, on average, 15.8x more ratings than the highest ranked App Store apps.
  9. In both stores, apps that fall under the 401–500 ranks receive, on average, 10–20% of the rating volume seen by apps in the top 100.
  10. Rating volume and, by extension, installs or MAUs, is perhaps the best indicator of ranks, with a 29–40% correlation between the two.

Revisiting our first (albeit oversimplified) guess at the app stores’ ranking algorithm gives us this loosely defined function:

Ranking = fn(Rating, Rating Count, Installs, Trends)

I’d now re-write the function into a formula by weighing each of these four factors, where a, b, c, & d are unknown multipliers, or weights:

Ranking = (Rating * a) + (Rating Count * b) + (Installs * c) + (Trends * d)

These five studies on ASO shed a little more light on these multipliers, showing Rating Count to have the strongest correlation with rank, followed closely by Installs, in either app store.

It’s with the other two factors—rating and trends—that the two stores show the greatest discrepancy. I’d hazard a guess to say that the App Store prioritizes growth trends over ratings, given the importance it places on an app’s current version and the wide distribution of ratings across the top charts. Google Play, on the other hand, seems to favor ratings, with an unwritten rule that apps just about have to have at least four stars to make the top 100 ranks.

Thus, we conclude our mad science with this final glimpse into what it takes to make the top charts in either store:

Weight of factors in the Apple App Store ranking algorithm

Rating Count > Installs > Trends > Rating

Weight of factors in the Google Play ranking algorithm

Rating Count > Installs > Rating > Trends


Again, we’re oversimplifying for the sake of keeping this post to a mere 3,000 words, but additional factors including keyword density and in-app engagement statistics continue to be strong indicators of ranks. They simply lie outside the scope of these studies.

I hope you found this deep-dive both helpful and interesting. Moving forward, I also hope to see ASOs conducting the same experiments that have brought SEO to the center stage, and encourage you to enhance or refute these findings with your own ASO mad science experiments.

Please share your thoughts in the comments below, and let’s deconstruct the ranking formula together, one experiment at a time.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Has Google Gone Too Far with the Bias Toward Its Own Content?

Posted by ajfried

Since the beginning of SEO time, practitioners have been trying to crack the Google algorithm. Every once in a while, the industry gets a glimpse into how the search giant works and we have opportunity to deconstruct it. We don’t get many of these opportunities, but when we do—assuming we spot them in time—we try to take advantage of them so we can “fix the Internet.”

On Feb. 16, 2015, news started to circulate that NBC would start removing images and references of Brian Williams from its website.

This was it!

A golden opportunity.

This was our chance to learn more about the Knowledge Graph.

Expectation vs. reality

Often it’s difficult to predict what Google is truly going to do. We expect something to happen, but in reality it’s nothing like we imagined.

Expectation

What we expected to see was that Google would change the source of the image. Typically, if you hover over the image in the Knowledge Graph, it reveals the location of the image.

Keanu-Reeves-Image-Location.gif

This would mean that if the image disappeared from its original source, then the image displayed in the Knowledge Graph would likely change or even disappear entirely.

Reality (February 2015)

The only problem was, there was no official source (this changed, as you will soon see) and identifying where the image was coming from proved extremely challenging. In fact, when you clicked on the image, it took you to an image search result that didn’t even include the image.

Could it be? Had Google started its own database of owned or licensed images and was giving it priority over any other sources?

In order to find the source, we tried taking the image from the Knowledge Graph and “search by image” in images.google.com to find others like it. For the NBC Nightly News image, Google failed to even locate a match to the image it was actually using anywhere on the Internet. For other television programs, it was successful. Here is an example of what happened for Morning Joe:

Morning_Joe_image_search.png

So we found the potential source. In fact, we found three potential sources. Seemed kind of strange, but this seemed to be the discovery we were looking for.

This looks like Google is using someone else’s content and not referencing it. These images have a source, but Google is choosing not to show it.

Then Google pulled the ol’ switcheroo.

New reality (March 2015)

Now things changed and Google decided to put a source to their images. Unfortunately, I mistakenly assumed that hovering over an image showed the same thing as the file path at the bottom, but I was wrong. The URL you see when you hover over an image in the Knowledge Graph is actually nothing more than the title. The source is different.

Morning_Joe_Source.png

Luckily, I still had two screenshots I took when I first saw this saved on my desktop. Success. One screen capture was from NBC Nightly News, and the other from the news show Morning Joe (see above) showing that the source was changed.

NBC-nightly-news-crop.png

(NBC Nightly News screenshot.)

The source is a Google-owned property: gstatic.com. You can clearly see the difference in the source change. What started as a hypothesis in now a fact. Google is certainly creating a database of images.

If this is the direction Google is moving, then it is creating all kinds of potential risks for brands and individuals. The implications are a loss of control for any brand that is looking to optimize its Knowledge Graph results. As well, it seems this poses a conflict of interest to Google, whose mission is to organize the world’s information, not license and prioritize it.

How do we think Google is supposed to work?

Google is an information-retrieval system tasked with sourcing information from across the web and supplying the most relevant results to users’ searches. In recent months, the search giant has taken a more direct approach by answering questions and assumed questions in the Answer Box, some of which come from un-credited sources. Google has clearly demonstrated that it is building a knowledge base of facts that it uses as the basis for its Answer Boxes. When it sources information from that knowledge base, it doesn’t necessarily reference or credit any source.

However, I would argue there is a difference between an un-credited Answer Box and an un-credited image. An un-credited Answer Box provides a fact that is indisputable, part of the public domain, unlikely to change (e.g., what year was Abraham Lincoln shot? How long is the George Washington Bridge?) Answer Boxes that offer more than just a basic fact (or an opinion, instructions, etc.) always credit their sources.

There are four possibilities when it comes to Google referencing content:

  • Option 1: It credits the content because someone else owns the rights to it
  • Option 2: It doesn’t credit the content because it’s part of the public domain, as seen in some Answer Box results
  • Option 3: It doesn’t reference it because it owns or has licensed the content. If you search for “Chicken Pox” or other diseases, Google appears to be using images from licensed medical illustrators. The same goes for song lyrics, which Eric Enge discusses here: Google providing credit for content. This adds to the speculation that Google is giving preference to its own content by displaying it over everything else.
  • Option 4: It doesn’t credit the content, but neither does it necessarily own the rights to the content. This is a very gray area, and is where Google seemed to be back in February. If this were the case, it would imply that Google is “stealing” content—which I find hard to believe, but felt was necessary to include in this post for the sake of completeness.

Is this an isolated incident?

At Five Blocks, whenever we see these anomalies in search results, we try to compare the term in question against others like it. This is a categorization concept we use to bucket individuals or companies into similar groups. When we do this, we uncover some incredible trends that help us determine what a search result “should” look like for a given group. For example, when looking at searches for a group of people or companies in an industry, this grouping gives us a sense of how much social media presence the group has on average or how much media coverage it typically gets.

Upon further investigation of terms similar to NBC Nightly News (other news shows), we noticed the un-credited image scenario appeared to be a trend in February, but now all of the images are being hosted on gstatic.com. When we broadened the categories further to TV shows and movies, the trend persisted. Rather than show an image in the Knowledge Graph and from the actual source, Google tends to show an image and reference the source from Google’s own database of stored images.

And just to ensure this wasn’t a case of tunnel vision, we researched other categories, including sports teams, actors and video games, in addition to spot-checking other genres.

Unlike terms for specific TV shows and movies, terms in each of these other groups all link to the actual source in the Knowledge Graph.

Immediate implications

It’s easy to ignore this and say “Well, it’s Google. They are always doing something.” However, there are some serious implications to these actions:

  1. The TV shows/movies aren’t receiving their due credit because, from within the Knowledge Graph, there is no actual reference to the show’s official site
  2. The more Google moves toward licensing and then retrieving their own information, the more biased they become, preferring their own content over the equivalent—or possibly even superior—content from another source
  3. If feels wrong and misleading to get a Google Image Search result rather than an actual site because:
    • The search doesn’t include the original image
    • Considering how poor Image Search results are normally, it feels like a poor experience
  4. If Google is moving toward licensing as much content as possible, then it could make the Knowledge Graph infinitely more complicated when there is a “mistake” or something unflattering. How could one go about changing what Google shows about them?

Google is objectively becoming subjective

It is clear that Google is attempting to create databases of information, including lyrics stored in Google Play, photos, and, previously, facts in Freebase (which is now Wikidata and not owned by Google).

I am not normally one to point my finger and accuse Google of wrongdoing. But this really strikes me as an odd move, one bordering on a clear bias to direct users to stay within the search engine. The fact is, we trust Google with a heck of a lot of information with our searches. In return, I believe we should expect Google to return an array of relevant information for searchers to decide what they like best. The example cited above seems harmless, but what about determining which is the right religion? Or even who the prettiest girl in the world is?

Religion-and-beauty-queries.png

Questions such as these, which Google is returning credited answers for, could return results that are perceived as facts.

Should we next expect Google to decide who is objectively the best service provider (e.g., pizza chain, painter, or accountant), then feature them in an un-credited answer box? The direction Google is moving right now, it feels like we should be calling into question their objectivity.

But that’s only my (subjective) opinion.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Try Your Hand at A/B Testing for a Chance to Win the Email Subject Line Contest

Posted by danielburstein

This blog post ends with an opportunity for you to win a stay at the ARIA in Vegas and a ticket to
Email Summit, but it begins with an essential question for marketers…

How can you improve already successful marketing, advertising, websites and copywriting?

Today’s Moz blog post is unique. Not only are we going to teach you how to address this challenge, we’re going to offer an example that you can dig into to help drive home the lesson.

Give the people what they want

Some copy and design is so bad, the fixes are obvious. Maybe you shouldn’t insult the customer in the headline. Maybe you should update the website that still uses a dot matrix font.

But when you’re already doing well, how can you continue to improve?

I don’t have the answer for you, but I’ll tell you who does – your customers.

There are many tricks, gimmicks and technology you can use in marketing, but when you strip away all the hype and rhetoric, successful marketing is pretty straightforward –
clearly communicate the value your offer provides to people who will pay you for that value.

Easier said than done, of course.

So how do you determine what customers want? And the best way to deliver it to them?

Well, there are many ways to learn from customers, such as focus groups, surveys and social listening. While there is value in asking people what they want, there is also a major challenge in it. “People’s ability to understand the factors that affect their behavior is surprisingly poor,” according to research from Dr. Noah J. Goldstein, Associate Professor of Management and Organizations, UCLA Anderson School of Management.

Or, as Malcolm Gladwell more glibly puts it when referring to coffee choices, “The mind knows not what the tongue wants.”

Not to say that opinion-based customer preference research is bad. It can be helpful. However, it should be the beginning and not the end of your quest.

…by seeing what they actually do

You can use what you learn from opinion-based research to create a hypothesis about what customers want, and then
run an experiment to see how they actually behave in real-world customer interactions with your product, marketing messages, and website.

The technique that powers this kind of research is often known as A/B testing, split testing, landing page optimization, and/or website optimization. If you are testing more than one thing at a time, it may also be referred to as multi-variate testing.

To offer a simple example, you might assume that customers buy your product because it tastes great. Or because it’s less filling. So you could create two landing pages – one with a headline that promotes that taste (treatment A) and another that mentions the low carbs (treatment B). You then send half the traffic that visits that URL to each version and see which performs better.

Here is a simple visual that Joey Taravella, Content Writer, MECLABS create to illustrate the concept…

That’s just one test. To really learn about your customers, you must continue the process and create a testing-optimization cycle in your organization – continue to run A/B tests, record the findings, learn from them, create more hypotheses, and test again based on these hypotheses.

This is true marketing experimentation, and helps you build your theory of the customer.

But you probably know all that already. So here’s your chance to practice while helping us shape an A/B test. You might even win a prize in the process.

The email subject line contest

The Moz Blog and MarketingExperiments Blog have joined forces to run a unique marketing experimentation contest. We’re presenting you with a real challenge from a real organization (VolunteerMatch) and
asking you to write a subject line to test (it’s simple, just leave your subject line as a comment in this blog post).

We’re going to pick three subject lines suggested by readers of The Moz Blog and three from the MarketingExperiments Blog and run a test with this organization’s customers. Whoever writes the best performing subject line will
win a stay at the ARIA Resort in Las Vegas as well as a two-day ticket to MarketingSherpa Email Summit 2015 to help them gain lessons to further improve their marketing.

Sound good? OK, let’s dive in and tell you more about your “client”…

Craft the best-performing subject line to win the prize

Every year at Email Summit, we run a live A/B test where the audience helps craft the experiment. We then run, validate, close the experiment, and share the results during Summit as a way to teach about marketing experimentation. We have typically run the experiment using MarketingSherpa as the “client” website to test (MarketingExperiments and MarketingSherpa are sister publications, both owned by MECLABS Institute).

However, this year we wanted to try something different and interviewed three national non-profits to find a new “client” for our tests.

We chose
VolunteerMatch – a nonprofit organization that uses the power of technology to make it easier for good people and good causes to connect. One of the key reasons we chose VolunteerMatch is because it is an already successful organization looking to further improve. (Here is a case study explaining one of its successful implementations – Lead Management: How a B2B SaaS nonprofit decreased its sales cycle 99%).

Another reason we chose VolunteerMatch for this opportunity is that it has three types of customers, so the lessons from the content we create can help marketers across a wide range of sales models. VolunteerMatch’s customers are:

  • People who want to volunteer (B2C)
  • Non-profit organizations looking for volunteers (non-profit)
  • Businesses looking for corporate volunteering solutions (B2B) to which it offers a Software-as-a-Service product through VolunteerMatch Solutions

Designing the experiment

After we took VolunteerMatch on as the Research Partner “client,” Jon Powell, Senior Executive Research and Development Manager, MECLABS, worked with Shari Tishman, Director of Engagement and Lauren Wagner, Senior Manager of Engagement, VolunteerMatch, to understand their challenges, take a look at their current assets and performance, and craft a design of experiments to determine what further knowledge about its customers would help VolunteerMatch improve performance.

That design of experiments includes a series of split tests – including the live test we’re going to run at Email Summit, as well as the one you have an opportunity to take part in by writing a subject line in the comments section of this blog post. Let’s take a look at that experiment…

The challenge

VolunteerMatch wants to increase the response rate of the corporate email list (B2B) by discovering the best possible messaging to use. In order to find out, MarketingExperiments wants to run an A/B split test to determine the
best messaging.

However the B2B list is relatively smaller than the volunteer/cause list (B2C) which makes it harder to test in (and gain
statistical significance) and determine which messaging is most effective.

So we’re going to run a messaging test to the B2C list. This isn’t without its challenges though, because most individuals on the B2C list are not likely to immediately connect with B2B corporate solutions messaging.

So the question is…

How do we create an email that is relevant (to the B2C list), which doesn’t ask too much, that simultaneously helps us discover the most relevant aspect of the solutions (B2B) product (if any)?

The approach – Here’s where you come in

This is where the Moz and MarketingExperiments community comes in to help.

We would like you to craft subject lines relevant to the B2C list, which highlight various benefits of the corporate solutions tool.

We have broken down the corporate solutions tool into three main categories of benefit for the SaaS product.
In the comments section below, include which category you are writing a subject line for along with what you think is an effective subject line.

The crew at Moz and MarketingExperiments will then choose the top subject line in each category to test. Below you will find the emails that will be sent as part of the test. They are identical, except for the subject lines (which you will write) and the bolded line in the third paragraph (that ties into that category of value).

Category #1: Proof, recognition, credibility


Category #2: Better, more opportunities to choose from


Category #3: Ease-of-use

About VolunteerMatch’s brand

Since we’re asking you to try your hand at crafting messaging for this example “client,” here is some more information about the brand to inform your messaging…


VolunteerMatch’s brand identity


VolunteerMatch’s core values

Ten things VolunteerMatch believes:

  1. People want to do good
  2. Every great cause should be able to find the help it needs
  3. People want to improve their lives and communities through volunteering
  4. You can’t make a difference without making a connection
  5. In putting the power of technology to good use
  6. Businesses are serious about making a difference
  7. In building relationships based on trust and excellent service
  8. In partnering with like-minded organizations to create systems that result in even greater impact
  9. The passion of our employees drives the success of our products, services and mission
  10. In being great at what we do

And now, we test…

To participate, you must leave your comment with your idea for a subject line before midnight on Tuesday, January 13, 2015. The contest is open to all residents of the 50 US states, the District of Columbia, and Canada (excluding Quebec), 18 or older. If you want more info, here are the
official rules.

When you enter your subject line in the comments section, also include which category you’re entering for (and if you have an idea outside these categories, let us know…we just might drop it in the test).

Next, the Moz marketing team will pick the subject lines they think will perform best in each category from all the comments on The Moz Blog, and the MarketingExperiments team will pick the subject lines we think will perform the best in each category from all the comments on the MarketingExperiments Blog.

We’ll give the VolunteerMatch team a chance to approve the subject lines based on their brand standards, then test all six to eight subject lines and report back to you through the Moz and MarketingExperiments blogs which subject lines won and why they won to help you improve your already successful marketing.

So, what have you got? Write your best subject lines in the comments section below. I look forward to seeing what you come up with.

Related resources

If you’re interested in learning more about marketing experimentation and A/B testing, you might find these links helpful…

And here’s a look at a previous subject line writing contest we’ve run to give you some ideas for your entry…


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

6 Things I Wish I Knew Before Using Optimizely

Posted by tallen1985

Diving into Conversion Rate Optimization (CRO) for the first time can be a challenge. You are faced with a whole armoury of new tools, each containing a huge variety of features. Optimizely is one of those tools you will quickly encounter and through this post I’m going to cover 6 features I wish I had known from day one that have helped improve test performance/debugging and the ability to track results accurately.

1. You don’t have to use the editor

The editor within Optimizely is a useful tool if you don’t have much experience working with code. The editor
should be used for making simple visual changes, such as changing an image, adjusting copy or making minor layout changes.

If you are looking to make changes that change the behaviour of the page rather than just straightforward visual changes, then the editor can become troublesome. In this case you should use the “Edit Code” feature at the foot of the editor.

For any large-scale changes to the site, such as completely redesigning the page, Optimizely should be used for traffic allocation and not editing pages. To do this:

1. Build a new version of the page outside of Optimizely

2. Upload the variation page to your site.
Important: Ensure that the variation page is noindexed.

We now have two variations of our page:

www.myhomepage.com & www.myhomepage.com/variation1

3. Select the variation drop down menu and click Redirect to a new page

4. Enter the variation URL, apply the settings and save the experiment. You can now use Optimizely as an A/B test management tool to allocate traffic, exclude traffic/device types, and gather further test data.

If you do use the editor be aware of excess code

One problem to be aware of here is that each time you move or change an element Optimizely adds a new line of code. The variation code below actually repositions the h2 title four times.

Instead when using the editor we should make sure that we strip out any excess code. If you move and save a page element multiple times, open the <edit code> tab at the foot of the page and delete any excess code. For example, the following positions my h2 title in exactly the same position as before with three fewer lines of code. Over the course of multiple changes, this excess code can result in an increase of load time for Optimizely.


2. Enabling analytics tracking

Turning on analytics tracking seems obvious, right? In fact, why would we even need to turn it on in the first place, surely it would be defaulted to on?

Optimizely currently sets analytics tracking to the default option of off. As a result if you don’t manually change the setting nothing will be getting reporting into your analytics platform of choice.

To turn on analytics tracking, simply open the settings in the top right corner from within the editor mode and select Analytics Integration.

Turn on the relevant analytics tracking. If you are using Google Analytics, then at this point you should assign a vacant custom variable slot (for Classic Analytics) or a vacant custom dimension (Universal Analytics) to the experiment.

Once the test is live, wait for a while (up to 24 hours), then check to be sure the data is reporting correctly within the custom segments.


3. Test your variations in a live environment

Before you set your test live, it’s important that you test the new variation to ensure everything works as expected. To do this we need to see the test in a live environment while ensuring no customers see the test versions yet. I’ve suggested a couple of ways to do this below:

Query parameter targeting

Query parameter tracking is available on all accounts and is our preferred method for sharing live versions with clients, mainly because once set up, it is as simple as sharing a URL.

1. Click the audiences icon at the top of the page 

2. Select create a new audience

3. Drag Query Parameters from the possible conditions and enter parameters of your choice.

4. Click Apply and save the experiment.

5. To view the experiment visit the test URL with query parameters added. In the above example the URL would be:
http://www.distilled.net?test=variation

Cookie targeting

1. Open the browser and create a bookmark on any page

2. Edit the bookmark and change both properties to:

a) Name: Set A Test Cookie

b)URL: The following Javascript code:

<em>javascript:(function(){ var hostname = window.location.hostname; var parts = hostname.split("."); var publicSuffix = hostname; var last = parts[parts.length - 1]; var expireDate = new Date(); expireDate.setDate(expireDate.getDate() + 7); var TOP_LEVEL_DOMAINS = ["com", "local", "net", "org", "xxx", "edu", "es", "gov", "biz", "info", "fr", "gr", "nl", "ca", "de", "kr", "it", "me", "ly", "tv", "mx", "cn", "jp", "il", "in", "iq"]; var SPECIAL_DOMAINS = ["jp", "uk", "au"]; if(parts.length > 2 && SPECIAL_DOMAINS.indexOf(last) != -1){ publicSuffix = parts[parts.length - 3] + "."+ parts[parts.length - 2] + "."+ last} else if(parts.length > 1 && TOP_LEVEL_DOMAINS.indexOf(last) != -1) {publicSuffix = parts[parts.length - 2] + "."+ last} document.cookie = "optly_"+publicSuffix.split(".")[0]+"_test=true; domain=."+publicSuffix+"; path=/; expires="+expireDate.toGMTString()+";"; })();</em>

You should end up with the following:

3. Open the page where you want to place the cookie and click the bookmark

4. The cookie will now be set on the domain you are browsing and will looking something like: ‘optly_YOURDOMAINNAME_test=true’

Next we need to target our experiment to only allow visitors who have the cookie set to see test variations.

5. Click the audiences icon at the top of the page

6. Select create a new audience

7. Drag Cookie into the Conditions and change the name to optly_YOURDOMAINNAME_test=true

8. Click Apply and save the experiment.

Source:
https://help.optimizely.com/hc/en-us/articles/200293784-Setting-a-test-cookie-for-your-site

IP address targeting (only available on Enterprise accounts)

Using IP address targeting is useful when you are looking to test variations in house and on a variety of different devices and browsers.

1. Click the audiences icon at the top of the page

2. Select create a new audience

3. Drag IP Address from the possible conditions and enter the IP address being used. (Not sure of your IP address then head to
http://whatismyipaddress.com/)

4. Click Apply and Save the experiment.


4. Force variations using parameters when debugging pages

There will be times, particular when testing new variations, that there will be the need to view a specific variation. Obviously this can be an issue if your browser has already been bucketed into an alternative variation. Optimizely overcomes this by allowing you to force the variation you wish to view, simply using query parameters.

The query parameter is structured in the following way: optimizely_x
EXPRIMENTID=VARIATIONINDEX

1. The
EXPERIMENTID can be found in the browser URL

2.
VARIATIONINDEX is the variation you want to run, 0 is for the original, 1 is variation #1, 2 is variation #2 etc.

3. Using the above example to force a variation, we would use the following URLstructure to display variation 1 of our experiment:
http://www.yourwebsite.com/?optimizely_x1845540742=1

Source:
https://help.optimizely.com/hc/en-us/articles/200107480-Forcing-a-specific-variation-to-run-and-other-advanced-URL-parameters


5. Don’t change the traffic allocation sliders

Once a test is live it is important not change the amount of traffic allocated to each variation. Doing so can massively affect test results, as one version would potentially begin to receive more return visitors who in turn have a much higher chance of converting.

My colleague Tom Capper discussed further the
do’s and don’ts of statistical significance earlier this year where he explained,

“At the start of your test, you decide to play it safe and set your traffic allocation to 90/10. After a time, it seems the variation is non-disastrous, and you decide to move the slider to 50/50. But return visitors are still always assigned their original group, so now you have a situation where the original version has a larger proportion of return visitors, who are far more likely to convert.”

To summarize, if you do need to adjust the amount of traffic allocated to each test variation, you should look to restart the test to have complete confidence that the data you receive is accurate.


6. Use segmentation to generate better analysis

Okay I understand this one isn’t strictly about Optimizely, but it is certainly worth keeping in mind, particularly earlier on in the CRO process when producing hypothesis around device type.

Conversion rates can vary greatly, particularly when we start segmenting data by locations, browsers, medium, return visits vs new visits, just to name a few. However, by using segmentation we can unearth opportunities that we may have previously overlooked, allowing us to generate new hypotheses for future experiments.


Example

You have been running a test for a month and unfortunately the results are inconclusive. The test version of the page didn’t perform any better or worse than the original. Overall the test results look like the following:


Page Version

Visitors

Transactions

Conversion Rate
Original 41781 1196 2.86%
Variation 42355 1225 2.89%

In this case the test variation overall has only performed
1% better than the original with a significance of 60%. With these results this test variation certainly wouldn’t be getting rolled out any time soon.

However when these results are segmented by
device they tell a very different story:

Drilling into the
desktop results we actually find that the test variation saw a 10% increase in conversions over the original with 97% significance. Yet those using a tablet were converting way below the original, thus driving down the overall conversion rates we were seeing in the first table.

Ultimately with this data we would be able to generate a new hypothesis of “we believe the variation will increase conversion rate for users on a desktop”. We would then re-run the test to desktop only users to verify the previous data and the new hypothesis.

Using segmented data here could also potentially help the experiment reach significance at a much faster rate as
explained in this video from Opticon 2014.

Should the new test be successful and achieve significance we would serve users on the desktops the new variation, whilst those on mobile and tablets continue to be displayed the original site.

Key takeaways

  • Always turn on Google Analytics tracking (and then double check it is turned on).
  • If you plan to make behavioural changes to a page use the Javascript editor rather than the drag and drop feature
  • Use IP address targeting for device testing and query parameters to share a live test with clients.
  • If you need to change the traffic allocation to test variations you should restart the test.
  • Be aware that test performance can vary greatly based on device.

What problems and solutions have you come across when creating CRO experiments with Optimizely? What pieces of information do you wish you had known 6 months ago?

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]