Stop Ghost Spam in Google Analytics with One Filter

Posted by CarloSeo

The spam in Google Analytics (GA) is becoming a serious issue. Due to a deluge of referral spam from social buttons, adult sites, and many, many other sources, people are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.

The good news is, there is no need to panic. In this post, I’m going to focus on the most common mistakes people make when fighting spam in GA, and explain an efficient way to prevent it.

But first, let’s make sure we understand how spam works. A couple of months ago, Jared Gardner wrote an excellent article explaining what referral spam is, including its intended purpose. He also pointed out some great examples of referral spam.

Types of spam

The spam in Google Analytics can be categorized by two types: ghosts and crawlers.

Ghosts

The vast majority of spam is this type. They are called ghosts because they never access your site. It is important to keep this in mind, as it’s key to creating a more efficient solution for managing spam.

As unusual as it sounds, this type of spam doesn’t have any interaction with your site at all. You may wonder how that is possible since one of the main purposes of GA is to track visits to our sites.

They do it by using the Measurement Protocol, which allows people to send data directly to Google Analytics’ servers. Using this method, and probably randomly generated tracking codes (UA-XXXXX-1) as well, the spammers leave a “visit” with fake data, without even knowing who they are hitting.

Crawlers

This type of spam, the opposite to ghost spam, does access your site. As the name implies, these spam bots crawl your pages, ignoring rules like those found in robots.txt that are supposed to stop them from reading your site. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.

Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against this list might help you answer the question of whether or not it is spammy.

Most common mistakes made when dealing with spam in GA

I’ve been following this issue closely for the last few months. According to the comments people have made on my articles and conversations I’ve found in discussion forums, there are primarily three mistakes people make when dealing with spam in Google Analytics.

Mistake #1. Blocking ghost spam from the .htaccess file

One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.

For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won’t have any effect and will only add useless lines to your .htaccess file.

Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people think that they successfully blocked it from here when really it’s just a coincidence of timing.

Then when the spammers later return, they get worried because the solution is not working anymore, and they think the spammer somehow bypassed the barriers they set up.

The truth is, the .htaccess file can only effectively block crawlers such as buttons-for-website.com and a few others since these access your site. Most of the spam can’t be blocked using this method, so there is no other option than using filters to exclude them.

Mistake #2. Using the referral exclusion list to stop spam

Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in the way we want to for the spam. It has other purposes.

For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they’re redirected back to you website, and GA records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.

If you try to use the referral exclusion list to manage spam, however, the referral part will be stripped since there is no preexisting record. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with since. You will still have spam, and direct visits are harder to track.

Mistake #3. Worrying that bounce rate changes will affect rankings

When people see that the bounce rate changes drastically because of the spam, they start worrying about the impact that it will have on their rankings in the SERPs.

bounce.png

This is another mistake commonly made. With or without spam, Google doesn’t take into consideration Google Analytics metrics as a ranking factor. Here is an explanation about this from Matt Cutts, the former head of Google’s web spam team.

And if you think about it, Cutts’ explanation makes sense; because although many people have GA, not everyone uses it.

Assuming your site has been hacked

Another common concern when people see strange landing pages coming from spam on their reports is that they have been hacked.

landing page

The page that the spam shows on the reports doesn’t exist, and if you try to open it, you will get a 404 page. Your site hasn’t been compromised.

But you have to make sure the page doesn’t exist. Because there are cases (not spam) where some sites have a security breach and get injected with pages full of bad keywords to defame the website.

What should you worry about?

Now that we’ve discarded security issues and their effects on rankings, the only thing left to worry about is your data. The fake trail that the spam leaves behind pollutes your reports.

It might have greater or lesser impact depending on your site traffic, but everyone is susceptible to the spam.

Small and midsize sites are the most easily impacted – not only because a big part of their traffic can be spam, but also because usually these sites are self-managed and sometimes don’t have the support of an analyst or a webmaster.

Big sites with a lot of traffic can also be impacted by spam, and although the impact can be insignificant, invalid traffic means inaccurate reports no matter the size of the website. As an analyst, you should be able to explain what’s going on in even in the most granular reports.

You only need one filter to deal with ghost spam

Usually it is recommended to add the referral to an exclusion filter after it is spotted. Although this is useful for a quick action against the spam, it has three big disadvantages.

  • Making filters every week for every new spam detected is tedious and time-consuming, especially if you manage many sites. Plus, by the time you apply the filter, and it starts working, you already have some affected data.
  • Some of the spammers use direct visits along with the referrals.
  • These direct hits won’t be stopped by the filter so even if you are excluding the referral you will sill be receiving invalid traffic, which explains why some people have seen an unusual spike in direct traffic.

Luckily, there is a good way to prevent all these problems. Most of the spam (ghost) works by hitting GA’s random tracking-IDs, meaning the offender doesn’t really know who is the target, and for that reason either the hostname is not set or it uses a fake one. (See report below)

Ghost-Spam.png

You can see that they use some weird names or don’t even bother to set one. Although there are some known names in the list, these can be easily added by the spammer.

On the other hand, valid traffic will always use a real hostname. In most of the cases, this will be the domain. But it also can also result from paid services, translation services, or any other place where you’ve inserted GA tracking code.

Valid-Referral.png

Based on this, we can make a filter that will include only hits that use real hostnames. This will automatically exclude all hits from ghost spam, whether it shows up as a referral, keyword, or pageview; or even as a direct visit.

To create this filter, you will need to find the report of hostnames. Here’s how:

  1. Go to the Reporting tab in GA
  2. Click on Audience in the lefthand panel
  3. Expand Technology and select Network
  4. At the top of the report, click on Hostname

Valid-list

You will see a list of all hostnames, including the ones that the spam uses. Make a list of all the valid hostnames you find, as follows:

  • yourmaindomain.com
  • blog.yourmaindomain.com
  • es.yourmaindomain.com
  • payingservice.com
  • translatetool.com
  • anotheruseddomain.com

For small to medium sites, this list of hostnames will likely consist of the main domain and a couple of subdomains. After you are sure you got all of them, create a regular expression similar to this one:

yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com

You don’t need to put all of your subdomains in the regular expression. The main domain will match all of them. If you don’t have a view set up without filters, create one now.

Then create a Custom Filter.

Make sure you select INCLUDE, then select “Hostname” on the filter field, and copy your expression into the Filter Pattern box.

filter

You might want to verify the filter before saving to check that everything is okay. Once you’re ready, set it to save, and apply the filter to all the views you want (except the view without filters).

This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn’t require much maintenance. But it’s important that every time you add your tracking code to any service, you add it to the end of the filter.

Now you should only need to take care of the crawler spam. Since crawlers access your site, you can block them by adding these lines to the .htaccess file:

## STOP REFERRER SPAM 
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR] 
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] 
RewriteRule .* - [F]

It is important to note that this file is very sensitive, and misplacing a single character it it can bring down your entire site. Therefore, make sure you create a backup copy of your .htaccess file prior to editing it.

If you don’t feel comfortable messing around with your .htaccess file, you can alternatively make an expression with all the crawlers, then and add it to an exclude filter by Campaign Source.

Implement these combined solutions, and you will worry much less about spam contaminating your analytics data. This will have the added benefit of freeing up more time for you to spend actually analyze your valid data.

After stopping spam, you can also get clean reports from the historical data by using the same expressions in an Advance Segment to exclude all the spam.

Bonus resources to help you manage spam

If you still need more information to help you understand and deal with the spam on your GA reports, you can read my main article on the subject here: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/.

Additional information on how to stop spam can be found at these URLs:

In closing, I am eager to hear your ideas on this serious issue. Please share them in the comments below.

(Editor’s Note: All images featured in this post were created by the author.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How to Use Server Log Analysis for Technical SEO

Posted by SamuelScott

It’s ten o’clock. Do you know where your logs are?

I’m introducing this guide with a pun on a common public-service announcement that has run on late-night TV news broadcasts in the United States because log analysis is something that is extremely newsworthy and important.

If your technical and on-page SEO is poor, then nothing else that you do will matter. Technical SEO is the key to helping search engines to crawl, parse, and index websites, and thereby rank them appropriately long before any marketing work begins.

The important thing to remember: Your log files contain the only data that is 100% accurate in terms of how search engines are crawling your website. By helping Google to do its job, you will set the stage for your future SEO work and make your job easier. Log analysis is one facet of technical SEO, and correcting the problems found in your logs will help to lead to higher rankings, more traffic, and more conversions and sales.

Here are just a few reasons why:

  • Too many response code errors may cause Google to reduce its crawling of your website and perhaps even your rankings.
  • You want to make sure that search engines are crawling everything, new and old, that you want to appear and rank in the SERPs (and nothing else).
  • It’s crucial to ensure that all URL redirections will pass along any incoming “link juice.”

However, log analysis is something that is unfortunately discussed all too rarely in SEO circles. So, here, I wanted to give the Moz community an introductory guide to log analytics that I hope will help. If you have any questions, feel free to ask in the comments!

What is a log file?

Computer servers, operating systems, network devices, and computer applications automatically generate something called a log entry whenever they perform an action. In a SEO and digital marketing context, one type of action is whenever a page is requested by a visiting bot or human.

Server log entries are specifically programmed to be output in the Common Log Format of the W3C consortium. Here is one example from Wikipedia with my accompanying explanations:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • 127.0.0.1 — The remote hostname. An IP address is shown, like in this example, whenever the DNS hostname is not available or DNSLookup is turned off.
  • user-identifier — The remote logname / RFC 1413 identity of the user. (It’s not that important.)
  • frank — The user ID of the person requesting the page. Based on what I see in my Moz profile, Moz’s log entries would probably show either “SamuelScott” or “392388” whenever I visit a page after having logged in.
  • [10/Oct/2000:13:55:36 -0700] — The date, time, and timezone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP/1.0 — “GET” is one of the two commands (the other is “POST”) that can be performed. “GET” fetches a URL while “POST” is submitting something (such as a forum comment). The second part is the URL that is being accessed, and the last part is the version of HTTP that is being accessed.
  • 200 — The status code of the document that was returned.
  • 2326 — The size, in bytes, of the document that was returned.

Note: A hyphen is shown in a field when that information is unavailable.

Every single time that you — or the Googlebot — visit a page on a website, a line with this information is output, recorded, and stored by the server.

Log entries are generated continuously and anywhere from several to thousands can be created every second — depending on the level of a given server, network, or application’s activity. A collection of log entries is called a log file (or often in slang, “the log” or “the logs”), and it is displayed with the most-recent log entry at the bottom. Individual log files often contain a calendar day’s worth of log entries.

Accessing your log files

Different types of servers store and manage their log files differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers:

What is log analysis?

Log analysis (or log analytics) is the process of going through log files to learn something from the data. Some common reasons include:

  • Development and quality assurance (QA) — Creating a program or application and checking for problematic bugs to make sure that it functions properly
  • Network troubleshooting — Responding to and fixing system errors in a network
  • Customer service — Determining what happened when a customer had a problem with a technical product
  • Security issues — Investigating incidents of hacking and other intrusions
  • Compliance matters — Gathering information in response to corporate or government policies
  • Technical SEO — This is my favorite! More on that in a bit.

Log analysis is rarely performed regularly. Usually, people go into log files only in response to something — a bug, a hack, a subpoena, an error, or a malfunction. It’s not something that anyone wants to do on an ongoing basis.

Why? This is a screenshot of ours of just a very small part of an original (unstructured) log file:

Ouch. If a website gets 10,000 visitors who each go to ten pages per day, then the server will create a log file every day that will consist of 100,000 log entries. No one has the time to go through all of that manually.

How to do log analysis

There are three general ways to make log analysis easier in SEO or any other context:

  • Do-it-yourself in Excel
  • Proprietary software such as Splunk or Sumo-logic
  • The ELK Stack open-source software

Tim Resnik’s Moz essay from a few years ago walks you through the process of exporting a batch of log files into Excel. This is a (relatively) quick and easy way to do simple log analysis, but the downside is that one will see only a snapshot in time and not any overall trends. To obtain the best data, it’s crucial to use either proprietary tools or the ELK Stack.

Splunk and Sumo-Logic are proprietary log analysis tools that are primarily used by enterprise companies. The ELK Stack is a free and open-source batch of three platforms (Elasticsearch, Logstash, and Kibana) that is owned by Elastic and used more often by smaller businesses. (Disclosure: We at Logz.io use the ELK Stack to monitor our own internal systems as well as for the basis of our own log management software.)

For those who are interested in using this process to do technical SEO analysis, monitor system or application performance, or for any other reason, our CEO, Tomer Levy, has written a guide to deploying the ELK Stack.

Technical SEO insights in log data

However you choose to access and understand your log data, there are many important technical SEO issues to address as needed. I’ve included screenshots of our technical SEO dashboard with our own website’s data to demonstrate what to examine in your logs.

Bot crawl volume

It’s important to know the number of requests made by Baidu, BingBot, GoogleBot, Yahoo, Yandex, and others over a given period time. If, for example, you want to get found in search in Russia but Yandex is not crawling your website, that is a problem. (You’d want to consult Yandex Webmaster and see this article on Search Engine Land.)

Response code errors

Moz has a great primer on the meanings of the different status codes. I have an alert system setup that tells me about 4XX and 5XX errors immediately because those are very significant.

Temporary redirects

Temporary 302 redirects do not pass along the “link juice” of external links from the old URL to the new one. Almost all of the time, they should be changed to permanent 301 redirects.

Crawl budget waste

Google assigns a crawl budget to each website based on numerous factors. If your crawl budget is, say, 100 pages per day (or the equivalent amount of data), then you want to be sure that all 100 are things that you want to appear in the SERPs. No matter what you write in your robots.txt file and meta-robots tags, you might still be wasting your crawl budget on advertising landing pages, internal scripts, and more. The logs will tell you — I’ve outlined two script-based examples in red above.

If you hit your crawl limit but still have new content that should be indexed to appear in search results, Google may abandon your site before finding it.

Duplicate URL crawling

The addition of URL parameters — typically used in tracking for marketing purposes — often results in search engines wasting crawl budgets by crawling different URLs with the same content. To learn how to address this issue, I recommend reading the resources on Google and Search Engine Land here, here, here, and here.

Crawl priority

Google might be ignoring (and not crawling or indexing) a crucial page or section of your website. The logs will reveal what URLs and/or directories are getting the most and least attention. If, for example, you have published an e-book that attempts to rank for targeted search queries but it sits in a directory that Google only visits once every six months, then you won’t get any organic search traffic from the e-book for up to six months.

If a part of your website is not being crawled very often — and it is updated often enough that it should be — then you might need to check your internal-linking structure and the crawl-priority settings in your XML sitemap.

Last crawl date

Have you uploaded something that you hope will be indexed quickly? The log files will tell you when Google has crawled it.

Crawl budget

One thing I personally like to check and see is Googlebot’s real-time activity on our site because the crawl budget that the search engine assigns to a website is a rough indicator — a very rough one — of how much it “likes” your site. Google ideally does not want to waste valuable crawling time on a bad website. Here, I had seen that Googlebot had made 154 requests of our new startup’s website over the prior twenty-four hours. Hopefully, that number will go up!

As I hope you can see, log analysis is critically important in technical SEO. It’s eleven o’clock — do you know where your logs are now?

Additional resources

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Deconstructing the App Store Rankings Formula with a Little Mad Science

Posted by AlexApptentive

After seeing Rand’s “Mad Science Experiments in SEO” presented at last year’s MozCon, I was inspired to put on the lab coat and goggles and do a few experiments of my own—not in SEO, but in SEO’s up-and-coming younger sister, ASO (app store optimization).

Working with Apptentive to guide enterprise apps and small startup apps alike to increase their discoverability in the app stores, I’ve learned a thing or two about app store optimization and what goes into an app’s ranking. It’s been my personal goal for some time now to pull back the curtains on Google and Apple. Yet, the deeper into the rabbit hole I go, the more untested assumptions I leave in my way.

Hence, I thought it was due time to put some longstanding hypotheses through the gauntlet.

As SEOs, we know how much of an impact a single ranking can mean on a SERP. One tiny rank up or down can make all the difference when it comes to your website’s traffic—and revenue.

In the world of apps, ranking is just as important when it comes to standing out in a sea of more than 1.3 million apps. Apptentive’s recent mobile consumer survey shed a little more light this claim, revealing that nearly half of all mobile app users identified browsing the app store charts and search results (the placement on either of which depends on rankings) as a preferred method for finding new apps in the app stores. Simply put, better rankings mean more downloads and easier discovery.

Like Google and Bing, the two leading app stores (the Apple App Store and Google Play) have a complex and highly guarded algorithms for determining rankings for both keyword-based app store searches and composite top charts.

Unlike SEO, however, very little research and theory has been conducted around what goes into these rankings.

Until now, that is.

Over the course of five studies analyzing various publicly available data points for a cross-section of the top 500 iOS (U.S. Apple App Store) and the top 500 Android (U.S. Google Play) apps, I’ll attempt to set the record straight with a little myth-busting around ASO. In the process, I hope to assess and quantify any perceived correlations between app store ranks, ranking volatility, and a few of the factors commonly thought of as influential to an app’s ranking.

But first, a little context

Image credit: Josh Tuininga, Apptentive

Both the Apple App Store and Google Play have roughly 1.3 million apps each, and both stores feature a similar breakdown by app category. Apps ranking in the two stores should, theoretically, be on a fairly level playing field in terms of search volume and competition.

Of these apps, nearly two-thirds have not received a single rating and 99% are considered unprofitable. These studies, therefore, single out the rare exceptions to the rule—the top 500 ranked apps in each store.

While neither Apple nor Google have revealed specifics about how they calculate search rankings, it is generally accepted that both app store algorithms factor in:

  • Average app store rating
  • Rating/review volume
  • Download and install counts
  • Uninstalls (what retention and churn look like for the app)
  • App usage statistics (how engaged an app’s users are and how frequently they launch the app)
  • Growth trends weighted toward recency (how daily download counts changed over time and how today’s ratings compare to last week’s)
  • Keyword density of the app’s landing page (Ian did a great job covering this factor in a previous Moz post)

I’ve simplified this formula to a function highlighting the four elements with sufficient data (or at least proxy data) for our analysis:

Ranking = fn(Rating, Rating Count, Installs, Trends)

Of course, right now, this generalized function doesn’t say much. Over the next five studies, however, we’ll revisit this function before ultimately attempting to compare the weights of each of these four variables on app store rankings.

(For the purpose of brevity, I’ll stop here with the assumptions, but I’ve gone into far greater depth into how I’ve reached these conclusions in a 55-page report on app store rankings.)

Now, for the Mad Science.

Study #1: App-les to app-les app store ranking volatility

The first, and most straight forward of the five studies involves tracking daily movement in app store rankings across iOS and Android versions of the same apps to determine any trends of differences between ranking volatility in the two stores.

I went with a small sample of five apps for this study, the only criteria for which were that:

  • They were all apps I actively use (a criterion for coming up with the five apps but not one that influences rank in the U.S. app stores)
  • They were ranked in the top 500 (but not the top 25, as I assumed app store rankings would be stickier at the top—an assumption I’ll test in study #2)
  • They had an almost identical version of the app in both Google Play and the App Store, meaning they should (theoretically) rank similarly
  • They covered a spectrum of app categories

The apps I ultimately chose were Lyft, Venmo, Duolingo, Chase Mobile, and LinkedIn. These five apps represent the travel, finance, education banking, and social networking categories.

Hypothesis

Going into this analysis, I predicted slightly more volatility in Apple App Store rankings, based on two statistics:

Both of these assumptions will be tested in later analysis.

Results

7-Day App Store Ranking Volatility in the App Store and Google Play

Among these five apps, Google Play rankings were, indeed, significantly less volatile than App Store rankings. Among the 35 data points recorded, rankings within Google Play moved by as much as 23 positions/ranks per day while App Store rankings moved up to 89 positions/ranks. The standard deviation of ranking volatility in the App Store was, furthermore, 4.45 times greater than that of Google Play.

Of course, the same apps varied fairly dramatically in their rankings in the two app stores, so I then standardized the ranking volatility in terms of percent change to control for the effect of numeric rank on volatility. When cast in this light, App Store rankings changed by as much as 72% within a 24-hour period while Google Play rankings changed by no more than 9%.

Also of note, daily rankings tended to move in the same direction across the two app stores approximately two-thirds of the time, suggesting that the two stores, and their customers, may have more in common than we think.

Study #2: App store ranking volatility across the top charts

Testing the assumption implicit in standardizing the data in study No. 1, this one was designed to see if app store ranking volatility is correlated with an app’s current rank. The sample for this study consisted of the top 500 ranked apps in both Google Play and the App Store, with special attention given to those on both ends of the spectrum (ranks 1–100 and 401–500).

Hypothesis

I anticipated rankings to be more volatile the higher an app is ranked—meaning an app ranked No. 450 should be able to move more ranks in any given day than an app ranked No. 50. This hypothesis is based on the assumption that higher ranked apps have more installs, active users, and ratings, and that it would take a large margin to produce a noticeable shift in any of these factors.

Results

App Store Ranking Volatility of Top 500 Apps

One look at the chart above shows that apps in both stores have increasingly more volatile rankings (based on how many ranks they moved in the last 24 hours) the lower on the list they’re ranked.

This is particularly true when comparing either end of the spectrum—with a seemingly straight volatility line among Google Play’s Top 100 apps and very few blips within the App Store’s Top 100. Compare this section to the lower end, ranks 401–)500, where both stores experience much more turbulence in their rankings. Across the gamut, I found a 24% correlation between rank and ranking volatility in the Play Store and 28% correlation in the App Store.

To put this into perspective, the average app in Google Play’s 401–)500 ranks moved 12.1 ranks in the last 24 hours while the average app in the Top 100 moved a mere 1.4 ranks. For the App Store, these numbers were 64.28 and 11.26, making slightly lower-ranked apps more than five times as volatile as the highest ranked apps. (I say slightly as these “lower-ranked” apps are still ranked higher than 99.96% of all apps.)

The relationship between rank and volatility is pretty consistent across the App Store charts, while rank has a much greater impact on volatility at the lower end of Google Play charts (ranks 1-100 have a 35% correlation) than it does at the upper end (ranks 401-500 have a 1% correlation).

Study #3: App store rankings across the stars

The next study looks at the relationship between rank and star ratings to determine any trends that set the top chart apps apart from the rest and explore any ties to app store ranking volatility.

Hypothesis

Ranking = fn(Rating, Rating Count, Installs, Trends)

As discussed in the introduction, this study relates directly to one of the factors commonly accepted as influential to app store rankings: average rating.

Getting started, I hypothesized that higher ranks generally correspond to higher ratings, cementing the role of star ratings in the ranking algorithm.

As far as volatility goes, I did not anticipate average rating to play a role in app store ranking volatility, as I saw no reason for higher rated apps to be less volatile than lower rated apps, or vice versa. Instead, I believed volatility to be tied to rating volume (as we’ll explore in our last study).

Results

Average App Store Ratings of Top Apps

The chart above plots the top 100 ranked apps in either store with their average rating (both historic and current, for App Store apps). If it looks a little chaotic, it’s just one indicator of the complexity of ranking algorithm in Google Play and the App Store.

If our hypothesis was correct, we’d see a downward trend in ratings. We’d expect to see the No. 1 ranked app with a significantly higher rating than the No. 100 ranked app. Yet, in neither store is this the case. Instead, we get a seemingly random plot with no obvious trends that jump off the chart.

A closer examination, in tandem with what we already know about the app stores, reveals two other interesting points:

  1. The average star rating of the top 100 apps is significantly higher than that of the average app. Across the top charts, the average rating of a top 100 Android app was 4.319 and the average top iOS app was 3.935. These ratings are 0.32 and 0.27 points, respectively, above the average rating of all rated apps in either store. The averages across apps in the 401–)500 ranks approximately split the difference between the ratings of the top ranked apps and the ratings of the average app.
  2. The rating distribution of top apps in Google Play was considerably more compact than the distribution of top iOS apps. The standard deviation of ratings in the Apple App Store top chart was over 2.5 times greater than that of the Google Play top chart, likely meaning that ratings are more heavily weighted in Google Play’s algorithm.

App Store Ranking Volatility and Average Rating

Looking next at the relationship between ratings and app store ranking volatility reveals a -15% correlation that is consistent across both app stores; meaning the higher an app is rated, the less its rank it likely to move in a 24-hour period. The exception to this rule is the Apple App Store’s calculation of an app’s current rating, for which I did not find a statistically significant correlation.

Study #4: App store rankings across versions

This next study looks at the relationship between the age of an app’s current version, its rank and its ranking volatility.

Hypothesis

Ranking = fn(Rating, Rating Count, Installs, Trends)

In alteration of the above function, I’m using the age of a current app’s version as a proxy (albeit not a very good one) for trends in app store ratings and app quality over time.

Making the assumptions that (a) apps that are updated more frequently are of higher quality and (b) each new update inspires a new wave of installs and ratings, I’m hypothesizing that the older the age of an app’s current version, the lower it will be ranked and the less volatile its rank will be.

Results

How update frequency correlates with app store rank

The first and possibly most important finding is that apps across the top charts in both Google Play and the App Store are updated remarkably often as compared to the average app.

At the time of conducting the study, the current version of the average iOS app on the top chart was only 28 days old; the current version of the average Android app was 38 days old.

As hypothesized, the age of the current version is negatively correlated with the app’s rank, with a 13% correlation in Google Play and a 10% correlation in the App Store.

How update frequency correlates with app store ranking volatility

The next part of the study maps the age of the current app version to its app store ranking volatility, finding that recently updated Android apps have less volatile rankings (correlation: 8.7%) while recently updated iOS apps have more volatile rankings (correlation: -3%).

Study #5: App store rankings across monthly active users

In the final study, I wanted to examine the role of an app’s popularity on its ranking. In an ideal world, popularity would be measured by an app’s monthly active users (MAUs), but since few mobile app developers have released this information, I’ve settled for two publicly available proxies: Rating Count and Installs.

Hypothesis

Ranking = fn(Rating, Rating Count, Installs, Trends)

For the same reasons indicated in the second study, I anticipated that more popular apps (e.g., apps with more ratings and more installs) would be higher ranked and less volatile in rank. This, again, takes into consideration that it takes more of a shift to produce a noticeable impact in average rating or any of the other commonly accepted influencers of an app’s ranking.

Results

Apps with more ratings and reviews typically rank higher

The first finding leaps straight off of the chart above: Android apps have been rated more times than iOS apps, 15.8x more, in fact.

The average app in Google Play’s Top 100 had a whopping 3.1 million ratings while the average app in the Apple App Store’s Top 100 had 196,000 ratings. In contrast, apps in the 401–)500 ranks (still tremendously successful apps in the 99.96 percentile of all apps) tended to have between one-tenth (Android) and one-fifth (iOS) of the ratings count as that of those apps in the top 100 ranks.

Considering that almost two-thirds of apps don’t have a single rating, reaching rating counts this high is a huge feat, and a very strong indicator of the influence of rating count in the app store ranking algorithms.

To even out the playing field a bit and help us visualize any correlation between ratings and rankings (and to give more credit to the still-staggering 196k ratings for the average top ranked iOS app), I’ve applied a logarithmic scale to the chart above:

The relationship between app store ratings and rankings in the top 100 apps

From this chart, we can see a correlation between ratings and rankings, such that apps with more ratings tend to rank higher. This equates to a 29% correlation in the App Store and a 40% correlation in Google Play.

Apps with more ratings typically experience less app store ranking volatility

Next up, I looked at how ratings count influenced app store ranking volatility, finding that apps with more ratings had less volatile rankings in the Apple App Store (correlation: 17%). No conclusive evidence was found within the Top 100 Google Play apps.

Apps with more installs and active users tend to rank higher in the app stores

And last but not least, I looked at install counts as an additional proxy for MAUs. (Sadly, this is a statistic only listed in Google Play. so any resulting conclusions are applicable only to Android apps.)

Among the top 100 Android apps, this last study found that installs were heavily correlated with ranks (correlation: -35.5%), meaning that apps with more installs are likely to rank higher in Google Play. Android apps with more installs also tended to have less volatile app store rankings, with a correlation of -16.5%.

Unfortunately, these numbers are slightly skewed as Google Play only provides install counts in broad ranges (e.g., 500k–)1M). For each app, I took the low end of the range, meaning we can likely expect the correlation to be a little stronger since the low end was further away from the midpoint for apps with more installs.

Summary

To make a long post ever so slightly shorter, here are the nuts and bolts unearthed in these five mad science studies in app store optimization:

  1. Across the top charts, Apple App Store rankings are 4.45x more volatile than those of Google Play
  2. Rankings become increasingly volatile the lower an app is ranked. This is particularly true across the Apple App Store’s top charts.
  3. In both stores, higher ranked apps tend to have an app store ratings count that far exceeds that of the average app.
  4. Ratings appear to matter more to the Google Play algorithm, especially as the Apple App Store top charts experience a much wider ratings distribution than that of Google Play’s top charts.
  5. The higher an app is rated, the less volatile its rankings are.
  6. The 100 highest ranked apps in either store are updated much more frequently than the average app, and apps with older current versions are correlated with lower ratings.
  7. An app’s update frequency is negatively correlated with Google Play’s ranking volatility but positively correlated with ranking volatility in the App Store. This likely due to how Apple weighs an app’s most recent ratings and reviews.
  8. The highest ranked Google Play apps receive, on average, 15.8x more ratings than the highest ranked App Store apps.
  9. In both stores, apps that fall under the 401–500 ranks receive, on average, 10–20% of the rating volume seen by apps in the top 100.
  10. Rating volume and, by extension, installs or MAUs, is perhaps the best indicator of ranks, with a 29–40% correlation between the two.

Revisiting our first (albeit oversimplified) guess at the app stores’ ranking algorithm gives us this loosely defined function:

Ranking = fn(Rating, Rating Count, Installs, Trends)

I’d now re-write the function into a formula by weighing each of these four factors, where a, b, c, & d are unknown multipliers, or weights:

Ranking = (Rating * a) + (Rating Count * b) + (Installs * c) + (Trends * d)

These five studies on ASO shed a little more light on these multipliers, showing Rating Count to have the strongest correlation with rank, followed closely by Installs, in either app store.

It’s with the other two factors—rating and trends—that the two stores show the greatest discrepancy. I’d hazard a guess to say that the App Store prioritizes growth trends over ratings, given the importance it places on an app’s current version and the wide distribution of ratings across the top charts. Google Play, on the other hand, seems to favor ratings, with an unwritten rule that apps just about have to have at least four stars to make the top 100 ranks.

Thus, we conclude our mad science with this final glimpse into what it takes to make the top charts in either store:

Weight of factors in the Apple App Store ranking algorithm

Rating Count > Installs > Trends > Rating

Weight of factors in the Google Play ranking algorithm

Rating Count > Installs > Rating > Trends


Again, we’re oversimplifying for the sake of keeping this post to a mere 3,000 words, but additional factors including keyword density and in-app engagement statistics continue to be strong indicators of ranks. They simply lie outside the scope of these studies.

I hope you found this deep-dive both helpful and interesting. Moving forward, I also hope to see ASOs conducting the same experiments that have brought SEO to the center stage, and encourage you to enhance or refute these findings with your own ASO mad science experiments.

Please share your thoughts in the comments below, and let’s deconstruct the ranking formula together, one experiment at a time.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How to optimize Joomla Website for search engines

How to optimize Joomla Website for search engines. Recorded by Onifade Azeez of www.olaaz.com.

Reblogged 4 years ago from www.youtube.com

Social Media & SEO Tips, Tricks, and Trends [RECORDED WEBINAR]

Social media is now an essential component of small business marketing, but it does not replace more “traditional” forms of online marketing for businesses, …

Reblogged 5 years ago from www.youtube.com

How to optimize Joomla Website Page Titles for SEO

How to optimize Joomla Website Page Titles for SEO Recorded by Onifade Azeez of www.olaaz.com.

Reblogged 5 years ago from www.youtube.com