Stop Ghost Spam in Google Analytics with One Filter

Posted by CarloSeo

The spam in Google Analytics (GA) is becoming a serious issue. Due to a deluge of referral spam from social buttons, adult sites, and many, many other sources, people are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.

The good news is, there is no need to panic. In this post, I’m going to focus on the most common mistakes people make when fighting spam in GA, and explain an efficient way to prevent it.

But first, let’s make sure we understand how spam works. A couple of months ago, Jared Gardner wrote an excellent article explaining what referral spam is, including its intended purpose. He also pointed out some great examples of referral spam.

Types of spam

The spam in Google Analytics can be categorized by two types: ghosts and crawlers.

Ghosts

The vast majority of spam is this type. They are called ghosts because they never access your site. It is important to keep this in mind, as it’s key to creating a more efficient solution for managing spam.

As unusual as it sounds, this type of spam doesn’t have any interaction with your site at all. You may wonder how that is possible since one of the main purposes of GA is to track visits to our sites.

They do it by using the Measurement Protocol, which allows people to send data directly to Google Analytics’ servers. Using this method, and probably randomly generated tracking codes (UA-XXXXX-1) as well, the spammers leave a “visit” with fake data, without even knowing who they are hitting.

Crawlers

This type of spam, the opposite to ghost spam, does access your site. As the name implies, these spam bots crawl your pages, ignoring rules like those found in robots.txt that are supposed to stop them from reading your site. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.

Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against this list might help you answer the question of whether or not it is spammy.

Most common mistakes made when dealing with spam in GA

I’ve been following this issue closely for the last few months. According to the comments people have made on my articles and conversations I’ve found in discussion forums, there are primarily three mistakes people make when dealing with spam in Google Analytics.

Mistake #1. Blocking ghost spam from the .htaccess file

One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.

For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won’t have any effect and will only add useless lines to your .htaccess file.

Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people think that they successfully blocked it from here when really it’s just a coincidence of timing.

Then when the spammers later return, they get worried because the solution is not working anymore, and they think the spammer somehow bypassed the barriers they set up.

The truth is, the .htaccess file can only effectively block crawlers such as buttons-for-website.com and a few others since these access your site. Most of the spam can’t be blocked using this method, so there is no other option than using filters to exclude them.

Mistake #2. Using the referral exclusion list to stop spam

Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in the way we want to for the spam. It has other purposes.

For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they’re redirected back to you website, and GA records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.

If you try to use the referral exclusion list to manage spam, however, the referral part will be stripped since there is no preexisting record. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with since. You will still have spam, and direct visits are harder to track.

Mistake #3. Worrying that bounce rate changes will affect rankings

When people see that the bounce rate changes drastically because of the spam, they start worrying about the impact that it will have on their rankings in the SERPs.

bounce.png

This is another mistake commonly made. With or without spam, Google doesn’t take into consideration Google Analytics metrics as a ranking factor. Here is an explanation about this from Matt Cutts, the former head of Google’s web spam team.

And if you think about it, Cutts’ explanation makes sense; because although many people have GA, not everyone uses it.

Assuming your site has been hacked

Another common concern when people see strange landing pages coming from spam on their reports is that they have been hacked.

landing page

The page that the spam shows on the reports doesn’t exist, and if you try to open it, you will get a 404 page. Your site hasn’t been compromised.

But you have to make sure the page doesn’t exist. Because there are cases (not spam) where some sites have a security breach and get injected with pages full of bad keywords to defame the website.

What should you worry about?

Now that we’ve discarded security issues and their effects on rankings, the only thing left to worry about is your data. The fake trail that the spam leaves behind pollutes your reports.

It might have greater or lesser impact depending on your site traffic, but everyone is susceptible to the spam.

Small and midsize sites are the most easily impacted – not only because a big part of their traffic can be spam, but also because usually these sites are self-managed and sometimes don’t have the support of an analyst or a webmaster.

Big sites with a lot of traffic can also be impacted by spam, and although the impact can be insignificant, invalid traffic means inaccurate reports no matter the size of the website. As an analyst, you should be able to explain what’s going on in even in the most granular reports.

You only need one filter to deal with ghost spam

Usually it is recommended to add the referral to an exclusion filter after it is spotted. Although this is useful for a quick action against the spam, it has three big disadvantages.

  • Making filters every week for every new spam detected is tedious and time-consuming, especially if you manage many sites. Plus, by the time you apply the filter, and it starts working, you already have some affected data.
  • Some of the spammers use direct visits along with the referrals.
  • These direct hits won’t be stopped by the filter so even if you are excluding the referral you will sill be receiving invalid traffic, which explains why some people have seen an unusual spike in direct traffic.

Luckily, there is a good way to prevent all these problems. Most of the spam (ghost) works by hitting GA’s random tracking-IDs, meaning the offender doesn’t really know who is the target, and for that reason either the hostname is not set or it uses a fake one. (See report below)

Ghost-Spam.png

You can see that they use some weird names or don’t even bother to set one. Although there are some known names in the list, these can be easily added by the spammer.

On the other hand, valid traffic will always use a real hostname. In most of the cases, this will be the domain. But it also can also result from paid services, translation services, or any other place where you’ve inserted GA tracking code.

Valid-Referral.png

Based on this, we can make a filter that will include only hits that use real hostnames. This will automatically exclude all hits from ghost spam, whether it shows up as a referral, keyword, or pageview; or even as a direct visit.

To create this filter, you will need to find the report of hostnames. Here’s how:

  1. Go to the Reporting tab in GA
  2. Click on Audience in the lefthand panel
  3. Expand Technology and select Network
  4. At the top of the report, click on Hostname

Valid-list

You will see a list of all hostnames, including the ones that the spam uses. Make a list of all the valid hostnames you find, as follows:

  • yourmaindomain.com
  • blog.yourmaindomain.com
  • es.yourmaindomain.com
  • payingservice.com
  • translatetool.com
  • anotheruseddomain.com

For small to medium sites, this list of hostnames will likely consist of the main domain and a couple of subdomains. After you are sure you got all of them, create a regular expression similar to this one:

yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com

You don’t need to put all of your subdomains in the regular expression. The main domain will match all of them. If you don’t have a view set up without filters, create one now.

Then create a Custom Filter.

Make sure you select INCLUDE, then select “Hostname” on the filter field, and copy your expression into the Filter Pattern box.

filter

You might want to verify the filter before saving to check that everything is okay. Once you’re ready, set it to save, and apply the filter to all the views you want (except the view without filters).

This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn’t require much maintenance. But it’s important that every time you add your tracking code to any service, you add it to the end of the filter.

Now you should only need to take care of the crawler spam. Since crawlers access your site, you can block them by adding these lines to the .htaccess file:

## STOP REFERRER SPAM 
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR] 
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] 
RewriteRule .* - [F]

It is important to note that this file is very sensitive, and misplacing a single character it it can bring down your entire site. Therefore, make sure you create a backup copy of your .htaccess file prior to editing it.

If you don’t feel comfortable messing around with your .htaccess file, you can alternatively make an expression with all the crawlers, then and add it to an exclude filter by Campaign Source.

Implement these combined solutions, and you will worry much less about spam contaminating your analytics data. This will have the added benefit of freeing up more time for you to spend actually analyze your valid data.

After stopping spam, you can also get clean reports from the historical data by using the same expressions in an Advance Segment to exclude all the spam.

Bonus resources to help you manage spam

If you still need more information to help you understand and deal with the spam on your GA reports, you can read my main article on the subject here: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/.

Additional information on how to stop spam can be found at these URLs:

In closing, I am eager to hear your ideas on this serious issue. Please share them in the comments below.

(Editor’s Note: All images featured in this post were created by the author.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Know What Your Audience Wants Before Investing in Content Creation and Marketing – Whiteboard Friday

Posted by randfish

Content marketing is an iterative process: We learn and improve by analyzing the success of the things we produce. That doesn’t mean, though, that we shouldn’t set ourselves up for that success in the first place, and the best way to do that is by knowing what our audiences want before we actually go through the effort to create it. In today’s Whiteboard Friday, Rand (along with his stick-figure friends Rainy Bill and Hailstorm Hal) explains how we can stack our own decks in our favor with that knowledge.

Know What Your Audience Wants – Whiteboard Friday_1

For reference, here’s a still of this week’s whiteboard!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. It’s 2015. It’s going to be a year where, again, many, many marketers engage in a ton of content investments and content marketing for a wide variety of purposes from SEO to driving traffic to growing their email newsletters and lists to earning links and attention and growing their social channels. Unfortunately, there’s a content marketing problem that we see over and over and over again, and that is that folks are making investments in content without knowing whether their audience is going to know and love and appreciate what they’re doing beforehand.

That kind of sucks because it adds a lot of risk to a process that is already risk intensive. You’re going to put a lot of work into the content that you’re creating. Well, hopefully you are. If you’re not, I don’t know how well it’s going to do. All of that work can be for naught.

Let me show you two examples. Over here I have Rainy Bill from WhatTheWeather.com, and here’s Hailstorm Hal from KingOfClimate.com. We’ll start with Rainy Bill’s story.

So Rainy Bill, he’s thinking to himself, “You know, I want to invest in some content marketing for WhatTheWeather.com.” He has an idea. He’s like, “You know, maybe I could make a chart of the T-shirts that meteorologists wear by season. I’ll look at all the TV meteorologists, all the Internet meteorologists, and I’ll look at the T-shirts that they wear. They all wear T-shirts, and I’ll make a big chart of them.”

You might think this is a ridiculous idea. I have seen worse. But Rainy Bill is thinking to himself, “Well, if I do this, it’s kind of ego bait. I get all the meteorologists involved. I’ll feature all their T-shirts, and, of course, all of them will see it and they’ll all link to me, talk about me, share it on their social media channels, email their friends with it. Oh check it out. Put it on their Facebook.”

He makes it. He’s got this beautiful chart showing different kinds of T-shirts that meteorologists are wearing over the seasons, and Bill’s just as happy as a clam. He can’t believe how beautiful that is until he tries to launch and promote it. Then it’s just sadness. He’s just crying tears.

What happened here is that no one actually cared what Bill had to say. No one cared about T-shirt patterns that are worn by meteorologists, and Bill didn’t actually realize this until he had already made the investment and started trying to do the promotion.

This might be a slightly ridiculous example, but I can’t tell you how many times I’ve seen exactly this story play out by marketer after marketer of content investments. They put something together that they hope will achieve their goal of reaching a new audience, of getting promoted, but it falls flat mostly because they had the idea before they talked to anyone else. Before they realized whether anyone else was interested, they went and built it.

That’s actually kind of a terrible idea. Unless you have your finger on the pulse of an industry, a field so incredibly well that you don’t need that process, I’m going to say that is the 1% of the 1% who can do this without going out and first talking to their audience and understanding.

Hailstorm Hal, from KingOfClimate, instead of having a great idea for a piece of content, Hailstorm Hal is going to start with the idea from which all content marketing springs, which is, “I want to make something people will really want and something they’ll really love.” Okay. They want it, and they’re going to love it when they see it and when they get it.

So Hailstorm Hal is going to go out and say, “Well, what are the weather watchers talking about? People who are active in this community, in this industry, the people who do the sharing and the amplification, who influence what the rest of us see, what are they talking about?”

So he goes onto this weather forum and hears someone complaining, “The weather in Cincinnati is totally unpredictable.” The reply, “Yeah, but it’s way more predictable than Seattle is.” “Nuh-uh, you liar.” From this, eureka, Hailstorm Hal has a great idea. “Wait a minute. What if I were to actually go and take all of this online commentary and turn it into something useful where these two commenters could prove to each other who’s correct and people would know for certain how much . . .”

It’s not just helpful to them. This is helpful to a huge, broad swath of society. How accurate are your meteorologists, on average, city by city? I don’t actually know, but I would be fascinated to know whether when I go to San Diego — I was there for the holidays to see my wife’s family — maybe the weather reports in San Diego are much more or much less accurate than what I’m used to here at home in Seattle.

So Hal’s going to put together this great map that’s got an illustration of different regions of the United States, and you can see that in the Midwest actually weather is more predictable than it is on the coast or less predictable than it is on the coast. That’s awesome. That’s terrific. This is going to work far, far better than anything that Hal could have come up with on his own without first understanding the industry.

Now the process and tips that I’m going to recommend here are not exhaustive. There are a lot more things in this. But if you follow these five, at least, I think you’re going to do much better with your content investment.

First off, even before you do this process, get to know the industry, the niche, or the community that you’re operating in. If Hal didn’t know where to find weather watchers, he might just search weather forum, click on the first link in Google, and be at some place that doesn’t really have a very serious investment from the community of people he’s trying to reach. Without understanding all of the sites and pages, without understanding who are the big influencers in the community on social media, without understanding what are the popular websites, what gets a lot of interaction and engagement and doesn’t, that’s going to be really tough for him to figure out.

So that’s why I would say you need to go out and learn about your industry before you make something for it. Incidentally, this is why it’s really tough to do this as a consultant and why if you are paying consultants to go and do this, you’re going to actually be paying quite a bit of money for this research time. This is going to be dozens of hours of research to understand the niche before you can effectively create content for it. That’s something where it isn’t just an on demand kind of thing.

Then from there you want to use the discussion forums, Q&A sites, social media, and blog comments to find topics and discussions that inspire questions, curiosity, and need. Some of that is going to be very blatant. Some of it is going to be much more latent, and you’re going to be drawing from both of those. Your job is to have insight and empathy, and that’s what a great marketer should be able to do when they’re researching these communities.

Number three, you want to validate that if you created something, (a) it would be unique, no one else has made it before, and (b) others would actually share it. You can do this very directly by reaching out and talking to people.

So Hal can go and say, “Hey, who’s this commenter right here? Let’s have a quick conversation. Would you like this?” If the answer is, “Yeah, not only would I like that, I would help share that. I would spread that. I would love to know the answer to this question.” Or no reply, or “Sounds interesting, let me know when you get it up.” There’s going to be a different variation.

You can go and use Twitter, Google+, and email to reach out directly to these people. Most of the time, if you’re finding commentary on these forums and in these places, there will be a way to reach them. I also have two tools I’m going to recommend, both for email. One is Conspire and the other is VoilaNorbert. VoilaNorbert.com is an email finding tool. I think it’s the best one out there right now, and Conspire is a great tool for seeing who you’re connected to that’s connected to people you might want to reach. When you’re trying to reach someone, those can be very helpful.

Number four, it tends to be the case that visual and/or interactive content is going to perform a lot better than text. So if Hal’s list had simply been a list of data — here are all the major U.S. regions and here’s how predictable and unpredictable their weather is — well, that might work okay. But this map, this visual is probably going to sail around the weather world much faster, much better, be picked up by news sources, be written about, be embedded in social media graphics, all that kind of stuff, far better than a mere chart would be.

Number five, remember that as you’re doing the creation, you need to align the audience goals with your business goals. So if KingOfClimate’s goal is to get people signing up for a weather tracking service on an email list, well great, you should have this and then say, “We can send you variability reports. We can tell you if things are getting more or less accurate,” and have an email call to action to get people to sign up to the newsletter. But you want to tie those business goals together.

The one thing I’d be careful of and this is a mistake that many, many folks who invest in content marketing make is that a lot of those benefits are going to be indirect and long term, meaning if the goal is that KingOfClimate.com is trying to sell professional meteorologists on a software subscription service, well, you know what? You’re probably not going to sell a whole lot with this. But you are going to get a lot more professional meteorologists who remember the name, KingOfClimate, and that brand memory is going to influence future purchase decisions, likely nudging conversation rates up a little bit.

It’s probably going to help with links. Links will lead to rankings. Rankings will lead to being higher up in search engines when professional meteorologists search for precisely, “I’m looking for weather tracking software or weather notification software.” So these kings of things are long term and indirect. You have to make sure you’re tying together all of the benefits of content marketing with your business goals that you might achieve.

I hope to see some phenomenal content here in 2015. I’m sure you guys are already working on some great stuff. Applying this can mean that you don’t have to be psychic. You just have to put in a little bit of elbow grease, and you can make things that will perform far better for your customers, for your community, and for your business.

All right, everyone. Look forward to the discussion, and we will see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]