Stop Ghost Spam in Google Analytics with One Filter

Posted by CarloSeo

The spam in Google Analytics (GA) is becoming a serious issue. Due to a deluge of referral spam from social buttons, adult sites, and many, many other sources, people are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.

The good news is, there is no need to panic. In this post, I’m going to focus on the most common mistakes people make when fighting spam in GA, and explain an efficient way to prevent it.

But first, let’s make sure we understand how spam works. A couple of months ago, Jared Gardner wrote an excellent article explaining what referral spam is, including its intended purpose. He also pointed out some great examples of referral spam.

Types of spam

The spam in Google Analytics can be categorized by two types: ghosts and crawlers.

Ghosts

The vast majority of spam is this type. They are called ghosts because they never access your site. It is important to keep this in mind, as it’s key to creating a more efficient solution for managing spam.

As unusual as it sounds, this type of spam doesn’t have any interaction with your site at all. You may wonder how that is possible since one of the main purposes of GA is to track visits to our sites.

They do it by using the Measurement Protocol, which allows people to send data directly to Google Analytics’ servers. Using this method, and probably randomly generated tracking codes (UA-XXXXX-1) as well, the spammers leave a “visit” with fake data, without even knowing who they are hitting.

Crawlers

This type of spam, the opposite to ghost spam, does access your site. As the name implies, these spam bots crawl your pages, ignoring rules like those found in robots.txt that are supposed to stop them from reading your site. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.

Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against this list might help you answer the question of whether or not it is spammy.

Most common mistakes made when dealing with spam in GA

I’ve been following this issue closely for the last few months. According to the comments people have made on my articles and conversations I’ve found in discussion forums, there are primarily three mistakes people make when dealing with spam in Google Analytics.

Mistake #1. Blocking ghost spam from the .htaccess file

One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.

For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won’t have any effect and will only add useless lines to your .htaccess file.

Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people think that they successfully blocked it from here when really it’s just a coincidence of timing.

Then when the spammers later return, they get worried because the solution is not working anymore, and they think the spammer somehow bypassed the barriers they set up.

The truth is, the .htaccess file can only effectively block crawlers such as buttons-for-website.com and a few others since these access your site. Most of the spam can’t be blocked using this method, so there is no other option than using filters to exclude them.

Mistake #2. Using the referral exclusion list to stop spam

Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in the way we want to for the spam. It has other purposes.

For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they’re redirected back to you website, and GA records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.

If you try to use the referral exclusion list to manage spam, however, the referral part will be stripped since there is no preexisting record. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with since. You will still have spam, and direct visits are harder to track.

Mistake #3. Worrying that bounce rate changes will affect rankings

When people see that the bounce rate changes drastically because of the spam, they start worrying about the impact that it will have on their rankings in the SERPs.

bounce.png

This is another mistake commonly made. With or without spam, Google doesn’t take into consideration Google Analytics metrics as a ranking factor. Here is an explanation about this from Matt Cutts, the former head of Google’s web spam team.

And if you think about it, Cutts’ explanation makes sense; because although many people have GA, not everyone uses it.

Assuming your site has been hacked

Another common concern when people see strange landing pages coming from spam on their reports is that they have been hacked.

landing page

The page that the spam shows on the reports doesn’t exist, and if you try to open it, you will get a 404 page. Your site hasn’t been compromised.

But you have to make sure the page doesn’t exist. Because there are cases (not spam) where some sites have a security breach and get injected with pages full of bad keywords to defame the website.

What should you worry about?

Now that we’ve discarded security issues and their effects on rankings, the only thing left to worry about is your data. The fake trail that the spam leaves behind pollutes your reports.

It might have greater or lesser impact depending on your site traffic, but everyone is susceptible to the spam.

Small and midsize sites are the most easily impacted – not only because a big part of their traffic can be spam, but also because usually these sites are self-managed and sometimes don’t have the support of an analyst or a webmaster.

Big sites with a lot of traffic can also be impacted by spam, and although the impact can be insignificant, invalid traffic means inaccurate reports no matter the size of the website. As an analyst, you should be able to explain what’s going on in even in the most granular reports.

You only need one filter to deal with ghost spam

Usually it is recommended to add the referral to an exclusion filter after it is spotted. Although this is useful for a quick action against the spam, it has three big disadvantages.

  • Making filters every week for every new spam detected is tedious and time-consuming, especially if you manage many sites. Plus, by the time you apply the filter, and it starts working, you already have some affected data.
  • Some of the spammers use direct visits along with the referrals.
  • These direct hits won’t be stopped by the filter so even if you are excluding the referral you will sill be receiving invalid traffic, which explains why some people have seen an unusual spike in direct traffic.

Luckily, there is a good way to prevent all these problems. Most of the spam (ghost) works by hitting GA’s random tracking-IDs, meaning the offender doesn’t really know who is the target, and for that reason either the hostname is not set or it uses a fake one. (See report below)

Ghost-Spam.png

You can see that they use some weird names or don’t even bother to set one. Although there are some known names in the list, these can be easily added by the spammer.

On the other hand, valid traffic will always use a real hostname. In most of the cases, this will be the domain. But it also can also result from paid services, translation services, or any other place where you’ve inserted GA tracking code.

Valid-Referral.png

Based on this, we can make a filter that will include only hits that use real hostnames. This will automatically exclude all hits from ghost spam, whether it shows up as a referral, keyword, or pageview; or even as a direct visit.

To create this filter, you will need to find the report of hostnames. Here’s how:

  1. Go to the Reporting tab in GA
  2. Click on Audience in the lefthand panel
  3. Expand Technology and select Network
  4. At the top of the report, click on Hostname

Valid-list

You will see a list of all hostnames, including the ones that the spam uses. Make a list of all the valid hostnames you find, as follows:

  • yourmaindomain.com
  • blog.yourmaindomain.com
  • es.yourmaindomain.com
  • payingservice.com
  • translatetool.com
  • anotheruseddomain.com

For small to medium sites, this list of hostnames will likely consist of the main domain and a couple of subdomains. After you are sure you got all of them, create a regular expression similar to this one:

yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com

You don’t need to put all of your subdomains in the regular expression. The main domain will match all of them. If you don’t have a view set up without filters, create one now.

Then create a Custom Filter.

Make sure you select INCLUDE, then select “Hostname” on the filter field, and copy your expression into the Filter Pattern box.

filter

You might want to verify the filter before saving to check that everything is okay. Once you’re ready, set it to save, and apply the filter to all the views you want (except the view without filters).

This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn’t require much maintenance. But it’s important that every time you add your tracking code to any service, you add it to the end of the filter.

Now you should only need to take care of the crawler spam. Since crawlers access your site, you can block them by adding these lines to the .htaccess file:

## STOP REFERRER SPAM 
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR] 
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] 
RewriteRule .* - [F]

It is important to note that this file is very sensitive, and misplacing a single character it it can bring down your entire site. Therefore, make sure you create a backup copy of your .htaccess file prior to editing it.

If you don’t feel comfortable messing around with your .htaccess file, you can alternatively make an expression with all the crawlers, then and add it to an exclude filter by Campaign Source.

Implement these combined solutions, and you will worry much less about spam contaminating your analytics data. This will have the added benefit of freeing up more time for you to spend actually analyze your valid data.

After stopping spam, you can also get clean reports from the historical data by using the same expressions in an Advance Segment to exclude all the spam.

Bonus resources to help you manage spam

If you still need more information to help you understand and deal with the spam on your GA reports, you can read my main article on the subject here: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/.

Additional information on how to stop spam can be found at these URLs:

In closing, I am eager to hear your ideas on this serious issue. Please share them in the comments below.

(Editor’s Note: All images featured in this post were created by the author.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking – Whiteboard Friday

Posted by randfish

There’s no doubt that quite a bit has changed about SEO, and that the field is far more integrated with other aspects of online marketing than it once was. In today’s Whiteboard Friday, Rand pushes back against the idea that effective modern SEO doesn’t require any technical expertise, outlining a fantastic list of technical elements that today’s SEOs need to know about in order to be truly effective.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week I’m going to do something unusual. I don’t usually point out these inconsistencies or sort of take issue with other folks’ content on the web, because I generally find that that’s not all that valuable and useful. But I’m going to make an exception here.

There is an article by Jayson DeMers, who I think might actually be here in Seattle — maybe he and I can hang out at some point — called “Why Modern SEO Requires Almost No Technical Expertise.” It was an article that got a shocking amount of traction and attention. On Facebook, it has thousands of shares. On LinkedIn, it did really well. On Twitter, it got a bunch of attention.

Some folks in the SEO world have already pointed out some issues around this. But because of the increasing popularity of this article, and because I think there’s, like, this hopefulness from worlds outside of kind of the hardcore SEO world that are looking to this piece and going, “Look, this is great. We don’t have to be technical. We don’t have to worry about technical things in order to do SEO.”

Look, I completely get the appeal of that. I did want to point out some of the reasons why this is not so accurate. At the same time, I don’t want to rain on Jayson, because I think that it’s very possible he’s writing an article for Entrepreneur, maybe he has sort of a commitment to them. Maybe he had no idea that this article was going to spark so much attention and investment. He does make some good points. I think it’s just really the title and then some of the messages inside there that I take strong issue with, and so I wanted to bring those up.

First off, some of the good points he did bring up.

One, he wisely says, “You don’t need to know how to code or to write and read algorithms in order to do SEO.” I totally agree with that. If today you’re looking at SEO and you’re thinking, “Well, am I going to get more into this subject? Am I going to try investing in SEO? But I don’t even know HTML and CSS yet.”

Those are good skills to have, and they will help you in SEO, but you don’t need them. Jayson’s totally right. You don’t have to have them, and you can learn and pick up some of these things, and do searches, watch some Whiteboard Fridays, check out some guides, and pick up a lot of that stuff later on as you need it in your career. SEO doesn’t have that hard requirement.

And secondly, he makes an intelligent point that we’ve made many times here at Moz, which is that, broadly speaking, a better user experience is well correlated with better rankings.

You make a great website that delivers great user experience, that provides the answers to searchers’ questions and gives them extraordinarily good content, way better than what’s out there already in the search results, generally speaking you’re going to see happy searchers, and that’s going to lead to higher rankings.

But not entirely. There are a lot of other elements that go in here. So I’ll bring up some frustrating points around the piece as well.

First off, there’s no acknowledgment — and I find this a little disturbing — that the ability to read and write code, or even HTML and CSS, which I think are the basic place to start, is helpful or can take your SEO efforts to the next level. I think both of those things are true.

So being able to look at a web page, view source on it, or pull up Firebug in Firefox or something and diagnose what’s going on and then go, “Oh, that’s why Google is not able to see this content. That’s why we’re not ranking for this keyword or term, or why even when I enter this exact sentence in quotes into Google, which is on our page, this is why it’s not bringing it up. It’s because it’s loading it after the page from a remote file that Google can’t access.” These are technical things, and being able to see how that code is built, how it’s structured, and what’s going on there, very, very helpful.

Some coding knowledge also can take your SEO efforts even further. I mean, so many times, SEOs are stymied by the conversations that we have with our programmers and our developers and the technical staff on our teams. When we can have those conversations intelligently, because at least we understand the principles of how an if-then statement works, or what software engineering best practices are being used, or they can upload something into a GitHub repository, and we can take a look at it there, that kind of stuff is really helpful.

Secondly, I don’t like that the article overly reduces all of this information that we have about what we’ve learned about Google. So he mentions two sources. One is things that Google tells us, and others are SEO experiments. I think both of those are true. Although I’d add that there’s sort of a sixth sense of knowledge that we gain over time from looking at many, many search results and kind of having this feel for why things rank, and what might be wrong with a site, and getting really good at that using tools and data as well. There are people who can look at Open Site Explorer and then go, “Aha, I bet this is going to happen.” They can look, and 90% of the time they’re right.

So he boils this down to, one, write quality content, and two, reduce your bounce rate. Neither of those things are wrong. You should write quality content, although I’d argue there are lots of other forms of quality content that aren’t necessarily written — video, images and graphics, podcasts, lots of other stuff.

And secondly, that just doing those two things is not always enough. So you can see, like many, many folks look and go, “I have quality content. It has a low bounce rate. How come I don’t rank better?” Well, your competitors, they’re also going to have quality content with a low bounce rate. That’s not a very high bar.

Also, frustratingly, this really gets in my craw. I don’t think “write quality content” means anything. You tell me. When you hear that, to me that is a totally non-actionable, non-useful phrase that’s a piece of advice that is so generic as to be discardable. So I really wish that there was more substance behind that.

The article also makes, in my opinion, the totally inaccurate claim that modern SEO really is reduced to “the happier your users are when they visit your site, the higher you’re going to rank.”

Wow. Okay. Again, I think broadly these things are correlated. User happiness and rank is broadly correlated, but it’s not a one to one. This is not like a, “Oh, well, that’s a 1.0 correlation.”

I would guess that the correlation is probably closer to like the page authority range. I bet it’s like 0.35 or something correlation. If you were to actually measure this broadly across the web and say like, “Hey, were you happier with result one, two, three, four, or five,” the ordering would not be perfect at all. It probably wouldn’t even be close.

There’s a ton of reasons why sometimes someone who ranks on Page 2 or Page 3 or doesn’t rank at all for a query is doing a better piece of content than the person who does rank well or ranks on Page 1, Position 1.

Then the article suggests five and sort of a half steps to successful modern SEO, which I think is a really incomplete list. So Jayson gives us;

  • Good on-site experience
  • Writing good content
  • Getting others to acknowledge you as an authority
  • Rising in social popularity
  • Earning local relevance
  • Dealing with modern CMS systems (which he notes most modern CMS systems are SEO-friendly)

The thing is there’s nothing actually wrong with any of these. They’re all, generally speaking, correct, either directly or indirectly related to SEO. The one about local relevance, I have some issue with, because he doesn’t note that there’s a separate algorithm for sort of how local SEO is done and how Google ranks local sites in maps and in their local search results. Also not noted is that rising in social popularity won’t necessarily directly help your SEO, although it can have indirect and positive benefits.

I feel like this list is super incomplete. Okay, I brainstormed just off the top of my head in the 10 minutes before we filmed this video a list. The list was so long that, as you can see, I filled up the whole whiteboard and then didn’t have any more room. I’m not going to bother to erase and go try and be absolutely complete.

But there’s a huge, huge number of things that are important, critically important for technical SEO. If you don’t know how to do these things, you are sunk in many cases. You can’t be an effective SEO analyst, or consultant, or in-house team member, because you simply can’t diagnose the potential problems, rectify those potential problems, identify strategies that your competitors are using, be able to diagnose a traffic gain or loss. You have to have these skills in order to do that.

I’ll run through these quickly, but really the idea is just that this list is so huge and so long that I think it’s very, very, very wrong to say technical SEO is behind us. I almost feel like the opposite is true.

We have to be able to understand things like;

  • Content rendering and indexability
  • Crawl structure, internal links, JavaScript, Ajax. If something’s post-loading after the page and Google’s not able to index it, or there are links that are accessible via JavaScript or Ajax, maybe Google can’t necessarily see those or isn’t crawling them as effectively, or is crawling them, but isn’t assigning them as much link weight as they might be assigning other stuff, and you’ve made it tough to link to them externally, and so they can’t crawl it.
  • Disabling crawling and/or indexing of thin or incomplete or non-search-targeted content. We have a bunch of search results pages. Should we use rel=prev/next? Should we robots.txt those out? Should we disallow from crawling with meta robots? Should we rel=canonical them to other pages? Should we exclude them via the protocols inside Google Webmaster Tools, which is now Google Search Console?
  • Managing redirects, domain migrations, content updates. A new piece of content comes out, replacing an old piece of content, what do we do with that old piece of content? What’s the best practice? It varies by different things. We have a whole Whiteboard Friday about the different things that you could do with that. What about a big redirect or a domain migration? You buy another company and you’re redirecting their site to your site. You have to understand things about subdomain structures versus subfolders, which, again, we’ve done another Whiteboard Friday about that.
  • Proper error codes, downtime procedures, and not found pages. If your 404 pages turn out to all be 200 pages, well, now you’ve made a big error there, and Google could be crawling tons of 404 pages that they think are real pages, because you’ve made it a status code 200, or you’ve used a 404 code when you should have used a 410, which is a permanently removed, to be able to get it completely out of the indexes, as opposed to having Google revisit it and keep it in the index.

Downtime procedures. So there’s specifically a… I can’t even remember. It’s a 5xx code that you can use. Maybe it was a 503 or something that you can use that’s like, “Revisit later. We’re having some downtime right now.” Google urges you to use that specific code rather than using a 404, which tells them, “This page is now an error.”

Disney had that problem a while ago, if you guys remember, where they 404ed all their pages during an hour of downtime, and then their homepage, when you searched for Disney World, was, like, “Not found.” Oh, jeez, Disney World, not so good.

  • International and multi-language targeting issues. I won’t go into that. But you have to know the protocols there. Duplicate content, syndication, scrapers. How do we handle all that? Somebody else wants to take our content, put it on their site, what should we do? Someone’s scraping our content. What can we do? We have duplicate content on our own site. What should we do?
  • Diagnosing traffic drops via analytics and metrics. Being able to look at a rankings report, being able to look at analytics connecting those up and trying to see: Why did we go up or down? Did we have less pages being indexed, more pages being indexed, more pages getting traffic less, more keywords less?
  • Understanding advanced search parameters. Today, just today, I was checking out the related parameter in Google, which is fascinating for most sites. Well, for Moz, weirdly, related:oursite.com shows nothing. But for virtually every other sit, well, most other sites on the web, it does show some really interesting data, and you can see how Google is connecting up, essentially, intentions and topics from different sites and pages, which can be fascinating, could expose opportunities for links, could expose understanding of how they view your site versus your competition or who they think your competition is.

Then there are tons of parameters, like in URL and in anchor, and da, da, da, da. In anchor doesn’t work anymore, never mind about that one.

I have to go faster, because we’re just going to run out of these. Like, come on. Interpreting and leveraging data in Google Search Console. If you don’t know how to use that, Google could be telling you, you have all sorts of errors, and you don’t know what they are.

  • Leveraging topic modeling and extraction. Using all these cool tools that are coming out for better keyword research and better on-page targeting. I talked about a couple of those at MozCon, like MonkeyLearn. There’s the new Moz Context API, which will be coming out soon, around that. There’s the Alchemy API, which a lot of folks really like and use.
  • Identifying and extracting opportunities based on site crawls. You run a Screaming Frog crawl on your site and you’re going, “Oh, here’s all these problems and issues.” If you don’t have these technical skills, you can’t diagnose that. You can’t figure out what’s wrong. You can’t figure out what needs fixing, what needs addressing.
  • Using rich snippet format to stand out in the SERPs. This is just getting a better click-through rate, which can seriously help your site and obviously your traffic.
  • Applying Google-supported protocols like rel=canonical, meta description, rel=prev/next, hreflang, robots.txt, meta robots, x robots, NOODP, XML sitemaps, rel=nofollow. The list goes on and on and on. If you’re not technical, you don’t know what those are, you think you just need to write good content and lower your bounce rate, it’s not going to work.
  • Using APIs from services like AdWords or MozScape, or hrefs from Majestic, or SEM refs from SearchScape or Alchemy API. Those APIs can have powerful things that they can do for your site. There are some powerful problems they could help you solve if you know how to use them. It’s actually not that hard to write something, even inside a Google Doc or Excel, to pull from an API and get some data in there. There’s a bunch of good tutorials out there. Richard Baxter has one, Annie Cushing has one, I think Distilled has some. So really cool stuff there.
  • Diagnosing page load speed issues, which goes right to what Jayson was talking about. You need that fast-loading page. Well, if you don’t have any technical skills, you can’t figure out why your page might not be loading quickly.
  • Diagnosing mobile friendliness issues
  • Advising app developers on the new protocols around App deep linking, so that you can get the content from your mobile apps into the web search results on mobile devices. Awesome. Super powerful. Potentially crazy powerful, as mobile search is becoming bigger than desktop.

Okay, I’m going to take a deep breath and relax. I don’t know Jayson’s intention, and in fact, if he were in this room, he’d be like, “No, I totally agree with all those things. I wrote the article in a rush. I had no idea it was going to be big. I was just trying to make the broader points around you don’t have to be a coder in order to do SEO.” That’s completely fine.

So I’m not going to try and rain criticism down on him. But I think if you’re reading that article, or you’re seeing it in your feed, or your clients are, or your boss is, or other folks are in your world, maybe you can point them to this Whiteboard Friday and let them know, no, that’s not quite right. There’s a ton of technical SEO that is required in 2015 and will be for years to come, I think, that SEOs have to have in order to be effective at their jobs.

All right, everyone. Look forward to some great comments, and we’ll see you again next time for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

The Linkbait Bump: How Viral Content Creates Long-Term Lift in Organic Traffic – Whiteboard Friday

Posted by randfish

A single fantastic (or “10x”) piece of content can lift a site’s traffic curves long beyond the popularity of that one piece. In today’s Whiteboard Friday, Rand talks about why those curves settle into a “new normal,” and how you can go about creating the content that drives that change.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re chatting about the linkbait bump, classic phrase in the SEO world and almost a little dated. I think today we’re talking a little bit more about viral content and how high-quality content, content that really is the cornerstone of a brand or a website’s content can be an incredible and powerful driver of traffic, not just when it initially launches but over time.

So let’s take a look.

This is a classic linkbait bump, viral content bump analytics chart. I’m seeing over here my traffic and over here the different months of the year. You know, January, February, March, like I’m under a thousand. Maybe I’m at 500 visits or something, and then I have this big piece of viral content. It performs outstandingly well from a relative standpoint for my site. It gets 10,000 or more visits, drives a ton more people to my site, and then what happens is that that traffic falls back down. But the new normal down here, new normal is higher than the old normal was. So the new normal might be at 1,000, 1,500 or 2,000 visits whereas before I was at 500.

Why does this happen?

A lot of folks see an analytics chart like this, see examples of content that’s done this for websites, and they want to know: Why does this happen and how can I replicate that effect? The reasons why are it sort of feeds back into that viral loop or the flywheel, which we’ve talked about in previous Whiteboard Fridays, where essentially you start with a piece of content. That content does well, and then you have things like more social followers on your brand’s accounts. So now next time you go to amplify content or share content socially, you’re reaching more potential people. You have a bigger audience. You have more people who share your content because they’ve seen that that content performs well for them in social. So they want to find other content from you that might help their social accounts perform well.

You see more RSS and email subscribers because people see your interesting content and go, “Hey, I want to see when these guys produce something else.” You see more branded search traffic because people are looking specifically for content from you, not necessarily just around this viral piece, although that’s often a big part of it, but around other pieces as well, especially if you do a good job of exposing them to that additional content. You get more bookmark and type in traffic, more searchers biased by personalization because they’ve already visited your site. So now when they search and they’re logged into their accounts, they’re going to see your site ranking higher than they normally would otherwise, and you get an organic SEO lift from all the links and shares and engagement.

So there’s a ton of different factors that feed into this, and you kind of want to hit all of these things. If you have a piece of content that gets a lot of shares, a lot of links, but then doesn’t promote engagement, doesn’t get more people signing up, doesn’t get more people searching for your brand or searching for that content specifically, then it’s not going to have the same impact. Your traffic might fall further and more quickly.

How do you achieve this?

How do we get content that’s going to do this? Well, we’re going to talk through a number of things that we’ve talked about previously on Whiteboard Friday. But there are some additional ones as well. This isn’t just creating good content or creating high quality content, it’s creating a particular kind of content. So for this what you want is a deep understanding, not necessarily of what your standard users or standard customers are interested in, but a deep understanding of what influencers in your niche will share and promote and why they do that.

This often means that you follow a lot of sharers and influencers in your field, and you understand, hey, they’re all sharing X piece of content. Why? Oh, because it does this, because it makes them look good, because it helps their authority in the field, because it provides a lot of value to their followers, because they know it’s going to get a lot of retweets and shares and traffic. Whatever that because is, you have to have a deep understanding of it in order to have success with viral kinds of content.

Next, you want to have empathy for users and what will give them the best possible experience. So if you know, for example, that a lot of people are coming on mobile and are going to be sharing on mobile, which is true of almost all viral content today, FYI, you need to be providing a great mobile and desktop experience. Oftentimes that mobile experience has to be different, not just responsive design, but actually a different format, a different way of being able to scroll through or watch or see or experience that content.

There are some good examples out there of content that does that. It makes a very different user experience based on the browser or the device you’re using.

You also need to be aware of what will turn them off. So promotional messages, pop-ups, trying to sell to them, oftentimes that diminishes user experience. It means that content that could have been more viral, that could have gotten more shares won’t.

Unique value and attributes that separate your content from everything else in the field. So if there’s like ABCD and whoa, what’s that? That’s very unique. That stands out from the crowd. That provides a different form of value in a different way than what everyone else is doing. That uniqueness is often a big reason why content spreads virally, why it gets more shared than just the normal stuff.

I’ve talk about this a number of times, but content that’s 10X better than what the competition provides. So unique value from the competition, but also quality that is not just a step up, but 10X better, massively, massively better than what else you can get out there. That makes it unique enough. That makes it stand out from the crowd, and that’s a very hard thing to do, but that’s why this is so rare and so valuable.

This is a critical one, and I think one that, I’ll just say, many organizations fail at. That is the freedom and support to fail many times, to try to create these types of effects, to have this impact many times before you hit on a success. A lot of managers and clients and teams and execs just don’t give marketing teams and content teams the freedom to say, “Yeah, you know what? You spent a month and developer resources and designer resources and spent some money to go do some research and contracted with this third party, and it wasn’t a hit. It didn’t work. We didn’t get the viral content bump. It just kind of did okay. You know what? We believe in you. You’ve got a lot of chances. You should try this another 9 or 10 times before we throw it out. We really want to have a success here.”

That is something that very few teams invest in. The powerful thing is because so few people are willing to invest that way, the ones that do, the ones that believe in this, the ones that invest long term, the ones that are willing to take those failures are going to have a much better shot at success, and they can stand out from the crowd. They can get these bumps. It’s powerful.

Not a requirement, but it really, really helps to have a strong engaged community, either on your site and around your brand, or at least in your niche and your topic area that will help, that wants to see you, your brand, your content succeed. If you’re in a space that has no community, I would work on building one, even if it’s very small. We’re not talking about building a community of thousands or tens of thousands. A community of 100 people, a community of 50 people even can be powerful enough to help content get that catalyst, that first bump that’ll boost it into viral potential.

Then finally, for this type of content, you need to have a logical and not overly promotional match between your brand and the content itself. You can see many sites in what I call sketchy niches. So like a criminal law site or a casino site or a pharmaceutical site that’s offering like an interactive musical experience widget, and you’re like, “Why in the world is this brand promoting this content? Why did they even make it? How does that match up with what they do? Oh, it’s clearly just intentionally promotional.”

Look, many of these brands go out there and they say, “Hey, the average web user doesn’t know and doesn’t care.” I agree. But the average web user is not an influencer. Influencers know. Well, they’re very, very suspicious of why content is being produced and promoted, and they’re very skeptical of promoting content that they don’t think is altruistic. So this kills a lot of content for brands that try and invest in it when there’s no match. So I think you really need that.

Now, when you do these linkbait bump kinds of things, I would strongly recommend that you follow up, that you consider the quality of the content that you’re producing. Thereafter, that you invest in reproducing these resources, keeping those resources updated, and that you don’t simply give up on content production after this. However, if you’re a small business site, a small or medium business, you might think about only doing one or two of these a year. If you are a heavy content player, you’re doing a lot of content marketing, content marketing is how you’re investing in web traffic, I’d probably be considering these weekly or monthly at the least.

All right, everyone. Look forward to your experiences with the linkbait bump, and I will see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

How to Rid Your Website of Six Common Google Analytics Headaches

Posted by amandaecking

I’ve been in and out of Google Analytics (GA) for the past five or so years agency-side. I’ve seen three different code libraries, dozens of new different features and reports roll out, IP addresses stop being reported, and keywords not-so-subtly phased out of the free platform.

Analytics has been a focus of mine for the past year or so—mainly, making sure clients get their data right. Right now, our new focus is closed loop tracking, but that’s a topic for another day. If you’re using Google Analytics, and only Google Analytics for the majority of your website stats, or it’s your primary vehicle for analysis, you need to make sure it’s accurate.

Not having data pulling in or reporting properly is like building a house on a shaky foundation: It doesn’t end well. Usually there are tears.

For some reason, a lot of people, including many of my clients, assume everything is tracking properly in Google Analytics… because Google. But it’s not Google who sets up your analytics. People do that. And people are prone to make mistakes.

I’m going to go through six scenarios where issues are commonly encountered with Google Analytics.

I’ll outline the remedy for each issue, and in the process, show you how to move forward with a diagnosis or resolution.

1. Self-referrals

This is probably one of the areas we’re all familiar with. If you’re seeing a lot of traffic from your own domain, there’s likely a problem somewhere—or you need to extend the default session length in Google Analytics. (For example, if you have a lot of long videos or music clips and don’t use event tracking; a website like TEDx or SoundCloud would be a good equivalent.)

Typically one of the first things I’ll do to help diagnose the problem is include an advanced filter to show the full referrer string. You do this by creating a filter, as shown below:

Filter Type: Custom filter > Advanced
Field A: Hostname
Extract A: (.*)
Field B: Request URI
Extract B: (.*)
Output To: Request URI
Constructor: $A1$B1

You’ll then start seeing the subdomains pulling in. Experience has shown me that if you have a separate subdomain hosted in another location (say, if you work with a separate company and they host and run your mobile site or your shopping cart), it gets treated by Google Analytics as a separate domain. Thus, you ‘ll need to implement cross domain tracking. This way, you can narrow down whether or not it’s one particular subdomain that’s creating the self-referrals.

In this example below, we can see all the revenue is being reported to the booking engine (which ended up being cross domain issues) and their own site is the fourth largest traffic source:

I’ll also a good idea to check the browser and device reports to start narrowing down whether the issue is specific to a particular element. If it’s not, keep digging. Look at pages pulling the self-referrals and go through the code with a fine-tooth comb, drilling down as much as you can.

2. Unusually low bounce rate

If you have a crazy-low bounce rate, it could be too good to be true. Unfortunately. An unusually low bounce rate could (and probably does) mean that at least on some pages of your website have the same Google Analytics tracking code installed twice.

Take a look at your source code, or use Google Tag Assistant (though it does have known bugs) to see if you’ve got GA tracking code installed twice.

While I tell clients having Google Analytics installed on the same page can lead to double the pageviews, I’ve not actually encountered that—I usually just say it to scare them into removing the duplicate implementation more quickly. Don’t tell on me.

3. Iframes anywhere

I’ve heard directly from Google engineers and Google Analytics evangelists that Google Analytics does not play well with iframes, and that it will never will play nice with this dinosaur technology.

If you track the iframe, you inflate your pageviews, plus you still aren’t tracking everything with 100% clarity.

If you don’t track across iframes, you lose the source/medium attribution and everything becomes a self-referral.

Damned if you do; damned if you don’t.

My advice: Stop using iframes. They’re Netscape-era technology anyway, with rainbow marquees and Comic Sans on top. Interestingly, and unfortunately, a number of booking engines (for hotels) and third-party carts (for ecommerce) still use iframes.

If you have any clients in those verticals, or if you’re in the vertical yourself, check with your provider to see if they use iframes. Or you can check for yourself, by right-clicking as close as you can to the actual booking element:

iframe-booking.png

There is no neat and tidy way to address iframes with Google Analytics, and usually iframes are not the only complicated element of setup you’ll encounter. I spent eight months dealing with a website on a subfolder, which used iframes and had a cross domain booking system, and the best visibility I was able to get was about 80% on a good day.

Typically, I’d approach diagnosing iframes (if, for some reason, I had absolutely no access to viewing a website or talking to the techs) similarly to diagnosing self-referrals, as self-referrals are one of the biggest symptoms of iframe use.

4. Massive traffic jumps

Massive jumps in traffic don’t typically just happen. (Unless, maybe, you’re Geraldine.) There’s always an explanation—a new campaign launched, you just turned on paid ads for the first time, you’re using content amplification platforms, you’re getting a ton of referrals from that recent press in The New York Times. And if you think it just happened, it’s probably a technical glitch.

I’ve seen everything from inflated pageviews result from including tracking on iframes and unnecessary implementation of virtual pageviews, to not realizing the tracking code was installed on other microsites for the same property. Oops.

Usually I’ve seen this happen when the tracking code was somewhere it shouldn’t be, so if you’re investigating a situation of this nature, first confirm the Google Analytics code is only in the places it needs to be.Tools like Google Tag Assistant and Screaming Frog can be your BFFs in helping you figure this out.

Also, I suggest bribing the IT department with sugar (or booze) to see if they’ve changed anything lately.

5. Cross-domain tracking

I wish cross-domain tracking with Google Analytics out of the box didn’t require any additional setup. But it does.

If you don’t have it set up properly, things break down quickly, and can be quite difficult to untangle.

The older the GA library you’re using, the harder it is. The easiest setup, by far, is Google Tag Manager with Universal Analytics. Hard-coded universal analytics is a bit more difficult because you have to implement autoLink manually and decorate forms, if you’re using them (and you probably are). Beyond that, rather than try and deal with it, I say update your Google Analytics code. Then we can talk.

Where I’ve seen the most murkiness with tracking is when parts of cross domain tracking are implemented, but not all. For some reason, if allowLinker isn’t included, or you forget to decorate all the forms, the cookies aren’t passed between domains.

The absolute first place I would start with this would be confirming the cookies are all passing properly at all the right points, forms, links, and smoke signals. I’ll usually use a combination of the Real Time report in Google Analytics, Google Tag Assistant, and GA debug to start testing this. Any debug tool you use will mean you’re playing in the console, so get friendly with it.

6. Internal use of UTM strings

I’ve saved the best for last. Internal use of campaign tagging. We may think, oh, I use Google to tag my campaigns externally, and we’ve got this new promotion on site which we’re using a banner ad for. That’s a campaign. Why don’t I tag it with a UTM string?

Step away from the keyboard now. Please.

When you tag internal links with UTM strings, you override the original source/medium. So that visitor who came in through your paid ad and then who clicks on the campaign banner has now been manually tagged. You lose the ability to track that they came through on the ad the moment they click on the tagged internal link. Their source and medium is now your internal campaign, not that paid ad you’re spending gobs of money on and have to justify to your manager. See the problem?

I’ve seen at least three pretty spectacular instances of this in the past year, and a number of smaller instances of it. Annie Cushing also talks about the evils of internal UTM tags and the odd prevalence of it. (Oh, and if you haven’t explored her blog, and the amazing spreadsheets she shares, please do.)

One clothing company I worked with tagged all of their homepage offers with UTM strings, which resulted in the loss of visibility for one-third of their audience: One million visits over the course of a year, and $2.1 million in lost revenue.

Let me say that again. One million visits, and $2.1 million. That couldn’t be attributed to an external source/campaign/spend.

Another client I audited included campaign tagging on nearly every navigational element on their website. It still gives me nightmares.

If you want to see if you have any internal UTM strings, head straight to the Campaigns report in Acquisition in Google Analytics, and look for anything like “home” or “navigation” or any language you may use internally to refer to your website structure.

And if you want to see how users are moving through your website, go to the Flow reports. Or if you really, really, really want to know how many people click on that sidebar link, use event tracking. But please, for the love of all things holy (and to keep us analytics lovers from throwing our computers across the room), stop using UTM tagging on your internal links.

Now breathe and smile

Odds are, your Google Analytics setup is fine. If you are seeing any of these issues, though, you have somewhere to start in diagnosing and addressing the data.

We’ve looked at six of the most common points of friction I’ve encountered with Google Analytics and how to start investigating them: self-referrals, bounce rate, iframes, traffic jumps, cross domain tracking and internal campaign tagging.

What common data integrity issues have you encountered with Google Analytics? What are your favorite tools to investigate?

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Why We Can’t Do Keyword Research Like It’s 2010 – Whiteboard Friday

Posted by randfish

Keyword Research is a very different field than it was just five years ago, and if we don’t keep up with the times we might end up doing more harm than good. From the research itself to the selection and targeting process, in today’s Whiteboard Friday Rand explains what has changed and what we all need to do to conduct effective keyword research today.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

What do we need to change to keep up with the changing world of keyword research?

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat a little bit about keyword research, why it’s changed from the last five, six years and what we need to do differently now that things have changed. So I want to talk about changing up not just the research but also the selection and targeting process.

There are three big areas that I’ll cover here. There’s lots more in-depth stuff, but I think we should start with these three.

1) The Adwords keyword tool hides data!

This is where almost all of us in the SEO world start and oftentimes end with our keyword research. We go to AdWords Keyword Tool, what used to be the external keyword tool and now is inside AdWords Ad Planner. We go inside that tool, and we look at the volume that’s reported and we sort of record that as, well, it’s not good, but it’s the best we’re going to do.

However, I think there are a few things to consider here. First off, that tool is hiding data. What I mean by that is not that they’re not telling the truth, but they’re not telling the whole truth. They’re not telling nothing but the truth, because those rounded off numbers that you always see, you know that those are inaccurate. Anytime you’ve bought keywords, you’ve seen that the impression count never matches the count that you see in the AdWords tool. It’s not usually massively off, but it’s often off by a good degree, and the only thing it’s great for is telling relative volume from one from another.

But because AdWords hides data essentially by saying like, “Hey, you’re going to type in . . .” Let’s say I’m going to type in “college tuition,” and Google knows that a lot of people search for how to reduce college tuition, but that doesn’t come up in the suggestions because it’s not a commercial term, or they don’t think that an advertiser who bids on that is going to do particularly well and so they don’t show it in there. I’m giving an example. They might indeed show that one.

But because that data is hidden, we need to go deeper. We need to go beyond and look at things like Google Suggest and related searches, which are down at the bottom. We need to start conducting customer interviews and staff interviews, which hopefully has always been part of your brainstorming process but really needs to be now. Then you can apply that to AdWords. You can apply that to suggest and related.

The beautiful thing is once you get these tools from places like visiting forums or communities, discussion boards and seeing what terms and phrases people are using, you can collect all this stuff up, plug it back into AdWords, and now they will tell you how much volume they’ve got. So you take that how to lower college tuition term, you plug it into AdWords, they will show you a number, a non-zero number. They were just hiding it in the suggestions because they thought, “Hey, you probably don’t want to bid on that. That won’t bring you a good ROI.” So you’ve got to be careful with that, especially when it comes to SEO kinds of keyword research.

2) Building separate pages for each term or phrase doesn’t make sense

It used to be the case that we built separate pages for every single term and phrase that was in there, because we wanted to have the maximum keyword targeting that we could. So it didn’t matter to us that college scholarship and university scholarships were essentially people looking for exactly the same thing, just using different terminology. We would make one page for one and one page for the other. That’s not the case anymore.

Today, we need to group by the same searcher intent. If two searchers are searching for two different terms or phrases but both of them have exactly the same intent, they want the same information, they’re looking for the same answers, their query is going to be resolved by the same content, we want one page to serve those, and that’s changed up a little bit of how we’ve done keyword research and how we do selection and targeting as well.

3) Build your keyword consideration and prioritization spreadsheet with the right metrics

Everybody’s got an Excel version of this, because I think there’s just no awesome tool out there that everyone loves yet that kind of solves this problem for us, and Excel is very, very flexible. So we go into Excel, we put in our keyword, the volume, and then a lot of times we almost stop there. We did keyword volume and then like value to the business and then we prioritize.

What are all these new columns you’re showing me, Rand? Well, here I think is how sophisticated, modern SEOs that I’m seeing in the more advanced agencies, the more advanced in-house practitioners, this is what I’m seeing them add to the keyword process.

Difficulty

A lot of folks have done this, but difficulty helps us say, “Hey, this has a lot of volume, but it’s going to be tremendously hard to rank.”

The difficulty score that Moz uses and attempts to calculate is a weighted average of the top 10 domain authorities. It also uses page authority, so it’s kind of a weighted stack out of the two. If you’re seeing very, very challenging pages, very challenging domains to get in there, it’s going to be super hard to rank against them. The difficulty is high. For all of these ones it’s going to be high because college and university terms are just incredibly lucrative.

That difficulty can help bias you against chasing after terms and phrases for which you are very unlikely to rank for at least early on. If you feel like, “Hey, I already have a powerful domain. I can rank for everything I want. I am the thousand pound gorilla in my space,” great. Go after the difficulty of your choice, but this helps prioritize.

Opportunity

This is actually very rarely used, but I think sophisticated marketers are using it extremely intelligently. Essentially what they’re saying is, “Hey, if you look at a set of search results, sometimes there are two or three ads at the top instead of just the ones on the sidebar, and that’s biasing some of the click-through rate curve.” Sometimes there’s an instant answer or a Knowledge Graph or a news box or images or video, or all these kinds of things that search results can be marked up with, that are not just the classic 10 web results. Unfortunately, if you’re building a spreadsheet like this and treating every single search result like it’s just 10 blue links, well you’re going to lose out. You’re missing the potential opportunity and the opportunity cost that comes with ads at the top or all of these kinds of features that will bias the click-through rate curve.

So what I’ve seen some really smart marketers do is essentially build some kind of a framework to say, “Hey, you know what? When we see that there’s a top ad and an instant answer, we’re saying the opportunity if I was ranking number 1 is not 10 out of 10. I don’t expect to get whatever the average traffic for the number 1 position is. I expect to get something considerably less than that. Maybe something around 60% of that, because of this instant answer and these top ads.” So I’m going to mark this opportunity as a 6 out of 10.

There are 2 top ads here, so I’m giving this a 7 out of 10. This has two top ads and then it has a news block below the first position. So again, I’m going to reduce that click-through rate. I think that’s going down to a 6 out of 10.

You can get more and less scientific and specific with this. Click-through rate curves are imperfect by nature because we truly can’t measure exactly how those things change. However, I think smart marketers can make some good assumptions from general click-through rate data, which there are several resources out there on that to build a model like this and then include it in their keyword research.

This does mean that you have to run a query for every keyword you’re thinking about, but you should be doing that anyway. You want to get a good look at who’s ranking in those search results and what kind of content they’re building . If you’re running a keyword difficulty tool, you are already getting something like that.

Business value

This is a classic one. Business value is essentially saying, “What’s it worth to us if visitors come through with this search term?” You can get that from bidding through AdWords. That’s the most sort of scientific, mathematically sound way to get it. Then, of course, you can also get it through your own intuition. It’s better to start with your intuition than nothing if you don’t already have AdWords data or you haven’t started bidding, and then you can refine your sort of estimate over time as you see search visitors visit the pages that are ranking, as you potentially buy those ads, and those kinds of things.

You can get more sophisticated around this. I think a 10 point scale is just fine. You could also use a one, two, or three there, that’s also fine.

Requirements or Options

Then I don’t exactly know what to call this column. I can’t remember the person who’ve showed me theirs that had it in there. I think they called it Optional Data or Additional SERPs Data, but I’m going to call it Requirements or Options. Requirements because this is essentially saying, “Hey, if I want to rank in these search results, am I seeing that the top two or three are all video? Oh, they’re all video. They’re all coming from YouTube. If I want to be in there, I’ve got to be video.”

Or something like, “Hey, I’m seeing that most of the top results have been produced or updated in the last six months. Google appears to be biasing to very fresh information here.” So, for example, if I were searching for “university scholarships Cambridge 2015,” well, guess what? Google probably wants to bias to show results that have been either from the official page on Cambridge’s website or articles from this year about getting into that university and the scholarships that are available or offered. I saw those in two of these search results, both the college and university scholarships had a significant number of the SERPs where a fresh bump appeared to be required. You can see that a lot because the date will be shown ahead of the description, and the date will be very fresh, sometime in the last six months or a year.

Prioritization

Then finally I can build my prioritization. So based on all the data I had here, I essentially said, “Hey, you know what? These are not 1 and 2. This is actually 1A and 1B, because these are the same concepts. I’m going to build a single page to target both of those keyword phrases.” I think that makes good sense. Someone who is looking for college scholarships, university scholarships, same intent.

I am giving it a slight prioritization, 1A versus 1B, and the reason I do this is because I always have one keyword phrase that I’m leaning on a little more heavily. Because Google isn’t perfect around this, the search results will be a little different. I want to bias to one versus the other. In this case, my title tag, since I more targeting university over college, I might say something like college and university scholarships so that university and scholarships are nicely together, near the front of the title, that kind of thing. Then 1B, 2, 3.

This is kind of the way that modern SEOs are building a more sophisticated process with better data, more inclusive data that helps them select the right kinds of keywords and prioritize to the right ones. I’m sure you guys have built some awesome stuff. The Moz community is filled with very advanced marketers, probably plenty of you who’ve done even more than this.

I look forward to hearing from you in the comments. I would love to chat more about this topic, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it