Stop Ghost Spam in Google Analytics with One Filter

Posted by CarloSeo

The spam in Google Analytics (GA) is becoming a serious issue. Due to a deluge of referral spam from social buttons, adult sites, and many, many other sources, people are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.

The good news is, there is no need to panic. In this post, I’m going to focus on the most common mistakes people make when fighting spam in GA, and explain an efficient way to prevent it.

But first, let’s make sure we understand how spam works. A couple of months ago, Jared Gardner wrote an excellent article explaining what referral spam is, including its intended purpose. He also pointed out some great examples of referral spam.

Types of spam

The spam in Google Analytics can be categorized by two types: ghosts and crawlers.

Ghosts

The vast majority of spam is this type. They are called ghosts because they never access your site. It is important to keep this in mind, as it’s key to creating a more efficient solution for managing spam.

As unusual as it sounds, this type of spam doesn’t have any interaction with your site at all. You may wonder how that is possible since one of the main purposes of GA is to track visits to our sites.

They do it by using the Measurement Protocol, which allows people to send data directly to Google Analytics’ servers. Using this method, and probably randomly generated tracking codes (UA-XXXXX-1) as well, the spammers leave a “visit” with fake data, without even knowing who they are hitting.

Crawlers

This type of spam, the opposite to ghost spam, does access your site. As the name implies, these spam bots crawl your pages, ignoring rules like those found in robots.txt that are supposed to stop them from reading your site. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.

Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against this list might help you answer the question of whether or not it is spammy.

Most common mistakes made when dealing with spam in GA

I’ve been following this issue closely for the last few months. According to the comments people have made on my articles and conversations I’ve found in discussion forums, there are primarily three mistakes people make when dealing with spam in Google Analytics.

Mistake #1. Blocking ghost spam from the .htaccess file

One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.

For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won’t have any effect and will only add useless lines to your .htaccess file.

Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people think that they successfully blocked it from here when really it’s just a coincidence of timing.

Then when the spammers later return, they get worried because the solution is not working anymore, and they think the spammer somehow bypassed the barriers they set up.

The truth is, the .htaccess file can only effectively block crawlers such as buttons-for-website.com and a few others since these access your site. Most of the spam can’t be blocked using this method, so there is no other option than using filters to exclude them.

Mistake #2. Using the referral exclusion list to stop spam

Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in the way we want to for the spam. It has other purposes.

For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they’re redirected back to you website, and GA records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.

If you try to use the referral exclusion list to manage spam, however, the referral part will be stripped since there is no preexisting record. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with since. You will still have spam, and direct visits are harder to track.

Mistake #3. Worrying that bounce rate changes will affect rankings

When people see that the bounce rate changes drastically because of the spam, they start worrying about the impact that it will have on their rankings in the SERPs.

bounce.png

This is another mistake commonly made. With or without spam, Google doesn’t take into consideration Google Analytics metrics as a ranking factor. Here is an explanation about this from Matt Cutts, the former head of Google’s web spam team.

And if you think about it, Cutts’ explanation makes sense; because although many people have GA, not everyone uses it.

Assuming your site has been hacked

Another common concern when people see strange landing pages coming from spam on their reports is that they have been hacked.

landing page

The page that the spam shows on the reports doesn’t exist, and if you try to open it, you will get a 404 page. Your site hasn’t been compromised.

But you have to make sure the page doesn’t exist. Because there are cases (not spam) where some sites have a security breach and get injected with pages full of bad keywords to defame the website.

What should you worry about?

Now that we’ve discarded security issues and their effects on rankings, the only thing left to worry about is your data. The fake trail that the spam leaves behind pollutes your reports.

It might have greater or lesser impact depending on your site traffic, but everyone is susceptible to the spam.

Small and midsize sites are the most easily impacted – not only because a big part of their traffic can be spam, but also because usually these sites are self-managed and sometimes don’t have the support of an analyst or a webmaster.

Big sites with a lot of traffic can also be impacted by spam, and although the impact can be insignificant, invalid traffic means inaccurate reports no matter the size of the website. As an analyst, you should be able to explain what’s going on in even in the most granular reports.

You only need one filter to deal with ghost spam

Usually it is recommended to add the referral to an exclusion filter after it is spotted. Although this is useful for a quick action against the spam, it has three big disadvantages.

  • Making filters every week for every new spam detected is tedious and time-consuming, especially if you manage many sites. Plus, by the time you apply the filter, and it starts working, you already have some affected data.
  • Some of the spammers use direct visits along with the referrals.
  • These direct hits won’t be stopped by the filter so even if you are excluding the referral you will sill be receiving invalid traffic, which explains why some people have seen an unusual spike in direct traffic.

Luckily, there is a good way to prevent all these problems. Most of the spam (ghost) works by hitting GA’s random tracking-IDs, meaning the offender doesn’t really know who is the target, and for that reason either the hostname is not set or it uses a fake one. (See report below)

Ghost-Spam.png

You can see that they use some weird names or don’t even bother to set one. Although there are some known names in the list, these can be easily added by the spammer.

On the other hand, valid traffic will always use a real hostname. In most of the cases, this will be the domain. But it also can also result from paid services, translation services, or any other place where you’ve inserted GA tracking code.

Valid-Referral.png

Based on this, we can make a filter that will include only hits that use real hostnames. This will automatically exclude all hits from ghost spam, whether it shows up as a referral, keyword, or pageview; or even as a direct visit.

To create this filter, you will need to find the report of hostnames. Here’s how:

  1. Go to the Reporting tab in GA
  2. Click on Audience in the lefthand panel
  3. Expand Technology and select Network
  4. At the top of the report, click on Hostname

Valid-list

You will see a list of all hostnames, including the ones that the spam uses. Make a list of all the valid hostnames you find, as follows:

  • yourmaindomain.com
  • blog.yourmaindomain.com
  • es.yourmaindomain.com
  • payingservice.com
  • translatetool.com
  • anotheruseddomain.com

For small to medium sites, this list of hostnames will likely consist of the main domain and a couple of subdomains. After you are sure you got all of them, create a regular expression similar to this one:

yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com

You don’t need to put all of your subdomains in the regular expression. The main domain will match all of them. If you don’t have a view set up without filters, create one now.

Then create a Custom Filter.

Make sure you select INCLUDE, then select “Hostname” on the filter field, and copy your expression into the Filter Pattern box.

filter

You might want to verify the filter before saving to check that everything is okay. Once you’re ready, set it to save, and apply the filter to all the views you want (except the view without filters).

This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn’t require much maintenance. But it’s important that every time you add your tracking code to any service, you add it to the end of the filter.

Now you should only need to take care of the crawler spam. Since crawlers access your site, you can block them by adding these lines to the .htaccess file:

## STOP REFERRER SPAM 
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR] 
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] 
RewriteRule .* - [F]

It is important to note that this file is very sensitive, and misplacing a single character it it can bring down your entire site. Therefore, make sure you create a backup copy of your .htaccess file prior to editing it.

If you don’t feel comfortable messing around with your .htaccess file, you can alternatively make an expression with all the crawlers, then and add it to an exclude filter by Campaign Source.

Implement these combined solutions, and you will worry much less about spam contaminating your analytics data. This will have the added benefit of freeing up more time for you to spend actually analyze your valid data.

After stopping spam, you can also get clean reports from the historical data by using the same expressions in an Advance Segment to exclude all the spam.

Bonus resources to help you manage spam

If you still need more information to help you understand and deal with the spam on your GA reports, you can read my main article on the subject here: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/.

Additional information on how to stop spam can be found at these URLs:

In closing, I am eager to hear your ideas on this serious issue. Please share them in the comments below.

(Editor’s Note: All images featured in this post were created by the author.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Pinpoint vs. Floodlight Content and Keyword Research Strategies – Whiteboard Friday

Posted by randfish

When we’re doing keyword research and targeting, we have a choice to make: Are we targeting broader keywords with multiple potential searcher intents, or are we targeting very narrow keywords where it’s pretty clear what the searchers were looking for? Those different approaches, it turns out, apply to content creation and site architecture, as well. In today’s Whiteboard Friday, Rand illustrates that connection.

Pinpoint vs Floodlight Content and Keyword Research Strategy Whiteboard

For reference, here are stills of this week’s whiteboards. Click on it to open a high resolution image in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat about pinpoint versus floodlight tactics for content targeting, content strategy, and keyword research, keyword targeting strategy. This is also called the shotgun versus sniper approach, but I’m not a big gun fan. So I’m going to stick with my floodlight versus pinpoint, plus, you know, for the opening shot we don’t have a whole lot of weaponry here at Moz, but we do have lighting.

So let’s talk through this at first. You’re going through and doing some keyword research. You’re trying to figure out which terms and phrases to target. You might look down a list like this.

Well, maybe, I’m using an example here around antique science equipment. So you see these various terms and phrases. You’ve got your volume numbers. You probably have lots of other columns. Hopefully, you’ve watched the Whiteboard Friday on how to do keyword research like it’s 2015 and not 2010.

So you know you have all these other columns to choose from, but I’m simplifying here for the purpose of this experiment. So you might choose some of these different terms. Now, they’re going to have different kinds of tactics and a different strategic approach, depending on the breadth and depth of the topic that you’re targeting. That’s going to determine what types of content you want to create and where you place it in your information architecture. So I’ll show you what I mean.

The floodlight approach

For antique science equipment, this is a relatively broad phrase. I’m going to do my floodlight analysis on this, and floodlight analysis is basically saying like, “Okay, are there multiple potential searcher intents?” Yeah, absolutely. That’s a fairly broad phase. People could be looking to transact around it. They might be looking for research information, historical information, different types of scientific equipment that they’re looking for.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b15fc96679b8.73854740.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

Are there four or more approximately unique keyword terms and phrases to target? Well, absolutely, in fact, there’s probably more than that. So antique science equipment, antique scientific equipment, 18th century scientific equipment, all these different terms and phrases that you might explore there.

Is this a broad content topic with many potential subtopics? Again, yes is the answer to this. Are we talking about generally larger search volume? Again, yes, this is going to have a much larger search volume than some of the narrower terms and phrases. That’s not always the case, but it is here.

The pinpoint approach

For pinpoint analysis, we kind of go the opposite direction. So we might look at a term like antique test tubes, which is a very specific kind of search, and that has a clear single searcher intent or maybe two. Someone might be looking for actually purchasing one of those, or they might be looking to research them and see what kinds there are. Not a ton of additional intents behind that. One to three unique keywords, yeah, probably. It’s pretty specific. Antique test tubes, maybe 19th century test tubes, maybe old science test tubes, but you’re talking about a limited set of keywords that you’re targeting. It’s a narrow content topic, typically smaller search volume.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b160069eb6b1.12473448.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

Now, these are going to feed into your IA, your information architecture, and your site structure in this way. So floodlight content generally sits higher up. It’s the category or the subcategory, those broad topic terms and phrases. Those are going to turn into those broad topic category pages. Then you might have multiple, narrower subtopics. So we could go into lab equipment versus astronomical equipment versus chemistry equipment, and then we’d get into those individual pinpoints from the pinpoint analysis.

How do I decide which approach is best for my keywords?

Why are we doing this? Well, generally speaking, if you can take your terms and phrases and categorize them like this and then target them differently, you’re going to provide a better, more logical user experience. Someone who searches for antique scientific equipment, they’re going to really expect to see that category and then to be able to drill down into things. So you’re providing them the experience they predict, the one that they want, the one that they expect.

It’s better for topic modeling analysis and for all of the algorithms around things like Hummingbird, where Google looks at: Are you using the types of terms and phrases, do you have the type of architecture that we expect to find for this keyword?

It’s better for search intent targeting, because the searcher intent is going to be fulfilled if you provide the multiple paths versus the narrow focus. It’s easier keyword targeting for you. You’re going to be able to know, “Hey, I need to target a lot of different terms and phrases and variations in floodlight and one very specific one in pinpoint.”

There’s usually higher searcher satisfaction, which means you get lower bounce rate. You get more engagement. You usually get a higher conversion rate. So it’s good for all those things.

For example…

I’ll actually create pages for each of antique scientific equipment and antique test tubes to illustrate this. So I’ve got two different types of pages here. One is my antique scientific equipment page.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b161fa871e32.54731215.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

This is that floodlight, shotgun approach, and what we’re doing here is going to be very different from a pinpoint approach. It’s looking at like, okay, you’ve landed on antique scientific equipment. Now, where do you want to go? What do you want to specifically explore? So we’re going to have a little bit of content specifically about this topic, and how robust that is depends on the type of topic and the type of site you are.

If this is an e-commerce site or a site that’s showing information about various antiques, well maybe we don’t need very much content here. You can see the filtration that we’ve got is going to be pretty broad. So I can go into different centuries. I can go into chemistry, astronomy, physics. Maybe I have a safe for kids type of stuff if you want to buy your kids antique lab equipment, which you might be. Who knows? Maybe you’re awesome and your kids are too. Then different types of stuff at a very broad level. So I can go to microscopes or test tubes, lab searches.

This is great because it’s got broad intent foci, serving many different kinds of searchers with the same page because we don’t know exactly what they want. It’s got multiple keyword targets so that we can go after broad phrases like antique or old or historical or 13th, 14th, whatever century, science and scientific equipment ,materials, labs, etc., etc., etc. This is a broad page that could reach any and all of those. Then there’s lots of navigational and refinement options once you get there.

Total opposite of pinpoint content.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b1622740f0b5.73477500.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

Pinpoint content, like this antique test tubes page, we’re still going to have some filtration options, but one of the important things to note is note how these are links that take you deeper. Depending on how deep the search volume goes in terms of the types of queries that people are performing, you might want to make a specific page for 17th century antique test tubes. You might not, and if you don’t want to do that, you can have these be filters that are simply clickable and change the content of the page here, narrowing the options rather than creating completely separate pages.

So if there’s no search volume for these different things and you don’t think you need to separately target them, go ahead and just make them filters on the data that already appears on this page or the results that are already in here as opposed to links that are going to take you deeper into specific content and create a new page, a new experience.

You can also see I’ve got my individual content here. I probably would go ahead and add some content specifically to this page that is just unique here and that describes antique test tubes and the things that your searchers need. They might want to know things about price. They might want to know things about make and model. They might want to know things about what they were used for. Great. You can have that information broadly, and then individual pieces of content that someone might dig into.

This is narrower intent foci obviously, serving maybe one or two searcher intents. This is really talking about targeting maybe one to two separate keywords. So antique test tubes, maybe lab tubes or test tube sets, but not much beyond that.

Ten we’re going to have fewer navigational paths, fewer distractions. We want to keep the searcher. Because we know their intent, we want to guide them along the path that we know they probably want to take and that we want them to take.

So when you’re considering your content, choose wisely between shotgun/floodlight approach or sniper/pinpoint approach. Your searchers will be better served. You’ll probably rank better. You’ll be more likely to earn links and amplification. You’re going to be more successful.

Looking forward to the comments, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How to Rid Your Website of Six Common Google Analytics Headaches

Posted by amandaecking

I’ve been in and out of Google Analytics (GA) for the past five or so years agency-side. I’ve seen three different code libraries, dozens of new different features and reports roll out, IP addresses stop being reported, and keywords not-so-subtly phased out of the free platform.

Analytics has been a focus of mine for the past year or so—mainly, making sure clients get their data right. Right now, our new focus is closed loop tracking, but that’s a topic for another day. If you’re using Google Analytics, and only Google Analytics for the majority of your website stats, or it’s your primary vehicle for analysis, you need to make sure it’s accurate.

Not having data pulling in or reporting properly is like building a house on a shaky foundation: It doesn’t end well. Usually there are tears.

For some reason, a lot of people, including many of my clients, assume everything is tracking properly in Google Analytics… because Google. But it’s not Google who sets up your analytics. People do that. And people are prone to make mistakes.

I’m going to go through six scenarios where issues are commonly encountered with Google Analytics.

I’ll outline the remedy for each issue, and in the process, show you how to move forward with a diagnosis or resolution.

1. Self-referrals

This is probably one of the areas we’re all familiar with. If you’re seeing a lot of traffic from your own domain, there’s likely a problem somewhere—or you need to extend the default session length in Google Analytics. (For example, if you have a lot of long videos or music clips and don’t use event tracking; a website like TEDx or SoundCloud would be a good equivalent.)

Typically one of the first things I’ll do to help diagnose the problem is include an advanced filter to show the full referrer string. You do this by creating a filter, as shown below:

Filter Type: Custom filter > Advanced
Field A: Hostname
Extract A: (.*)
Field B: Request URI
Extract B: (.*)
Output To: Request URI
Constructor: $A1$B1

You’ll then start seeing the subdomains pulling in. Experience has shown me that if you have a separate subdomain hosted in another location (say, if you work with a separate company and they host and run your mobile site or your shopping cart), it gets treated by Google Analytics as a separate domain. Thus, you ‘ll need to implement cross domain tracking. This way, you can narrow down whether or not it’s one particular subdomain that’s creating the self-referrals.

In this example below, we can see all the revenue is being reported to the booking engine (which ended up being cross domain issues) and their own site is the fourth largest traffic source:

I’ll also a good idea to check the browser and device reports to start narrowing down whether the issue is specific to a particular element. If it’s not, keep digging. Look at pages pulling the self-referrals and go through the code with a fine-tooth comb, drilling down as much as you can.

2. Unusually low bounce rate

If you have a crazy-low bounce rate, it could be too good to be true. Unfortunately. An unusually low bounce rate could (and probably does) mean that at least on some pages of your website have the same Google Analytics tracking code installed twice.

Take a look at your source code, or use Google Tag Assistant (though it does have known bugs) to see if you’ve got GA tracking code installed twice.

While I tell clients having Google Analytics installed on the same page can lead to double the pageviews, I’ve not actually encountered that—I usually just say it to scare them into removing the duplicate implementation more quickly. Don’t tell on me.

3. Iframes anywhere

I’ve heard directly from Google engineers and Google Analytics evangelists that Google Analytics does not play well with iframes, and that it will never will play nice with this dinosaur technology.

If you track the iframe, you inflate your pageviews, plus you still aren’t tracking everything with 100% clarity.

If you don’t track across iframes, you lose the source/medium attribution and everything becomes a self-referral.

Damned if you do; damned if you don’t.

My advice: Stop using iframes. They’re Netscape-era technology anyway, with rainbow marquees and Comic Sans on top. Interestingly, and unfortunately, a number of booking engines (for hotels) and third-party carts (for ecommerce) still use iframes.

If you have any clients in those verticals, or if you’re in the vertical yourself, check with your provider to see if they use iframes. Or you can check for yourself, by right-clicking as close as you can to the actual booking element:

iframe-booking.png

There is no neat and tidy way to address iframes with Google Analytics, and usually iframes are not the only complicated element of setup you’ll encounter. I spent eight months dealing with a website on a subfolder, which used iframes and had a cross domain booking system, and the best visibility I was able to get was about 80% on a good day.

Typically, I’d approach diagnosing iframes (if, for some reason, I had absolutely no access to viewing a website or talking to the techs) similarly to diagnosing self-referrals, as self-referrals are one of the biggest symptoms of iframe use.

4. Massive traffic jumps

Massive jumps in traffic don’t typically just happen. (Unless, maybe, you’re Geraldine.) There’s always an explanation—a new campaign launched, you just turned on paid ads for the first time, you’re using content amplification platforms, you’re getting a ton of referrals from that recent press in The New York Times. And if you think it just happened, it’s probably a technical glitch.

I’ve seen everything from inflated pageviews result from including tracking on iframes and unnecessary implementation of virtual pageviews, to not realizing the tracking code was installed on other microsites for the same property. Oops.

Usually I’ve seen this happen when the tracking code was somewhere it shouldn’t be, so if you’re investigating a situation of this nature, first confirm the Google Analytics code is only in the places it needs to be.Tools like Google Tag Assistant and Screaming Frog can be your BFFs in helping you figure this out.

Also, I suggest bribing the IT department with sugar (or booze) to see if they’ve changed anything lately.

5. Cross-domain tracking

I wish cross-domain tracking with Google Analytics out of the box didn’t require any additional setup. But it does.

If you don’t have it set up properly, things break down quickly, and can be quite difficult to untangle.

The older the GA library you’re using, the harder it is. The easiest setup, by far, is Google Tag Manager with Universal Analytics. Hard-coded universal analytics is a bit more difficult because you have to implement autoLink manually and decorate forms, if you’re using them (and you probably are). Beyond that, rather than try and deal with it, I say update your Google Analytics code. Then we can talk.

Where I’ve seen the most murkiness with tracking is when parts of cross domain tracking are implemented, but not all. For some reason, if allowLinker isn’t included, or you forget to decorate all the forms, the cookies aren’t passed between domains.

The absolute first place I would start with this would be confirming the cookies are all passing properly at all the right points, forms, links, and smoke signals. I’ll usually use a combination of the Real Time report in Google Analytics, Google Tag Assistant, and GA debug to start testing this. Any debug tool you use will mean you’re playing in the console, so get friendly with it.

6. Internal use of UTM strings

I’ve saved the best for last. Internal use of campaign tagging. We may think, oh, I use Google to tag my campaigns externally, and we’ve got this new promotion on site which we’re using a banner ad for. That’s a campaign. Why don’t I tag it with a UTM string?

Step away from the keyboard now. Please.

When you tag internal links with UTM strings, you override the original source/medium. So that visitor who came in through your paid ad and then who clicks on the campaign banner has now been manually tagged. You lose the ability to track that they came through on the ad the moment they click on the tagged internal link. Their source and medium is now your internal campaign, not that paid ad you’re spending gobs of money on and have to justify to your manager. See the problem?

I’ve seen at least three pretty spectacular instances of this in the past year, and a number of smaller instances of it. Annie Cushing also talks about the evils of internal UTM tags and the odd prevalence of it. (Oh, and if you haven’t explored her blog, and the amazing spreadsheets she shares, please do.)

One clothing company I worked with tagged all of their homepage offers with UTM strings, which resulted in the loss of visibility for one-third of their audience: One million visits over the course of a year, and $2.1 million in lost revenue.

Let me say that again. One million visits, and $2.1 million. That couldn’t be attributed to an external source/campaign/spend.

Another client I audited included campaign tagging on nearly every navigational element on their website. It still gives me nightmares.

If you want to see if you have any internal UTM strings, head straight to the Campaigns report in Acquisition in Google Analytics, and look for anything like “home” or “navigation” or any language you may use internally to refer to your website structure.

And if you want to see how users are moving through your website, go to the Flow reports. Or if you really, really, really want to know how many people click on that sidebar link, use event tracking. But please, for the love of all things holy (and to keep us analytics lovers from throwing our computers across the room), stop using UTM tagging on your internal links.

Now breathe and smile

Odds are, your Google Analytics setup is fine. If you are seeing any of these issues, though, you have somewhere to start in diagnosing and addressing the data.

We’ve looked at six of the most common points of friction I’ve encountered with Google Analytics and how to start investigating them: self-referrals, bounce rate, iframes, traffic jumps, cross domain tracking and internal campaign tagging.

What common data integrity issues have you encountered with Google Analytics? What are your favorite tools to investigate?

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Why We Can’t Do Keyword Research Like It’s 2010 – Whiteboard Friday

Posted by randfish

Keyword Research is a very different field than it was just five years ago, and if we don’t keep up with the times we might end up doing more harm than good. From the research itself to the selection and targeting process, in today’s Whiteboard Friday Rand explains what has changed and what we all need to do to conduct effective keyword research today.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

What do we need to change to keep up with the changing world of keyword research?

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat a little bit about keyword research, why it’s changed from the last five, six years and what we need to do differently now that things have changed. So I want to talk about changing up not just the research but also the selection and targeting process.

There are three big areas that I’ll cover here. There’s lots more in-depth stuff, but I think we should start with these three.

1) The Adwords keyword tool hides data!

This is where almost all of us in the SEO world start and oftentimes end with our keyword research. We go to AdWords Keyword Tool, what used to be the external keyword tool and now is inside AdWords Ad Planner. We go inside that tool, and we look at the volume that’s reported and we sort of record that as, well, it’s not good, but it’s the best we’re going to do.

However, I think there are a few things to consider here. First off, that tool is hiding data. What I mean by that is not that they’re not telling the truth, but they’re not telling the whole truth. They’re not telling nothing but the truth, because those rounded off numbers that you always see, you know that those are inaccurate. Anytime you’ve bought keywords, you’ve seen that the impression count never matches the count that you see in the AdWords tool. It’s not usually massively off, but it’s often off by a good degree, and the only thing it’s great for is telling relative volume from one from another.

But because AdWords hides data essentially by saying like, “Hey, you’re going to type in . . .” Let’s say I’m going to type in “college tuition,” and Google knows that a lot of people search for how to reduce college tuition, but that doesn’t come up in the suggestions because it’s not a commercial term, or they don’t think that an advertiser who bids on that is going to do particularly well and so they don’t show it in there. I’m giving an example. They might indeed show that one.

But because that data is hidden, we need to go deeper. We need to go beyond and look at things like Google Suggest and related searches, which are down at the bottom. We need to start conducting customer interviews and staff interviews, which hopefully has always been part of your brainstorming process but really needs to be now. Then you can apply that to AdWords. You can apply that to suggest and related.

The beautiful thing is once you get these tools from places like visiting forums or communities, discussion boards and seeing what terms and phrases people are using, you can collect all this stuff up, plug it back into AdWords, and now they will tell you how much volume they’ve got. So you take that how to lower college tuition term, you plug it into AdWords, they will show you a number, a non-zero number. They were just hiding it in the suggestions because they thought, “Hey, you probably don’t want to bid on that. That won’t bring you a good ROI.” So you’ve got to be careful with that, especially when it comes to SEO kinds of keyword research.

2) Building separate pages for each term or phrase doesn’t make sense

It used to be the case that we built separate pages for every single term and phrase that was in there, because we wanted to have the maximum keyword targeting that we could. So it didn’t matter to us that college scholarship and university scholarships were essentially people looking for exactly the same thing, just using different terminology. We would make one page for one and one page for the other. That’s not the case anymore.

Today, we need to group by the same searcher intent. If two searchers are searching for two different terms or phrases but both of them have exactly the same intent, they want the same information, they’re looking for the same answers, their query is going to be resolved by the same content, we want one page to serve those, and that’s changed up a little bit of how we’ve done keyword research and how we do selection and targeting as well.

3) Build your keyword consideration and prioritization spreadsheet with the right metrics

Everybody’s got an Excel version of this, because I think there’s just no awesome tool out there that everyone loves yet that kind of solves this problem for us, and Excel is very, very flexible. So we go into Excel, we put in our keyword, the volume, and then a lot of times we almost stop there. We did keyword volume and then like value to the business and then we prioritize.

What are all these new columns you’re showing me, Rand? Well, here I think is how sophisticated, modern SEOs that I’m seeing in the more advanced agencies, the more advanced in-house practitioners, this is what I’m seeing them add to the keyword process.

Difficulty

A lot of folks have done this, but difficulty helps us say, “Hey, this has a lot of volume, but it’s going to be tremendously hard to rank.”

The difficulty score that Moz uses and attempts to calculate is a weighted average of the top 10 domain authorities. It also uses page authority, so it’s kind of a weighted stack out of the two. If you’re seeing very, very challenging pages, very challenging domains to get in there, it’s going to be super hard to rank against them. The difficulty is high. For all of these ones it’s going to be high because college and university terms are just incredibly lucrative.

That difficulty can help bias you against chasing after terms and phrases for which you are very unlikely to rank for at least early on. If you feel like, “Hey, I already have a powerful domain. I can rank for everything I want. I am the thousand pound gorilla in my space,” great. Go after the difficulty of your choice, but this helps prioritize.

Opportunity

This is actually very rarely used, but I think sophisticated marketers are using it extremely intelligently. Essentially what they’re saying is, “Hey, if you look at a set of search results, sometimes there are two or three ads at the top instead of just the ones on the sidebar, and that’s biasing some of the click-through rate curve.” Sometimes there’s an instant answer or a Knowledge Graph or a news box or images or video, or all these kinds of things that search results can be marked up with, that are not just the classic 10 web results. Unfortunately, if you’re building a spreadsheet like this and treating every single search result like it’s just 10 blue links, well you’re going to lose out. You’re missing the potential opportunity and the opportunity cost that comes with ads at the top or all of these kinds of features that will bias the click-through rate curve.

So what I’ve seen some really smart marketers do is essentially build some kind of a framework to say, “Hey, you know what? When we see that there’s a top ad and an instant answer, we’re saying the opportunity if I was ranking number 1 is not 10 out of 10. I don’t expect to get whatever the average traffic for the number 1 position is. I expect to get something considerably less than that. Maybe something around 60% of that, because of this instant answer and these top ads.” So I’m going to mark this opportunity as a 6 out of 10.

There are 2 top ads here, so I’m giving this a 7 out of 10. This has two top ads and then it has a news block below the first position. So again, I’m going to reduce that click-through rate. I think that’s going down to a 6 out of 10.

You can get more and less scientific and specific with this. Click-through rate curves are imperfect by nature because we truly can’t measure exactly how those things change. However, I think smart marketers can make some good assumptions from general click-through rate data, which there are several resources out there on that to build a model like this and then include it in their keyword research.

This does mean that you have to run a query for every keyword you’re thinking about, but you should be doing that anyway. You want to get a good look at who’s ranking in those search results and what kind of content they’re building . If you’re running a keyword difficulty tool, you are already getting something like that.

Business value

This is a classic one. Business value is essentially saying, “What’s it worth to us if visitors come through with this search term?” You can get that from bidding through AdWords. That’s the most sort of scientific, mathematically sound way to get it. Then, of course, you can also get it through your own intuition. It’s better to start with your intuition than nothing if you don’t already have AdWords data or you haven’t started bidding, and then you can refine your sort of estimate over time as you see search visitors visit the pages that are ranking, as you potentially buy those ads, and those kinds of things.

You can get more sophisticated around this. I think a 10 point scale is just fine. You could also use a one, two, or three there, that’s also fine.

Requirements or Options

Then I don’t exactly know what to call this column. I can’t remember the person who’ve showed me theirs that had it in there. I think they called it Optional Data or Additional SERPs Data, but I’m going to call it Requirements or Options. Requirements because this is essentially saying, “Hey, if I want to rank in these search results, am I seeing that the top two or three are all video? Oh, they’re all video. They’re all coming from YouTube. If I want to be in there, I’ve got to be video.”

Or something like, “Hey, I’m seeing that most of the top results have been produced or updated in the last six months. Google appears to be biasing to very fresh information here.” So, for example, if I were searching for “university scholarships Cambridge 2015,” well, guess what? Google probably wants to bias to show results that have been either from the official page on Cambridge’s website or articles from this year about getting into that university and the scholarships that are available or offered. I saw those in two of these search results, both the college and university scholarships had a significant number of the SERPs where a fresh bump appeared to be required. You can see that a lot because the date will be shown ahead of the description, and the date will be very fresh, sometime in the last six months or a year.

Prioritization

Then finally I can build my prioritization. So based on all the data I had here, I essentially said, “Hey, you know what? These are not 1 and 2. This is actually 1A and 1B, because these are the same concepts. I’m going to build a single page to target both of those keyword phrases.” I think that makes good sense. Someone who is looking for college scholarships, university scholarships, same intent.

I am giving it a slight prioritization, 1A versus 1B, and the reason I do this is because I always have one keyword phrase that I’m leaning on a little more heavily. Because Google isn’t perfect around this, the search results will be a little different. I want to bias to one versus the other. In this case, my title tag, since I more targeting university over college, I might say something like college and university scholarships so that university and scholarships are nicely together, near the front of the title, that kind of thing. Then 1B, 2, 3.

This is kind of the way that modern SEOs are building a more sophisticated process with better data, more inclusive data that helps them select the right kinds of keywords and prioritize to the right ones. I’m sure you guys have built some awesome stuff. The Moz community is filled with very advanced marketers, probably plenty of you who’ve done even more than this.

I look forward to hearing from you in the comments. I would love to chat more about this topic, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it