Stop Ghost Spam in Google Analytics with One Filter

Posted by CarloSeo

The spam in Google Analytics (GA) is becoming a serious issue. Due to a deluge of referral spam from social buttons, adult sites, and many, many other sources, people are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.

The good news is, there is no need to panic. In this post, I’m going to focus on the most common mistakes people make when fighting spam in GA, and explain an efficient way to prevent it.

But first, let’s make sure we understand how spam works. A couple of months ago, Jared Gardner wrote an excellent article explaining what referral spam is, including its intended purpose. He also pointed out some great examples of referral spam.

Types of spam

The spam in Google Analytics can be categorized by two types: ghosts and crawlers.

Ghosts

The vast majority of spam is this type. They are called ghosts because they never access your site. It is important to keep this in mind, as it’s key to creating a more efficient solution for managing spam.

As unusual as it sounds, this type of spam doesn’t have any interaction with your site at all. You may wonder how that is possible since one of the main purposes of GA is to track visits to our sites.

They do it by using the Measurement Protocol, which allows people to send data directly to Google Analytics’ servers. Using this method, and probably randomly generated tracking codes (UA-XXXXX-1) as well, the spammers leave a “visit” with fake data, without even knowing who they are hitting.

Crawlers

This type of spam, the opposite to ghost spam, does access your site. As the name implies, these spam bots crawl your pages, ignoring rules like those found in robots.txt that are supposed to stop them from reading your site. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.

Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against this list might help you answer the question of whether or not it is spammy.

Most common mistakes made when dealing with spam in GA

I’ve been following this issue closely for the last few months. According to the comments people have made on my articles and conversations I’ve found in discussion forums, there are primarily three mistakes people make when dealing with spam in Google Analytics.

Mistake #1. Blocking ghost spam from the .htaccess file

One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.

For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won’t have any effect and will only add useless lines to your .htaccess file.

Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people think that they successfully blocked it from here when really it’s just a coincidence of timing.

Then when the spammers later return, they get worried because the solution is not working anymore, and they think the spammer somehow bypassed the barriers they set up.

The truth is, the .htaccess file can only effectively block crawlers such as buttons-for-website.com and a few others since these access your site. Most of the spam can’t be blocked using this method, so there is no other option than using filters to exclude them.

Mistake #2. Using the referral exclusion list to stop spam

Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in the way we want to for the spam. It has other purposes.

For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they’re redirected back to you website, and GA records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.

If you try to use the referral exclusion list to manage spam, however, the referral part will be stripped since there is no preexisting record. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with since. You will still have spam, and direct visits are harder to track.

Mistake #3. Worrying that bounce rate changes will affect rankings

When people see that the bounce rate changes drastically because of the spam, they start worrying about the impact that it will have on their rankings in the SERPs.

bounce.png

This is another mistake commonly made. With or without spam, Google doesn’t take into consideration Google Analytics metrics as a ranking factor. Here is an explanation about this from Matt Cutts, the former head of Google’s web spam team.

And if you think about it, Cutts’ explanation makes sense; because although many people have GA, not everyone uses it.

Assuming your site has been hacked

Another common concern when people see strange landing pages coming from spam on their reports is that they have been hacked.

landing page

The page that the spam shows on the reports doesn’t exist, and if you try to open it, you will get a 404 page. Your site hasn’t been compromised.

But you have to make sure the page doesn’t exist. Because there are cases (not spam) where some sites have a security breach and get injected with pages full of bad keywords to defame the website.

What should you worry about?

Now that we’ve discarded security issues and their effects on rankings, the only thing left to worry about is your data. The fake trail that the spam leaves behind pollutes your reports.

It might have greater or lesser impact depending on your site traffic, but everyone is susceptible to the spam.

Small and midsize sites are the most easily impacted – not only because a big part of their traffic can be spam, but also because usually these sites are self-managed and sometimes don’t have the support of an analyst or a webmaster.

Big sites with a lot of traffic can also be impacted by spam, and although the impact can be insignificant, invalid traffic means inaccurate reports no matter the size of the website. As an analyst, you should be able to explain what’s going on in even in the most granular reports.

You only need one filter to deal with ghost spam

Usually it is recommended to add the referral to an exclusion filter after it is spotted. Although this is useful for a quick action against the spam, it has three big disadvantages.

  • Making filters every week for every new spam detected is tedious and time-consuming, especially if you manage many sites. Plus, by the time you apply the filter, and it starts working, you already have some affected data.
  • Some of the spammers use direct visits along with the referrals.
  • These direct hits won’t be stopped by the filter so even if you are excluding the referral you will sill be receiving invalid traffic, which explains why some people have seen an unusual spike in direct traffic.

Luckily, there is a good way to prevent all these problems. Most of the spam (ghost) works by hitting GA’s random tracking-IDs, meaning the offender doesn’t really know who is the target, and for that reason either the hostname is not set or it uses a fake one. (See report below)

Ghost-Spam.png

You can see that they use some weird names or don’t even bother to set one. Although there are some known names in the list, these can be easily added by the spammer.

On the other hand, valid traffic will always use a real hostname. In most of the cases, this will be the domain. But it also can also result from paid services, translation services, or any other place where you’ve inserted GA tracking code.

Valid-Referral.png

Based on this, we can make a filter that will include only hits that use real hostnames. This will automatically exclude all hits from ghost spam, whether it shows up as a referral, keyword, or pageview; or even as a direct visit.

To create this filter, you will need to find the report of hostnames. Here’s how:

  1. Go to the Reporting tab in GA
  2. Click on Audience in the lefthand panel
  3. Expand Technology and select Network
  4. At the top of the report, click on Hostname

Valid-list

You will see a list of all hostnames, including the ones that the spam uses. Make a list of all the valid hostnames you find, as follows:

  • yourmaindomain.com
  • blog.yourmaindomain.com
  • es.yourmaindomain.com
  • payingservice.com
  • translatetool.com
  • anotheruseddomain.com

For small to medium sites, this list of hostnames will likely consist of the main domain and a couple of subdomains. After you are sure you got all of them, create a regular expression similar to this one:

yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com

You don’t need to put all of your subdomains in the regular expression. The main domain will match all of them. If you don’t have a view set up without filters, create one now.

Then create a Custom Filter.

Make sure you select INCLUDE, then select “Hostname” on the filter field, and copy your expression into the Filter Pattern box.

filter

You might want to verify the filter before saving to check that everything is okay. Once you’re ready, set it to save, and apply the filter to all the views you want (except the view without filters).

This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn’t require much maintenance. But it’s important that every time you add your tracking code to any service, you add it to the end of the filter.

Now you should only need to take care of the crawler spam. Since crawlers access your site, you can block them by adding these lines to the .htaccess file:

## STOP REFERRER SPAM 
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR] 
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] 
RewriteRule .* - [F]

It is important to note that this file is very sensitive, and misplacing a single character it it can bring down your entire site. Therefore, make sure you create a backup copy of your .htaccess file prior to editing it.

If you don’t feel comfortable messing around with your .htaccess file, you can alternatively make an expression with all the crawlers, then and add it to an exclude filter by Campaign Source.

Implement these combined solutions, and you will worry much less about spam contaminating your analytics data. This will have the added benefit of freeing up more time for you to spend actually analyze your valid data.

After stopping spam, you can also get clean reports from the historical data by using the same expressions in an Advance Segment to exclude all the spam.

Bonus resources to help you manage spam

If you still need more information to help you understand and deal with the spam on your GA reports, you can read my main article on the subject here: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/.

Additional information on how to stop spam can be found at these URLs:

In closing, I am eager to hear your ideas on this serious issue. Please share them in the comments below.

(Editor’s Note: All images featured in this post were created by the author.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

Controlling Search Engine Crawlers for Better Indexation and Rankings – Whiteboard Friday

Posted by randfish

When should you disallow search engines in your robots.txt file, and when should you use meta robots tags in a page header? What about nofollowing links? In today’s Whiteboard Friday, Rand covers these tools and their appropriate use in four situations that SEOs commonly find themselves facing.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to talk about controlling search engine crawlers, blocking bots, sending bots where we want, restricting them from where we don’t want them to go. We’re going to talk a little bit about crawl budget and what you should and shouldn’t have indexed.

As a start, what I want to do is discuss the ways in which we can control robots. Those include the three primary ones: robots.txt, meta robots, and—well, the nofollow tag is a little bit less about controlling bots.

There are a few others that we’re going to discuss as well, including Webmaster Tools (Search Console) and URL status codes. But let’s dive into those first few first.

Robots.txt lives at yoursite.com/robots.txt, it tells crawlers what they should and shouldn’t access, it doesn’t always get respected by Google and Bing. So a lot of folks when you say, “hey, disallow this,” and then you suddenly see those URLs popping up and you’re wondering what’s going on, look—Google and Bing oftentimes think that they just know better. They think that maybe you’ve made a mistake, they think “hey, there’s a lot of links pointing to this content, there’s a lot of people who are visiting and caring about this content, maybe you didn’t intend for us to block it.” The more specific you get about an individual URL, the better they usually are about respecting it. The less specific, meaning the more you use wildcards or say “everything behind this entire big directory,” the worse they are about necessarily believing you.

Meta robots—a little different—that lives in the headers of individual pages, so you can only control a single page with a meta robots tag. That tells the engines whether or not they should keep a page in the index, and whether they should follow the links on that page, and it’s usually a lot more respected, because it’s at an individual-page level; Google and Bing tend to believe you about the meta robots tag.

And then the nofollow tag, that lives on an individual link on a page. It doesn’t tell engines where to crawl or not to crawl. All it’s saying is whether you editorially vouch for a page that is being linked to, and whether you want to pass the PageRank and link equity metrics to that page.

Interesting point about meta robots and robots.txt working together (or not working together so well)—many, many folks in the SEO world do this and then get frustrated.

What if, for example, we take a page like “blogtest.html” on our domain and we say “all user agents, you are not allowed to crawl blogtest.html. Okay—that’s a good way to keep that page away from being crawled, but just because something is not crawled doesn’t necessarily mean it won’t be in the search results.

So then we have our SEO folks go, “you know what, let’s make doubly sure that doesn’t show up in search results; we’ll put in the meta robots tag:”

<meta name="robots" content="noindex, follow">

So, “noindex, follow” tells the search engine crawler they can follow the links on the page, but they shouldn’t index this particular one.

Then, you go and run a search for “blog test” in this case, and everybody on the team’s like “What the heck!? WTF? Why am I seeing this page show up in search results?”

The answer is, you told the engines that they couldn’t crawl the page, so they didn’t. But they are still putting it in the results. They’re actually probably not going to include a meta description; they might have something like “we can’t include a meta description because of this site’s robots.txt file.” The reason it’s showing up is because they can’t see the noindex; all they see is the disallow.

So, if you want something truly removed, unable to be seen in search results, you can’t just disallow a crawler. You have to say meta “noindex” and you have to let them crawl it.

So this creates some complications. Robots.txt can be great if we’re trying to save crawl bandwidth, but it isn’t necessarily ideal for preventing a page from being shown in the search results. I would not recommend, by the way, that you do what we think Twitter recently tried to do, where they tried to canonicalize www and non-www by saying “Google, don’t crawl the www version of twitter.com.” What you should be doing is rel canonical-ing or using a 301.

Meta robots—that can allow crawling and link-following while disallowing indexation, which is great, but it requires crawl budget and you can still conserve indexing.

The nofollow tag, generally speaking, is not particularly useful for controlling bots or conserving indexation.

Webmaster Tools (now Google Search Console) has some special things that allow you to restrict access or remove a result from the search results. For example, if you have 404’d something or if you’ve told them not to crawl something but it’s still showing up in there, you can manually say “don’t do that.” There are a few other crawl protocol things that you can do.

And then URL status codes—these are a valid way to do things, but they’re going to obviously change what’s going on on your pages, too.

If you’re not having a lot of luck using a 404 to remove something, you can use a 410 to permanently remove something from the index. Just be aware that once you use a 410, it can take a long time if you want to get that page re-crawled or re-indexed, and you want to tell the search engines “it’s back!” 410 is permanent removal.

301—permanent redirect, we’ve talked about those here—and 302, temporary redirect.

Now let’s jump into a few specific use cases of “what kinds of content should and shouldn’t I allow engines to crawl and index” in this next version…

[Rand moves at superhuman speed to erase the board and draw part two of this Whiteboard Friday. Seriously, we showed Roger how fast it was, and even he was impressed.]

Four crawling/indexing problems to solve

So we’ve got these four big problems that I want to talk about as they relate to crawling and indexing.

1. Content that isn’t ready yet

The first one here is around, “If I have content of quality I’m still trying to improve—it’s not yet ready for primetime, it’s not ready for Google, maybe I have a bunch of products and I only have the descriptions from the manufacturer and I need people to be able to access them, so I’m rewriting the content and creating unique value on those pages… they’re just not ready yet—what should I do with those?”

My options around crawling and indexing? If I have a large quantity of those—maybe thousands, tens of thousands, hundreds of thousands—I would probably go the robots.txt route. I’d disallow those pages from being crawled, and then eventually as I get (folder by folder) those sets of URLs ready, I can then allow crawling and maybe even submit them to Google via an XML sitemap.

If I’m talking about a small quantity—a few dozen, a few hundred pages—well, I’d probably just use the meta robots noindex, and then I’d pull that noindex off of those pages as they are made ready for Google’s consumption. And then again, I would probably use the XML sitemap and start submitting those once they’re ready.

2. Dealing with duplicate or thin content

What about, “Should I noindex, nofollow, or potentially disallow crawling on largely duplicate URLs or thin content?” I’ve got an example. Let’s say I’m an ecommerce shop, I’m selling this nice Star Wars t-shirt which I think is kind of hilarious, so I’ve got starwarsshirt.html, and it links out to a larger version of an image, and that’s an individual HTML page. It links out to different colors, which change the URL of the page, so I have a gray, blue, and black version. Well, these four pages are really all part of this same one, so I wouldn’t recommend disallowing crawling on these, and I wouldn’t recommend noindexing them. What I would do there is a rel canonical.

Remember, rel canonical is one of those things that can be precluded by disallowing. So, if I were to disallow these from being crawled, Google couldn’t see the rel canonical back, so if someone linked to the blue version instead of the default version, now I potentially don’t get link credit for that. So what I really want to do is use the rel canonical, allow the indexing, and allow it to be crawled. If you really feel like it, you could also put a meta “noindex, follow” on these pages, but I don’t really think that’s necessary, and again that might interfere with the rel canonical.

3. Passing link equity without appearing in search results

Number three: “If I want to pass link equity (or at least crawling) through a set of pages without those pages actually appearing in search results—so maybe I have navigational stuff, ways that humans are going to navigate through my pages, but I don’t need those appearing in search results—what should I use then?”

What I would say here is, you can use the meta robots to say “don’t index the page, but do follow the links that are on that page.” That’s a pretty nice, handy use case for that.

Do NOT, however, disallow those in robots.txt—many, many folks make this mistake. What happens if you disallow crawling on those, Google can’t see the noindex. They don’t know that they can follow it. Granted, as we talked about before, sometimes Google doesn’t obey the robots.txt, but you can’t rely on that behavior. Trust that the disallow in robots.txt will prevent them from crawling. So I would say, the meta robots “noindex, follow” is the way to do this.

4. Search results-type pages

Finally, fourth, “What should I do with search results-type pages?” Google has said many times that they don’t like your search results from your own internal engine appearing in their search results, and so this can be a tricky use case.

Sometimes a search result page—a page that lists many types of results that might come from a database of types of content that you’ve got on your site—could actually be a very good result for a searcher who is looking for a wide variety of content, or who wants to see what you have on offer. Yelp does this: When you say, “I’m looking for restaurants in Seattle, WA,” they’ll give you what is essentially a list of search results, and Google does want those to appear because that page provides a great result. But you should be doing what Yelp does there, and make the most common or popular individual sets of those search results into category-style pages. A page that provides real, unique value, that’s not just a list of search results, that is more of a landing page than a search results page.

However, that being said, if you’ve got a long tail of these, or if you’d say “hey, our internal search engine, that’s really for internal visitors only—it’s not useful to have those pages show up in search results, and we don’t think we need to make the effort to make those into category landing pages.” Then you can use the disallow in robots.txt to prevent those.

Just be cautious here, because I have sometimes seen an over-swinging of the pendulum toward blocking all types of search results, and sometimes that can actually hurt your SEO and your traffic. Sometimes those pages can be really useful to people. So check your analytics, and make sure those aren’t valuable pages that should be served up and turned into landing pages. If you’re sure, then go ahead and disallow all your search results-style pages. You’ll see a lot of sites doing this in their robots.txt file.

That being said, I hope you have some great questions about crawling and indexing, controlling robots, blocking robots, allowing robots, and I’ll try and tackle those in the comments below.

We’ll look forward to seeing you again next week for another edition of Whiteboard Friday. Take care!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

How to Defeat Duplicate Content – Next Level

Posted by EllieWilkinson

Welcome to the third installment of Next Level! In the previous Next Level blog post, we shared a workflow showing you how to take on your competitors using Moz tools. We’re continuing the educational series with several new videos all about resolving duplicate content. Read on and level up!


Dealing with duplicate content can feel a bit like doing battle with your site’s evil doppelgänger—confusing and tricky to defeat! But identifying and resolving duplicates is a necessary part of helping search engines decide on relevant results. In this short video, learn about how duplicate content happens, why it’s important to fix, and a bit about how you can uncover it.

Next Level – Identifying Duplicate_pt1

[
Quick clarification: Search engines don’t actively penalize duplicate content, per se; they just don’t always understand it as well, which can lead to a drop in rankings. More info here.]

Now that you have a better idea of how to identify those dastardly duplicates, let’s get rid of ’em once and for all. Watch this next video to review how to use Moz Analytics to find and fix duplicate content using three common solutions. (You’ll need a Moz Pro subscription to use Moz Analytics. If you aren’t yet a Moz Pro subscriber, you can always try out the tools with a
30-day free trial.)

Workflow summary

Here’s a review of the three common solutions to conquering duplicate content:

  1. 301 redirect. Check Page Authority to see if one page has a higher PA than the other using Open Site Explorer, then set up a 301 redirect from the duplicate page to the original page. This will ensure that they no longer compete with one another in the search results. Wondering what a 301 redirect is and how to do it? Read more about redirection here.
  2. Rel=canonical. A rel=canonical tag passes the same amount of ranking power as a 301 redirect, and there’s a bonus: it often takes less development time to implement! Add this tag to the HTML head of a web page to tell search engines that it should be treated as a copy of the “canon,” or original, page:
    <head> <link rel="canonical" href="http://moz.com/blog/" /> </head>

    If you’re curious, you can
    read more about canonicalization here.

  3. noindex, follow. Add the values “noindex, follow” to the meta robots tag to tell search engines not to include the duplicate pages in their indexes, but to crawl their links. This works really well with paginated content or if you have a system set up to tag or categorize content (as with a blog). Here’s what it should look like:
    <head> <meta name="robots" content="noindex, follow" /> </head>

    If you’re looking to block the Moz crawler, Rogerbot, you can use the robots.txt file if you prefer—he’s a good robot, and he’ll obey!
    More about meta robots (and robots.txt) here.

Can’t get enough of duplicate content? Want to become a duplicate content connoisseur? This last video explains more about how Moz finds duplicates, if you’re curious. And you can read even more over at the
Moz Developer Blog.

We’d love to hear about your techniques for defeating duplicates! Chime in below in the comments.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

12 Common Reasons Reconsideration Requests Fail

Posted by Modestos

There are several reasons a reconsideration request might fail. But some of the most common mistakes site owners and inexperienced SEOs make when trying to lift a link-related Google penalty are entirely avoidable. 

Here’s a list of the top 12 most common mistakes made when submitting reconsideration requests, and how you can prevent them.

1. Insufficient link data

This is one of the most common reasons why reconsideration requests fail. This mistake is readily evident each time a reconsideration request gets rejected and the example URLs provided by Google are unknown to the webmaster. Relying only on Webmaster Tools data isn’t enough, as Google has repeatedly said. You need to combine data from as many different sources as possible. 

A good starting point is to collate backlink data, at the very least:

  • Google Webmaster Tools (both latest and sample links)
  • Bing Webmaster Tools
  • Majestic SEO (Fresh Index)
  • Ahrefs
  • Open Site Explorer

If you use any toxic link-detection services (e.g., Linkrisk and Link Detox), then you need to take a few precautions to ensure the following:

  • They are 100% transparent about their backlink data sources
  • They have imported all backlink data
  • You can upload your own backlink data (e.g., Webmaster Tools) without any limitations

If you work on large websites that have tons of backlinks, most of these automated services are very likely used to process just a fraction of the links, unless you pay for one of their premium packages. If you have direct access to the above data sources, it’s worthwhile to download all backlink data, then manually upload it into your tool of choice for processing. This is the only way to have full visibility over the backlink data that has to be analyzed and reviewed later. Starting with an incomplete data set at this early (yet crucial) stage could seriously hinder the outcome of your reconsideration request.

2. Missing vital legacy information

The more you know about a site’s history and past activities, the better. You need to find out (a) which pages were targeted in the past as part of link building campaigns, (b) which keywords were the primary focus and (c) the link building tactics that were scaled (or abused) most frequently. Knowing enough about a site’s past activities, before it was penalized, can help you home in on the actual causes of the penalty. Also, collect as much information as possible from the site owners.

3. Misjudgement

Misreading your current situation can lead to wrong decisions. One common mistake is to treat the example URLs provided by Google as gospel and try to identify only links with the same patterns. Google provides a very small number of examples of unnatural links. Often, these examples are the most obvious and straightforward ones. However, you should look beyond these examples to fully address the issues and take the necessary actions against all types of unnatural links. 

Google is very clear on the matter: “Please correct or remove all inorganic links, not limited to the samples provided above.

Another common area of bad judgement is the inability to correctly identify unnatural links. This is a skill that requires years of experience in link auditing, as well as link building. Removing the wrong links won’t lift the penalty, and may also result in further ranking drops and loss of traffic. You must remove the right links.


4. Blind reliance on tools

There are numerous unnatural link-detection tools available on the market, and over the years I’ve had the chance to try out most (if not all) of them. Because (and without any exception) I’ve found them all very ineffective and inaccurate, I do not rely on any such tools for my day-to-day work. In some cases, a lot of the reported “high risk” links were 100% natural links, and in others, numerous toxic links were completely missed. If you have to manually review all the links to discover the unnatural ones, ensuring you don’t accidentally remove any natural ones, it makes no sense to pay for tools. 

If you solely rely on automated tools to identify the unnatural links, you will need a miracle for your reconsideration request to be successful. The only tool you really need is a powerful backlink crawler that can accurately report the current link status of each URL you have collected. You should then manually review all currently active links and decide which ones to remove. 

I could write an entire book on the numerous flaws and bugs I have come across each time I’ve tried some of the most popular link auditing tools. A lot of these issues can be detrimental to the outcome of the reconsideration request. I have seen many reconsiderations request fail because of this. If Google cannot algorithmically identify all unnatural links and must operate entire teams of humans to review the sites (and their links), you shouldn’t trust a $99/month service to identify the unnatural links.

If you have an in-depth understanding of Google’s link schemes, you can build your own process to prioritize which links are more likely to be unnatural, as I described in this post (see sections 7 & 8). In an ideal world, you should manually review every single link pointing to your site. Where this isn’t possible (e.g., when dealing with an enormous numbers of links or resources are unavailable), you should at least focus on the links that have the more “unnatural” signals and manually review them.

5. Not looking beyond direct links

When trying to lift a link-related penalty, you need to look into all the links that may be pointing to your site directly or indirectly. Such checks include reviewing all links pointing to other sites that have been redirected to your site, legacy URLs with external inbound links that have been internally redirected owned, and third-party sites that include cross-domain canonicals to your site. For sites that used to buy and redirect domains in order increase their rankings, the quickest solution is to get rid of the redirects. Both Majestic SEO and Ahrefs report redirects, but some manual digging usually reveals a lot more.

PQPkyj0.jpg

6. Not looking beyond the first link

All major link intelligence tools, including Majestic SEO, Ahrefs and Open Site Explorer, report only the first link pointing to a given site when crawling a page. This means that, if you overly rely on automated tools to identify links with commercial keywords, the vast majority of them will only take into consideration the first link they discover on a page. If a page on the web links just once to your site, this is not big deal. But if there are multiple links, the tools will miss all but the first one.

For example, if a page has five different links pointing to your site, and the first one includes a branded anchor text, these tools will just report the first link. Most of the link-auditing tools will in turn evaluate the link as “natural” and completely miss the other four links, some of which may contain manipulative anchor text. The more links that get missed this way the more likely your reconsideration request will fail.

7. Going too thin

Many SEOs and webmasters (still) feel uncomfortable with the idea of losing links. They cannot accept the idea of links that once helped their rankings are now being devalued, and must be removed. There is no point trying to save “authoritative”, unnatural links out of fear of losing rankings. If the main objective is to lift the penalty, then all unnatural links need to be removed.

Often, in the first reconsideration request, SEOs and site owners tend to go too thin, and in the subsequent attempts start cutting deeper. If you are already aware of the unnatural links pointing to your site, try to get rid of them from the very beginning. I have seen examples of unnatural links provided by Google on PR 9/DA 98 sites. Metrics do not matter when it comes to lifting a penalty. If a link is manipulative, it has to go.

In any case, Google’s decision won’t be based only on the number of links that have been removed. Most important in the search giant’s eyes are the quality of links still pointing to your site. If the remaining links are largely of low quality, the reconsideration request will almost certainly fail. 

8. Insufficient effort to remove links

Google wants to see a “good faith” effort to get as many links removed as possible. The higher the percentage of unnatural links removed, the better. Some agencies and SEO consultants tend to rely too much on the use of the disavow tool. However, this isn’t a panacea, and should be used as a last resort for removing those links that are impossible to remove—after exhausting all possibilities to physically remove them via the time-consuming (yet necessary) outreach route. 

Google is very clear on this:

m4M4n3g.jpg?1

Even if you’re unable to remove all of the links that need to be removed, you must be able to demonstrate that you’ve made several attempts to have them removed, which can have a favorable impact on the outcome of the reconsideration request. Yes, in some cases it might be possible to have a penalty lifted simply by disavowing instead of removing the links, but these cases are rare and this strategy may backfire in the future. When I reached out to ex-googler Fili Wiese’s for some advice on the value of removing the toxic links (instead of just disavowing them), his response was very straightforward:

V3TmCrj.jpg 

9. Ineffective outreach

Simply identifying the unnatural links won’t get the penalty lifted unless a decent percentage of the links have been successfully removed. The more communication channels you try, the more likely it is that you reach the webmaster and get the links removed. Sending the same email hundreds or thousands of times is highly unlikely to result in a decent response rate. Trying to remove a link from a directory is very different from trying to get rid of a link appearing in a press release, so you should take a more targeted approach with a well-crafted, personalized email. Link removal request emails must be honest and to the point, or else they’ll be ignored.

Tracking the emails will also help in figuring out which messages have been read, which webmasters might be worth contacting again, or alert you of the need to try an alternative means of contacting webmasters.

Creativity, too, can play a big part in the link removal process. For example, it might be necessary to use social media to reach the right contact. Again, don’t trust automated emails or contact form harvesters. In some cases, these applications will pull in any email address they find on the crawled page (without any guarantee of who the information belongs to). In others, they will completely miss masked email addresses or those appearing in images. If you really want to see that the links are removed, outreach should be carried out by experienced outreach specialists. Unfortunately, there aren’t any shortcuts to effective outreach.

10. Quality issues and human errors

All sorts of human errors can occur when filing a reconsideration request. The most common errors include submitting files that do not exist, files that do not open, files that contain incomplete data, and files that take too long to load. You need to triple-check that the files you are including in your reconsideration request are read-only, and that anyone with the URL can fully access them. 

Poor grammar and language is also bad practice, as it may be interpreted as “poor effort.” You should definitely get the reconsideration request proofread by a couple of people to be sure it is flawless. A poorly written reconsideration request can significantly hinder your overall efforts.

Quality issues can also occur with the disavow file submission. Disavowing at the URL level isn’t recommended because the link(s) you want to get rid of are often accessible to search engines via several URLs you may be unaware of. Therefore, it is strongly recommended that you disavow at the domain or sub-domain level.

11. Insufficient evidence

How does Google know you have done everything you claim in your reconsideration request? Because you have to prove each claim is valid, you need to document every single action you take, from sent emails and submitted forms, to social media nudges and phone calls. The more information you share with Google in your reconsideration request, the better. This is the exact wording from Google:

“ …we will also need to see good-faith efforts to remove a large portion of inorganic links from the web wherever possible.”

12. Bad communication

How you communicate your link cleanup efforts is as essential as the work you are expected to carry out. Not only do you need to explain the steps you’ve taken to address the issues, but you also need to share supportive information and detailed evidence. The reconsideration request is the only chance you have to communicate to Google which issues you have identified, and what you’ve done to address them. Being honest and transparent is vital for the success of the reconsideration request.

There is absolutely no point using the space in a reconsideration request to argue with Google. Some of the unnatural links examples they share may not always be useful (e.g., URLs that include nofollow links, removed links, or even no links at all). But taking the argumentative approach veritably guarantees your request will be denied.

54adb6e0227790.04405594.jpg
Cropped from photo by Keith Allison, licensed under Creative Commons.

Conclusion

Getting a Google penalty lifted requires a good understanding of why you have been penalized, a flawless process and a great deal of hands-on work. Performing link audits for the purpose of lifting a penalty can be very challenging, and should only be carried out by experienced consultants. If you are not 100% sure you can take all the required actions, seek out expert help rather than looking for inexpensive (and ineffective) automated solutions. Otherwise, you will almost certainly end up wasting weeks or months of your precious time, and in the end, see your request denied.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from moz.com

10 Predictions for the Marketing World in 2015

Posted by randfish

The beginning of the year marks the traditional week for bloggers to prognosticate about the 12 months ahead, and, over the last decade I’ve created a tradition of joining in this festive custom to predict the big trends in SEO and web marketing. However, I divine the future by a strict code: I’m only allowed to make predictions IF my predictions from last year were at least moderately accurate (otherwise, why should you listen to me?). So, before I bring my crystal-ball-gazing, let’s have a look at how I did for 2014.

Yes, we’ll get to that, but not until you prove you’re a real Wizard, mustache-man.

You can find 
my post from January 5th of last year here, but I won’t force you to read through it. Here’s how I do grading:

  • Spot On (+2) – when a prediction hits the nail on the head and the primary criteria are fulfilled
  • Partially Accurate (+1) – predictions that are in the area, but are somewhat different than reality
  • Not Completely Wrong (-1) – those that landed near the truth, but couldn’t be called “correct” in any real sense
  • Off the Mark (-2) – guesses which didn’t come close

If the score is positive, prepare for more predictions, and if it’s negative, I’m clearly losing the pulse of the industry. Let’s tally up the numbers.

In 2014, I made 6 predictions:

#1: Twitter will go Facebook’s route and create insights-style pages for at least some non-advertising accounts

Grade: +2

Twitter rolled out Twitter analytics for all users this year (
starting in July for some accounts, and then in August for everyone), and while it’s not nearly as full-featured as Facebook’s “Insights” pages, it’s definitely in line with the spirit of this prediction.

#2: We will see Google test search results with no external, organic listings

Grade: -2

I’m very happy to be wrong about this one. To my knowledge, Google has yet to go this direction and completely eliminate external-pointing links on search results pages. Let’s hope they never do.

That said, there are plenty of SERPs where Google is taking more and more of the traffic away from everyone but themselves, e.g.:

I think many SERPs that have basic, obvious functions like ”
timer” are going to be less and less valuable as traffic sources over time.

#3: Google will publicly acknowledge algorithmic updates targeting both guest posting and embeddable infographics/badges as manipulative linking practices

Grade: -1

Google most certainly did release an update (possibly several)
targeted at guest posts, but they didn’t publicly talk about something specifically algorithmic targeting emebedded content/badges. It’s very possible this was included in the rolling Penguin updates, but the prediction said “publicly acknowledge” so I’m giving myself a -1.

#4: One of these 5 marketing automation companies will be purchased in the 9-10 figure $ range: Hubspot, Marketo, Act-On, Silverpop, or Sailthru

Grade: +2

Silverpop was 
purchased by IBM in April of 2014. While a price wasn’t revealed, the “sources” quoted by the media estimated the deal in the ~$270mm range. I’m actually surprised there wasn’t another sale, but this one was spot-on, so it gets the full +2.

#5: Resumes listing “content marketing” will grow faster than either SEO or “social media marketing”

Grade: +1

As a percentage, this certainly appears to be the case. Here’s some stats:

  • US profiles with “content marketing”
    • June 2013: 30,145
    • January 2015: 68,580
    • Growth: 227.5%
  • US profiles with “SEO”
    • June 2013: 364,119
    • January 2015: 596,050
    • Growth: 163.7%
  • US profiles with “social media marketing”
    • June 2013: 938,951
    • January 2015: 1,990,677
    • Growth: 212%

Granted, content marketing appears on far fewer profiles than SEO or social media marketing, but it has seen greater growth. I’m only giving myself a +1 rather than a +2 on this because, while the prediction was mathematically correct, the numbers of SEO and social still dwarf content marketing as a term. In fact, in LinkedIn’s 
annual year-end report of which skills got people hired the most, SEO was #5! Clearly, the term and the skillset continue to endure and be in high demand.

#6: There will be more traffic sent by Pinterest than Twitter in Q4 2014 (in the US)

Grade: +1

This is probably accurate, since Pinterest appears to have grown faster in 2014 than Twitter by a good amount AND this was 
already true in most of 2014 according to SharedCount (though I’m not totally sold on the methodology of coverage for their numbers). However, we won’t know the truth for a few months to come, so I’d be presumptuous in giving a full +2. I am a bit surprised that Pinterest continues to grow at such a rapid pace — certainly a very impressive feat for an established social network.


SOURCE: 
Global Web Index

With Twitter’s expected moves into embedded video, it’s my guess that we’ll continue to see a lot more Twitter engagement and activity on Twitter itself, and referring traffic outward won’t be as considerable a focus. Pinterest seems to be one of the only social networks that continues that push (as Facebook, Instagram, LinkedIn, and YouTube all seem to be pursuing a “keep them here” strategy).

——————————–

Final Score: +3

That positive number means I’ve passed my bar and can make another set of predictions for 2015. I’m going to be a little more aggressive this year, even though it risks ruining my sterling record, simply because I think it’s more exciting 🙂

Thus, here are my 10 predictions for what the marketing world will bring us in 2015:

#1: We’ll see the first major not-for-profit University in the US offer a degree in Internet Marketing, including classes on SEO.

There are already some private, for-profit offerings from places like Fullsail and Univ. of Phoenix, but I don’t know that these pedigrees carry much weight. Seeing a Stanford, a Wharton, or a University of Washington offer undergraduate or MBA programs in our field would be a boon to those seeking options and an equal boon to the universities.

The biggest reason I think we’re ripe for this in 2015 is the 
LinkedIn top 25 job skills data showing the immense value of SEO (#5) and digital/online marketing (#16) in a profile when seeking a new job. That should (hopefully) be a direct barometer for what colleges seek to include in their repertoire.

#2: Google will continue the trend of providing instant answers in search results with more interactive tools.

Google has been doing instant answers for a long time, but in addition to queries with immediate and direct responses, they’ve also undercut a number of online tool vendors by building their own versions directly into the SERPs, like they do currently for queries like ”
timer” and “calculator.”

I predict in 2015, we’ll see more partnerships like what’s provided with 
OpenTable and the ability to book reservations directly from the SERPs, possibly with companies like Uber, Flixster (they really need to get back to a better instant answer for movies+city), Zillow, or others that have unique data that could be surfaced directly.

#3: 2015 will be the year Facebook begins including some form of web content (not on Facebook’s site) in their search functionality.

Facebook 
severed their search relationship with Bing in 2014, and I’m going to make a very risky prediction that in 2015, we’ll see Facebook’s new search emerge and use some form of non-Facebook web data. Whether they’ll actually build their own crawler or merely license certain data from outside their properties is another matter, but I think Facebook’s shown an interest in getting more sophisticated with their ad offerings, and any form of search data/history about their users would provide a powerful addition to what they can do today.

#4: Google’s indexation of Twitter will grow dramatically, and a significantly higher percentage of tweets, hashtags, and profiles will be indexed by the year’s end.

Twitter has been 
putting more muscle behind their indexation and SEO efforts, and I’ve seen more and more Twitter URLs creeping into the search results over the last 6 months. I think that trend continues, and in 2015, we see Twitter.com enter the top 5-6 “big domains” in Mozcast.

#5: The EU will take additional regulatory action against Google that will create new, substantive changes to the search results for European searchers.

In 2014, we saw the EU 
enforce the “right to be forgotten” and settle some antitrust issues that require Google to edit what it displays in the SERPs. I don’t think the EU is done with Google. As the press has noted, there are plenty of calls in the European Parliament to break up the company, and while I think the EU will stop short of that measure, I believe we’ll see additional regulatory action that affects search results.

On a personal opinion note, I would add that while I’m not thrilled with how the EU has gone about their regulation of Google, I am impressed by their ability to do so. In the US, with 
Google becoming the second largest lobbying spender in the country and a masterful influencer of politicians, I think it’s extremely unlikely that they suffer any antitrust or regulatory action in their home country — not because they haven’t engaged in monopolistic behavior, but because they were smart enough to spend money to manipulate elected officials before that happened (unlike Microsoft, who, in the 1990’s, assumed they wouldn’t become a target).

Thus, if there is to be any hedge to Google’s power in search, it will probably come from the EU and the EU alone. There’s no competitor with the teeth or market share to have an impact (at least outside of China, Russia, and South Korea), and no other government is likely to take them on.

#6: Mobile search, mobile devices, SSL/HTTPS referrals, and apps will combine to make traffic source data increasingly hard to come by.

I’ll estimate that by year’s end, many major publishers will see 40%+ of their traffic coming from “direct” even though most of that is search and social referrers that fail to pass the proper referral string. Hopefully, we’ll be able to verify that through folks like 
Define Media Group, whose data sharing this year has made them one of the best allies marketers have in understanding the landscape of web traffic patterns.

BTW – I’d already estimate that 30-50% of all “direct” traffic is, in fact, search or social traffic that hasn’t been properly attributed. This is a huge challenge for web marketers — maybe one of the greatest challenges we face, because saying “I brought in a lot more traffic, I just can’t prove it or measure it,” isn’t going to get you nearly the buy-in, raises, or respect that your paid-traffic compatriots can earn by having every last visit they drive perfectly attributed.

#7: The content advertising/recommendation platforms will continue to consolidate, and either Taboola or Outbrain will be acquired or do some heavy acquiring themselves.

We just witnessed the 
surprising shutdown of nRelate, which I suspect had something to do with IAC politics more than just performance and potential for the company. But given that less than 2% of the web’s largest sites use content recommendation/promotion services and yet both Outbrain and Taboola are expected to have pulled in north of $200m in 2014, this is a massive area for future growth.

Yahoo!, Facebook, and Google are all potential acquirers here, and I could even see AOL (who already own Gravity) or Buzzfeed making a play. Likewise, there’s a slew of smaller/other players that Taboola or Outbrain themselves could acquire: Zemanta, Adblade, Zegnet, Nativo, Disqus, Gravity, etc. It’s a marketplace as ripe for acquisition as it is for growth.

#8: Promoted pins will make Pinterest an emerging juggernaut in the social media and social advertising world, particularly for e-commerce.

I’d estimate we’ll see figures north of $50m spent on promoted pins in 2015. This is coming after Pinterest only just 
opened their ad platform beyond a beta group this January. But, thanks to high engagement, lots of traffic, and a consumer base that B2C marketers absolutely love and often struggle to reach, I think Pinterest is going to have a big ad opportunity on their hands.

Note the promoted pin from Mad Hippie on the right

(apologies for very unappetizing recipes featured around it)

#9: Foursquare (and/or Swarm) will be bought, merge with someone, or shut down in 2015 (probably one of the first two).

I used to love Foursquare. I used the service multiple times every day, tracked where I went with it, ran into friends in foreign cities thanks to its notifications, and even used it to see where to go sometimes (in Brazil, for example, I found Foursquare’s business location data far superior to Google Maps’). Then came the split from Swarm. Most of my friends who were using Foursquare stopped, and the few who continued did so less frequently. Swarm itself tried to compete with Yelp, but it looks like 
neither is doing well in the app rankings these days.

I feel a lot of empathy for Dennis and the Foursquare team. I can totally understand the appeal, from a development and product perspective, of splitting up the two apps to let each concentrate on what it’s best at, and not dilute a single product with multiple primary use cases. Heck, we’re trying to learn that lesson at Moz and refocus our products back on SEO, so I’m hardly one to criticize. That said, I think there’s trouble brewing for the company and probably some pressure to sell while their location and check-in data, which is still hugely valuable, is robust enough and unique enough to command a high price.

#10: Amazon will not take considerable search share from Google, nor will mobile search harm Google’s ad revenue substantively.

The “Google’s-in-trouble” pundits are mostly talking about two trends that could hurt Google’s revenue in the year ahead. First, mobile searchers being less valuable to Google because they don’t click on ads as often and advertisers won’t pay as much for them. And, second, Amazon becoming the destination for direct, commercial queries ahead of Google.

In 2015, I don’t see either of these taking a toll on Google. I believe most of Amazon’s impact as a direct navigation destination for e-commerce shoppers has already taken place and while Google would love to get those searchers back, that’s already a lost battle (to the extent it was lost). I also don’t think mobile is a big concern for Google — in fact, I think they’re pivoting it into an opportunity, and taking advantage of their ability to connect mobile to desktop through Google+/Android/Chrome. Desktop search may have flatter growth, and it may even decline 5-10% before reaching a state of equilibrium, but mobile is growing at such a huge clip that Google has plenty of time and even plentier eyeballs and clicks to figure out how to drive more revenue per searcher.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from moz.com

Google Webmaster Tools Just Got a Lot More Important for Link Discovery and Cleanup

Posted by RobertFisher

What if you owned a paid directory site and every day you received emails upon emails stating that someone wants links removed. As they stacked up in your inbox, whether they were pleasant or they were sternly demanding you cease and desist, would you just want to give up?
What would you do to stop the barrage of emails if you thought the requests were just too overwhelming? How could you make it all go away, or at least the majority of it?

First, a bit of background

We had a new, important client come aboard on April 1, 2013 with a lot of work needed going forward. They had been losing rankings for some time and wanted help. With new clients, we want as much baseline data as possible so that we can measure progress going forward, so we do a lot of monitoring. On April 17th, one of our team members noticed something quite interesting. Using Ahrefs for link tracking, we saw there was a big spike in the number of external links coming to our new client’s site. 

When the client came on board on two weeks prior, the site had about 5,500 links coming in and many of those were less than quality. Likely half or more were comment links from sites with no relevance to the client and they used the domain as the anchor text. Now, overnight they were at 6,100 links and the next day even more. Each day the links kept increasing. We saw they were coming from a paid directory called Netwerker.com. Within a month to six weeks, they were at over 30,000 new links from that site.

We sent a couple of emails asking that they please stop the linking, and we watched Google Webmaster Tools (GWT) every day like hawks waiting for the first link from Netwerker to show. The emails got no response, but in late May we saw the first links from there show up in GWT and we submitted a domain disavow immediately.

We launched their new site in late June and watched as they climbed in the rankings; that is a great feeling. Because the site was rising in the rankings rather well, we assumed the disavow tool had worked on Netwerker. Unfortunately, there was a cloud on the horizon concerning all of the link building that had been done for the client prior to our engagement. October arrived with a Penguin attack
(Penguin 2.1, Oct. 4, 2013) and they fell considerably in the SERPs. I mean, they disappeared for many of the best terms they had again began to rank for. They had fallen to page five or deeper for key terms. (NOTE: This was all algorithmic and they had no manual penalty.)

While telling the client that their new drop was a Penguin issue related to the October Penguin update (and the large ratio of really bad links), we also looked for anything else that would cause the issue or might be affecting the results. We are constantly monitoring and changing things with our clients. As a result, there are times we do not make a good change and we have to move things back. (We always tell the client if we have caused a negative impact on their rankings, etc. This is one of the most important things we ever do in building trust over time and we have never lost a client because we made a mistake.) We went through everything thoroughly and eliminated any other potential causative factors. At every turn there was a Penguin staring back at us!

When we had launched the new site in late June 2013, we had seen them rise back to page one for key terms in a competitive vertical. Now, they were missing for the majority of their most important terms. In mid-March of 2014, nearly a year after engagement, they agreed to do a severe link clean up and we began immediately. There would be roughly 45,000 – 50,000 links to clean up, but with 30,000 from the one domain already appropriately disavowed, it was a bit less daunting. I have to say here that I believe their reticence to do the link cleanup was due to really bad SEO in the past. They had, over time, had several SEO people/firms and at every turn, they were given poor advice. I believe they were misinformed into believing that high rankings were easy to get and there were “tricks” that would fool Google so you could pull it off. So, it really isn’t a client’s fault when they believe things are easy in the world of SEO.

Finally, it begins to be fun

About two weeks in, we saw them start to pop up randomly in the rankings. We were regularly getting responses back from linking sites. Some responses were positive and some were requests for money to remove the links; the majority gave us the famous “no reply.” But, we were making progress and beginning to see a result. Around the first or second week of April their most precious term, geo location + product/service, was ranked number one and their rich snippets were beautiful. It came and went over the next week or two, staying longer each time.

To track links we use MajesticSEO, Ahrefs, Open Site Explorer, and Google Webmaster Tools. As the project progressed, our Director of Content and Media who was overseeing the project could not understand why so many links were falling off so quickly. Frankly, we were not getting that many agreeing to remove them.

Here is a screenshot of the lost links from Ahrefs.

Here are the lost links in MajesticSEO.

MajesticSEO Lost Links March to May

We were seeing links fall off as if the wording we had used in our emails to the sites was magical. This caused a bit of skepticism on our team’s part so they began to dig deeper. It took little time to realize the majority of the links that were falling off were from Netwerker! (Remember, a disavow does not keep the links from showing in the link research tools.) Were they suddenly good guys and willing to clear it all up? Had our changed wording caused a change of heart? No, the links from Netwerker still showed in GWT; Webmaster Tools had never shown all from Netwerker, only about 13,000, and it was still showing 13,000. But, was that just because Google was slower at showing the change? To check we did a couple of things. First, we just tried out the links that were “lost” and we saw they still resolved to the site, so we dug some more.

Using a bit of magic in the form of a
User-Agent Switcher extension and eSolutions, What’s my info? (to verify the correct user-agent was being presented), our head of development ran the user-agent string for Ahrefs and MajesticSEO. What he found was that Netwerker was now starting to block MajesticSEO and Ahrefs via a 406 response. We were unable to check Removeem, but the site was not yet blocking OSE. Here are some screenshots to show the results we are seeing. Notice in the first screenshot, all is well with Googlebot.


But A Different Story for Ahrefs


And a Different Story for MajesticSEO

We alerted both Ahrefs and MajesticSEO and neither responded beyond we will look into it canned response. We thought it important to let those dealing with link removal know to look even more carefully. Now August and three months in, both maintain the original response.

User-agents and how to run these tests

The user-agent or user-agent string is sent to the server along with any request. This allows the server to determine the best response to deliver based on conditions set up by its developers. It appears in the case of Netwerker’s servers that the response is to deny access to certain user-agents.

  1. We used the User-Agent Switcher extension for Chrome
  2. Next determine the user-agent string you would like to check (these can be found on various sites, one set of examples can be found at: http://www.useragentstring.com/. In most cases, the owner of the crawler or browser will have a webpage associated with them, for example the Ahrefs bot.)
  3. Within the User-Agent Switcher extension, open the options panel and add the new user-agent string.
  4. Browse to the site you would like to check.
  5. Using the User-Agent Switcher select the Agent you would like to view the site as, it will reload the page and you will be viewing it as the new user-agent string.
  6. We used eSolutions, What’s my info? to verify that the User-Agent Switcher was presenting the correct data to us.

A final summary

If you talk with anyone who is known for link removal (think people like Ryan Kent of Vitopian, an expert in Link cleanup), they will tell you to use every link report you can get your hands on to ensure you miss nothing. They always include Google Webmaster Tools as an important tool. Personally, while we always use GWT, early on I did not think GWT was important for other than checking to see if we missed anything due to them consistently showing less links than others and all of the links showing in GWT are usually showing in the other tools. My opinion has changed with this revelation.

Given we gather data on clients early on, we had something to refer back to with the link clean-up; today if someone comes in and we have no history of their links, we must assume they will have links from sites blocking major link discovery tools and we have a heightened sense of caution. We will not believe we have cleaned everything ever again; we can believe we cleaned everything in GWT.

If various directories and other sites with a lot of outbound links start blocking link discovery tools because they, “just don’t want to hear any more removal requests,” GWT just became your most important tool for catching the ones that block the tools. They would not want to block Google or Bing for the obvious reasons.

So, as you go forward and you look at links with your own site and/or with clients, I suggest that you go to GWT to make sure there is not something showing there which fails to show in the well-known link discovery tools.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from feedproxy.google.com