Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking – Whiteboard Friday

Posted by randfish

There’s no doubt that quite a bit has changed about SEO, and that the field is far more integrated with other aspects of online marketing than it once was. In today’s Whiteboard Friday, Rand pushes back against the idea that effective modern SEO doesn’t require any technical expertise, outlining a fantastic list of technical elements that today’s SEOs need to know about in order to be truly effective.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week I’m going to do something unusual. I don’t usually point out these inconsistencies or sort of take issue with other folks’ content on the web, because I generally find that that’s not all that valuable and useful. But I’m going to make an exception here.

There is an article by Jayson DeMers, who I think might actually be here in Seattle — maybe he and I can hang out at some point — called “Why Modern SEO Requires Almost No Technical Expertise.” It was an article that got a shocking amount of traction and attention. On Facebook, it has thousands of shares. On LinkedIn, it did really well. On Twitter, it got a bunch of attention.

Some folks in the SEO world have already pointed out some issues around this. But because of the increasing popularity of this article, and because I think there’s, like, this hopefulness from worlds outside of kind of the hardcore SEO world that are looking to this piece and going, “Look, this is great. We don’t have to be technical. We don’t have to worry about technical things in order to do SEO.”

Look, I completely get the appeal of that. I did want to point out some of the reasons why this is not so accurate. At the same time, I don’t want to rain on Jayson, because I think that it’s very possible he’s writing an article for Entrepreneur, maybe he has sort of a commitment to them. Maybe he had no idea that this article was going to spark so much attention and investment. He does make some good points. I think it’s just really the title and then some of the messages inside there that I take strong issue with, and so I wanted to bring those up.

First off, some of the good points he did bring up.

One, he wisely says, “You don’t need to know how to code or to write and read algorithms in order to do SEO.” I totally agree with that. If today you’re looking at SEO and you’re thinking, “Well, am I going to get more into this subject? Am I going to try investing in SEO? But I don’t even know HTML and CSS yet.”

Those are good skills to have, and they will help you in SEO, but you don’t need them. Jayson’s totally right. You don’t have to have them, and you can learn and pick up some of these things, and do searches, watch some Whiteboard Fridays, check out some guides, and pick up a lot of that stuff later on as you need it in your career. SEO doesn’t have that hard requirement.

And secondly, he makes an intelligent point that we’ve made many times here at Moz, which is that, broadly speaking, a better user experience is well correlated with better rankings.

You make a great website that delivers great user experience, that provides the answers to searchers’ questions and gives them extraordinarily good content, way better than what’s out there already in the search results, generally speaking you’re going to see happy searchers, and that’s going to lead to higher rankings.

But not entirely. There are a lot of other elements that go in here. So I’ll bring up some frustrating points around the piece as well.

First off, there’s no acknowledgment — and I find this a little disturbing — that the ability to read and write code, or even HTML and CSS, which I think are the basic place to start, is helpful or can take your SEO efforts to the next level. I think both of those things are true.

So being able to look at a web page, view source on it, or pull up Firebug in Firefox or something and diagnose what’s going on and then go, “Oh, that’s why Google is not able to see this content. That’s why we’re not ranking for this keyword or term, or why even when I enter this exact sentence in quotes into Google, which is on our page, this is why it’s not bringing it up. It’s because it’s loading it after the page from a remote file that Google can’t access.” These are technical things, and being able to see how that code is built, how it’s structured, and what’s going on there, very, very helpful.

Some coding knowledge also can take your SEO efforts even further. I mean, so many times, SEOs are stymied by the conversations that we have with our programmers and our developers and the technical staff on our teams. When we can have those conversations intelligently, because at least we understand the principles of how an if-then statement works, or what software engineering best practices are being used, or they can upload something into a GitHub repository, and we can take a look at it there, that kind of stuff is really helpful.

Secondly, I don’t like that the article overly reduces all of this information that we have about what we’ve learned about Google. So he mentions two sources. One is things that Google tells us, and others are SEO experiments. I think both of those are true. Although I’d add that there’s sort of a sixth sense of knowledge that we gain over time from looking at many, many search results and kind of having this feel for why things rank, and what might be wrong with a site, and getting really good at that using tools and data as well. There are people who can look at Open Site Explorer and then go, “Aha, I bet this is going to happen.” They can look, and 90% of the time they’re right.

So he boils this down to, one, write quality content, and two, reduce your bounce rate. Neither of those things are wrong. You should write quality content, although I’d argue there are lots of other forms of quality content that aren’t necessarily written — video, images and graphics, podcasts, lots of other stuff.

And secondly, that just doing those two things is not always enough. So you can see, like many, many folks look and go, “I have quality content. It has a low bounce rate. How come I don’t rank better?” Well, your competitors, they’re also going to have quality content with a low bounce rate. That’s not a very high bar.

Also, frustratingly, this really gets in my craw. I don’t think “write quality content” means anything. You tell me. When you hear that, to me that is a totally non-actionable, non-useful phrase that’s a piece of advice that is so generic as to be discardable. So I really wish that there was more substance behind that.

The article also makes, in my opinion, the totally inaccurate claim that modern SEO really is reduced to “the happier your users are when they visit your site, the higher you’re going to rank.”

Wow. Okay. Again, I think broadly these things are correlated. User happiness and rank is broadly correlated, but it’s not a one to one. This is not like a, “Oh, well, that’s a 1.0 correlation.”

I would guess that the correlation is probably closer to like the page authority range. I bet it’s like 0.35 or something correlation. If you were to actually measure this broadly across the web and say like, “Hey, were you happier with result one, two, three, four, or five,” the ordering would not be perfect at all. It probably wouldn’t even be close.

There’s a ton of reasons why sometimes someone who ranks on Page 2 or Page 3 or doesn’t rank at all for a query is doing a better piece of content than the person who does rank well or ranks on Page 1, Position 1.

Then the article suggests five and sort of a half steps to successful modern SEO, which I think is a really incomplete list. So Jayson gives us;

  • Good on-site experience
  • Writing good content
  • Getting others to acknowledge you as an authority
  • Rising in social popularity
  • Earning local relevance
  • Dealing with modern CMS systems (which he notes most modern CMS systems are SEO-friendly)

The thing is there’s nothing actually wrong with any of these. They’re all, generally speaking, correct, either directly or indirectly related to SEO. The one about local relevance, I have some issue with, because he doesn’t note that there’s a separate algorithm for sort of how local SEO is done and how Google ranks local sites in maps and in their local search results. Also not noted is that rising in social popularity won’t necessarily directly help your SEO, although it can have indirect and positive benefits.

I feel like this list is super incomplete. Okay, I brainstormed just off the top of my head in the 10 minutes before we filmed this video a list. The list was so long that, as you can see, I filled up the whole whiteboard and then didn’t have any more room. I’m not going to bother to erase and go try and be absolutely complete.

But there’s a huge, huge number of things that are important, critically important for technical SEO. If you don’t know how to do these things, you are sunk in many cases. You can’t be an effective SEO analyst, or consultant, or in-house team member, because you simply can’t diagnose the potential problems, rectify those potential problems, identify strategies that your competitors are using, be able to diagnose a traffic gain or loss. You have to have these skills in order to do that.

I’ll run through these quickly, but really the idea is just that this list is so huge and so long that I think it’s very, very, very wrong to say technical SEO is behind us. I almost feel like the opposite is true.

We have to be able to understand things like;

  • Content rendering and indexability
  • Crawl structure, internal links, JavaScript, Ajax. If something’s post-loading after the page and Google’s not able to index it, or there are links that are accessible via JavaScript or Ajax, maybe Google can’t necessarily see those or isn’t crawling them as effectively, or is crawling them, but isn’t assigning them as much link weight as they might be assigning other stuff, and you’ve made it tough to link to them externally, and so they can’t crawl it.
  • Disabling crawling and/or indexing of thin or incomplete or non-search-targeted content. We have a bunch of search results pages. Should we use rel=prev/next? Should we robots.txt those out? Should we disallow from crawling with meta robots? Should we rel=canonical them to other pages? Should we exclude them via the protocols inside Google Webmaster Tools, which is now Google Search Console?
  • Managing redirects, domain migrations, content updates. A new piece of content comes out, replacing an old piece of content, what do we do with that old piece of content? What’s the best practice? It varies by different things. We have a whole Whiteboard Friday about the different things that you could do with that. What about a big redirect or a domain migration? You buy another company and you’re redirecting their site to your site. You have to understand things about subdomain structures versus subfolders, which, again, we’ve done another Whiteboard Friday about that.
  • Proper error codes, downtime procedures, and not found pages. If your 404 pages turn out to all be 200 pages, well, now you’ve made a big error there, and Google could be crawling tons of 404 pages that they think are real pages, because you’ve made it a status code 200, or you’ve used a 404 code when you should have used a 410, which is a permanently removed, to be able to get it completely out of the indexes, as opposed to having Google revisit it and keep it in the index.

Downtime procedures. So there’s specifically a… I can’t even remember. It’s a 5xx code that you can use. Maybe it was a 503 or something that you can use that’s like, “Revisit later. We’re having some downtime right now.” Google urges you to use that specific code rather than using a 404, which tells them, “This page is now an error.”

Disney had that problem a while ago, if you guys remember, where they 404ed all their pages during an hour of downtime, and then their homepage, when you searched for Disney World, was, like, “Not found.” Oh, jeez, Disney World, not so good.

  • International and multi-language targeting issues. I won’t go into that. But you have to know the protocols there. Duplicate content, syndication, scrapers. How do we handle all that? Somebody else wants to take our content, put it on their site, what should we do? Someone’s scraping our content. What can we do? We have duplicate content on our own site. What should we do?
  • Diagnosing traffic drops via analytics and metrics. Being able to look at a rankings report, being able to look at analytics connecting those up and trying to see: Why did we go up or down? Did we have less pages being indexed, more pages being indexed, more pages getting traffic less, more keywords less?
  • Understanding advanced search parameters. Today, just today, I was checking out the related parameter in Google, which is fascinating for most sites. Well, for Moz, weirdly, related:oursite.com shows nothing. But for virtually every other sit, well, most other sites on the web, it does show some really interesting data, and you can see how Google is connecting up, essentially, intentions and topics from different sites and pages, which can be fascinating, could expose opportunities for links, could expose understanding of how they view your site versus your competition or who they think your competition is.

Then there are tons of parameters, like in URL and in anchor, and da, da, da, da. In anchor doesn’t work anymore, never mind about that one.

I have to go faster, because we’re just going to run out of these. Like, come on. Interpreting and leveraging data in Google Search Console. If you don’t know how to use that, Google could be telling you, you have all sorts of errors, and you don’t know what they are.

  • Leveraging topic modeling and extraction. Using all these cool tools that are coming out for better keyword research and better on-page targeting. I talked about a couple of those at MozCon, like MonkeyLearn. There’s the new Moz Context API, which will be coming out soon, around that. There’s the Alchemy API, which a lot of folks really like and use.
  • Identifying and extracting opportunities based on site crawls. You run a Screaming Frog crawl on your site and you’re going, “Oh, here’s all these problems and issues.” If you don’t have these technical skills, you can’t diagnose that. You can’t figure out what’s wrong. You can’t figure out what needs fixing, what needs addressing.
  • Using rich snippet format to stand out in the SERPs. This is just getting a better click-through rate, which can seriously help your site and obviously your traffic.
  • Applying Google-supported protocols like rel=canonical, meta description, rel=prev/next, hreflang, robots.txt, meta robots, x robots, NOODP, XML sitemaps, rel=nofollow. The list goes on and on and on. If you’re not technical, you don’t know what those are, you think you just need to write good content and lower your bounce rate, it’s not going to work.
  • Using APIs from services like AdWords or MozScape, or hrefs from Majestic, or SEM refs from SearchScape or Alchemy API. Those APIs can have powerful things that they can do for your site. There are some powerful problems they could help you solve if you know how to use them. It’s actually not that hard to write something, even inside a Google Doc or Excel, to pull from an API and get some data in there. There’s a bunch of good tutorials out there. Richard Baxter has one, Annie Cushing has one, I think Distilled has some. So really cool stuff there.
  • Diagnosing page load speed issues, which goes right to what Jayson was talking about. You need that fast-loading page. Well, if you don’t have any technical skills, you can’t figure out why your page might not be loading quickly.
  • Diagnosing mobile friendliness issues
  • Advising app developers on the new protocols around App deep linking, so that you can get the content from your mobile apps into the web search results on mobile devices. Awesome. Super powerful. Potentially crazy powerful, as mobile search is becoming bigger than desktop.

Okay, I’m going to take a deep breath and relax. I don’t know Jayson’s intention, and in fact, if he were in this room, he’d be like, “No, I totally agree with all those things. I wrote the article in a rush. I had no idea it was going to be big. I was just trying to make the broader points around you don’t have to be a coder in order to do SEO.” That’s completely fine.

So I’m not going to try and rain criticism down on him. But I think if you’re reading that article, or you’re seeing it in your feed, or your clients are, or your boss is, or other folks are in your world, maybe you can point them to this Whiteboard Friday and let them know, no, that’s not quite right. There’s a ton of technical SEO that is required in 2015 and will be for years to come, I think, that SEOs have to have in order to be effective at their jobs.

All right, everyone. Look forward to some great comments, and we’ll see you again next time for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Distance from Perfect

Posted by wrttnwrd

In spite of all the advice, the strategic discussions and the conference talks, we Internet marketers are still algorithmic thinkers. That’s obvious when you think of SEO.

Even when we talk about content, we’re algorithmic thinkers. Ask yourself: How many times has a client asked you, “How much content do we need?” How often do you still hear “How unique does this page need to be?”

That’s 100% algorithmic thinking: Produce a certain amount of content, move up a certain number of spaces.

But you and I know it’s complete bullshit.

I’m not suggesting you ignore the algorithm. You should definitely chase it. Understanding a little bit about what goes on in Google’s pointy little head helps. But it’s not enough.

A tale of SEO woe that makes you go “whoa”

I have this friend.

He ranked #10 for “flibbergibbet.” He wanted to rank #1.

He compared his site to the #1 site and realized the #1 site had five hundred blog posts.

“That site has five hundred blog posts,” he said, “I must have more.”

So he hired a few writers and cranked out five thousand blogs posts that melted Microsoft Word’s grammar check. He didn’t move up in the rankings. I’m shocked.

“That guy’s spamming,” he decided, “I’ll just report him to Google and hope for the best.”

What happened? Why didn’t adding five thousand blog posts work?

It’s pretty obvious: My, uh, friend added nothing but crap content to a site that was already outranked. Bulk is no longer a ranking tactic. Google’s very aware of that tactic. Lots of smart engineers have put time into updates like Panda to compensate.

He started like this:

And ended up like this:
more posts, no rankings

Alright, yeah, I was Mr. Flood The Site With Content, way back in 2003. Don’t judge me, whippersnappers.

Reality’s never that obvious. You’re scratching and clawing to move up two spots, you’ve got an overtasked IT team pushing back on changes, and you’ve got a boss who needs to know the implications of every recommendation.

Why fix duplication if rel=canonical can address it? Fixing duplication will take more time and cost more money. It’s easier to paste in one line of code. You and I know it’s better to fix the duplication. But it’s a hard sell.

Why deal with 302 versus 404 response codes and home page redirection? The basic user experience remains the same. Again, we just know that a server should return one home page without any redirects and that it should send a ‘not found’ 404 response if a page is missing. If it’s going to take 3 developer hours to reconfigure the server, though, how do we justify it? There’s no flashing sign reading “Your site has a problem!”

Why change this thing and not that thing?

At the same time, our boss/client sees that the site above theirs has five hundred blog posts and thousands of links from sites selling correspondence MBAs. So they want five thousand blog posts and cheap links as quickly as possible.

Cue crazy music.

SEO lacks clarity

SEO is, in some ways, for the insane. It’s an absurd collection of technical tweaks, content thinking, link building and other little tactics that may or may not work. A novice gets exposed to one piece of crappy information after another, with an occasional bit of useful stuff mixed in. They create sites that repel search engines and piss off users. They get more awful advice. The cycle repeats. Every time it does, best practices get more muddled.

SEO lacks clarity. We can’t easily weigh the value of one change or tactic over another. But we can look at our changes and tactics in context. When we examine the potential of several changes or tactics before we flip the switch, we get a closer balance between algorithm-thinking and actual strategy.

Distance from perfect brings clarity to tactics and strategy

At some point you have to turn that knowledge into practice. You have to take action based on recommendations, your knowledge of SEO, and business considerations.

That’s hard when we can’t even agree on subdomains vs. subfolders.

I know subfolders work better. Sorry, couldn’t resist. Let the flaming comments commence.

To get clarity, take a deep breath and ask yourself:

“All other things being equal, will this change, tactic, or strategy move my site closer to perfect than my competitors?”

Breaking it down:

“Change, tactic, or strategy”

A change takes an existing component or policy and makes it something else. Replatforming is a massive change. Adding a new page is a smaller one. Adding ALT attributes to your images is another example. Changing the way your shopping cart works is yet another.

A tactic is a specific, executable practice. In SEO, that might be fixing broken links, optimizing ALT attributes, optimizing title tags or producing a specific piece of content.

A strategy is a broader decision that’ll cause change or drive tactics. A long-term content policy is the easiest example. Shifting away from asynchronous content and moving to server-generated content is another example.

“Perfect”

No one knows exactly what Google considers “perfect,” and “perfect” can’t really exist, but you can bet a perfect web page/site would have all of the following:

  1. Completely visible content that’s perfectly relevant to the audience and query
  2. A flawless user experience
  3. Instant load time
  4. Zero duplicate content
  5. Every page easily indexed and classified
  6. No mistakes, broken links, redirects or anything else generally yucky
  7. Zero reported problems or suggestions in each search engines’ webmaster tools, sorry, “Search Consoles”
  8. Complete authority through immaculate, organically-generated links

These 8 categories (and any of the other bazillion that probably exist) give you a way to break down “perfect” and help you focus on what’s really going to move you forward. These different areas may involve different facets of your organization.

Your IT team can work on load time and creating an error-free front- and back-end. Link building requires the time and effort of content and outreach teams.

Tactics for relevant, visible content and current best practices in UX are going to be more involved, requiring research and real study of your audience.

What you need and what resources you have are going to impact which tactics are most realistic for you.

But there’s a basic rule: If a website would make Googlebot swoon and present zero obstacles to users, it’s close to perfect.

“All other things being equal”

Assume every competing website is optimized exactly as well as yours.

Now ask: Will this [tactic, change or strategy] move you closer to perfect?

That’s the “all other things being equal” rule. And it’s an incredibly powerful rubric for evaluating potential changes before you act. Pretend you’re in a tie with your competitors. Will this one thing be the tiebreaker? Will it put you ahead? Or will it cause you to fall behind?

“Closer to perfect than my competitors”

Perfect is great, but unattainable. What you really need is to be just a little perfect-er.

Chasing perfect can be dangerous. Perfect is the enemy of the good (I love that quote. Hated Voltaire. But I love that quote). If you wait for the opportunity/resources to reach perfection, you’ll never do anything. And the only way to reduce distance from perfect is to execute.

Instead of aiming for pure perfection, aim for more perfect than your competitors. Beat them feature-by-feature, tactic-by-tactic. Implement strategy that supports long-term superiority.

Don’t slack off. But set priorities and measure your effort. If fixing server response codes will take one hour and fixing duplication will take ten, fix the response codes first. Both move you closer to perfect. Fixing response codes may not move the needle as much, but it’s a lot easier to do. Then move on to fixing duplicates.

Do the 60% that gets you a 90% improvement. Then move on to the next thing and do it again. When you’re done, get to work on that last 40%. Repeat as necessary.

Take advantage of quick wins. That gives you more time to focus on your bigger solutions.

Sites that are “fine” are pretty far from perfect

Google has lots of tweaks, tools and workarounds to help us mitigate sub-optimal sites:

  • Rel=canonical lets us guide Google past duplicate content rather than fix it
  • HTML snapshots let us reveal content that’s delivered using asynchronous content and JavaScript frameworks
  • We can use rel=next and prev to guide search bots through outrageously long pagination tunnels
  • And we can use rel=nofollow to hide spammy links and banners

Easy, right? All of these solutions may reduce distance from perfect (the search engines don’t guarantee it). But they don’t reduce it as much as fixing the problems.
Just fine does not equal fixed

The next time you set up rel=canonical, ask yourself:

“All other things being equal, will using rel=canonical to make up for duplication move my site closer to perfect than my competitors?”

Answer: Not if they’re using rel=canonical, too. You’re both using imperfect solutions that force search engines to crawl every page of your site, duplicates included. If you want to pass them on your way to perfect, you need to fix the duplicate content.

When you use Angular.js to deliver regular content pages, ask yourself:

“All other things being equal, will using HTML snapshots instead of actual, visible content move my site closer to perfect than my competitors?”

Answer: No. Just no. Not in your wildest, code-addled dreams. If I’m Google, which site will I prefer? The one that renders for me the same way it renders for users? Or the one that has to deliver two separate versions of every page?

When you spill banner ads all over your site, ask yourself…

You get the idea. Nofollow is better than follow, but banner pollution is still pretty dang far from perfect.

Mitigating SEO issues with search engine-specific tools is “fine.” But it’s far, far from perfect. If search engines are forced to choose, they’ll favor the site that just works.

Not just SEO

By the way, distance from perfect absolutely applies to other channels.

I’m focusing on SEO, but think of other Internet marketing disciplines. I hear stuff like “How fast should my site be?” (Faster than it is right now.) Or “I’ve heard you shouldn’t have any content below the fold.” (Maybe in 2001.) Or “I need background video on my home page!” (Why? Do you have a reason?) Or, my favorite: “What’s a good bounce rate?” (Zero is pretty awesome.)

And Internet marketing venues are working to measure distance from perfect. Pay-per-click marketing has the quality score: A codified financial reward applied for seeking distance from perfect in as many elements as possible of your advertising program.

Social media venues are aggressively building their own forms of graphing, scoring and ranking systems designed to separate the good from the bad.

Really, all marketing includes some measure of distance from perfect. But no channel is more influenced by it than SEO. Instead of arguing one rule at a time, ask yourself and your boss or client: Will this move us closer to perfect?

Hell, you might even please a customer or two.

One last note for all of the SEOs in the crowd. Before you start pointing out edge cases, consider this: We spend our days combing Google for embarrassing rankings issues. Every now and then, we find one, point, and start yelling “SEE! SEE!!!! THE GOOGLES MADE MISTAKES!!!!” Google’s got lots of issues. Screwing up the rankings isn’t one of them.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

An Open-Source Tool for Checking rel-alternate-hreflang Annotations

Posted by Tom-Anthony

In the Distilled R&D department we have been ramping up the amount of automated monitoring and analysis we do, with an internal system monitoring our client’s sites both directly and via various data sources to ensure they remain healthy and we are alerted to any problems that may arise.

Recently we started work to add in functionality for including the rel-alternate-hreflang annotations in this system. In this blog post I’m going to share an open-source Python library we’ve just started work on for the purpose, which makes it easy to read the hreflang entries from a page and identify errors with them.

If you’re not a Python aficionado then don’t despair, as I have also built a ready-to-go tool for you to use, which will quickly do some checks on the hreflang entries for any URL you specify. 🙂

Google’s Search Console (formerly Webmaster Tools) does have some basic rel-alternate-hreflang checking built in, but it is limited in how you can use it and you are restricted to using it for verified sites.

rel-alternate-hreflang checklist

Before we introduce the code, I wanted to quickly review a list of five easy and common mistakes that we will want to check for when looking at rel-alternate-hreflang annotations:

  • return tag errors – Every alternate language/locale URL of a page should, itself, include a link back to the first page. This makes sense but I’ve seen people make mistakes with it fairly often.
  • indirect / broken links – Links to alternate language/region versions of the page should no go via redirects, and should not link to missing or broken pages.
  • multiple entries – There should never be multiple entries for a single language/region combo.
  • multiple defaults – You should never have more than one x-default entry.
  • conflicting modes – rel-alternate-hreflang entries can be implemented via inline HTML, XML sitemaps, or HTTP headers. For any one set of pages only one implementation mode should be used.

So now imagine that we want to simply automate these checks quickly and simply…

Introducing: polly – the hreflang checker library

polly is the name for the library we have developed to help us solve this problem, and we are releasing it as open source so the SEO community can use it freely to build upon. We only started work on it last week, but we plan to continue developing it, and will also accept contributions to the code from the community, so we expect its feature set to grow rapidly.

If you are not comfortable tinkering with Python, then feel free to skip down to the next section of the post, where there is a tool that is built with polly which you can use right away.

Still here? Ok, great. You can install polly easily via pip:

pip install polly

You can then create a PollyPage() object which will do all our work and store the data simply by instantiating the class with the desired URL:

my_page = PollyPage("http://www.facebook.com/")

You can quickly see the hreflang entries on the page by running:

print my_page.alternate_urls_map

You can list all the hreflang values encountered on a page, and which countries and languages they cover:

print my_page.hreflang_values
print my_page.languages
print my_page.regions

You can also check various aspects of a page, see whether the pages it includes in its rel-alternate-hreflang entries point back, or whether there are entries that do not see retrievable (due to 404 or 500 etc. errors):

print my_page.is_default
print my_page.no_return_tag_pages()
print my_page.non_retrievable_pages()

Get more instructions and grab the code at the polly github page. Hit me up in the comments with any questions.

Free tool: hreflang.ninja

I have put together a very simple tool that uses polly to run some of the checks we highlighted above as being common mistakes with rel-alternate-hreflang, which you can visit right now and start using:

http://hreflang.ninja

Simply enter a URL and hit enter, and you should see something like:

Example output from the ninja!

The tool shows you the rel-alternate-hreflang entries found on the page, the language and region of those entries, the alternate URLs, and any errors identified with the entry. It is perfect for doing quick’n’dirty checks of a URL to identify any errors.

As we add additional functionality to polly we will be updating hreflang.ninja as well, so please tweet me with feature ideas or suggestions.

To-do list!

This is the first release of polly and currently we only handle annotations that are in the HTML of the page, not those in the XML sitemap or HTTP headers. However, we are going to be updating polly (and hreflang.ninja) over the coming weeks, so watch this space! 🙂

Resources

Here are a few links you may find helpful for hreflang:

Got suggestions?

With the increasing number of SEO directives and annotations available, and the ever-changing guidelines around how to deploy them, it is important to automate whatever areas possible. Hopefully polly is helpful to the community in this regard, and we want to here what ideas you have for making these tools more useful – here in the comments or via Twitter.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

The Meta Referrer Tag: An Advancement for SEO and the Internet

Posted by Cyrus-Shepard

The movement to make the Internet more secure through HTTPS brings several useful advancements for webmasters. In addition to security improvements, HTTPS promises future technological advances and potential SEO benefits for marketers.

HTTPS in search results is rising. Recent MozCast data from Dr. Pete shows nearly 20% of first page Google results are now HTTPS.

Sadly, HTTPS also has its downsides.

Marketers run into their first challenge when they switch regular HTTP sites over to HTTPS. Technically challenging, the switch typically involves routing your site through a series of 301 redirects. Historically, these types of redirects are associated with a loss of link equity (thought to be around 15%) which can lead to a loss in rankings. This can offset any SEO advantage that Google claims switching.

Ross Hudgens perfectly summed it up in this tweet:

Many SEOs have anecdotally shared stories of HTTPS sites performing well in Google search results (and our soon-to-be-published Ranking Factors data seems to support this.) However, the short term effect of a large migration can be hard to take. When Moz recently switched to HTTPS to provide better security to our logged-in users, we saw an 8-9% dip in our organic search traffic.

Problem number two is the subject of this post. It involves the loss of referral data. Typically, when one site sends traffic to another, information is sent that identifies the originating site as the source of traffic. This invaluable data allows people to see where their traffic is coming from, and helps spread the flow of information across the web.

SEOs have long used referrer data for a number of beneficial purposes. Oftentimes, people will link back or check out the site sending traffic when they see the referrer in their analytics data. Spammers know this works, as evidenced by the recent increase in referrer spam:

This process stops when traffic flows from an HTTPS site to a non-secure HTTP site. In this case, no referrer data is sent. Webmasters can’t know where their traffic is coming from.

Here’s how referral data to my personal site looked when Moz switched to HTTPS. I lost all visibility into where my traffic came from.

Its (not provided) all over again!

Enter the meta referrer tag

While we can’t solve the ranking challenges imposed by switching a site to HTTPS, we can solve the loss of referral data, and it’s actually super-simple.

Almost completely unknown to most marketers, the relatively new meta referrer tag (it’s actually been around for a few years) was designed to help out in these situations.

Better yet, the tag allows you to control how your referrer information is passed.

The meta referrer tag works with most browsers to pass referrer information in a manner defined by the user. Traffic remains encrypted and all the benefits of using HTTPS remain in place, but now you can pass referrer data to all websites, even those that use HTTP.

How to use the meta referrer tag

What follows are extremely simplified instructions for using the meta referrer tag. For more in-depth understanding, we highly recommend referring to the W3C working draft of the spec.

The meta referrer tag is placed in the <head> section of your HTML, and references one of five states, which control how browsers send referrer information from your site. The five states are:

  1. None: Never pass referral data
    <meta name="referrer" content="none">
    
  2. None When Downgrade: Sends referrer information to secure HTTPS sites, but not insecure HTTP sites
    <meta name="referrer" content="none-when-downgrade">
    
  3. Origin Only: Sends the scheme, host, and port (basically, the subdomain) stripped of the full URL as a referrer, i.e. https://moz.com/example.html would simply send https://moz.com
    <meta name="referrer" content="origin">
    

  4. Origin When Cross-Origin: Sends the full URL as the referrer when the target has the same scheme, host, and port (i.e. subdomain) regardless if it’s HTTP or HTTPS, while sending origin-only referral information to external sites. (note: There is a typo in the official spec. Future versions should be “origin-when-cross-origin”)
    <meta name="referrer" content="origin-when-crossorigin">
    
  5. Unsafe URL: Always passes the URL string as a referrer. Note if you have any sensitive information contained in your URL, this isn’t the safest option. By default, URL fragments, username, and password are automatically stripped out.
    <meta name="referrer" content="unsafe-url">
    

The meta referrer tag in action

By clicking the link below, you can get a sense of how the meta referrer tag works.

Check Referrer

Boom!

We’ve set the meta referrer tag for Moz to “origin”, which means when we link out to another site, we pass our scheme, host, and port. The end result is you see http://moz.com as the referrer, stripped of the full URL path (/meta-referrer-tag).

My personal site typically receives several visits per day from Moz. Here’s what my analytics data looked like before and after we implemented the meta referrer tag.

For simplicity and security, most sites may want to implement the “origin” state, but there are drawbacks.

One negative side effect was that as soon as we implemented the meta referrer tag, our AdRoll analytics, which we use for retargeting, stopped working. It turns out that AdRoll uses our referrer information for analytics, but the meta referrer tag “origin” state meant that the only URL they ever saw reported was https://moz.com.

Conclusion

We love the meta referrer tag because it keeps information flowing on the Internet. It’s the way the web is supposed to work!

It helps marketers and webmasters see exactly where their traffic is coming from. It encourages engagement, communication, and even linking, which can lead to improvements in SEO.

Useful links:

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

How to Use Server Log Analysis for Technical SEO

Posted by SamuelScott

It’s ten o’clock. Do you know where your logs are?

I’m introducing this guide with a pun on a common public-service announcement that has run on late-night TV news broadcasts in the United States because log analysis is something that is extremely newsworthy and important.

If your technical and on-page SEO is poor, then nothing else that you do will matter. Technical SEO is the key to helping search engines to crawl, parse, and index websites, and thereby rank them appropriately long before any marketing work begins.

The important thing to remember: Your log files contain the only data that is 100% accurate in terms of how search engines are crawling your website. By helping Google to do its job, you will set the stage for your future SEO work and make your job easier. Log analysis is one facet of technical SEO, and correcting the problems found in your logs will help to lead to higher rankings, more traffic, and more conversions and sales.

Here are just a few reasons why:

  • Too many response code errors may cause Google to reduce its crawling of your website and perhaps even your rankings.
  • You want to make sure that search engines are crawling everything, new and old, that you want to appear and rank in the SERPs (and nothing else).
  • It’s crucial to ensure that all URL redirections will pass along any incoming “link juice.”

However, log analysis is something that is unfortunately discussed all too rarely in SEO circles. So, here, I wanted to give the Moz community an introductory guide to log analytics that I hope will help. If you have any questions, feel free to ask in the comments!

What is a log file?

Computer servers, operating systems, network devices, and computer applications automatically generate something called a log entry whenever they perform an action. In a SEO and digital marketing context, one type of action is whenever a page is requested by a visiting bot or human.

Server log entries are specifically programmed to be output in the Common Log Format of the W3C consortium. Here is one example from Wikipedia with my accompanying explanations:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • 127.0.0.1 — The remote hostname. An IP address is shown, like in this example, whenever the DNS hostname is not available or DNSLookup is turned off.
  • user-identifier — The remote logname / RFC 1413 identity of the user. (It’s not that important.)
  • frank — The user ID of the person requesting the page. Based on what I see in my Moz profile, Moz’s log entries would probably show either “SamuelScott” or “392388” whenever I visit a page after having logged in.
  • [10/Oct/2000:13:55:36 -0700] — The date, time, and timezone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP/1.0 — “GET” is one of the two commands (the other is “POST”) that can be performed. “GET” fetches a URL while “POST” is submitting something (such as a forum comment). The second part is the URL that is being accessed, and the last part is the version of HTTP that is being accessed.
  • 200 — The status code of the document that was returned.
  • 2326 — The size, in bytes, of the document that was returned.

Note: A hyphen is shown in a field when that information is unavailable.

Every single time that you — or the Googlebot — visit a page on a website, a line with this information is output, recorded, and stored by the server.

Log entries are generated continuously and anywhere from several to thousands can be created every second — depending on the level of a given server, network, or application’s activity. A collection of log entries is called a log file (or often in slang, “the log” or “the logs”), and it is displayed with the most-recent log entry at the bottom. Individual log files often contain a calendar day’s worth of log entries.

Accessing your log files

Different types of servers store and manage their log files differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers:

What is log analysis?

Log analysis (or log analytics) is the process of going through log files to learn something from the data. Some common reasons include:

  • Development and quality assurance (QA) — Creating a program or application and checking for problematic bugs to make sure that it functions properly
  • Network troubleshooting — Responding to and fixing system errors in a network
  • Customer service — Determining what happened when a customer had a problem with a technical product
  • Security issues — Investigating incidents of hacking and other intrusions
  • Compliance matters — Gathering information in response to corporate or government policies
  • Technical SEO — This is my favorite! More on that in a bit.

Log analysis is rarely performed regularly. Usually, people go into log files only in response to something — a bug, a hack, a subpoena, an error, or a malfunction. It’s not something that anyone wants to do on an ongoing basis.

Why? This is a screenshot of ours of just a very small part of an original (unstructured) log file:

Ouch. If a website gets 10,000 visitors who each go to ten pages per day, then the server will create a log file every day that will consist of 100,000 log entries. No one has the time to go through all of that manually.

How to do log analysis

There are three general ways to make log analysis easier in SEO or any other context:

  • Do-it-yourself in Excel
  • Proprietary software such as Splunk or Sumo-logic
  • The ELK Stack open-source software

Tim Resnik’s Moz essay from a few years ago walks you through the process of exporting a batch of log files into Excel. This is a (relatively) quick and easy way to do simple log analysis, but the downside is that one will see only a snapshot in time and not any overall trends. To obtain the best data, it’s crucial to use either proprietary tools or the ELK Stack.

Splunk and Sumo-Logic are proprietary log analysis tools that are primarily used by enterprise companies. The ELK Stack is a free and open-source batch of three platforms (Elasticsearch, Logstash, and Kibana) that is owned by Elastic and used more often by smaller businesses. (Disclosure: We at Logz.io use the ELK Stack to monitor our own internal systems as well as for the basis of our own log management software.)

For those who are interested in using this process to do technical SEO analysis, monitor system or application performance, or for any other reason, our CEO, Tomer Levy, has written a guide to deploying the ELK Stack.

Technical SEO insights in log data

However you choose to access and understand your log data, there are many important technical SEO issues to address as needed. I’ve included screenshots of our technical SEO dashboard with our own website’s data to demonstrate what to examine in your logs.

Bot crawl volume

It’s important to know the number of requests made by Baidu, BingBot, GoogleBot, Yahoo, Yandex, and others over a given period time. If, for example, you want to get found in search in Russia but Yandex is not crawling your website, that is a problem. (You’d want to consult Yandex Webmaster and see this article on Search Engine Land.)

Response code errors

Moz has a great primer on the meanings of the different status codes. I have an alert system setup that tells me about 4XX and 5XX errors immediately because those are very significant.

Temporary redirects

Temporary 302 redirects do not pass along the “link juice” of external links from the old URL to the new one. Almost all of the time, they should be changed to permanent 301 redirects.

Crawl budget waste

Google assigns a crawl budget to each website based on numerous factors. If your crawl budget is, say, 100 pages per day (or the equivalent amount of data), then you want to be sure that all 100 are things that you want to appear in the SERPs. No matter what you write in your robots.txt file and meta-robots tags, you might still be wasting your crawl budget on advertising landing pages, internal scripts, and more. The logs will tell you — I’ve outlined two script-based examples in red above.

If you hit your crawl limit but still have new content that should be indexed to appear in search results, Google may abandon your site before finding it.

Duplicate URL crawling

The addition of URL parameters — typically used in tracking for marketing purposes — often results in search engines wasting crawl budgets by crawling different URLs with the same content. To learn how to address this issue, I recommend reading the resources on Google and Search Engine Land here, here, here, and here.

Crawl priority

Google might be ignoring (and not crawling or indexing) a crucial page or section of your website. The logs will reveal what URLs and/or directories are getting the most and least attention. If, for example, you have published an e-book that attempts to rank for targeted search queries but it sits in a directory that Google only visits once every six months, then you won’t get any organic search traffic from the e-book for up to six months.

If a part of your website is not being crawled very often — and it is updated often enough that it should be — then you might need to check your internal-linking structure and the crawl-priority settings in your XML sitemap.

Last crawl date

Have you uploaded something that you hope will be indexed quickly? The log files will tell you when Google has crawled it.

Crawl budget

One thing I personally like to check and see is Googlebot’s real-time activity on our site because the crawl budget that the search engine assigns to a website is a rough indicator — a very rough one — of how much it “likes” your site. Google ideally does not want to waste valuable crawling time on a bad website. Here, I had seen that Googlebot had made 154 requests of our new startup’s website over the prior twenty-four hours. Hopefully, that number will go up!

As I hope you can see, log analysis is critically important in technical SEO. It’s eleven o’clock — do you know where your logs are now?

Additional resources

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Spam Score: Moz’s New Metric to Measure Penalization Risk

Posted by randfish

Today, I’m very excited to announce that Moz’s Spam Score, an R&D project we’ve worked on for nearly a year, is finally going live. In this post, you can learn more about how we’re calculating spam score, what it means, and how you can potentially use it in your SEO work.

How does Spam Score work?

Over the last year, our data science team, led by 
Dr. Matt Peters, examined a great number of potential factors that predicted that a site might be penalized or banned by Google. We found strong correlations with 17 unique factors we call “spam flags,” and turned them into a score.

Almost every subdomain in 
Mozscape (our web index) now has a Spam Score attached to it, and this score is viewable inside Open Site Explorer (and soon, the MozBar and other tools). The score is simple; it just records the quantity of spam flags the subdomain triggers. Our correlations showed that no particular flag was more likely than others to mean a domain was penalized/banned in Google, but firing many flags had a very strong correlation (you can see the math below).

Spam Score currently operates only on the subdomain level—we don’t have it for pages or root domains. It’s been my experience and the experience of many other SEOs in the field that a great deal of link spam is tied to the subdomain-level. There are plenty of exceptions—manipulative links can and do live on plenty of high-quality sites—but as we’ve tested, we found that subdomain-level Spam Score was the best solution we could create at web scale. It does a solid job with the most obvious, nastiest spam, and a decent job highlighting risk in other areas, too.

How to access Spam Score

Right now, you can find Spam Score inside 
Open Site Explorer, both in the top metrics (just below domain/page authority) and in its own tab labeled “Spam Analysis.” Spam Score is only available for Pro subscribers right now, though in the future, we may make the score in the metrics section available to everyone (if you’re not a subscriber, you can check it out with a free trial). 

The current Spam Analysis page includes a list of subdomains or pages linking to your site. You can toggle the target to look at all links to a given subdomain on your site, given pages, or the entire root domain. You can further toggle source tier to look at the Spam Score for incoming linking pages or subdomains (but in the case of pages, we’re still showing the Spam Score for the subdomain on which that page is hosted).

You can click on any Spam Score row and see the details about which flags were triggered. We’ll bring you to a page like this:

Back on the original Spam Analysis page, at the very bottom of the rows, you’ll find an option to export a disavow file, which is compatible with Google Webmaster Tools. You can choose to filter the file to contain only those sites with a given spam flag count or higher:

Disavow exports usually take less than 3 hours to finish. We can send you an email when it’s ready, too.

WARNING: Please do not export this file and simply upload it to Google! You can really, really hurt your site’s ranking and there may be no way to recover. Instead, carefully sort through the links therein and make sure you really do want to disavow what’s in there. You can easily remove/edit the file to take out links you feel are not spam. When Moz’s Cyrus Shepard disavowed every link to his own site, it took more than a year for his rankings to return!

We’ve actually made the file not-wholly-ready for upload to Google in order to be sure folks aren’t too cavalier with this particular step. You’ll need to open it up and make some edits (specifically to lines at the top of the file) in order to ready it for Webmaster Tools

In the near future, we hope to have Spam Score in the Mozbar as well, which might look like this: 

Sweet, right? 🙂

Potential use cases for Spam Analysis

This list probably isn’t exhaustive, but these are a few of the ways we’ve been playing around with the data:

  1. Checking for spammy links to your own site: Almost every site has at least a few bad links pointing to it, but it’s been hard to know how much or how many potentially harmful links you might have until now. Run a quick spam analysis and see if there’s enough there to cause concern.
  2. Evaluating potential links: This is a big one where we think Spam Score can be helpful. It’s not going to catch every potentially bad link, and you should certainly still use your brain for evaluation too, but as you’re scanning a list of link opportunities or surfing to various sites, having the ability to see if they fire a lot of flags is a great warning sign.
  3. Link cleanup: Link cleanup projects can be messy, involved, precarious, and massively tedious. Spam Score might not catch everything, but sorting links by it can be hugely helpful in identifying potentially nasty stuff, and filtering out the more probably clean links.
  4. Disavow Files: Again, because Spam Score won’t perfectly catch everything, you will likely need to do some additional work here (especially if the site you’re working on has done some link buying on more generally trustworthy domains), but it can save you a heap of time evaluating and listing the worst and most obvious junk.

Over time, we’re also excited about using Spam Score to help improve the PA and DA calculations (it’s not currently in there), as well as adding it to other tools and data sources. We’d love your feedback and insight about where you’d most want to see Spam Score get involved.

Details about Spam Score’s calculation

This section comes courtesy of Moz’s head of data science, Dr. Matt Peters, who created the metric and deserves (at least in my humble opinion) a big round of applause. – Rand

Definition of “spam”

Before diving into the details of the individual spam flags and their calculation, it’s important to first describe our data gathering process and “spam” definition.

For our purposes, we followed Google’s definition of spam and gathered labels for a large number of sites as follows.

  • First, we randomly selected a large number of subdomains from the Mozscape index stratified by mozRank.
  • Then we crawled the subdomains and threw out any that didn’t return a “200 OK” (redirects, errors, etc).
  • Finally, we collected the top 10 de-personalized, geo-agnostic Google-US search results using the full subdomain name as the keyword and checked whether any of those results matched the original keyword. If they did not, we called the subdomain “spam,” otherwise we called it “ham.”

We performed the most recent data collection in November 2014 (after the Penguin 3.0 update) for about 500,000 subdomains.

Relationship between number of flags and spam

The overall Spam Score is currently an aggregate of 17 different “flags.” You can think of each flag a potential “warning sign” that signals that a site may be spammy. The overall likelihood of spam increases as a site accumulates more and more flags, so that the total number of flags is a strong predictor of spam. Accordingly, the flags are designed to be used together—no single flag, or even a few flags, is cause for concern (and indeed most sites will trigger at least a few flags).

The following table shows the relationship between the number of flags and percent of sites with those flags that we found Google had penalized or banned:

ABOVE: The overall probability of spam vs. the number of spam flags. Data collected in Nov. 2014 for approximately 500K subdomains. The table also highlights the three overall danger levels: low/green (< 10%) moderate/yellow (10-50%) and high/red (>50%)

The overall spam percent averaged across a large number of sites increases in lock step with the number of flags; however there are outliers in every category. For example, there are a small number of sites with very few flags that are tagged as spam by Google and conversely a small number of sites with many flags that are not spam.

Spam flag details

The individual spam flags capture a wide range of spam signals link profiles, anchor text, on page signals and properties of the domain name. At a high level the process to determine the spam flags for each subdomain is:

  • Collect link metrics from Mozscape (mozRank, mozTrust, number of linking domains, etc).
  • Collect anchor text metrics from Mozscape (top anchor text phrases sorted by number of links)
  • Collect the top five pages by Page Authority on the subdomain from Mozscape
  • Crawl the top five pages plus the home page and process to extract on page signals
  • Provide the output for Mozscape to include in the next index release cycle

Since the spam flags are incorporated into in the Mozscape index, fresh data is released with each new index. Right now, we crawl and process the spam flags for each subdomains every two – three months although this may change in the future.

Link flags

The following table lists the link and anchor text related flags with the the odds ratio for each flag. For each flag, we can compute two percents: the percent of sites with that flag that are penalized by Google and the percent of sites with that flag that were not penalized. The odds ratio is the ratio of these percents and gives the increase in likelihood that a site is spam if it has the flag. For example, the first row says that a site with this flag is 12.4 times more likely to be spam than one without the flag.

ABOVE: Description and odds ratio of link and anchor text related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

Working down the table, the flags are:

  • Low mozTrust to mozRank ratio: Sites with low mozTrust compared to mozRank are likely to be spam.
  • Large site with few links: Large sites with many pages tend to also have many links and large sites without a corresponding large number of links are likely to be spam.
  • Site link diversity is low: If a large percentage of links to a site are from a few domains it is likely to be spam.
  • Ratio of followed to nofollowed subdomains/domains (two separate flags): Sites with a large number of followed links relative to nofollowed are likely to be spam.
  • Small proportion of branded links (anchor text): Organically occurring links tend to contain a disproportionate amount of banded keywords. If a site does not have a lot of branded anchor text, it’s a signal the links are not organic.

On-page flags

Similar to the link flags, the following table lists the on page and domain name related flags:

ABOVE: Description and odds ratio of on page and domain name related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

  • Thin content: If a site has a relatively small ratio of content to navigation chrome it’s likely to be spam.
  • Site mark-up is abnormally small: Non-spam sites tend to invest in rich user experiences with CSS, Javascript and extensive mark-up. Accordingly, a large ratio of text to mark-up is a spam signal.
  • Large number of external links: A site with a large number of external links may look spammy.
  • Low number of internal links: Real sites tend to link heavily to themselves via internal navigation and a relative lack of internal links is a spam signal.
  • Anchor text-heavy page: Sites with a lot of anchor text are more likely to be spam then those with more content and less links.
  • External links in navigation: Spam sites may hide external links in the sidebar or footer.
  • No contact info: Real sites prominently display their social and other contact information.
  • Low number of pages found: A site with only one or a few pages is more likely to be spam than one with many pages.
  • TLD correlated with spam domains: Certain TLDs are more spammy than others (e.g. pw).
  • Domain name length: A long subdomain name like “bycheapviagra.freeshipping.onlinepharmacy.com” may indicate keyword stuffing.
  • Domain name contains numerals: domain names with numerals may be automatically generated and therefore spam.

If you’d like some more details on the technical aspects of the spam score, check out the 
video of Matt’s 2012 MozCon talk about Algorithmic Spam Detection or the slides (many of the details have evolved, but the overall ideas are the same):

We’d love your feedback

As with all metrics, Spam Score won’t be perfect. We’d love to hear your feedback and ideas for improving the score as well as what you’d like to see from it’s in-product application in the future. Feel free to leave comments on this post, or to email Matt (matt at moz dot com) and me (rand at moz dot com) privately with any suggestions.

Good luck cleaning up and preventing link spam!



Not a Pro Subscriber? No problem!



Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Moving 5 Domains to 1: An SEO Case Study

Posted by Dr-Pete

People often ask me if they should change domain names, and I always shudder just a little. Changing domains is a huge, risky undertaking, and too many people rush into it seeing only the imaginary upside. The success of the change also depends wildly on the details, and it’s not the kind of question anyone should be asking casually on social media.

Recently, I decided that it was time to find a new permanent home for my personal and professional blogs, which had gradually spread out over 5 domains. I also felt my main domain was no longer relevant to my current situation, and it was time for a change. So, ultimately I ended up with a scenario that looked like this:

The top three sites were active, with UserEffect.com being my former consulting site and blog (and relatively well-trafficked). The bottom two sites were both inactive and were both essentially gag sites. My one-pager, AreYouARealDoctor.com, did previously rank well for “are you a real doctor”, so I wanted to try to recapture that.

I started migrating the 5 sites in mid-January, and I’ve been tracking the results. I thought it would be useful to see how this kind of change plays out, in all of the gory details. As it turns out, nothing is ever quite “textbook” when it comes to technical SEO.

Why Change Domains at All?

The rationale for picking a new domain could fill a month’s worth of posts, but I want to make one critical point – changing domains should be about your business goals first, and SEO second. I did not change domains to try to rank better for “Dr. Pete” – that’s a crap shoot at best. I changed domains because my old consulting brand (“User Effect”) no longer represented the kind of work I do and I’m much more known by my personal brand.

That business case was strong enough that I was willing to accept some losses. We went through a similar transition here
from SEOmoz.org to Moz.com. That was a difficult transition that cost us some SEO ground, especially short-term, but our core rationale was grounded in the business and where it’s headed. Don’t let an SEO pipe dream lead you into a risky decision.

Why did I pick a .co domain? I did it for the usual reason – the .com was taken. For a project of this type, where revenue wasn’t on the line, I didn’t have any particular concerns about .co. The evidence on how top-level domains (TLDs) impact ranking is tough to tease apart (so many other factors correlate with .com’s), and Google’s attitude tends to change over time, especially if new TLDs are abused. Anecdotally, though, I’ve seen plenty of .co’s rank, and I wasn’t concerned.

Step 1 – The Boring Stuff

It is absolutely shocking how many people build a new site, slap up some 301s, pull the switch, and hope for the best. It’s less shocking how many of those people end up in Q&A a week later, desperate and bleeding money.


Planning is hard work, and it’s boring – get over it.

You need to be intimately familiar with every page on your existing site(s), and, ideally, you should make a list. Not only do you have to plan for what will happen to each of these pages, but you’ll need that list to make sure everything works smoothly later.

In my case, I decided it might be time to do some housekeeping – the User Effect blog had hundreds of posts, many outdated and quite a few just not very good. So, I started with the easy data – recent traffic. I’m sure you’ve seen this Google Analytics report (Behavior > Site Content > All Pages):

Since I wanted to focus on recent activity, and none of the sites had much new content, I restricted myself to a 3-month window (Q4 of 2014). Of course, I looked much deeper than the top 10, but the principle was simple – I wanted to make sure the data matched my intuition and that I wasn’t cutting off anything important. This helped me prioritize the list.

Of course, from an SEO standpoint, I also didn’t want to lose content that had limited traffic but solid inbound links. So, I checked my “Top Pages” report in
Open Site Explorer:

Since the bulk of my main site was a blog, the top trafficked and top linked-to pages fortunately correlated pretty well. Again, this is only a way to prioritize. If you’re dealing with sites with thousands of pages, you need to work methodically through the site architecture.

I’m going to say something that makes some SEOs itchy – it’s ok not to move some pages to the new site. It’s even ok to let some pages 404. In Q4, UserEffect.com had traffic to 237 URLs. The top 10 pages accounted for 91.9% of that traffic. I strongly believe that moving domains is a good time to refocus a site and concentrate your visitors and link equity on your best content. More is not better in 2015.

Letting go of some pages also means that you’re not 301-redirecting a massive number of old URLs to a new home-page. This can look like a low-quality attempt to consolidate link-equity, and at large scale it can raise red flags with Google. Content worth keeping should exist on the new site, and your 301s should have well-matched targets.

In one case, I had a blog post that had a decent trickle of traffic due to ranking for “50,000 push-ups,” but the post itself was weak and the bounce rate was very high:

The post was basically just a placeholder announcing that I’d be attempting this challenge, but I never recapped anything after finishing it. So, in this case,
I rewrote the post.

Of course, this process was repeated across the 3 active sites. The 2 inactive sites only constituted a handful of total pages. In the case of AreYouARealDoctor.com, I decided to turn the previous one-pager
into a new page on the new site. That way, I had a very well-matched target for the 301-redirect, instead of simply mapping the old site to my new home-page.

I’m trying to prove a point – this is the amount of work I did for a handful of sites that were mostly inactive and producing no current business value. I don’t need consulting gigs and these sites produce no direct revenue, and yet I still considered this process worth the effort.

Step 2 – The Big Day

Eventually, you’re going to have to make the move, and in most cases, I prefer ripping off the bandage. Of course, doing something all at once doesn’t mean you shouldn’t be careful.

The biggest problem I see with domain switches (even if they’re 1-to-1) is that people rely on data that can take weeks to evaluate, like rankings and traffic, or directly checking Google’s index. By then, a lot of damage is already done. Here are some ways to find out quickly if you’ve got problems…

(1) Manually Check Pages

Remember that list you were supposed to make? It’s time to check it, or at least spot-check it. Someone needs to physically go to a browser and make sure that each major section of the site and each important individual page is resolving properly. It doesn’t matter how confident your IT department/guy/gal is – things go wrong.

(2) Manually Check Headers

Just because a page resolves, it doesn’t mean that your 301-redirects are working properly, or that you’re not firing some kind of 17-step redirect chain. Check your headers. There are tons of free tools, but lately I’m fond of
URI Valet. Guess what – I screwed up my primary 301-redirects. One of my registrar transfers wasn’t working, so I had to have a setting changed by customer service, and I inadvertently ended up with 302s (Pro tip: Don’t change registrars and domains in one step):

Don’t think that because you’re an “expert”, your plan is foolproof. Mistakes happen, and because I caught this one I was able to correct it fairly quickly.

(3) Submit Your New Site

You don’t need to submit your site to Google in 2015, but now that Google Webmaster Tools allows it, why not do it? The primary argument I hear is “well, it’s not necessary.” True, but direct submission has one advantage – it’s fast.

To be precise, Google Webmaster Tools separates the process into “Fetch” and “Submit to index” (you’ll find this under “Crawl” > “Fetch as Google”). Fetching will quickly tell you if Google can resolve a URL and retrieve the page contents, which alone is pretty useful. Once a page is fetched, you can submit it, and you should see something like this:

This isn’t really about getting indexed – it’s about getting nearly instantaneous feedback. If Google has any major problems with crawling your site, you’ll know quickly, at least at the macro level.

(4) Submit New XML Sitemaps

Finally, submit a new set of XML sitemaps in Google Webmaster Tools, and preferably tiered sitemaps. While it’s a few years old now, Rob Ousbey has a great post on the subject of
XML sitemap structure. The basic idea is that, if you divide your sitemap into logical sections, it’s going to be much easier to diagnosis what kinds of pages Google is indexing and where you’re running into trouble.

A couple of pro tips on sitemaps – first, keep your old sitemaps active temporarily. This is counterintuitive to some people, but unless Google can crawl your old URLs, they won’t see and process the 301-redirects and other signals. Let the old accounts stay open for a couple of months, and don’t cut off access to the domains you’re moving.

Second (I learned this one the hard way), make sure that your Google Webmaster Tools site verification still works. If you use file uploads or meta tags and don’t move those files/tags to the new site, GWT verification will fail and you won’t have access to your old accounts. I’d recommend using a more domain-independent solution, like verifying with Google Analytics. If you lose verification, don’t panic – your data won’t be instantly lost.

Step 3 – The Waiting Game

Once you’ve made the switch, the waiting begins, and this is where many people start to panic. Even executed perfectly, it can take Google weeks or even months to process all of your 301-redirects and reevaluate a new domain’s capacity to rank. You have to expect short term fluctuations in ranking and traffic.

During this period, you’ll want to watch a few things – your traffic, your rankings, your indexed pages (via GWT and the site: operator), and your errors (such as unexpected 404s). Traffic will recover the fastest, since direct traffic is immediately carried through redirects, but ranking and indexation will lag, and errors may take time to appear.

(1) Monitor Traffic

I’m hoping you know how to check your traffic, but actually trying to determine what your new levels should be and comparing any two days can be easier said than done. If you launch on a Friday, and then Saturday your traffic goes down on the new site, that’s hardly cause for panic – your traffic probably
always goes down on Saturday.

In this case, I redirected the individual sites over about a week, but I’m going to focus on UserEffect.com, as that was the major traffic generator. That site was redirected, in full on January 21st, and the Google Analytics data for January for the old site looked like this:

So far, so good – traffic bottomed out almost immediately. Of course, losing traffic is easy – the real question is what’s going on with the new domain. Here’s the graph for January for DrPete.co:

This one’s a bit trickier – the first spike, on January 16th, is when I redirected the first domain. The second spike, on January 22nd, is when I redirected UserEffect.com. Both spikes are meaningless – I announced these re-launches on social media and got a short-term traffic burst. What we really want to know is where traffic is leveling out.

Of course, there isn’t a lot of history here, but a typical day for UserEffect.com in January was about 1,000 pageviews. The traffic to DrPete.co after it leveled out was about half that (500 pageviews). It’s not a complete crisis, but we’re definitely looking at a short-term loss.

Obviously, I’m simplifying the process here – for a large, ecommerce site you’d want to track a wide range of metrics, including conversion metrics. Hopefully, though, this illustrates the core approach. So, what am I missing out on? In this day of [not provided], tracking down a loss can be tricky. Let’s look for clues in our other three areas…

(2) Monitor Indexation

You can get a broad sense of your indexed pages from Google Webmaster Tools, but this data often lags real-time and isn’t very granular. Despite its shortcomings, I still prefer
the site: operator. Generally, I monitor a domain daily – any one measurement has a lot of noise, but what you’re looking for is the trend over time. Here’s the indexed page count for DrPete.co:

The first set of pages was indexed fairly quickly, and then the second set started being indexed soon after UserEffect.com was redirected. All in all, we’re seeing a fairly steady upward trend, and that’s what we’re hoping to see. The number is also in the ballpark of sanity (compared to the actual page count) and roughly matched GWT data once it started being reported.

So, what happened to UserEffect.com’s index after the switch?

The timeframe here is shorter, since UserEffect.com was redirected last, but we see a gradual decline in indexation, as expected. Note that the index size plateaus around 60 pages – about 1/4 of the original size. This isn’t abnormal – low-traffic and unlinked pages (or those with deep links) are going to take a while to clear out. This is a long-term process. Don’t panic over the absolute numbers – what you want here is a downward trend on the old domain accompanied by a roughly equal upward trend on the new domain.

The fact that UserEffect.com didn’t bottom out is definitely worth monitoring, but this timespan is too short for the plateau to be a major concern. The next step would be to dig into these specific pages and look for a pattern.

(3) Monitor Rankings

The old domain is dropping out of the index, and the new domain is taking its place, but we still don’t know why the new site is taking a traffic hit. It’s time to dig into our core keyword rankings.

Historically, UserEffect.com had ranked well for keywords related to “split test calculator” (near #1) and “usability checklist” (in the top 3). While [not provided] makes keyword-level traffic analysis tricky, we also know that the split-test calculator is one of the top trafficked pages on the site, so let’s dig into that one. Here’s the ranking data from Moz Analytics for “split test calculator”:

The new site took over the #1 position from the old site at first, but then quickly dropped down to the #3/#4 ranking. That may not sound like a lot, but given this general keyword category was one of the site’s top traffic drivers, the CTR drop from #1 to #3/#4 could definitely be causing problems.

When you have a specific keyword you can diagnose, it’s worth taking a look at the live SERP, just to get some context. The day after relaunch, I captured this result for “dr. pete”:

Here, the new domain is ranking, but it’s showing the old title tag. This may not be cause for alarm – weird things often happen in the very short term – but in this case we know that I accidentally set up a 302-redirect. There’s some reason to believe that Google didn’t pass full link equity during that period when 301s weren’t implemented.

Let’s look at a domain where the 301s behaved properly. Before the site was inactive, AreYouARealDoctor.com ranked #1 for “are you a real doctor”. Since there was an inactive period, and I dropped the exact-match domain, it wouldn’t be surprising to see a corresponding ranking drop.

In reality, the new site was ranking #1 for “are you a real doctor” within 2 weeks of 301-redirecting the old domain. The graph is just a horizontal line at #1, so I’m not going to bother you with it, but here’s a current screenshot (incognito):

Early on, I also spot-checked this result, and it wasn’t showing the strange title tag crossover that UserEffect.com pages exhibited. So, it’s very likely that the 302-redirects caused some problems.

Of course, these are just a couple of keywords, but I hope it provides a starting point for you to understand how to methodically approach this problem. There’s no use crying over spilled milk, and I’m not going to fire myself, so let’s move on to checking any other errors that I might have missed.

(4) Check Errors (404s, etc.)

A good first stop for unexpected errors is the “Crawl Errors” report in Google Webmaster Tools (Crawl > Crawl Errors). This is going to take some digging, especially if you’ve deliberately 404’ed some content. Over the couple of weeks after re-launch, I spotted the following problems:

The old site had a “/blog” directory, but the new site put the blog right on the home-page and had no corresponding directory. Doh. Hey, do as I say, not as I do, ok? Obviously, this was a big blunder, as the old blog home-page was well-trafficked.

The other two errors here are smaller but easy to correct. MinimalTalent.com had a “/free” directory that housed downloads (mostly PDFs). I missed it, since my other sites used a different format. Luckily, this was easy to remap.

The last error is a weird looking URL, and there are other similar URLs in the 404 list. This is where site knowledge is critical. I custom-designed a URL shortener for UserEffect.com and, in some cases, people linked to those URLs. Since those URLs didn’t exist in the site architecture, I missed them. This is where digging deep into historical traffic reports and your top-linked pages is critical. In this case, the fix isn’t easy, and I have to decide whether the loss is worth the time.

What About the New EMD?

My goal here wasn’t to rank better for “Dr. Pete,” and finally unseat Dr. Pete’s Marinades, Dr. Pete the Sodastream flavor (yes, it’s hilarious – you can stop sending me your grocery store photos), and 172 dentists. Ok, it mostly wasn’t my goal. Of course, you might be wondering how switching to an EMD worked out.

In the short term, I’m afraid the answer is “not very well.” I didn’t track ranking for “Dr. Pete” and related phrases very often before the switch, but it appears that ranking actually fell in the short-term. Current estimates have me sitting around page 4, even though my combined link profile suggests a much stronger position. Here’s a look at the ranking history for “dr pete” since relaunch (from Moz Analytics):

There was an initial drop, after which the site evened out a bit. This less-than-impressive plateau could be due to the bad 302s during transition. It could be Google evaluating a new EMD and multiple redirects to that EMD. It could be that the prevalence of natural anchor text with “Dr. Pete” pointing to my site suddenly looked unnatural when my domain name switched to DrPete.co. It could just be that this is going to take time to shake out.

If there’s a lesson here (and, admittedly, it’s too soon to tell), it’s that you shouldn’t rush to buy an EMD in 2015 in the wild hope of instantly ranking for that target phrase. There are so many factors involved in ranking for even a moderately competitive term, and your domain is just one small part of the mix.

So, What Did We Learn?

I hope you learned that I should’ve taken my own advice and planned a bit more carefully. I admit that this was a side project and it didn’t get the attention it deserved. The problem is that, even when real money is at stake, people rush these things and hope for the best. There’s a real cheerleading mentality when it comes to change – people want to take action and only see the upside.

Ultimately, in a corporate or agency environment, you can’t be the one sour note among the cheering. You’ll be ignored, and possibly even fired. That’s not fair, but it’s reality. What you need to do is make sure the work gets done right and people go into the process with eyes wide open. There’s no room for shortcuts when you’re moving to a new domain.

That said, a domain change isn’t a death sentence, either. Done right, and with sensible goals in mind – balancing not just SEO but broader marketing and business objectives – a domain migration can be successful, even across multiple sites.

To sum up: Plan, plan, plan, monitor, monitor, monitor, and try not to panic.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]