Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking – Whiteboard Friday

Posted by randfish

There’s no doubt that quite a bit has changed about SEO, and that the field is far more integrated with other aspects of online marketing than it once was. In today’s Whiteboard Friday, Rand pushes back against the idea that effective modern SEO doesn’t require any technical expertise, outlining a fantastic list of technical elements that today’s SEOs need to know about in order to be truly effective.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week I’m going to do something unusual. I don’t usually point out these inconsistencies or sort of take issue with other folks’ content on the web, because I generally find that that’s not all that valuable and useful. But I’m going to make an exception here.

There is an article by Jayson DeMers, who I think might actually be here in Seattle — maybe he and I can hang out at some point — called “Why Modern SEO Requires Almost No Technical Expertise.” It was an article that got a shocking amount of traction and attention. On Facebook, it has thousands of shares. On LinkedIn, it did really well. On Twitter, it got a bunch of attention.

Some folks in the SEO world have already pointed out some issues around this. But because of the increasing popularity of this article, and because I think there’s, like, this hopefulness from worlds outside of kind of the hardcore SEO world that are looking to this piece and going, “Look, this is great. We don’t have to be technical. We don’t have to worry about technical things in order to do SEO.”

Look, I completely get the appeal of that. I did want to point out some of the reasons why this is not so accurate. At the same time, I don’t want to rain on Jayson, because I think that it’s very possible he’s writing an article for Entrepreneur, maybe he has sort of a commitment to them. Maybe he had no idea that this article was going to spark so much attention and investment. He does make some good points. I think it’s just really the title and then some of the messages inside there that I take strong issue with, and so I wanted to bring those up.

First off, some of the good points he did bring up.

One, he wisely says, “You don’t need to know how to code or to write and read algorithms in order to do SEO.” I totally agree with that. If today you’re looking at SEO and you’re thinking, “Well, am I going to get more into this subject? Am I going to try investing in SEO? But I don’t even know HTML and CSS yet.”

Those are good skills to have, and they will help you in SEO, but you don’t need them. Jayson’s totally right. You don’t have to have them, and you can learn and pick up some of these things, and do searches, watch some Whiteboard Fridays, check out some guides, and pick up a lot of that stuff later on as you need it in your career. SEO doesn’t have that hard requirement.

And secondly, he makes an intelligent point that we’ve made many times here at Moz, which is that, broadly speaking, a better user experience is well correlated with better rankings.

You make a great website that delivers great user experience, that provides the answers to searchers’ questions and gives them extraordinarily good content, way better than what’s out there already in the search results, generally speaking you’re going to see happy searchers, and that’s going to lead to higher rankings.

But not entirely. There are a lot of other elements that go in here. So I’ll bring up some frustrating points around the piece as well.

First off, there’s no acknowledgment — and I find this a little disturbing — that the ability to read and write code, or even HTML and CSS, which I think are the basic place to start, is helpful or can take your SEO efforts to the next level. I think both of those things are true.

So being able to look at a web page, view source on it, or pull up Firebug in Firefox or something and diagnose what’s going on and then go, “Oh, that’s why Google is not able to see this content. That’s why we’re not ranking for this keyword or term, or why even when I enter this exact sentence in quotes into Google, which is on our page, this is why it’s not bringing it up. It’s because it’s loading it after the page from a remote file that Google can’t access.” These are technical things, and being able to see how that code is built, how it’s structured, and what’s going on there, very, very helpful.

Some coding knowledge also can take your SEO efforts even further. I mean, so many times, SEOs are stymied by the conversations that we have with our programmers and our developers and the technical staff on our teams. When we can have those conversations intelligently, because at least we understand the principles of how an if-then statement works, or what software engineering best practices are being used, or they can upload something into a GitHub repository, and we can take a look at it there, that kind of stuff is really helpful.

Secondly, I don’t like that the article overly reduces all of this information that we have about what we’ve learned about Google. So he mentions two sources. One is things that Google tells us, and others are SEO experiments. I think both of those are true. Although I’d add that there’s sort of a sixth sense of knowledge that we gain over time from looking at many, many search results and kind of having this feel for why things rank, and what might be wrong with a site, and getting really good at that using tools and data as well. There are people who can look at Open Site Explorer and then go, “Aha, I bet this is going to happen.” They can look, and 90% of the time they’re right.

So he boils this down to, one, write quality content, and two, reduce your bounce rate. Neither of those things are wrong. You should write quality content, although I’d argue there are lots of other forms of quality content that aren’t necessarily written — video, images and graphics, podcasts, lots of other stuff.

And secondly, that just doing those two things is not always enough. So you can see, like many, many folks look and go, “I have quality content. It has a low bounce rate. How come I don’t rank better?” Well, your competitors, they’re also going to have quality content with a low bounce rate. That’s not a very high bar.

Also, frustratingly, this really gets in my craw. I don’t think “write quality content” means anything. You tell me. When you hear that, to me that is a totally non-actionable, non-useful phrase that’s a piece of advice that is so generic as to be discardable. So I really wish that there was more substance behind that.

The article also makes, in my opinion, the totally inaccurate claim that modern SEO really is reduced to “the happier your users are when they visit your site, the higher you’re going to rank.”

Wow. Okay. Again, I think broadly these things are correlated. User happiness and rank is broadly correlated, but it’s not a one to one. This is not like a, “Oh, well, that’s a 1.0 correlation.”

I would guess that the correlation is probably closer to like the page authority range. I bet it’s like 0.35 or something correlation. If you were to actually measure this broadly across the web and say like, “Hey, were you happier with result one, two, three, four, or five,” the ordering would not be perfect at all. It probably wouldn’t even be close.

There’s a ton of reasons why sometimes someone who ranks on Page 2 or Page 3 or doesn’t rank at all for a query is doing a better piece of content than the person who does rank well or ranks on Page 1, Position 1.

Then the article suggests five and sort of a half steps to successful modern SEO, which I think is a really incomplete list. So Jayson gives us;

  • Good on-site experience
  • Writing good content
  • Getting others to acknowledge you as an authority
  • Rising in social popularity
  • Earning local relevance
  • Dealing with modern CMS systems (which he notes most modern CMS systems are SEO-friendly)

The thing is there’s nothing actually wrong with any of these. They’re all, generally speaking, correct, either directly or indirectly related to SEO. The one about local relevance, I have some issue with, because he doesn’t note that there’s a separate algorithm for sort of how local SEO is done and how Google ranks local sites in maps and in their local search results. Also not noted is that rising in social popularity won’t necessarily directly help your SEO, although it can have indirect and positive benefits.

I feel like this list is super incomplete. Okay, I brainstormed just off the top of my head in the 10 minutes before we filmed this video a list. The list was so long that, as you can see, I filled up the whole whiteboard and then didn’t have any more room. I’m not going to bother to erase and go try and be absolutely complete.

But there’s a huge, huge number of things that are important, critically important for technical SEO. If you don’t know how to do these things, you are sunk in many cases. You can’t be an effective SEO analyst, or consultant, or in-house team member, because you simply can’t diagnose the potential problems, rectify those potential problems, identify strategies that your competitors are using, be able to diagnose a traffic gain or loss. You have to have these skills in order to do that.

I’ll run through these quickly, but really the idea is just that this list is so huge and so long that I think it’s very, very, very wrong to say technical SEO is behind us. I almost feel like the opposite is true.

We have to be able to understand things like;

  • Content rendering and indexability
  • Crawl structure, internal links, JavaScript, Ajax. If something’s post-loading after the page and Google’s not able to index it, or there are links that are accessible via JavaScript or Ajax, maybe Google can’t necessarily see those or isn’t crawling them as effectively, or is crawling them, but isn’t assigning them as much link weight as they might be assigning other stuff, and you’ve made it tough to link to them externally, and so they can’t crawl it.
  • Disabling crawling and/or indexing of thin or incomplete or non-search-targeted content. We have a bunch of search results pages. Should we use rel=prev/next? Should we robots.txt those out? Should we disallow from crawling with meta robots? Should we rel=canonical them to other pages? Should we exclude them via the protocols inside Google Webmaster Tools, which is now Google Search Console?
  • Managing redirects, domain migrations, content updates. A new piece of content comes out, replacing an old piece of content, what do we do with that old piece of content? What’s the best practice? It varies by different things. We have a whole Whiteboard Friday about the different things that you could do with that. What about a big redirect or a domain migration? You buy another company and you’re redirecting their site to your site. You have to understand things about subdomain structures versus subfolders, which, again, we’ve done another Whiteboard Friday about that.
  • Proper error codes, downtime procedures, and not found pages. If your 404 pages turn out to all be 200 pages, well, now you’ve made a big error there, and Google could be crawling tons of 404 pages that they think are real pages, because you’ve made it a status code 200, or you’ve used a 404 code when you should have used a 410, which is a permanently removed, to be able to get it completely out of the indexes, as opposed to having Google revisit it and keep it in the index.

Downtime procedures. So there’s specifically a… I can’t even remember. It’s a 5xx code that you can use. Maybe it was a 503 or something that you can use that’s like, “Revisit later. We’re having some downtime right now.” Google urges you to use that specific code rather than using a 404, which tells them, “This page is now an error.”

Disney had that problem a while ago, if you guys remember, where they 404ed all their pages during an hour of downtime, and then their homepage, when you searched for Disney World, was, like, “Not found.” Oh, jeez, Disney World, not so good.

  • International and multi-language targeting issues. I won’t go into that. But you have to know the protocols there. Duplicate content, syndication, scrapers. How do we handle all that? Somebody else wants to take our content, put it on their site, what should we do? Someone’s scraping our content. What can we do? We have duplicate content on our own site. What should we do?
  • Diagnosing traffic drops via analytics and metrics. Being able to look at a rankings report, being able to look at analytics connecting those up and trying to see: Why did we go up or down? Did we have less pages being indexed, more pages being indexed, more pages getting traffic less, more keywords less?
  • Understanding advanced search parameters. Today, just today, I was checking out the related parameter in Google, which is fascinating for most sites. Well, for Moz, weirdly, related:oursite.com shows nothing. But for virtually every other sit, well, most other sites on the web, it does show some really interesting data, and you can see how Google is connecting up, essentially, intentions and topics from different sites and pages, which can be fascinating, could expose opportunities for links, could expose understanding of how they view your site versus your competition or who they think your competition is.

Then there are tons of parameters, like in URL and in anchor, and da, da, da, da. In anchor doesn’t work anymore, never mind about that one.

I have to go faster, because we’re just going to run out of these. Like, come on. Interpreting and leveraging data in Google Search Console. If you don’t know how to use that, Google could be telling you, you have all sorts of errors, and you don’t know what they are.

  • Leveraging topic modeling and extraction. Using all these cool tools that are coming out for better keyword research and better on-page targeting. I talked about a couple of those at MozCon, like MonkeyLearn. There’s the new Moz Context API, which will be coming out soon, around that. There’s the Alchemy API, which a lot of folks really like and use.
  • Identifying and extracting opportunities based on site crawls. You run a Screaming Frog crawl on your site and you’re going, “Oh, here’s all these problems and issues.” If you don’t have these technical skills, you can’t diagnose that. You can’t figure out what’s wrong. You can’t figure out what needs fixing, what needs addressing.
  • Using rich snippet format to stand out in the SERPs. This is just getting a better click-through rate, which can seriously help your site and obviously your traffic.
  • Applying Google-supported protocols like rel=canonical, meta description, rel=prev/next, hreflang, robots.txt, meta robots, x robots, NOODP, XML sitemaps, rel=nofollow. The list goes on and on and on. If you’re not technical, you don’t know what those are, you think you just need to write good content and lower your bounce rate, it’s not going to work.
  • Using APIs from services like AdWords or MozScape, or hrefs from Majestic, or SEM refs from SearchScape or Alchemy API. Those APIs can have powerful things that they can do for your site. There are some powerful problems they could help you solve if you know how to use them. It’s actually not that hard to write something, even inside a Google Doc or Excel, to pull from an API and get some data in there. There’s a bunch of good tutorials out there. Richard Baxter has one, Annie Cushing has one, I think Distilled has some. So really cool stuff there.
  • Diagnosing page load speed issues, which goes right to what Jayson was talking about. You need that fast-loading page. Well, if you don’t have any technical skills, you can’t figure out why your page might not be loading quickly.
  • Diagnosing mobile friendliness issues
  • Advising app developers on the new protocols around App deep linking, so that you can get the content from your mobile apps into the web search results on mobile devices. Awesome. Super powerful. Potentially crazy powerful, as mobile search is becoming bigger than desktop.

Okay, I’m going to take a deep breath and relax. I don’t know Jayson’s intention, and in fact, if he were in this room, he’d be like, “No, I totally agree with all those things. I wrote the article in a rush. I had no idea it was going to be big. I was just trying to make the broader points around you don’t have to be a coder in order to do SEO.” That’s completely fine.

So I’m not going to try and rain criticism down on him. But I think if you’re reading that article, or you’re seeing it in your feed, or your clients are, or your boss is, or other folks are in your world, maybe you can point them to this Whiteboard Friday and let them know, no, that’s not quite right. There’s a ton of technical SEO that is required in 2015 and will be for years to come, I think, that SEOs have to have in order to be effective at their jobs.

All right, everyone. Look forward to some great comments, and we’ll see you again next time for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Controlling Search Engine Crawlers for Better Indexation and Rankings – Whiteboard Friday

Posted by randfish

When should you disallow search engines in your robots.txt file, and when should you use meta robots tags in a page header? What about nofollowing links? In today’s Whiteboard Friday, Rand covers these tools and their appropriate use in four situations that SEOs commonly find themselves facing.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to talk about controlling search engine crawlers, blocking bots, sending bots where we want, restricting them from where we don’t want them to go. We’re going to talk a little bit about crawl budget and what you should and shouldn’t have indexed.

As a start, what I want to do is discuss the ways in which we can control robots. Those include the three primary ones: robots.txt, meta robots, and—well, the nofollow tag is a little bit less about controlling bots.

There are a few others that we’re going to discuss as well, including Webmaster Tools (Search Console) and URL status codes. But let’s dive into those first few first.

Robots.txt lives at yoursite.com/robots.txt, it tells crawlers what they should and shouldn’t access, it doesn’t always get respected by Google and Bing. So a lot of folks when you say, “hey, disallow this,” and then you suddenly see those URLs popping up and you’re wondering what’s going on, look—Google and Bing oftentimes think that they just know better. They think that maybe you’ve made a mistake, they think “hey, there’s a lot of links pointing to this content, there’s a lot of people who are visiting and caring about this content, maybe you didn’t intend for us to block it.” The more specific you get about an individual URL, the better they usually are about respecting it. The less specific, meaning the more you use wildcards or say “everything behind this entire big directory,” the worse they are about necessarily believing you.

Meta robots—a little different—that lives in the headers of individual pages, so you can only control a single page with a meta robots tag. That tells the engines whether or not they should keep a page in the index, and whether they should follow the links on that page, and it’s usually a lot more respected, because it’s at an individual-page level; Google and Bing tend to believe you about the meta robots tag.

And then the nofollow tag, that lives on an individual link on a page. It doesn’t tell engines where to crawl or not to crawl. All it’s saying is whether you editorially vouch for a page that is being linked to, and whether you want to pass the PageRank and link equity metrics to that page.

Interesting point about meta robots and robots.txt working together (or not working together so well)—many, many folks in the SEO world do this and then get frustrated.

What if, for example, we take a page like “blogtest.html” on our domain and we say “all user agents, you are not allowed to crawl blogtest.html. Okay—that’s a good way to keep that page away from being crawled, but just because something is not crawled doesn’t necessarily mean it won’t be in the search results.

So then we have our SEO folks go, “you know what, let’s make doubly sure that doesn’t show up in search results; we’ll put in the meta robots tag:”

<meta name="robots" content="noindex, follow">

So, “noindex, follow” tells the search engine crawler they can follow the links on the page, but they shouldn’t index this particular one.

Then, you go and run a search for “blog test” in this case, and everybody on the team’s like “What the heck!? WTF? Why am I seeing this page show up in search results?”

The answer is, you told the engines that they couldn’t crawl the page, so they didn’t. But they are still putting it in the results. They’re actually probably not going to include a meta description; they might have something like “we can’t include a meta description because of this site’s robots.txt file.” The reason it’s showing up is because they can’t see the noindex; all they see is the disallow.

So, if you want something truly removed, unable to be seen in search results, you can’t just disallow a crawler. You have to say meta “noindex” and you have to let them crawl it.

So this creates some complications. Robots.txt can be great if we’re trying to save crawl bandwidth, but it isn’t necessarily ideal for preventing a page from being shown in the search results. I would not recommend, by the way, that you do what we think Twitter recently tried to do, where they tried to canonicalize www and non-www by saying “Google, don’t crawl the www version of twitter.com.” What you should be doing is rel canonical-ing or using a 301.

Meta robots—that can allow crawling and link-following while disallowing indexation, which is great, but it requires crawl budget and you can still conserve indexing.

The nofollow tag, generally speaking, is not particularly useful for controlling bots or conserving indexation.

Webmaster Tools (now Google Search Console) has some special things that allow you to restrict access or remove a result from the search results. For example, if you have 404’d something or if you’ve told them not to crawl something but it’s still showing up in there, you can manually say “don’t do that.” There are a few other crawl protocol things that you can do.

And then URL status codes—these are a valid way to do things, but they’re going to obviously change what’s going on on your pages, too.

If you’re not having a lot of luck using a 404 to remove something, you can use a 410 to permanently remove something from the index. Just be aware that once you use a 410, it can take a long time if you want to get that page re-crawled or re-indexed, and you want to tell the search engines “it’s back!” 410 is permanent removal.

301—permanent redirect, we’ve talked about those here—and 302, temporary redirect.

Now let’s jump into a few specific use cases of “what kinds of content should and shouldn’t I allow engines to crawl and index” in this next version…

[Rand moves at superhuman speed to erase the board and draw part two of this Whiteboard Friday. Seriously, we showed Roger how fast it was, and even he was impressed.]

Four crawling/indexing problems to solve

So we’ve got these four big problems that I want to talk about as they relate to crawling and indexing.

1. Content that isn’t ready yet

The first one here is around, “If I have content of quality I’m still trying to improve—it’s not yet ready for primetime, it’s not ready for Google, maybe I have a bunch of products and I only have the descriptions from the manufacturer and I need people to be able to access them, so I’m rewriting the content and creating unique value on those pages… they’re just not ready yet—what should I do with those?”

My options around crawling and indexing? If I have a large quantity of those—maybe thousands, tens of thousands, hundreds of thousands—I would probably go the robots.txt route. I’d disallow those pages from being crawled, and then eventually as I get (folder by folder) those sets of URLs ready, I can then allow crawling and maybe even submit them to Google via an XML sitemap.

If I’m talking about a small quantity—a few dozen, a few hundred pages—well, I’d probably just use the meta robots noindex, and then I’d pull that noindex off of those pages as they are made ready for Google’s consumption. And then again, I would probably use the XML sitemap and start submitting those once they’re ready.

2. Dealing with duplicate or thin content

What about, “Should I noindex, nofollow, or potentially disallow crawling on largely duplicate URLs or thin content?” I’ve got an example. Let’s say I’m an ecommerce shop, I’m selling this nice Star Wars t-shirt which I think is kind of hilarious, so I’ve got starwarsshirt.html, and it links out to a larger version of an image, and that’s an individual HTML page. It links out to different colors, which change the URL of the page, so I have a gray, blue, and black version. Well, these four pages are really all part of this same one, so I wouldn’t recommend disallowing crawling on these, and I wouldn’t recommend noindexing them. What I would do there is a rel canonical.

Remember, rel canonical is one of those things that can be precluded by disallowing. So, if I were to disallow these from being crawled, Google couldn’t see the rel canonical back, so if someone linked to the blue version instead of the default version, now I potentially don’t get link credit for that. So what I really want to do is use the rel canonical, allow the indexing, and allow it to be crawled. If you really feel like it, you could also put a meta “noindex, follow” on these pages, but I don’t really think that’s necessary, and again that might interfere with the rel canonical.

3. Passing link equity without appearing in search results

Number three: “If I want to pass link equity (or at least crawling) through a set of pages without those pages actually appearing in search results—so maybe I have navigational stuff, ways that humans are going to navigate through my pages, but I don’t need those appearing in search results—what should I use then?”

What I would say here is, you can use the meta robots to say “don’t index the page, but do follow the links that are on that page.” That’s a pretty nice, handy use case for that.

Do NOT, however, disallow those in robots.txt—many, many folks make this mistake. What happens if you disallow crawling on those, Google can’t see the noindex. They don’t know that they can follow it. Granted, as we talked about before, sometimes Google doesn’t obey the robots.txt, but you can’t rely on that behavior. Trust that the disallow in robots.txt will prevent them from crawling. So I would say, the meta robots “noindex, follow” is the way to do this.

4. Search results-type pages

Finally, fourth, “What should I do with search results-type pages?” Google has said many times that they don’t like your search results from your own internal engine appearing in their search results, and so this can be a tricky use case.

Sometimes a search result page—a page that lists many types of results that might come from a database of types of content that you’ve got on your site—could actually be a very good result for a searcher who is looking for a wide variety of content, or who wants to see what you have on offer. Yelp does this: When you say, “I’m looking for restaurants in Seattle, WA,” they’ll give you what is essentially a list of search results, and Google does want those to appear because that page provides a great result. But you should be doing what Yelp does there, and make the most common or popular individual sets of those search results into category-style pages. A page that provides real, unique value, that’s not just a list of search results, that is more of a landing page than a search results page.

However, that being said, if you’ve got a long tail of these, or if you’d say “hey, our internal search engine, that’s really for internal visitors only—it’s not useful to have those pages show up in search results, and we don’t think we need to make the effort to make those into category landing pages.” Then you can use the disallow in robots.txt to prevent those.

Just be cautious here, because I have sometimes seen an over-swinging of the pendulum toward blocking all types of search results, and sometimes that can actually hurt your SEO and your traffic. Sometimes those pages can be really useful to people. So check your analytics, and make sure those aren’t valuable pages that should be served up and turned into landing pages. If you’re sure, then go ahead and disallow all your search results-style pages. You’ll see a lot of sites doing this in their robots.txt file.

That being said, I hope you have some great questions about crawling and indexing, controlling robots, blocking robots, allowing robots, and I’ll try and tackle those in the comments below.

We’ll look forward to seeing you again next week for another edition of Whiteboard Friday. Take care!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

How to Defeat Duplicate Content – Next Level

Posted by EllieWilkinson

Welcome to the third installment of Next Level! In the previous Next Level blog post, we shared a workflow showing you how to take on your competitors using Moz tools. We’re continuing the educational series with several new videos all about resolving duplicate content. Read on and level up!


Dealing with duplicate content can feel a bit like doing battle with your site’s evil doppelgänger—confusing and tricky to defeat! But identifying and resolving duplicates is a necessary part of helping search engines decide on relevant results. In this short video, learn about how duplicate content happens, why it’s important to fix, and a bit about how you can uncover it.

Next Level – Identifying Duplicate_pt1

[
Quick clarification: Search engines don’t actively penalize duplicate content, per se; they just don’t always understand it as well, which can lead to a drop in rankings. More info here.]

Now that you have a better idea of how to identify those dastardly duplicates, let’s get rid of ’em once and for all. Watch this next video to review how to use Moz Analytics to find and fix duplicate content using three common solutions. (You’ll need a Moz Pro subscription to use Moz Analytics. If you aren’t yet a Moz Pro subscriber, you can always try out the tools with a
30-day free trial.)

Workflow summary

Here’s a review of the three common solutions to conquering duplicate content:

  1. 301 redirect. Check Page Authority to see if one page has a higher PA than the other using Open Site Explorer, then set up a 301 redirect from the duplicate page to the original page. This will ensure that they no longer compete with one another in the search results. Wondering what a 301 redirect is and how to do it? Read more about redirection here.
  2. Rel=canonical. A rel=canonical tag passes the same amount of ranking power as a 301 redirect, and there’s a bonus: it often takes less development time to implement! Add this tag to the HTML head of a web page to tell search engines that it should be treated as a copy of the “canon,” or original, page:
    <head> <link rel="canonical" href="http://moz.com/blog/" /> </head>

    If you’re curious, you can
    read more about canonicalization here.

  3. noindex, follow. Add the values “noindex, follow” to the meta robots tag to tell search engines not to include the duplicate pages in their indexes, but to crawl their links. This works really well with paginated content or if you have a system set up to tag or categorize content (as with a blog). Here’s what it should look like:
    <head> <meta name="robots" content="noindex, follow" /> </head>

    If you’re looking to block the Moz crawler, Rogerbot, you can use the robots.txt file if you prefer—he’s a good robot, and he’ll obey!
    More about meta robots (and robots.txt) here.

Can’t get enough of duplicate content? Want to become a duplicate content connoisseur? This last video explains more about how Moz finds duplicates, if you’re curious. And you can read even more over at the
Moz Developer Blog.

We’d love to hear about your techniques for defeating duplicates! Chime in below in the comments.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Technical Site Audit Checklist: 2015 Edition

Posted by GeoffKenyon

Back in 2011, I wrote a technical site audit checklist, and while it was thorough, there have been a lot of additions to what is encompassed in a site audit. I have gone through and updated that old checklist for 2015. Some of the biggest changes were the addition of sections for mobile, international, and site speed.

This checklist should help you put together a thorough site audit and determine what is holding back the organic performance of your site. At the end of your audit, don’t write a document that says what’s wrong with the website. Instead, create a document that says what needs to be done. Then explain why these actions need to be taken and why they are important. What I’ve found to really helpful is to provide a prioritized list along with your document of all the actions that you would like them to implement. This list can be handed off to a dev or content team to be implemented easily. These teams can refer to your more thorough document as needed.


Quick overview

Check indexed pages  
  • Do a site: search.
  • How many pages are returned? (This can be way off so don’t put too much stock in this).
  • Is the homepage showing up as the first result? 
  • If the homepage isn’t showing up as the first result, there could be issues, like a penalty or poor site architecture/internal linking, affecting the site. This may be less of a concern as Google’s John Mueller recently said that your homepage doesn’t need to be listed first.

Review the number of organic landing pages in Google Analytics

  • Does this match with the number of results in a site: search?
  • This is often the best view of how many pages are in a search engine’s index that search engines find valuable.

Search for the brand and branded terms

  • Is the homepage showing up at the top, or are correct pages showing up?
  • If the proper pages aren’t showing up as the first result, there could be issues, like a penalty, in play.
Check Google’s cache for key pages
  • Is the content showing up?
  • Are navigation links present?
  • Are there links that aren’t visible on the site?
PRO Tip:
Don’t forget to check the text-only version of the cached page. Here is a
bookmarklet to help you do that.

Do a mobile search for your brand and key landing pages

  • Does your listing have the “mobile friendly” label?
  • Are your landing pages mobile friendly?
  • If the answer is no to either of these, it may be costing you organic visits.

On-page optimization

Title tags are optimized
  • Title tags should be optimized and unique.
  • Your brand name should be included in your title tag to improve click-through rates.
  • Title tags are about 55-60 characters (512 pixels) to be fully displayed. You can test here or review title pixel widths in Screaming Frog.
Important pages have click-through rate optimized titles and meta descriptions
  • This will help improve your organic traffic independent of your rankings.
  • You can use SERP Turkey for this.

Check for pages missing page titles and meta descriptions
  
The on-page content includes the primary keyword phrase multiple times as well as variations and alternate keyword phrases
  
There is a significant amount of optimized, unique content on key pages
 
The primary keyword phrase is contained in the H1 tag
  

Images’ file names and alt text are optimized to include the primary keyword phrase associated with the page.
 
URLs are descriptive and optimized
  • While it is beneficial to include your keyword phrase in URLs, changing your URLs can negatively impact traffic when you do a 301. As such, I typically recommend optimizing URLs when the current ones are really bad or when you don’t have to change URLs with existing external links.
Clean URLs
  • No excessive parameters or session IDs.
  • URLs exposed to search engines should be static.
Short URLs
  • 115 characters or shorter – this character limit isn’t set in stone, but shorter URLs are better for usability.

Content

Homepage content is optimized
  • Does the homepage have at least one paragraph?
  • There has to be enough content on the page to give search engines an understanding of what a page is about. Based on my experience, I typically recommend at least 150 words.
Landing pages are optimized
  • Do these pages have at least a few paragraphs of content? Is it enough to give search engines an understanding of what the page is about?
  • Is it template text or is it completely unique?
Site contains real and substantial content
  • Is there real content on the site or is the “content” simply a list of links?
Proper keyword targeting
  • Does the intent behind the keyword match the intent of the landing page?
  • Are there pages targeting head terms, mid-tail, and long-tail keywords?
Keyword cannibalization
  • Do a site: search in Google for important keyword phrases.
  • Check for duplicate content/page titles using the Moz Pro Crawl Test.
Content to help users convert exists and is easily accessible to users
  • In addition to search engine driven content, there should be content to help educate users about the product or service.
Content formatting
  • Is the content formatted well and easy to read quickly?
  • Are H tags used?
  • Are images used?
  • Is the text broken down into easy to read paragraphs?
Good headlines on blog posts
  • Good headlines go a long way. Make sure the headlines are well written and draw users in.
Amount of content versus ads
  • Since the implementation of Panda, the amount of ad-space on a page has become important to evaluate.
  • Make sure there is significant unique content above the fold.
  • If you have more ads than unique content, you are probably going to have a problem.

Duplicate content

There should be one URL for each piece of content
  • Do URLs include parameters or tracking code? This will result in multiple URLs for a piece of content.
  • Does the same content reside on completely different URLs? This is often due to products/content being replicated across different categories.
Pro Tip:
Exclude common parameters, such as those used to designate tracking code, in Google Webmaster Tools. Read more at
Search Engine Land.
Do a search to check for duplicate content
  • Take a content snippet, put it in quotes and search for it.
  • Does the content show up elsewhere on the domain?
  • Has it been scraped? If the content has been scraped, you should file a content removal request with Google.
Sub-domain duplicate content
  • Does the same content exist on different sub-domains?
Check for a secure version of the site
  • Does the content exist on a secure version of the site?
Check other sites owned by the company
  • Is the content replicated on other domains owned by the company?
Check for “print” pages
  • If there are “printer friendly” versions of pages, they may be causing duplicate content.

Accessibility & Indexation

Check the robots.txt

  • Has the entire site, or important content been blocked? Is link equity being orphaned due to pages being blocked via the robots.txt?

Turn off JavaScript, cookies, and CSS

Now change your user agent to Googlebot

PRO Tip:
Use
SEO Browser to do a quick spot check.

Check the SEOmoz PRO Campaign

  • Check for 4xx errors and 5xx errors.

XML sitemaps are listed in the robots.txt file

XML sitemaps are submitted to Google/Bing Webmaster Tools

Check pages for meta robots noindex tag

  • Are pages accidentally being tagged with the meta robots noindex command
  • Are there pages that should have the noindex command applied
  • You can check the site quickly via a crawl tool such as Moz or Screaming Frog

Do goal pages have the noindex command applied?

  • This is important to prevent direct organic visits from showing up as goals in analytics

Site architecture and internal linking

Number of links on a page
Vertical linking structures are in place
  • Homepage links to category pages.
  • Category pages link to sub-category and product pages as appropriate.
  • Product pages link to relevant category pages.
Horizontal linking structures are in place
  • Category pages link to other relevant category pages.
  • Product pages link to other relevant product pages.
Links are in content
  • Does not utilize massive blocks of links stuck in the content to do internal linking.
Footer links
  • Does not use a block of footer links instead of proper navigation.
  • Does not link to landing pages with optimized anchors.
Good internal anchor text
 
Check for broken links
  • Link Checker and Xenu are good tools for this.

Technical issues

Proper use of 301s
  • Are 301s being used for all redirects?
  • If the root is being directed to a landing page, are they using a 301 instead of a 302?
  • Use Live HTTP Headers Firefox plugin to check 301s.
“Bad” redirects are avoided
  • These include 302s, 307s, meta refresh, and JavaScript redirects as they pass little to no value.
  • These redirects can easily be identified with a tool like Screaming Frog.
Redirects point directly to the final URL and do not leverage redirect chains
  • Redirect chains significantly diminish the amount of link equity associated with the final URL.
  • Google has said that they will stop following a redirect chain after several redirects.
Use of JavaScript
  • Is content being served in JavaScript?
  • Are links being served in JavaScript? Is this to do PR sculpting or is it accidental?
Use of iFrames
  • Is content being pulled in via iFrames?
Use of Flash
  • Is the entire site done in Flash, or is Flash used sparingly in a way that doesn’t hinder crawling?
Check for errors in Google Webmaster Tools
  • Google WMT will give you a good list of technical problems that they are encountering on your site (such as: 4xx and 5xx errors, inaccessible pages in the XML sitemap, and soft 404s)
XML Sitemaps  
  • Are XML sitemaps in place?
  • Are XML sitemaps covering for poor site architecture?
  • Are XML sitemaps structured to show indexation problems?
  • Do the sitemaps follow proper XML protocols
Canonical version of the site established through 301s
 
Canonical version of site is specified in Google Webmaster Tools
 
Rel canonical link tag is properly implemented across the site
Uses absolute URLs instead of relative URLs
  • This can cause a lot of problems if you have a root domain with secure sections.

Site speed


Review page load time for key pages 

Make sure compression is enabled


Enable caching


Optimize your images for the web


Minify your CSS/JS/HTML

Use a good, fast host
  • Consider using a CDN for your images.

Optimize your images for the web

Mobile

Review the mobile experience
  • Is there a mobile site set up?
  • If there is, is it a mobile site, responsive design, or dynamic serving?


Make sure analytics are set up if separate mobile content exists


If dynamic serving is being used, make sure the Vary HTTP header is being used

Review how the mobile experience matches up with the intent of mobile visitors
  • Do your mobile visitors have a different intent than desktop based visitors?
Ensure faulty mobile redirects do not exist
  • If your site redirects mobile visitors away from their intended URL (typically to the homepage), you’re likely going to run into issues impacting your mobile organic performance.
Ensure that the relationship between the mobile site and desktop site is established with proper markup
  • If a mobile site (m.) exists, does the desktop equivalent URL point to the mobile version with rel=”alternate”?
  • Does the mobile version canonical to the desktop version?
  • Official documentation.

International

Review international versions indicated in the URL
  • ex: site.com/uk/ or uk.site.com
Enable country based targeting in webmaster tools
  • If the site is targeted to one specific country, is this specified in webmaster tools? 
  • If the site has international sections, are they targeted in webmaster tools?
Implement hreflang / rel alternate if relevant
If there are multiple versions of a site in the same language (such as /us/ and /uk/, both in English), update the copy been updated so that they are both unique
 

Make sure the currency reflects the country targeted
 
Ensure the URL structure is in the native language 
  • Try to avoid having all URLs in the default language

Analytics

Analytics tracking code is on every page
  • You can check this using the “custom” filter in a Screaming Frog Crawl or by looking for self referrals.
  • Are there pages that should be blocked?
There is only one instance of a GA property on a page
  • Having the same Google Analytics property will create problems with pageview-related metrics such as inflating page views and pages per visit and reducing the bounce rate.
  • It is OK to have multiple GA properties listed, this won’t cause a problem.
Analytics is properly tracking and capturing internal searches
 

Demographics tracking is set up

Adwords and Adsense are properly linked if you are using these platforms
Internal IP addresses are excluded
UTM Campaign Parameters are used for other marketing efforts
Meta refresh and JavaScript redirects are avoided
  • These can artificially lower bounce rates.
Event tracking is set up for key user interactions

This audit covers the main technical elements of a site and should help you uncover any issues that are holding a site back. As with any project, the deliverable is critical. I’ve found focusing on the solution and impact (business case) is the best approach for site audit reports. While it is important to outline the problems, too much detail here can take away from the recommendations. If you’re looking for more resources on site audits, I recommend the following:

Helpful tools for doing a site audit:

Annie Cushing’s Site Audit
Web Developer Toolbar
User Agent Add-on
Firebug
Link Checker
SEObook Toolbar
MozBar (Moz’s SEO toolbar)
Xenu
Screaming Frog
Your own scraper
Inflow’s technical mobile best practices

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Experiment: We Removed a Major Website from Google Search, for Science!

Posted by Cyrus-Shepard

The folks at Groupon surprised us earlier this summer when they reported the
results of an experiment that showed that up to 60% of direct traffic is organic.

In order to accomplish this, Groupon de-indexed their site, effectively removing themselves from Google search results. That’s crazy talk!

Of course, we knew we had to try this ourselves.

We rolled up our sleeves and chose to de-index
Followerwonk, both for its consistent Google traffic and its good analytics setup—that way we could properly measure everything. We were also confident we could quickly bring the site back into Google’s results, which minimized the business risks.

(We discussed de-indexing our main site moz.com, but… no soup for you!)

We wanted to measure and test several things:

  1. How quickly will Google remove a site from its index?
  2. How much of our organic traffic is actually attributed as direct traffic?
  3. How quickly can you bring a site back into search results using the URL removal tool?

Here’s what happened.

How to completely remove a site from Google

The fastest, simplest, and most direct method to completely remove an entire site from Google search results is by using the
URL removal tool

We also understood, via statements form Google engineers, that using this method gave us the biggest chance of bringing the site back, with little risk. Other methods of de-indexing, such as using meta robots NOINDEX, might have taken weeks and caused recovery to take months.

CAUTION: Removing any URLs from a search index is potentially very dangerous, and should be taken very seriously. Do not try this at home; you will not pass go, and will not collect $200!

CAUTION: Removing any URLs from a search index is potentially very dangerous, and should be taken very seriously. Do not try this at home; you will not pass go, and will not collect $200!

After submitting the request, Followerwonk URLs started
disappearing from Google search results in 2-3 hours

The information needs to propagate across different data centers across the globe, so the effect can be delayed in areas. In fact, for the entire duration of the test, organic Google traffic continued to trickle in and never dropped to zero.

The effect on direct vs. organic traffic

In the Groupon experiment, they found that when they lost organic traffic, they
actually lost a bunch of direct traffic as well. The Groupon conclusion was that a large amount of their direct traffic was actually organic—up to 60% on “long URLs”.

At first glance, the overall amount of direct traffic to Followerwonk didn’t change significantly, even when organic traffic dropped.

In fact, we could find no discrepancy in direct traffic outside the expected range.

I ran this by our contacts at Groupon, who said this wasn’t totally unexpected. You see, in their experiment they saw the biggest drop in direct traffic on
long URLs, defined as a URL that is at least as long enough to be in a subfolder, like https://followerwonk.com/bio/?q=content+marketer.

For Followerwonk, the vast majority of traffic goes to the homepage and a handful of other URLs. This means we didn’t have a statistically significant sample size of long URLs to judge the effect. For the long URLs we were able to measure, the results were nebulous. 

Conclusion: While we can’t confirm the Groupon results with our outcome, we can’t discount them either.

It’s quite likely that a portion of your organic traffic is attributed as direct. This is because of different browsers, operating systems and user privacy settings can potentially block referral information from reaching your website.

Bringing your site back from death

After waiting 2 hours,
we deleted the request. Within a few hours all traffic returned to normal. Whew!

Does Google need to recrawl the pages?

If the time period is short enough, and you used the URL removal tool, apparently not.

In the case of Followerwonk, Google removed over
300,000 URLs from its search results, and made them all reappear in mere hours. This suggests that the domain wasn’t completely removed from Google’s index, but only “masked” from appearing for a short period of time.

What about longer periods of de-indexation?

In both the Groupon and Followerwonk experiments, the sites were only de-indexed for a short period of time, and bounced back quickly.

We wanted to find out what would happen if you de-indexed a site for a longer period, like
two and a half days?

I couldn’t convince the team to remove any of our sites from Google search results for a few days, so I choose a smaller personal site that I often subject to merciless SEO experiments.

In this case, I de-indexed the site and didn’t remove the request until three days later. Even with this longer period, all URLs returned within just
a few hours of cancelling the URL removal request.

In the chart below, we revoked the URL removal request on Friday the 25th. The next two days were Saturday and Sunday, both lower traffic days.

Test #2: De-index a personal site for 3 days

Likely, the URLs were still in Google’s index, so we didn’t have to wait for them to be recrawled. 

Here’s another shot of organic traffic before and after the second experiment.

For longer removal periods, a few weeks for example, I speculate Google might drop these semi-permanently from the index and re-inclusion would comprise a much longer time period.

What we learned

  1. While a portion of your organic traffic may be attributed as direct (due to browsers, privacy settings, etc) in our case the effect on direct traffic was negligible.
  2. If you accidentally de-index your site using Google Webmaster Tools, in most cases you can quickly bring it back to life by deleting the request.
  3. Reinclusion happens quickly even after we removed a site for over 2 days. Longer than this, the result is unknown, and you could have problems getting all the pages of your site indexed again.

Further reading

Moz community member Adina Toma wrote an excellent YouMoz post on the re-inclusion process using the same technique, with some excellent tips for other, more extreme situations.

Big thanks to
Peter Bray for volunteering Followerwonk for testing. You are a brave man!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]