Faceted Navigation Intro – Whiteboard Friday

Posted by sergeystefoglo

The topic of faceted navigation is bound to come up at some point in your SEO career. It’s a common solution to product filtering for e-commerce sites, but managing it on the SEO side can quickly spin out of control with the potential to cause indexing bloat and crawl errors. In this week’s Whiteboard Friday, we welcome our friend Sergey Stefoglo to give us a quick refresher on just what faceted nav is and why it matters, then dive into a few key solutions that can help you tame it.

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Hey, Moz fans. My name is Serge. I’m from Distilled. I work at the Seattle office as a consultant. For those of you that don’t know about Distilled, we’re a full-service digital marketing agency specializing in SEO, but have branched out since to work on all sorts of things like content, PR, and recently a split testing tool, ODN.

Today I’m here to talk to you guys about faceted navigation, just the basics. We have a few minutes today, so I’m just going to cover kind of the 101 version of this. But essentially we’re going to go through what the definition is, why we should care as SEOs, why it’s important, what are some options we have with this, and then also what a solution could look like.

1. What is faceted navigation?

For those that don’t know, faceted navigation is essentially something like this, probably a lot nicer than this to be honest. But it’s essentially a page that allows you to filter down or allows a user to filter down based on what they’re looking for. So this is an example we have here of a list of products on a page that sells laptops, Apple laptops in this case.

Right here on the left side, in the green, we have a bunch of facets. Essentially, if you’re a user and you’re going in here, you could look at the size of the screen you might want. You could look at the price of the laptop, etc. That’s what faceted navigation is. Previously, when I worked at my previous agency, I worked on a lot of local SEO things, not really e-commerce, big-scale websites, so I didn’t run into this issue often. I actually didn’t even know it was a thing until I started at Distilled. So this might be interesting for you even if it doesn’t apply at the moment.

2. Why does faceted navigation matter?

Essentially, we should care as SEOs because this can get out of control really quickly. While being very useful to users, obviously it’s helpful to be able to filter down to the specific thing you want. this could get kind of ridiculous for Googlebot.

Faceted navigation can result in indexing bloat and crawl issues

We’ve had clients at Distilled that come to us that are e-commerce brands that have millions of pages in the index being crawled that really shouldn’t be. They don’t bring any value to the site, any revenue, etc. The main reason we should care is because we want to avoid indexation bloat and kind of crawl errors or issues.

3. What options do we have when it comes to controlling which pages are indexed/crawled?

The third thing we’ll talk about is what are some options we have in terms of controlling some of that, so controlling whether a page gets indexed or crawled, etc. I’m not going to get into the specifics of each of these today, but I have a blog post on this topic that we’ll link to at the bottom.

The main, most common options that we have for controlling this kind of thing would be around no indexing a page and stopping Google from indexing it, using canonical tags to choose a page that’s essentially the canonical version, using a disallow rule in robots.txt to stop Google from crawling a certain part of the site, or using the nofollow meta directive as well. Those are some of the most common options. Again, we’re not going to go into the nitty-gritty of each one. They each have their kind of pros and cons, so you can research that for yourselves.

4. What could a solution look like?

So okay, we know all of this. What could be an ideal solution? Before I jump into this, I don’t want you guys to run in to your bosses and say, “This is what we need to do.”

Please, please do your research beforehand because it’s going to vary a lot based on your site. Based on the dev resources you have, you might have to get scrappy with it. Also, do some keyword research mainly around the long tail. There are a lot of instances where you could and might want to have three or four facets indexed.

So again, a huge caveat: this isn’t the end-all be-all solution. It’s something that we’ve recommended at times, when appropriate, to clients. So let’s jump into what an ideal solution, or not ideal solution, a possible solution could look like.

Category, subcategory, and sub-subcategory pages open to indexing and crawling

What we’re looking at here is we’re going to have our category, subcategory, and sub-subcategory pages open to indexation and open to being crawled. In our example here, that would be this page, so /computers/laptops/apple. Perfectly fine. People are probably searching for Apple laptops. In fact, I know they are.

Any pages with one or more facets selected = indexed, facet links get nofollowed

The second step here is any page that has one facet selected, so for example, if I was on this page and I wanted an Apple laptop with a solid state drive in it, I would select that from these options. Those are fine to be indexed. But any time you have one or more facets selected, we want to make sure to nofollow all of these internal links pointing to other facets, essentially to stop link equity from being wasted and to stop Google from wasting time crawling those pages.

Any pages with 2+ facets selected = noindex tag gets added

Then, past that point, if a user selects two or more facets, so if I was interested in an Apple laptop with a solid state hard drive that was in the $1,000 price range for example, the chances of there being a lot of search volume for an Apple laptop for $1,000 with a solid state drive is pretty low.

So what we want to do here is add a noindex tag to those two-plus facet options, and that will again help us control crawl bloat and indexation bloat.

Already set up faceted nav? Think about keyword search volume, then go back and whitelist

The final thing I want to mention here, I touched on it a little bit earlier. But essentially, if you’re doing this after the fact, after the faceted navigation is already set up, which you probably are, it’s worth, again, having a strong think about where there is keyword search volume. If you do this, it’s worth also taking a look back a few months in to see the impact and also see if there’s anything you might want to whitelist. There might be a certain set of facets that do have search volume, so you might want to throw them back into the index. It’s worth taking a look at that.

That’s what faceted navigation is as a quick intro. Thank you for watching. I’d be really interested to hear what you guys think in the comments. Again, like I said, there isn’t a one-size-fits-all solution. So I’d be really interested to hear what’s worked for you, or if you have any questions, please ask them below.

Thank you.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 2 weeks ago from tracking.feedpress.it

Eliminate Duplicate Content in Faceted Navigation with Ajax/JSON/JQuery

Posted by EricEnge

One of the classic problems in SEO is that while complex navigation schemes may be useful to users, they create problems for search engines. Many publishers rely on tags such as rel=canonical, or the parameters settings in Webmaster Tools to try and solve these types of issues. However, each of the potential solutions has limitations. In today’s post, I am going to outline how you can use JavaScript solutions to more completely eliminate the problem altogether.

Note that I am not going to provide code examples in this post, but I am going to outline how it works on a conceptual level. If you are interested in learning more about Ajax/JSON/jQuery here are some resources you can check out:

  1. Ajax Tutorial
  2. Learning Ajax/jQuery

Defining the problem with faceted navigation

Having a page of products and then allowing users to sort those products the way they want (sorted from highest to lowest price), or to use a filter to pick a subset of the products (only those over $60) makes good sense for users. We typically refer to these types of navigation options as “faceted navigation.”

However, faceted navigation can cause problems for search engines because they don’t want to crawl and index all of your different sort orders or all your different filtered versions of your pages. They would end up with many different variants of your pages that are not significantly different from a search engine user experience perspective.

Solutions such as rel=canonical tags and parameters settings in Webmaster Tools have some limitations. For example, rel=canonical tags are considered “hints” by the search engines, and they may not choose to accept them, and even if they are accepted, they do not necessarily keep the search engines from continuing to crawl those pages.

A better solution might be to use JSON and jQuery to implement your faceted navigation so that a new page is not created when a user picks a filter or a sort order. Let’s take a look at how it works.

Using JSON and jQuery to filter on the client side

The main benefit of the implementation discussed below is that a new URL is not created when a user is on a page of yours and applies a filter or sort order. When you use JSON and jQuery, the entire process happens on the client device without involving your web server at all.

When a user initially requests one of the product pages on your web site, the interaction looks like this:

using json on faceted navigation

This transfers the page to the browser the user used to request the page. Now when a user picks a sort order (or filter) on that page, here is what happens:

jquery and faceted navigation diagram

When the user picks one of those options, a jQuery request is made to the JSON data object. Translation: the entire interaction happens within the client’s browser and the sort or filter is applied there. Simply put, the smarts to handle that sort or filter resides entirely within the code on the client device that was transferred with the initial request for the page.

As a result, there is no new page created and no new URL for Google or Bing to crawl. Any concerns about crawl budget or inefficient use of PageRank are completely eliminated. This is great stuff! However, there remain limitations in this implementation.

Specifically, if your list of products spans multiple pages on your site, the sorting and filtering will only be applied to the data set already transferred to the user’s browser with the initial request. In short, you may only be sorting the first page of products, and not across the entire set of products. It’s possible to have the initial JSON data object contain the full set of pages, but this may not be a good idea if the page size ends up being large. In that event, we will need to do a bit more.

What Ajax does for you

Now we are going to dig in slightly deeper and outline how Ajax will allow us to handle sorting, filtering, AND pagination. Warning: There is some tech talk in this section, but I will try to follow each technical explanation with a layman’s explanation about what’s happening.

The conceptual Ajax implementation looks like this:

ajax and faceted navigation diagram

In this structure, we are using an Ajax layer to manage the communications with the web server. Imagine that we have a set of 10 pages, the user has gotten the first page of those 10 on their device and then requests a change to the sort order. The Ajax requests a fresh set of data from the web server for your site, similar to a normal HTML transaction, except that it runs asynchronously in a separate thread.

If you don’t know what that means, the benefit is that the rest of the page can load completely while the process to capture the data that the Ajax will display is running in parallel. This will be things like your main menu, your footer links to related products, and other page elements. This can improve the perceived performance of the page.

When a user selects a different sort order, the code registers an event handler for a given object (e.g. HTML Element or other DOM objects) and then executes an action. The browser will perform the action in a different thread to trigger the event in the main thread when appropriate. This happens without needing to execute a full page refresh, only the content controlled by the Ajax refreshes.

To translate this for the non-technical reader, it just means that we can update the sort order of the page, without needing to redraw the entire page, or change the URL, even in the case of a paginated sequence of pages. This is a benefit because it can be faster than reloading the entire page, and it should make it clear to search engines that you are not trying to get some new page into their index.

Effectively, it does this within the existing Document Object Model (DOM), which you can think of as the basic structure of the documents and a spec for the way the document is accessed and manipulated.

How will Google handle this type of implementation?

For those of you who read Adam Audette’s excellent recent post on the tests his team performed on how Google reads Javascript, you may be wondering if Google will still load all these page variants on the same URL anyway, and if they will not like it.

I had the same question, so I reached out to Google’s Gary Illyes to get an answer. Here is the dialog that transpired:

Eric Enge: I’d like to ask you about using JSON and jQuery to render different sort orders and filters within the same URL. I.e. the user selects a sort order or a filter, and the content is reordered and redrawn on the page on the client site. Hence no new URL would be created. It’s effectively a way of canonicalizing the content, since each variant is a strict subset.

Then there is a second level consideration with this approach, which involves doing the same thing with pagination. I.e. you have 10 pages of products, and users still have sorting and filtering options. In order to support sorting and filtering across the entire 10 page set, you use an Ajax solution, so all of that still renders on one URL.

So, if you are on page 1, and a user executes a sort, they get that all back in that one page. However, to do this right, going to page 2 would also render on the same URL. Effectively, you are taking the 10 page set and rendering it all within one URL. This allows sorting, filtering, and pagination without needing to use canonical, noindex, prev/next, or robots.txt.

If this was not problematic for Google, the only downside is that it makes the pagination not visible to Google. Does that make sense, or is it a bad idea?

Gary Illyes
: If you have one URL only, and people have to click on stuff to see different sort orders or filters for the exact same content under that URL, then typically we would only see the default content.

If you don’t have pagination information, that’s not a problem, except we might not see the content on the other pages that are not contained in the HTML within the initial page load. The meaning of rel-prev/next is to funnel the signals from child pages (page 2, 3, 4, etc.) to the group of pages as a collection, or to the view-all page if you have one. If you simply choose to render those paginated versions on a single URL, that will have the same impact from a signals point of view, meaning that all signals will go to a single entity, rather than distributed to several URLs.


Keep in mind, the reason why Google implemented tags like rel=canonical, NoIndex, rel=prev/next, and others is to reduce their crawling burden and overall page bloat and to help focus signals to incoming pages in the best way possible. The use of Ajax/JSON/jQuery as outlined above does this simply and elegantly.

On most e-commerce sites, there are many different “facets” of how a user might want to sort and filter a list of products. With the Ajax-style implementation, this can be done without creating new pages. The end users get the control they are looking for, the search engines don’t have to deal with excess pages they don’t want to see, and signals in to the site (such as links) are focused on the main pages where they should be.

The one downside is that Google may not see all the content when it is paginated. A site that has lots of very similar products in a paginated list does not have to worry too much about Google seeing all the additional content, so this isn’t much of a concern if your incremental pages contain more of what’s on the first page. Sites that have content that is materially different on the additional pages, however, might not want to use this approach.

These solutions do require Javascript coding expertise but are not really that complex. If you have the ability to consider a path like this, you can free yourself from trying to understand the various tags, their limitations, and whether or not they truly accomplish what you are looking for.

Credit: Thanks for Clark Lefavour for providing a review of the above for technical correctness.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

​The 3 Most Common SEO Problems on Listings Sites

Posted by Dom-Woodman

Listings sites have a very specific set of search problems that you don’t run into everywhere else. In the day I’m one of Distilled’s analysts, but by night I run a job listings site, teflSearch. So, for my first Moz Blog post I thought I’d cover the three search problems with listings sites that I spent far too long agonising about.

Quick clarification time: What is a listings site (i.e. will this post be useful for you)?

The classic listings site is Craigslist, but plenty of other sites act like listing sites:

  • Job sites like Monster
  • E-commerce sites like Amazon
  • Matching sites like Spareroom

1. Generating quality landing pages

The landing pages on listings sites are incredibly important. These pages are usually the primary drivers of converting traffic, and they’re usually generated automatically (or are occasionally custom category pages) .

For example, if I search “Jobs in Manchester“, you can see nearly every result is an automatically generated landing page or category page.

There are three common ways to generate these pages (occasionally a combination of more than one is used):

  • Faceted pages: These are generated by facets—groups of preset filters that let you filter the current search results. They usually sit on the left-hand side of the page.
  • Category pages: These pages are listings which have already had a filter applied and can’t be changed. They’re usually custom pages.
  • Free-text search pages: These pages are generated by a free-text search box.

Those definitions are still bit general; let’s clear them up with some examples:

Amazon uses a combination of categories and facets. If you click on browse by department you can see all the category pages. Then on each category page you can see a faceted search. Amazon is so large that it needs both.

Indeed generates its landing pages through free text search, for example if we search for “IT jobs in manchester” it will generate: IT jobs in manchester.

teflSearch generates landing pages using just facets. The jobs in China landing page is simply a facet of the main search page.

Each method has its own search problems when used for generating landing pages, so lets tackle them one by one.


Facets and free text search will typically generate pages with parameters e.g. a search for “dogs” would produce:


But to make the URL user friendly sites will often alter the URLs to display them as folders


These are still just ordinary free text search and facets, the URLs are just user friendly. (They’re a lot easier to work with in robots.txt too!)

Free search (& category) problems

If you’ve decided the base of your search will be a free text search, then we’ll have two major goals:

  • Goal 1: Helping search engines find your landing pages
  • Goal 2: Giving them link equity.


Search engines won’t use search boxes and so the solution to both problems is to provide links to the valuable landing pages so search engines can find them.

There are plenty of ways to do this, but two of the most common are:

  • Category links alongside a search

    Photobucket uses a free text search to generate pages, but if we look at example search for photos of dogs, we can see the categories which define the landing pages along the right-hand side. (This is also an example of URL friendly searches!)

  • Putting the main landing pages in a top-level menu

    Indeed also uses free text to generate landing pages, and they have a browse jobs section which contains the URL structure to allow search engines to find all the valuable landing pages.

Breadcrumbs are also often used in addition to the two above and in both the examples above, you’ll find breadcrumbs that reinforce that hierarchy.

Category (& facet) problems

Categories, because they tend to be custom pages, don’t actually have many search disadvantages. Instead it’s the other attributes that make them more or less desirable. You can create them for the purposes you want and so you typically won’t have too many problems.

However, if you also use a faceted search in each category (like Amazon) to generate additional landing pages, then you’ll run into all the problems described in the next section.

At first facets seem great, an easy way to generate multiple strong relevant landing pages without doing much at all. The problems appear because people don’t put limits on facets.

Lets take the job page on teflSearch. We can see it has 18 facets each with many options. Some of these options will generate useful landing pages:

The China facet in countries will generate “Jobs in China” that’s a useful landing page.

On the other hand, the “Conditional Bonus” facet will generate “Jobs with a conditional bonus,” and that’s not so great.

We can also see that the options within a single facet aren’t always useful. As of writing, I have a single job available in Serbia. That’s not a useful search result, and the poor user engagement combined with the tiny amount of content will be a strong signal to Google that it’s thin content. Depending on the scale of your site it’s very easy to generate a mass of poor-quality landing pages.

Facets generate other problems too. The primary one being they can create a huge amount of duplicate content and pages for search engines to get lost in. This is caused by two things: The first is the sheer number of possibilities they generate, and the second is because selecting facets in different orders creates identical pages with different URLs.

We end up with four goals for our facet-generated landing pages:

  • Goal 1: Make sure our searchable landing pages are actually worth landing on, and that we’re not handing a mass of low-value pages to the search engines.
  • Goal 2: Make sure we don’t generate multiple copies of our automatically generated landing pages.
  • Goal 3: Make sure search engines don’t get caught in the metaphorical plastic six-pack rings of our facets.
  • Goal 4: Make sure our landing pages have strong internal linking.

The first goal needs to be set internally; you’re always going to be the best judge of the number of results that need to present on a page in order for it to be useful to a user. I’d argue you can rarely ever go below three, but it depends both on your business and on how much content fluctuates on your site, as the useful landing pages might also change over time.

We can solve the next three problems as group. There are several possible solutions depending on what skills and resources you have access to; here are two possible solutions:

Category/facet solution 1: Blocking the majority of facets and providing external links
  • Easiest method
  • Good if your valuable category pages rarely change and you don’t have too many of them.
  • Can be problematic if your valuable facet pages change a lot

Nofollow all your facet links, and noindex and block category pages which aren’t valuable or are deeper than x facet/folder levels into your search using robots.txt.

You set x by looking at where your useful facet pages exist that have search volume. So, for example, if you have three facets for televisions: manufacturer, size, and resolution, and even combinations of all three have multiple results and search volume, then you could set you index everything up to three levels.

On the other hand, if people are searching for three levels (e.g. “Samsung 42″ Full HD TV”) but you only have one or two results for three-level facets, then you’d be better off indexing two levels and letting the product pages themselves pick up long-tail traffic for the third level.

If you have valuable facet pages that exist deeper than 1 facet or folder into your search, then this creates some duplicate content problems dealt with in the aside “Indexing more than 1 level of facets” below.)

The immediate problem with this set-up, however, is that in one stroke we’ve removed most of the internal links to our category pages, and by no-following all the facet links, search engines won’t be able to find your valuable category pages.

In order re-create the linking, you can add a top level drop down menu to your site containing the most valuable category pages, add category links elsewhere on the page, or create a separate part of the site with links to the valuable category pages.

The top level drop down menu you can see on teflSearch (it’s the search jobs menu), the other two examples are demonstrated in Photobucket and Indeed respectively in the previous section.

The big advantage for this method is how quick it is to implement, it doesn’t require any fiddly internal logic and adding an extra menu option is usually minimal effort.

Category/facet solution 2: Creating internal logic to work with the facets

  • Requires new internal logic
  • Works for large numbers of category pages with value that can change rapidly

There are four parts to the second solution:

  1. Select valuable facet categories and allow those links to be followed. No-follow the rest.
  2. No-index all pages that return a number of items below the threshold for a useful landing page
  3. No-follow all facets on pages with a search depth greater than x.
  4. Block all facet pages deeper than x level in robots.txt

As with the last solution, x is set by looking at where your useful facet pages exist that have search volume (full explanation in the first solution), and if you’re indexing more than one level you’ll need to check out the aside below to see how to deal with the duplicate content it generates.

Aside: Indexing more than one level of facets

If you want more than one level of facets to be indexable, then this will create certain problems.

Suppose you have a facet for size:

  • Televisions: Size: 46″, 44″, 42″

And want to add a brand facet:

  • Televisions: Brand: Samsung, Panasonic, Sony

This will create duplicate content because the search engines will be able to follow your facets in both orders, generating:

  • Television – 46″ – Samsung
  • Television – Samsung – 46″

You’ll have to either rel canonical your duplicate pages with another rule or set up your facets so they create a single unique URL.

You also need to be aware that each followable facet you add will multiply with each other followable facet and it’s very easy to generate a mass of pages for search engines to get stuck in. Depending on your setup you might need to block more paths in robots.txt or set-up more logic to prevent them being followed.

Letting search engines index more than one level of facets adds a lot of possible problems; make sure you’re keeping track of them.

2. User-generated content cannibalization

This is a common problem for listings sites (assuming they allow user generated content). If you’re reading this as an e-commerce site who only lists their own products, you can skip this one.

As we covered in the first area, category pages on listings sites are usually the landing pages aiming for the valuable search terms, but as your users start generating pages they can often create titles and content that cannibalise your landing pages.

Suppose you’re a job site with a category page for PHP Jobs in Greater Manchester. If a recruiter then creates a job advert for PHP Jobs in Greater Manchester for the 4 positions they currently have, you’ve got a duplicate content problem.

This is less of a problem when your site is large and your categories mature, it will be obvious to any search engine which are your high value category pages, but at the start where you’re lacking authority and individual listings might contain more relevant content than your own search pages this can be a problem.

Solution 1: Create structured titles

Set the <title> differently than the on-page title. Depending on variables you have available to you can set the title tag programmatically without changing the page title using other information given by the user.

For example, on our imaginary job site, suppose the recruiter also provided the following information in other fields:

  • The no. of positions: 4
  • The primary area: PHP Developer
  • The name of the recruiting company: ABC Recruitment
  • Location: Manchester

We could set the <title> pattern to be: *No of positions* *The primary area* with *recruiter name* in *Location* which would give us:

4 PHP Developers with ABC Recruitment in Manchester

Setting a <title> tag allows you to target long-tail traffic by constructing detailed descriptive titles. In our above example, imagine the recruiter had specified “Castlefield, Manchester” as the location.

All of a sudden, you’ve got a perfect opportunity to pick up long-tail traffic for people searching in Castlefield in Manchester.

On the downside, you lose the ability to pick up long-tail traffic where your users have chosen keywords you wouldn’t have used.

For example, suppose Manchester has a jobs program called “Green Highway.” A job advert title containing “Green Highway” might pick up valuable long-tail traffic. Being able to discover this, however, and find a way to fit it into a dynamic title is very hard.

Solution 2: Use regex to noindex the offending pages

Perform a regex (or string contains) search on your listings titles and no-index the ones which cannabalise your main category pages.

If it’s not possible to construct titles with variables or your users provide a lot of additional long-tail traffic with their own titles, then is a great option. On the downside, you miss out on possible structured long-tail traffic that you might’ve been able to aim for.

Solution 3: De-index all your listings

It may seem rash, but if you’re a large site with a huge number of very similar or low-content listings, you might want to consider this, but there is no common standard. Some sites like Indeed choose to no-index all their job adverts, whereas some other sites like Craigslist index all their individual listings because they’ll drive long tail traffic.

Don’t de-index them all lightly!

3. Constantly expiring content

Our third and final problem is that user-generated content doesn’t last forever. Particularly on listings sites, it’s constantly expiring and changing.

For most use cases I’d recommend 301’ing expired content to a relevant category page, with a message triggered by the redirect notifying the user of why they’ve been redirected. It typically comes out as the best combination of search and UX.

For more information or advice on how to deal with the edge cases, there’s a previous Moz blog post on how to deal with expired content which I think does an excellent job of covering this area.


In summary, if you’re working with listings sites, all three of the following need to be kept in mind:

  • How are the landing pages generated? If they’re generated using free text or facets have the potential problems been solved?
  • Is user generated content cannibalising the main landing pages?
  • How has constantly expiring content been dealt with?

Good luck listing, and if you’ve had any other tricky problems or solutions you’ve come across working on listings sites lets chat about them in the comments below!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

Technical Site Audit Checklist: 2015 Edition

Posted by GeoffKenyon

Back in 2011, I wrote a technical site audit checklist, and while it was thorough, there have been a lot of additions to what is encompassed in a site audit. I have gone through and updated that old checklist for 2015. Some of the biggest changes were the addition of sections for mobile, international, and site speed.

This checklist should help you put together a thorough site audit and determine what is holding back the organic performance of your site. At the end of your audit, don’t write a document that says what’s wrong with the website. Instead, create a document that says what needs to be done. Then explain why these actions need to be taken and why they are important. What I’ve found to really helpful is to provide a prioritized list along with your document of all the actions that you would like them to implement. This list can be handed off to a dev or content team to be implemented easily. These teams can refer to your more thorough document as needed.

Quick overview

Check indexed pages  
  • Do a site: search.
  • How many pages are returned? (This can be way off so don’t put too much stock in this).
  • Is the homepage showing up as the first result? 
  • If the homepage isn’t showing up as the first result, there could be issues, like a penalty or poor site architecture/internal linking, affecting the site. This may be less of a concern as Google’s John Mueller recently said that your homepage doesn’t need to be listed first.

Review the number of organic landing pages in Google Analytics

  • Does this match with the number of results in a site: search?
  • This is often the best view of how many pages are in a search engine’s index that search engines find valuable.

Search for the brand and branded terms

  • Is the homepage showing up at the top, or are correct pages showing up?
  • If the proper pages aren’t showing up as the first result, there could be issues, like a penalty, in play.
Check Google’s cache for key pages
  • Is the content showing up?
  • Are navigation links present?
  • Are there links that aren’t visible on the site?
PRO Tip:
Don’t forget to check the text-only version of the cached page. Here is a
bookmarklet to help you do that.

Do a mobile search for your brand and key landing pages

  • Does your listing have the “mobile friendly” label?
  • Are your landing pages mobile friendly?
  • If the answer is no to either of these, it may be costing you organic visits.

On-page optimization

Title tags are optimized
  • Title tags should be optimized and unique.
  • Your brand name should be included in your title tag to improve click-through rates.
  • Title tags are about 55-60 characters (512 pixels) to be fully displayed. You can test here or review title pixel widths in Screaming Frog.
Important pages have click-through rate optimized titles and meta descriptions
  • This will help improve your organic traffic independent of your rankings.
  • You can use SERP Turkey for this.

Check for pages missing page titles and meta descriptions
The on-page content includes the primary keyword phrase multiple times as well as variations and alternate keyword phrases
There is a significant amount of optimized, unique content on key pages
The primary keyword phrase is contained in the H1 tag

Images’ file names and alt text are optimized to include the primary keyword phrase associated with the page.
URLs are descriptive and optimized
  • While it is beneficial to include your keyword phrase in URLs, changing your URLs can negatively impact traffic when you do a 301. As such, I typically recommend optimizing URLs when the current ones are really bad or when you don’t have to change URLs with existing external links.
Clean URLs
  • No excessive parameters or session IDs.
  • URLs exposed to search engines should be static.
Short URLs
  • 115 characters or shorter – this character limit isn’t set in stone, but shorter URLs are better for usability.


Homepage content is optimized
  • Does the homepage have at least one paragraph?
  • There has to be enough content on the page to give search engines an understanding of what a page is about. Based on my experience, I typically recommend at least 150 words.
Landing pages are optimized
  • Do these pages have at least a few paragraphs of content? Is it enough to give search engines an understanding of what the page is about?
  • Is it template text or is it completely unique?
Site contains real and substantial content
  • Is there real content on the site or is the “content” simply a list of links?
Proper keyword targeting
  • Does the intent behind the keyword match the intent of the landing page?
  • Are there pages targeting head terms, mid-tail, and long-tail keywords?
Keyword cannibalization
  • Do a site: search in Google for important keyword phrases.
  • Check for duplicate content/page titles using the Moz Pro Crawl Test.
Content to help users convert exists and is easily accessible to users
  • In addition to search engine driven content, there should be content to help educate users about the product or service.
Content formatting
  • Is the content formatted well and easy to read quickly?
  • Are H tags used?
  • Are images used?
  • Is the text broken down into easy to read paragraphs?
Good headlines on blog posts
  • Good headlines go a long way. Make sure the headlines are well written and draw users in.
Amount of content versus ads
  • Since the implementation of Panda, the amount of ad-space on a page has become important to evaluate.
  • Make sure there is significant unique content above the fold.
  • If you have more ads than unique content, you are probably going to have a problem.

Duplicate content

There should be one URL for each piece of content
  • Do URLs include parameters or tracking code? This will result in multiple URLs for a piece of content.
  • Does the same content reside on completely different URLs? This is often due to products/content being replicated across different categories.
Pro Tip:
Exclude common parameters, such as those used to designate tracking code, in Google Webmaster Tools. Read more at
Search Engine Land.
Do a search to check for duplicate content
  • Take a content snippet, put it in quotes and search for it.
  • Does the content show up elsewhere on the domain?
  • Has it been scraped? If the content has been scraped, you should file a content removal request with Google.
Sub-domain duplicate content
  • Does the same content exist on different sub-domains?
Check for a secure version of the site
  • Does the content exist on a secure version of the site?
Check other sites owned by the company
  • Is the content replicated on other domains owned by the company?
Check for “print” pages
  • If there are “printer friendly” versions of pages, they may be causing duplicate content.

Accessibility & Indexation

Check the robots.txt

  • Has the entire site, or important content been blocked? Is link equity being orphaned due to pages being blocked via the robots.txt?

Turn off JavaScript, cookies, and CSS

Now change your user agent to Googlebot

PRO Tip:
SEO Browser to do a quick spot check.

Check the SEOmoz PRO Campaign

  • Check for 4xx errors and 5xx errors.

XML sitemaps are listed in the robots.txt file

XML sitemaps are submitted to Google/Bing Webmaster Tools

Check pages for meta robots noindex tag

  • Are pages accidentally being tagged with the meta robots noindex command
  • Are there pages that should have the noindex command applied
  • You can check the site quickly via a crawl tool such as Moz or Screaming Frog

Do goal pages have the noindex command applied?

  • This is important to prevent direct organic visits from showing up as goals in analytics

Site architecture and internal linking

Number of links on a page
Vertical linking structures are in place
  • Homepage links to category pages.
  • Category pages link to sub-category and product pages as appropriate.
  • Product pages link to relevant category pages.
Horizontal linking structures are in place
  • Category pages link to other relevant category pages.
  • Product pages link to other relevant product pages.
Links are in content
  • Does not utilize massive blocks of links stuck in the content to do internal linking.
Footer links
  • Does not use a block of footer links instead of proper navigation.
  • Does not link to landing pages with optimized anchors.
Good internal anchor text
Check for broken links
  • Link Checker and Xenu are good tools for this.

Technical issues

Proper use of 301s
  • Are 301s being used for all redirects?
  • If the root is being directed to a landing page, are they using a 301 instead of a 302?
  • Use Live HTTP Headers Firefox plugin to check 301s.
“Bad” redirects are avoided
  • These include 302s, 307s, meta refresh, and JavaScript redirects as they pass little to no value.
  • These redirects can easily be identified with a tool like Screaming Frog.
Redirects point directly to the final URL and do not leverage redirect chains
  • Redirect chains significantly diminish the amount of link equity associated with the final URL.
  • Google has said that they will stop following a redirect chain after several redirects.
Use of JavaScript
  • Is content being served in JavaScript?
  • Are links being served in JavaScript? Is this to do PR sculpting or is it accidental?
Use of iFrames
  • Is content being pulled in via iFrames?
Use of Flash
  • Is the entire site done in Flash, or is Flash used sparingly in a way that doesn’t hinder crawling?
Check for errors in Google Webmaster Tools
  • Google WMT will give you a good list of technical problems that they are encountering on your site (such as: 4xx and 5xx errors, inaccessible pages in the XML sitemap, and soft 404s)
XML Sitemaps  
  • Are XML sitemaps in place?
  • Are XML sitemaps covering for poor site architecture?
  • Are XML sitemaps structured to show indexation problems?
  • Do the sitemaps follow proper XML protocols
Canonical version of the site established through 301s
Canonical version of site is specified in Google Webmaster Tools
Rel canonical link tag is properly implemented across the site
Uses absolute URLs instead of relative URLs
  • This can cause a lot of problems if you have a root domain with secure sections.

Site speed

Review page load time for key pages 

Make sure compression is enabled

Enable caching

Optimize your images for the web

Minify your CSS/JS/HTML

Use a good, fast host
  • Consider using a CDN for your images.

Optimize your images for the web


Review the mobile experience
  • Is there a mobile site set up?
  • If there is, is it a mobile site, responsive design, or dynamic serving?

Make sure analytics are set up if separate mobile content exists

If dynamic serving is being used, make sure the Vary HTTP header is being used

Review how the mobile experience matches up with the intent of mobile visitors
  • Do your mobile visitors have a different intent than desktop based visitors?
Ensure faulty mobile redirects do not exist
  • If your site redirects mobile visitors away from their intended URL (typically to the homepage), you’re likely going to run into issues impacting your mobile organic performance.
Ensure that the relationship between the mobile site and desktop site is established with proper markup
  • If a mobile site (m.) exists, does the desktop equivalent URL point to the mobile version with rel=”alternate”?
  • Does the mobile version canonical to the desktop version?
  • Official documentation.


Review international versions indicated in the URL
  • ex: site.com/uk/ or uk.site.com
Enable country based targeting in webmaster tools
  • If the site is targeted to one specific country, is this specified in webmaster tools? 
  • If the site has international sections, are they targeted in webmaster tools?
Implement hreflang / rel alternate if relevant
If there are multiple versions of a site in the same language (such as /us/ and /uk/, both in English), update the copy been updated so that they are both unique

Make sure the currency reflects the country targeted
Ensure the URL structure is in the native language 
  • Try to avoid having all URLs in the default language


Analytics tracking code is on every page
  • You can check this using the “custom” filter in a Screaming Frog Crawl or by looking for self referrals.
  • Are there pages that should be blocked?
There is only one instance of a GA property on a page
  • Having the same Google Analytics property will create problems with pageview-related metrics such as inflating page views and pages per visit and reducing the bounce rate.
  • It is OK to have multiple GA properties listed, this won’t cause a problem.
Analytics is properly tracking and capturing internal searches

Demographics tracking is set up

Adwords and Adsense are properly linked if you are using these platforms
Internal IP addresses are excluded
UTM Campaign Parameters are used for other marketing efforts
Meta refresh and JavaScript redirects are avoided
  • These can artificially lower bounce rates.
Event tracking is set up for key user interactions

This audit covers the main technical elements of a site and should help you uncover any issues that are holding a site back. As with any project, the deliverable is critical. I’ve found focusing on the solution and impact (business case) is the best approach for site audit reports. While it is important to outline the problems, too much detail here can take away from the recommendations. If you’re looking for more resources on site audits, I recommend the following:

Helpful tools for doing a site audit:

Annie Cushing’s Site Audit
Web Developer Toolbar
User Agent Add-on
Link Checker
SEObook Toolbar
MozBar (Moz’s SEO toolbar)
Screaming Frog
Your own scraper
Inflow’s technical mobile best practices

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from moz.com

The Small Business Brief to the New SEO – SEO Lunch

Full Site – seolunch.org Search Engine Optimization is a multi-faceted process that, when done correctly, can elevate your web presence dramatically. Join Ma…

Reblogged 4 years ago from www.youtube.com