Controlling Search Engine Crawlers for Better Indexation and Rankings – Whiteboard Friday

Posted by randfish

When should you disallow search engines in your robots.txt file, and when should you use meta robots tags in a page header? What about nofollowing links? In today’s Whiteboard Friday, Rand covers these tools and their appropriate use in four situations that SEOs commonly find themselves facing.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to talk about controlling search engine crawlers, blocking bots, sending bots where we want, restricting them from where we don’t want them to go. We’re going to talk a little bit about crawl budget and what you should and shouldn’t have indexed.

As a start, what I want to do is discuss the ways in which we can control robots. Those include the three primary ones: robots.txt, meta robots, and—well, the nofollow tag is a little bit less about controlling bots.

There are a few others that we’re going to discuss as well, including Webmaster Tools (Search Console) and URL status codes. But let’s dive into those first few first.

Robots.txt lives at yoursite.com/robots.txt, it tells crawlers what they should and shouldn’t access, it doesn’t always get respected by Google and Bing. So a lot of folks when you say, “hey, disallow this,” and then you suddenly see those URLs popping up and you’re wondering what’s going on, look—Google and Bing oftentimes think that they just know better. They think that maybe you’ve made a mistake, they think “hey, there’s a lot of links pointing to this content, there’s a lot of people who are visiting and caring about this content, maybe you didn’t intend for us to block it.” The more specific you get about an individual URL, the better they usually are about respecting it. The less specific, meaning the more you use wildcards or say “everything behind this entire big directory,” the worse they are about necessarily believing you.

Meta robots—a little different—that lives in the headers of individual pages, so you can only control a single page with a meta robots tag. That tells the engines whether or not they should keep a page in the index, and whether they should follow the links on that page, and it’s usually a lot more respected, because it’s at an individual-page level; Google and Bing tend to believe you about the meta robots tag.

And then the nofollow tag, that lives on an individual link on a page. It doesn’t tell engines where to crawl or not to crawl. All it’s saying is whether you editorially vouch for a page that is being linked to, and whether you want to pass the PageRank and link equity metrics to that page.

Interesting point about meta robots and robots.txt working together (or not working together so well)—many, many folks in the SEO world do this and then get frustrated.

What if, for example, we take a page like “blogtest.html” on our domain and we say “all user agents, you are not allowed to crawl blogtest.html. Okay—that’s a good way to keep that page away from being crawled, but just because something is not crawled doesn’t necessarily mean it won’t be in the search results.

So then we have our SEO folks go, “you know what, let’s make doubly sure that doesn’t show up in search results; we’ll put in the meta robots tag:”

<meta name="robots" content="noindex, follow">

So, “noindex, follow” tells the search engine crawler they can follow the links on the page, but they shouldn’t index this particular one.

Then, you go and run a search for “blog test” in this case, and everybody on the team’s like “What the heck!? WTF? Why am I seeing this page show up in search results?”

The answer is, you told the engines that they couldn’t crawl the page, so they didn’t. But they are still putting it in the results. They’re actually probably not going to include a meta description; they might have something like “we can’t include a meta description because of this site’s robots.txt file.” The reason it’s showing up is because they can’t see the noindex; all they see is the disallow.

So, if you want something truly removed, unable to be seen in search results, you can’t just disallow a crawler. You have to say meta “noindex” and you have to let them crawl it.

So this creates some complications. Robots.txt can be great if we’re trying to save crawl bandwidth, but it isn’t necessarily ideal for preventing a page from being shown in the search results. I would not recommend, by the way, that you do what we think Twitter recently tried to do, where they tried to canonicalize www and non-www by saying “Google, don’t crawl the www version of twitter.com.” What you should be doing is rel canonical-ing or using a 301.

Meta robots—that can allow crawling and link-following while disallowing indexation, which is great, but it requires crawl budget and you can still conserve indexing.

The nofollow tag, generally speaking, is not particularly useful for controlling bots or conserving indexation.

Webmaster Tools (now Google Search Console) has some special things that allow you to restrict access or remove a result from the search results. For example, if you have 404’d something or if you’ve told them not to crawl something but it’s still showing up in there, you can manually say “don’t do that.” There are a few other crawl protocol things that you can do.

And then URL status codes—these are a valid way to do things, but they’re going to obviously change what’s going on on your pages, too.

If you’re not having a lot of luck using a 404 to remove something, you can use a 410 to permanently remove something from the index. Just be aware that once you use a 410, it can take a long time if you want to get that page re-crawled or re-indexed, and you want to tell the search engines “it’s back!” 410 is permanent removal.

301—permanent redirect, we’ve talked about those here—and 302, temporary redirect.

Now let’s jump into a few specific use cases of “what kinds of content should and shouldn’t I allow engines to crawl and index” in this next version…

[Rand moves at superhuman speed to erase the board and draw part two of this Whiteboard Friday. Seriously, we showed Roger how fast it was, and even he was impressed.]

Four crawling/indexing problems to solve

So we’ve got these four big problems that I want to talk about as they relate to crawling and indexing.

1. Content that isn’t ready yet

The first one here is around, “If I have content of quality I’m still trying to improve—it’s not yet ready for primetime, it’s not ready for Google, maybe I have a bunch of products and I only have the descriptions from the manufacturer and I need people to be able to access them, so I’m rewriting the content and creating unique value on those pages… they’re just not ready yet—what should I do with those?”

My options around crawling and indexing? If I have a large quantity of those—maybe thousands, tens of thousands, hundreds of thousands—I would probably go the robots.txt route. I’d disallow those pages from being crawled, and then eventually as I get (folder by folder) those sets of URLs ready, I can then allow crawling and maybe even submit them to Google via an XML sitemap.

If I’m talking about a small quantity—a few dozen, a few hundred pages—well, I’d probably just use the meta robots noindex, and then I’d pull that noindex off of those pages as they are made ready for Google’s consumption. And then again, I would probably use the XML sitemap and start submitting those once they’re ready.

2. Dealing with duplicate or thin content

What about, “Should I noindex, nofollow, or potentially disallow crawling on largely duplicate URLs or thin content?” I’ve got an example. Let’s say I’m an ecommerce shop, I’m selling this nice Star Wars t-shirt which I think is kind of hilarious, so I’ve got starwarsshirt.html, and it links out to a larger version of an image, and that’s an individual HTML page. It links out to different colors, which change the URL of the page, so I have a gray, blue, and black version. Well, these four pages are really all part of this same one, so I wouldn’t recommend disallowing crawling on these, and I wouldn’t recommend noindexing them. What I would do there is a rel canonical.

Remember, rel canonical is one of those things that can be precluded by disallowing. So, if I were to disallow these from being crawled, Google couldn’t see the rel canonical back, so if someone linked to the blue version instead of the default version, now I potentially don’t get link credit for that. So what I really want to do is use the rel canonical, allow the indexing, and allow it to be crawled. If you really feel like it, you could also put a meta “noindex, follow” on these pages, but I don’t really think that’s necessary, and again that might interfere with the rel canonical.

3. Passing link equity without appearing in search results

Number three: “If I want to pass link equity (or at least crawling) through a set of pages without those pages actually appearing in search results—so maybe I have navigational stuff, ways that humans are going to navigate through my pages, but I don’t need those appearing in search results—what should I use then?”

What I would say here is, you can use the meta robots to say “don’t index the page, but do follow the links that are on that page.” That’s a pretty nice, handy use case for that.

Do NOT, however, disallow those in robots.txt—many, many folks make this mistake. What happens if you disallow crawling on those, Google can’t see the noindex. They don’t know that they can follow it. Granted, as we talked about before, sometimes Google doesn’t obey the robots.txt, but you can’t rely on that behavior. Trust that the disallow in robots.txt will prevent them from crawling. So I would say, the meta robots “noindex, follow” is the way to do this.

4. Search results-type pages

Finally, fourth, “What should I do with search results-type pages?” Google has said many times that they don’t like your search results from your own internal engine appearing in their search results, and so this can be a tricky use case.

Sometimes a search result page—a page that lists many types of results that might come from a database of types of content that you’ve got on your site—could actually be a very good result for a searcher who is looking for a wide variety of content, or who wants to see what you have on offer. Yelp does this: When you say, “I’m looking for restaurants in Seattle, WA,” they’ll give you what is essentially a list of search results, and Google does want those to appear because that page provides a great result. But you should be doing what Yelp does there, and make the most common or popular individual sets of those search results into category-style pages. A page that provides real, unique value, that’s not just a list of search results, that is more of a landing page than a search results page.

However, that being said, if you’ve got a long tail of these, or if you’d say “hey, our internal search engine, that’s really for internal visitors only—it’s not useful to have those pages show up in search results, and we don’t think we need to make the effort to make those into category landing pages.” Then you can use the disallow in robots.txt to prevent those.

Just be cautious here, because I have sometimes seen an over-swinging of the pendulum toward blocking all types of search results, and sometimes that can actually hurt your SEO and your traffic. Sometimes those pages can be really useful to people. So check your analytics, and make sure those aren’t valuable pages that should be served up and turned into landing pages. If you’re sure, then go ahead and disallow all your search results-style pages. You’ll see a lot of sites doing this in their robots.txt file.

That being said, I hope you have some great questions about crawling and indexing, controlling robots, blocking robots, allowing robots, and I’ll try and tackle those in the comments below.

We’ll look forward to seeing you again next week for another edition of Whiteboard Friday. Take care!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

​The 3 Most Common SEO Problems on Listings Sites

Posted by Dom-Woodman

Listings sites have a very specific set of search problems that you don’t run into everywhere else. In the day I’m one of Distilled’s analysts, but by night I run a job listings site, teflSearch. So, for my first Moz Blog post I thought I’d cover the three search problems with listings sites that I spent far too long agonising about.

Quick clarification time: What is a listings site (i.e. will this post be useful for you)?

The classic listings site is Craigslist, but plenty of other sites act like listing sites:

  • Job sites like Monster
  • E-commerce sites like Amazon
  • Matching sites like Spareroom

1. Generating quality landing pages

The landing pages on listings sites are incredibly important. These pages are usually the primary drivers of converting traffic, and they’re usually generated automatically (or are occasionally custom category pages) .

For example, if I search “Jobs in Manchester“, you can see nearly every result is an automatically generated landing page or category page.

There are three common ways to generate these pages (occasionally a combination of more than one is used):

  • Faceted pages: These are generated by facets—groups of preset filters that let you filter the current search results. They usually sit on the left-hand side of the page.
  • Category pages: These pages are listings which have already had a filter applied and can’t be changed. They’re usually custom pages.
  • Free-text search pages: These pages are generated by a free-text search box.

Those definitions are still bit general; let’s clear them up with some examples:

Amazon uses a combination of categories and facets. If you click on browse by department you can see all the category pages. Then on each category page you can see a faceted search. Amazon is so large that it needs both.

Indeed generates its landing pages through free text search, for example if we search for “IT jobs in manchester” it will generate: IT jobs in manchester.

teflSearch generates landing pages using just facets. The jobs in China landing page is simply a facet of the main search page.

Each method has its own search problems when used for generating landing pages, so lets tackle them one by one.


Aside

Facets and free text search will typically generate pages with parameters e.g. a search for “dogs” would produce:

www.mysite.com?search=dogs

But to make the URL user friendly sites will often alter the URLs to display them as folders

www.mysite.com/results/dogs/

These are still just ordinary free text search and facets, the URLs are just user friendly. (They’re a lot easier to work with in robots.txt too!)


Free search (& category) problems

If you’ve decided the base of your search will be a free text search, then we’ll have two major goals:

  • Goal 1: Helping search engines find your landing pages
  • Goal 2: Giving them link equity.

Solution

Search engines won’t use search boxes and so the solution to both problems is to provide links to the valuable landing pages so search engines can find them.

There are plenty of ways to do this, but two of the most common are:

  • Category links alongside a search

    Photobucket uses a free text search to generate pages, but if we look at example search for photos of dogs, we can see the categories which define the landing pages along the right-hand side. (This is also an example of URL friendly searches!)

  • Putting the main landing pages in a top-level menu

    Indeed also uses free text to generate landing pages, and they have a browse jobs section which contains the URL structure to allow search engines to find all the valuable landing pages.

Breadcrumbs are also often used in addition to the two above and in both the examples above, you’ll find breadcrumbs that reinforce that hierarchy.

Category (& facet) problems

Categories, because they tend to be custom pages, don’t actually have many search disadvantages. Instead it’s the other attributes that make them more or less desirable. You can create them for the purposes you want and so you typically won’t have too many problems.

However, if you also use a faceted search in each category (like Amazon) to generate additional landing pages, then you’ll run into all the problems described in the next section.

At first facets seem great, an easy way to generate multiple strong relevant landing pages without doing much at all. The problems appear because people don’t put limits on facets.

Lets take the job page on teflSearch. We can see it has 18 facets each with many options. Some of these options will generate useful landing pages:

The China facet in countries will generate “Jobs in China” that’s a useful landing page.

On the other hand, the “Conditional Bonus” facet will generate “Jobs with a conditional bonus,” and that’s not so great.

We can also see that the options within a single facet aren’t always useful. As of writing, I have a single job available in Serbia. That’s not a useful search result, and the poor user engagement combined with the tiny amount of content will be a strong signal to Google that it’s thin content. Depending on the scale of your site it’s very easy to generate a mass of poor-quality landing pages.

Facets generate other problems too. The primary one being they can create a huge amount of duplicate content and pages for search engines to get lost in. This is caused by two things: The first is the sheer number of possibilities they generate, and the second is because selecting facets in different orders creates identical pages with different URLs.

We end up with four goals for our facet-generated landing pages:

  • Goal 1: Make sure our searchable landing pages are actually worth landing on, and that we’re not handing a mass of low-value pages to the search engines.
  • Goal 2: Make sure we don’t generate multiple copies of our automatically generated landing pages.
  • Goal 3: Make sure search engines don’t get caught in the metaphorical plastic six-pack rings of our facets.
  • Goal 4: Make sure our landing pages have strong internal linking.

The first goal needs to be set internally; you’re always going to be the best judge of the number of results that need to present on a page in order for it to be useful to a user. I’d argue you can rarely ever go below three, but it depends both on your business and on how much content fluctuates on your site, as the useful landing pages might also change over time.

We can solve the next three problems as group. There are several possible solutions depending on what skills and resources you have access to; here are two possible solutions:

Category/facet solution 1: Blocking the majority of facets and providing external links
  • Easiest method
  • Good if your valuable category pages rarely change and you don’t have too many of them.
  • Can be problematic if your valuable facet pages change a lot

Nofollow all your facet links, and noindex and block category pages which aren’t valuable or are deeper than x facet/folder levels into your search using robots.txt.

You set x by looking at where your useful facet pages exist that have search volume. So, for example, if you have three facets for televisions: manufacturer, size, and resolution, and even combinations of all three have multiple results and search volume, then you could set you index everything up to three levels.

On the other hand, if people are searching for three levels (e.g. “Samsung 42″ Full HD TV”) but you only have one or two results for three-level facets, then you’d be better off indexing two levels and letting the product pages themselves pick up long-tail traffic for the third level.

If you have valuable facet pages that exist deeper than 1 facet or folder into your search, then this creates some duplicate content problems dealt with in the aside “Indexing more than 1 level of facets” below.)

The immediate problem with this set-up, however, is that in one stroke we’ve removed most of the internal links to our category pages, and by no-following all the facet links, search engines won’t be able to find your valuable category pages.

In order re-create the linking, you can add a top level drop down menu to your site containing the most valuable category pages, add category links elsewhere on the page, or create a separate part of the site with links to the valuable category pages.

The top level drop down menu you can see on teflSearch (it’s the search jobs menu), the other two examples are demonstrated in Photobucket and Indeed respectively in the previous section.

The big advantage for this method is how quick it is to implement, it doesn’t require any fiddly internal logic and adding an extra menu option is usually minimal effort.

Category/facet solution 2: Creating internal logic to work with the facets

  • Requires new internal logic
  • Works for large numbers of category pages with value that can change rapidly

There are four parts to the second solution:

  1. Select valuable facet categories and allow those links to be followed. No-follow the rest.
  2. No-index all pages that return a number of items below the threshold for a useful landing page
  3. No-follow all facets on pages with a search depth greater than x.
  4. Block all facet pages deeper than x level in robots.txt

As with the last solution, x is set by looking at where your useful facet pages exist that have search volume (full explanation in the first solution), and if you’re indexing more than one level you’ll need to check out the aside below to see how to deal with the duplicate content it generates.


Aside: Indexing more than one level of facets

If you want more than one level of facets to be indexable, then this will create certain problems.

Suppose you have a facet for size:

  • Televisions: Size: 46″, 44″, 42″

And want to add a brand facet:

  • Televisions: Brand: Samsung, Panasonic, Sony

This will create duplicate content because the search engines will be able to follow your facets in both orders, generating:

  • Television – 46″ – Samsung
  • Television – Samsung – 46″

You’ll have to either rel canonical your duplicate pages with another rule or set up your facets so they create a single unique URL.

You also need to be aware that each followable facet you add will multiply with each other followable facet and it’s very easy to generate a mass of pages for search engines to get stuck in. Depending on your setup you might need to block more paths in robots.txt or set-up more logic to prevent them being followed.

Letting search engines index more than one level of facets adds a lot of possible problems; make sure you’re keeping track of them.


2. User-generated content cannibalization

This is a common problem for listings sites (assuming they allow user generated content). If you’re reading this as an e-commerce site who only lists their own products, you can skip this one.

As we covered in the first area, category pages on listings sites are usually the landing pages aiming for the valuable search terms, but as your users start generating pages they can often create titles and content that cannibalise your landing pages.

Suppose you’re a job site with a category page for PHP Jobs in Greater Manchester. If a recruiter then creates a job advert for PHP Jobs in Greater Manchester for the 4 positions they currently have, you’ve got a duplicate content problem.

This is less of a problem when your site is large and your categories mature, it will be obvious to any search engine which are your high value category pages, but at the start where you’re lacking authority and individual listings might contain more relevant content than your own search pages this can be a problem.

Solution 1: Create structured titles

Set the <title> differently than the on-page title. Depending on variables you have available to you can set the title tag programmatically without changing the page title using other information given by the user.

For example, on our imaginary job site, suppose the recruiter also provided the following information in other fields:

  • The no. of positions: 4
  • The primary area: PHP Developer
  • The name of the recruiting company: ABC Recruitment
  • Location: Manchester

We could set the <title> pattern to be: *No of positions* *The primary area* with *recruiter name* in *Location* which would give us:

4 PHP Developers with ABC Recruitment in Manchester

Setting a <title> tag allows you to target long-tail traffic by constructing detailed descriptive titles. In our above example, imagine the recruiter had specified “Castlefield, Manchester” as the location.

All of a sudden, you’ve got a perfect opportunity to pick up long-tail traffic for people searching in Castlefield in Manchester.

On the downside, you lose the ability to pick up long-tail traffic where your users have chosen keywords you wouldn’t have used.

For example, suppose Manchester has a jobs program called “Green Highway.” A job advert title containing “Green Highway” might pick up valuable long-tail traffic. Being able to discover this, however, and find a way to fit it into a dynamic title is very hard.

Solution 2: Use regex to noindex the offending pages

Perform a regex (or string contains) search on your listings titles and no-index the ones which cannabalise your main category pages.

If it’s not possible to construct titles with variables or your users provide a lot of additional long-tail traffic with their own titles, then is a great option. On the downside, you miss out on possible structured long-tail traffic that you might’ve been able to aim for.

Solution 3: De-index all your listings

It may seem rash, but if you’re a large site with a huge number of very similar or low-content listings, you might want to consider this, but there is no common standard. Some sites like Indeed choose to no-index all their job adverts, whereas some other sites like Craigslist index all their individual listings because they’ll drive long tail traffic.

Don’t de-index them all lightly!

3. Constantly expiring content

Our third and final problem is that user-generated content doesn’t last forever. Particularly on listings sites, it’s constantly expiring and changing.

For most use cases I’d recommend 301’ing expired content to a relevant category page, with a message triggered by the redirect notifying the user of why they’ve been redirected. It typically comes out as the best combination of search and UX.

For more information or advice on how to deal with the edge cases, there’s a previous Moz blog post on how to deal with expired content which I think does an excellent job of covering this area.

Summary

In summary, if you’re working with listings sites, all three of the following need to be kept in mind:

  • How are the landing pages generated? If they’re generated using free text or facets have the potential problems been solved?
  • Is user generated content cannibalising the main landing pages?
  • How has constantly expiring content been dealt with?

Good luck listing, and if you’ve had any other tricky problems or solutions you’ve come across working on listings sites lets chat about them in the comments below!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Leveraging Panda to Get Out of Product Feed Jail

Posted by MichaelC

This is a story about Panda, customer service, and differentiating your store from others selling the same products.

Many e-commerce websites get the descriptions, specifications, and imagery for products they sell from feeds or databases provided by the
manufacturers. The manufacturers might like this, as they control how their product is described and shown. However, it does their retailers
no good when they are trying to rank for searches for those products and they’ve got the exact same content as every other retailer. If the content
in the feed is thin, then you’ll have pages with…well….thin content. And if there’s a lot of content for the products, then you’ll have giant blocks of content that
Panda might spot as being the same as they’ve seen on many other sites. To throw salt on the wound, if the content is really crappy, badly written,
or downright wrong, then the retailers’ sites will look low-quality to Panda and users as well.

Many webmasters see Panda as a type of Google penalty—but it’s not, really. Panda is a collection of measurements Google
is taking of your web pages to try and give your pages a rating on how happy users are likely to be with those pages.
It’s not perfect, but then again—neither is your website.

Many SEO folks (including me) tend to focus on the kinds of tactical and structural things you can do to make Panda see
your web pages as higher quality: things like adding big, original images, interactive content like videos and maps, and
lots and lots and lots and lots of text. These are all good tactics, but let’s step back a bit and look at a specific
example to see WHY Panda was built to do this, and from that, what we can do as retailers to enrich the content we have
for e-commerce products where our hands are a bit tied—we’re getting a feed of product info from the manufacturers, the same
as every other retailer of those products.

I’m going to use a real-live example that I suffered through about a month ago. I was looking for a replacement sink
stopper for a bathroom sink. I knew the brand, but there wasn’t a part number on the part I needed to replace. After a few Google
searches, I think I’ve found it on Amazon:


Don’t you wish online shopping was always this exciting?

What content actually teaches the customer

All righty… my research has shown me that there are standard sizes for plug stoppers. In fact, I initially ordered a
“universal fit sink stopper.” Which didn’t fit. Then I found 3 standard diameters, and 5 or 6 standard lengths.
No problem…I possess that marvel of modern tool chests, a tape measure…so I measure the part I have that I need to replace. I get about 1.5″ x 5″.
So let’s scroll down to the product details to see if it’s a match:

Kohler sink stopper product info from hell

Whoa. 1.2 POUNDS? This sink stopper must be made of
Ununoctium.
The one in my hand weighs about an ounce. But the dimensions
are way off as well: a 2″ diameter stopper isn’t going to fit, and mine needs to be at least an inch longer.

I scroll down to the product description…maybe there’s more detail there, maybe the 2″ x 2″ is the box or something.

I've always wanted a sink stopper designed for long long

Well, that’s less than helpful, with a stupid typo AND incorrect capitalization AND a missing period at the end.
Doesn’t build confidence in the company’s quality control.

Looking at the additional info section, maybe this IS the right part…the weight quoted in there is about right:

Maybe this is my part after all

Where else customers look for answers

Next I looked at the questions and answers bit, which convinced me that it PROBABLY was the right part:

Customers will answer the question if the retailer won't...sometimes.

If I was smart, I would have covered my bets by doing what a bunch of other customers also did: buy a bunch of different parts,
and surely one of them will fit. Could there
possibly was a clearer signal that the product info was lacking than this?

If you can't tell which one to buy, buy them all!

In this case, that was probably smarter than spending another 1/2 hour of my time snooping around online. But in general, people
aren’t going to be willing to buy THREE of something just to make sure they get the right one. This cheap part was an exception.

So, surely SOMEONE out there has the correct dimensions of this part on their site—so I searched for the part number I saw on the Amazon
listing. But as it turned out, that crappy description and wrong weight and dimensions were on every site I found…because they came from
the manufacturer.

Better Homes and Gardens...but not better description.

A few of the sites had edited out the “designed for long long” bit, but apart from that, they were all the same.

What sucks for the customer is an opportunity for you

Many, many retailers are in this same boat—they get their product info from the manufacturer, and if the data sucks in their feed,
it’ll suck on their site. Your page looks weak to both users and to Panda, and it looks the same as everybody else’s page for that product…to
both users and to Panda. So (a) you won’t rank very well, and (b) if you DO manage to get a customer to that page, it’s not as likely to convert
to a sale.

What can you do to improve on this? Here’s a few tactics to consider.

1. Offer your own additional description and comments

Add a new field to your CMS for your own write-ups on products, and when you discover issues like the above, you can add your own information—and
make it VERY clear what’s the manufacturer’s stock info and what you’ve added (that’s VALUE-ADDED) as well. My client
Sports Car Market magazine does this with their collector car auction reports in their printed magazine:
they list the auction company’s description of the car, then their reporter’s assessment of the car. This is why I buy the magazine and not the auction catalog.

2. Solicit questions

Be sure you solicit questions on every product page—your customers will tell you what’s wrong or what important information is missing. Sure,
you’ve got millions of products to deal with, but what the customers are asking about (and your sales volume of course) will help you prioritize as well as
find the problems opportunities.

Amazon does a great job of enabling this, but in this case, I used the Feedback option to update the product info,
and got back a total
bull-twaddle email from the seller about how the dimensions are in the product description thank you for shopping with us, bye-bye.
I tried to help them, for free, and they shat on me.

3. But I don’t get enough traffic to get the questions

Don’t have enough site volume to get many customer requests? No problem, the information is out there for you on Amazon :-).
Take your most important products, and look them up on Amazon, and see what questions are being asked—then answer those ONLY on your own site.

4. What fits with what?

Create fitment/cross-reference charts for products.
You probably have in-house knowledge of what products fit/are compatible with what other products.
Just because YOU know a certain accessory fits all makes and models, because it’s some industry-standard size, doesn’t mean that the customer knows this.

If there’s a particular way to measure a product so you get the correct size, explain that (with photos of what you’re measuring, if it seems
at all complicated). I’m getting a new front door for my house. 

  • How big is the door I need? 
  • Do I measure the width of the door itself, or the width of the
    opening (probably 1/8″ wider)? 
  • Or if it’s pre-hung, do I measure the frame too? Is it inswing or outswing?
  • Right or left hinged…am I supposed to
    look at the door from inside the house or outside to figure this out? 

If you’re a door seller, this is all obvious stuff,
but it wasn’t obvious to me, and NOT having the info on a website means (a) I feel stupid, and (b) I’m going to look at your competitors’ sites
to see if they will explain it…and maybe I’ll find a door on THEIR site I like better anyway.

Again, prioritize based on customer requests.

5. Provide your own photos and measurements

If examples of the physical products are available to you, take your own photos, and take your own measurements.

In fact, take your OWN photo of YOURSELF taking the measurement—so the user can see exactly what part of the product you’re measuring.
In the photo below, you can see that I’m measuring the diameter of the stopper, NOT the hole in the sink, NOT the stopper plus the rubber gasket.
And no, Kohler, it’s NOT 2″ in diameter…by a long shot.

Don't just give the measurements, SHOW the measurements

Keep in mind, you shouldn’t have to tear apart your CMS to do any of this. You can put your additions in a new database table, just tied to the
core product content by SKU. In the page template code for the product page, you can check your database to see if you have any of your “extra bits” to display
alongside the feed content, and this way keep it separate from the core product catalog code. This will make updates to the CMS/product catalog less painful as well.

Fixing your content doesn’t have to be all that difficult, nor expensive

At this point, you’re probably thinking “hey, but I’ve got 1.2 million SKUs, and if I were to do this, it’d take me 20 years to update all of them.”
FINE. Don’t update all of them. Prioritize, based on factors like what you sell the most of, what you make the best margin on, what customers
ask questions about the most, etc. Maybe concentrate on your top 5% in terms of sales, and do those first. Take all that money you used to spend
buying spammy links every month, and spend it instead on junior employees or interns doing the product measurements, extra photos, etc.

And don’t be afraid to spend a little effort on a low value product, if it’s one that frequently gets questions from customers.
Simple things can make a life-long fan of the customer. I once needed to replace a dishwasher door seal, and didn’t know if I needed special glue,
special tools, how to cut it to fit with or without overlap, etc.
I found a video on how to do the replacement on
RepairClinic.com. So easy!
They got my business for the $10 seal, of course…but now I order my $50 fridge water filter from them every six months as well.

Benefits to your conversion rate

Certainly the tactics we’ve talked about will improve your conversion rate from visitors to purchasers. If JUST ONE of those sites I looked at for that damn sink stopper
had the right measurement (and maybe some statement about how the manufacturer’s specs above are actually incorrect, we measured, etc.), I’d have stopped right there
and bought from that site.

What does this have to do with Panda?

But, there’s a Panda benefit here too. You’ve just added a bunch of additional, unique text to your site…and maybe a few new unique photos as well.
Not only are you going to convert better, but you’ll probably rank better too.

If you’re NOT Amazon, or eBay, or Home Depot, etc., then Panda is your secret weapon to help you rank against those other sites whose backlink profiles are
stronger than
carbon fibre (that’s a really cool video, by the way).
If you saw my
Whiteboard Friday on Panda optimization, you’ll know that
Panda tuning can overcome incredible backlink profile deficits.

It’s go time

We’re talking about tactics that are time-consuming, yes—but relatively easy to implement, using relatively inexpensive staff (and in some
cases, your customers are doing some of the work for you).
And it’s something you can roll out a product at a time.
You’ll be doing things that really DO make your site a better experience for the user…we’re not just trying to trick Panda’s measurements.

  1. Your pages will rank better, and bring more traffic.
  2. Your pages will convert better, because users won’t leave your site, looking elsewhere for answers to their questions.
  3. Your customers will be more loyal, because you were able to help them when nobody else bothered.

Don’t be held hostage by other peoples’ crappy product feeds. Enhance your product information with your own info and imagery.
Like good link-building and outreach, it takes time and effort, but both Panda and your site visitors will reward you for it.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it