Has Google Gone Too Far with the Bias Toward Its Own Content?

Posted by ajfried

Since the beginning of SEO time, practitioners have been trying to crack the Google algorithm. Every once in a while, the industry gets a glimpse into how the search giant works and we have opportunity to deconstruct it. We don’t get many of these opportunities, but when we do—assuming we spot them in time—we try to take advantage of them so we can “fix the Internet.”

On Feb. 16, 2015, news started to circulate that NBC would start removing images and references of Brian Williams from its website.

This was it!

A golden opportunity.

This was our chance to learn more about the Knowledge Graph.

Expectation vs. reality

Often it’s difficult to predict what Google is truly going to do. We expect something to happen, but in reality it’s nothing like we imagined.

Expectation

What we expected to see was that Google would change the source of the image. Typically, if you hover over the image in the Knowledge Graph, it reveals the location of the image.

Keanu-Reeves-Image-Location.gif

This would mean that if the image disappeared from its original source, then the image displayed in the Knowledge Graph would likely change or even disappear entirely.

Reality (February 2015)

The only problem was, there was no official source (this changed, as you will soon see) and identifying where the image was coming from proved extremely challenging. In fact, when you clicked on the image, it took you to an image search result that didn’t even include the image.

Could it be? Had Google started its own database of owned or licensed images and was giving it priority over any other sources?

In order to find the source, we tried taking the image from the Knowledge Graph and “search by image” in images.google.com to find others like it. For the NBC Nightly News image, Google failed to even locate a match to the image it was actually using anywhere on the Internet. For other television programs, it was successful. Here is an example of what happened for Morning Joe:

Morning_Joe_image_search.png

So we found the potential source. In fact, we found three potential sources. Seemed kind of strange, but this seemed to be the discovery we were looking for.

This looks like Google is using someone else’s content and not referencing it. These images have a source, but Google is choosing not to show it.

Then Google pulled the ol’ switcheroo.

New reality (March 2015)

Now things changed and Google decided to put a source to their images. Unfortunately, I mistakenly assumed that hovering over an image showed the same thing as the file path at the bottom, but I was wrong. The URL you see when you hover over an image in the Knowledge Graph is actually nothing more than the title. The source is different.

Morning_Joe_Source.png

Luckily, I still had two screenshots I took when I first saw this saved on my desktop. Success. One screen capture was from NBC Nightly News, and the other from the news show Morning Joe (see above) showing that the source was changed.

NBC-nightly-news-crop.png

(NBC Nightly News screenshot.)

The source is a Google-owned property: gstatic.com. You can clearly see the difference in the source change. What started as a hypothesis in now a fact. Google is certainly creating a database of images.

If this is the direction Google is moving, then it is creating all kinds of potential risks for brands and individuals. The implications are a loss of control for any brand that is looking to optimize its Knowledge Graph results. As well, it seems this poses a conflict of interest to Google, whose mission is to organize the world’s information, not license and prioritize it.

How do we think Google is supposed to work?

Google is an information-retrieval system tasked with sourcing information from across the web and supplying the most relevant results to users’ searches. In recent months, the search giant has taken a more direct approach by answering questions and assumed questions in the Answer Box, some of which come from un-credited sources. Google has clearly demonstrated that it is building a knowledge base of facts that it uses as the basis for its Answer Boxes. When it sources information from that knowledge base, it doesn’t necessarily reference or credit any source.

However, I would argue there is a difference between an un-credited Answer Box and an un-credited image. An un-credited Answer Box provides a fact that is indisputable, part of the public domain, unlikely to change (e.g., what year was Abraham Lincoln shot? How long is the George Washington Bridge?) Answer Boxes that offer more than just a basic fact (or an opinion, instructions, etc.) always credit their sources.

There are four possibilities when it comes to Google referencing content:

  • Option 1: It credits the content because someone else owns the rights to it
  • Option 2: It doesn’t credit the content because it’s part of the public domain, as seen in some Answer Box results
  • Option 3: It doesn’t reference it because it owns or has licensed the content. If you search for “Chicken Pox” or other diseases, Google appears to be using images from licensed medical illustrators. The same goes for song lyrics, which Eric Enge discusses here: Google providing credit for content. This adds to the speculation that Google is giving preference to its own content by displaying it over everything else.
  • Option 4: It doesn’t credit the content, but neither does it necessarily own the rights to the content. This is a very gray area, and is where Google seemed to be back in February. If this were the case, it would imply that Google is “stealing” content—which I find hard to believe, but felt was necessary to include in this post for the sake of completeness.

Is this an isolated incident?

At Five Blocks, whenever we see these anomalies in search results, we try to compare the term in question against others like it. This is a categorization concept we use to bucket individuals or companies into similar groups. When we do this, we uncover some incredible trends that help us determine what a search result “should” look like for a given group. For example, when looking at searches for a group of people or companies in an industry, this grouping gives us a sense of how much social media presence the group has on average or how much media coverage it typically gets.

Upon further investigation of terms similar to NBC Nightly News (other news shows), we noticed the un-credited image scenario appeared to be a trend in February, but now all of the images are being hosted on gstatic.com. When we broadened the categories further to TV shows and movies, the trend persisted. Rather than show an image in the Knowledge Graph and from the actual source, Google tends to show an image and reference the source from Google’s own database of stored images.

And just to ensure this wasn’t a case of tunnel vision, we researched other categories, including sports teams, actors and video games, in addition to spot-checking other genres.

Unlike terms for specific TV shows and movies, terms in each of these other groups all link to the actual source in the Knowledge Graph.

Immediate implications

It’s easy to ignore this and say “Well, it’s Google. They are always doing something.” However, there are some serious implications to these actions:

  1. The TV shows/movies aren’t receiving their due credit because, from within the Knowledge Graph, there is no actual reference to the show’s official site
  2. The more Google moves toward licensing and then retrieving their own information, the more biased they become, preferring their own content over the equivalent—or possibly even superior—content from another source
  3. If feels wrong and misleading to get a Google Image Search result rather than an actual site because:
    • The search doesn’t include the original image
    • Considering how poor Image Search results are normally, it feels like a poor experience
  4. If Google is moving toward licensing as much content as possible, then it could make the Knowledge Graph infinitely more complicated when there is a “mistake” or something unflattering. How could one go about changing what Google shows about them?

Google is objectively becoming subjective

It is clear that Google is attempting to create databases of information, including lyrics stored in Google Play, photos, and, previously, facts in Freebase (which is now Wikidata and not owned by Google).

I am not normally one to point my finger and accuse Google of wrongdoing. But this really strikes me as an odd move, one bordering on a clear bias to direct users to stay within the search engine. The fact is, we trust Google with a heck of a lot of information with our searches. In return, I believe we should expect Google to return an array of relevant information for searchers to decide what they like best. The example cited above seems harmless, but what about determining which is the right religion? Or even who the prettiest girl in the world is?

Religion-and-beauty-queries.png

Questions such as these, which Google is returning credited answers for, could return results that are perceived as facts.

Should we next expect Google to decide who is objectively the best service provider (e.g., pizza chain, painter, or accountant), then feature them in an un-credited answer box? The direction Google is moving right now, it feels like we should be calling into question their objectivity.

But that’s only my (subjective) opinion.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

I Can’t Drive 155: Meta Descriptions in 2015

Posted by Dr-Pete

For years now, we (and many others) have been recommending keeping your Meta Descriptions shorter than
about 155-160 characters. For months, people have been sending me examples of search snippets that clearly broke that rule, like this one (on a search for “hummingbird food”):

For the record, this one clocks in at 317 characters (counting spaces). So, I set out to discover if these long descriptions were exceptions to the rule, or if we need to change the rules. I collected the search snippets across the MozCast 10K, which resulted in 92,669 snippets. All of the data in this post was collected on April 13, 2015.

The Basic Data

The minimum snippet length was zero characters. There were 69 zero-length snippets, but most of these were the new generation of answer box, that appears organic but doesn’t have a snippet. To put it another way, these were misidentified as organic by my code. The other 0-length snippets were local one-boxes that appeared as organic but had no snippet, such as this one for “chichen itza”:

These zero-length snippets were removed from further analysis, but considering that they only accounted for 0.07% of the total data, they didn’t really impact the conclusions either way. The shortest legitimate, non-zero snippet was 7 characters long, on a search for “geek and sundry”, and appears to have come directly from the site’s meta description:

The maximum snippet length that day (this is a highly dynamic situation) was 372 characters. The winner appeared on a search for “benefits of apple cider vinegar”:

The average length of all of the snippets in our data set (not counting zero-length snippets) was 143.5 characters, and the median length was 152 characters. Of course, this can be misleading, since some snippets are shorter than the limit and others are being artificially truncated by Google. So, let’s dig a bit deeper.

The Bigger Picture

To get a better idea of the big picture, let’s take a look at the display length of all 92,600 snippets (with non-zero length), split into 20-character buckets (0-20, 21-40, etc.):

Most of the snippets (62.1%) cut off as expected, right in the 141-160 character bucket. Of course, some snippets were shorter than that, and didn’t need to be cut off, and some broke the rules. About 1% (1,010) of the snippets in our data set measured 200 or more characters. That’s not a huge number, but it’s enough to take seriously.

That 141-160 character bucket is dwarfing everything else, so let’s zoom in a bit on the cut-off range, and just look at snippets in the 120-200 character range (in this case, by 5-character bins):

Zooming in, the bulk of the snippets are displaying at lengths between about 146-165 characters. There are plenty of exceptions to the 155-160 character guideline, but for the most part, they do seem to be exceptions.

Finally, let’s zoom in on the rule-breakers. This is the distribution of snippets displaying 191+ characters, bucketed in 10-character bins (191-200, 201-210, etc.):

Please note that the Y-axis scale is much smaller than in the previous 2 graphs, but there is a pretty solid spread, with a decent chunk of snippets displaying more than 300 characters.

Without looking at every original meta description tag, it’s very difficult to tell exactly how many snippets have been truncated by Google, but we do have a proxy. Snippets that have been truncated end in an ellipsis (…), which rarely appears at the end of a natural description. In this data set, more than half of all snippets (52.8%) ended in an ellipsis, so we’re still seeing a lot of meta descriptions being cut off.

I should add that, unlike titles/headlines, it isn’t clear whether Google is cutting off snippets by pixel width or character count, since that cut-off is done on the server-side. In most cases, Google will cut before the end of the second line, but sometimes they cut well before this, which could suggest a character-based limit. They also cut off at whole words, which can make the numbers a bit tougher to interpret.

The Cutting Room Floor

There’s another difficulty with telling exactly how many meta descriptions Google has modified – some edits are minor, and some are major. One minor edit is when Google adds some additional information to a snippet, such as a date at the beginning. Here’s an example (from a search for “chicken pox”):

With the date (and minus the ellipsis), this snippet is 164 characters long, which suggests Google isn’t counting the added text against the length limit. What’s interesting is that the rest comes directly from the meta description on the site, except that the site’s description starts with “Chickenpox.” and Google has removed that keyword. As a human, I’d say this matches the meta description, but a bot has a very hard time telling a minor edit from a complete rewrite.

Another minor rewrite occurs in snippets that start with search result counts:

Here, we’re at 172 characters (with spaces and minus the ellipsis), and Google has even let this snippet roll over to a third line. So, again, it seems like the added information at the beginning isn’t counting against the length limit.

All told, 11.6% of the snippets in our data set had some kind of Google-generated data, so this type of minor rewrite is pretty common. Even if Google honors most of your meta description, you may see small edits.

Let’s look at our big winner, the 372-character description. Here’s what we saw in the snippet:

Jan 26, 2015 – Health• Diabetes Prevention: Multiple studies have shown a correlation between apple cider vinegar and lower blood sugar levels. … • Weight Loss: Consuming apple cider vinegar can help you feel more full, which can help you eat less. … • Lower Cholesterol: … • Detox: … • Digestive Aid: … • Itchy or Sunburned Skin: … • Energy Boost:1 more items

So, what about the meta description? Here’s what we actually see in the tag:

Were you aware of all the uses of apple cider vinegar? From cleansing to healing, to preventing diabetes, ACV is a pantry staple you need in your home.

That’s a bit more than just a couple of edits. So, what’s happening here? Well, there’s a clue on that same page, where we see yet another rule-breaking snippet:

You might be wondering why this snippet is any more interesting than the other one. If you could see the top of the SERP, you’d know why, because it looks something like this:

Google is automatically extracting list-style data from these pages to fuel the expansion of the Knowledge Graph. In one case, that data is replacing a snippet
and going directly into an answer box, but they’re performing the same translation even for some other snippets on the page.

So, does every 2nd-generation answer box yield long snippets? After 3 hours of inadvisable mySQL queries, I can tell you that the answer is a resounding “probably not”. You can have 2nd-gen answer boxes without long snippets and you can have long snippets without 2nd-gen answer boxes,
but there does appear to be a connection between long snippets and Knowledge Graph in some cases.

One interesting connection is that Google has begun bolding keywords that seem like answers to the query (and not just synonyms for the query). Below is an example from a search for “mono symptoms”. There’s an answer box for this query, but the snippet below is not from the site in the answer box:

Notice the bolded words – “fatigue”, “sore throat”, “fever”, “headache”, “rash”. These aren’t synonyms for the search phrase; these are actual symptoms of mono. This data isn’t coming from the meta description, but from a bulleted list on the target page. Again, it appears that Google is trying to use the snippet to answer a question, and has gone well beyond just matching keywords.

Just for fun, let’s look at one more, where there’s no clear connection to the Knowledge Graph. Here’s a snippet from a search for “sons of anarchy season 4”:

This page has no answer box, and the information extracted is odd at best. The snippet bears little or no resemblance to the site’s meta description. The number string at the beginning comes out of a rating widget, and some of the text isn’t even clearly available on the page. This seems to be an example of Google acknowledging IMDb as a high-authority site and desperately trying to match any text they can to the query, resulting in a Frankenstein’s snippet.

The Final Verdict

If all of this seems confusing, that’s probably because it is. Google is taking a lot more liberties with snippets these days, both to better match queries, to add details they feel are important, or to help build and support the Knowledge Graph.

So, let’s get back to the original question – is it time to revise the 155(ish) character guideline? My gut feeling is: not yet. To begin with, the vast majority of snippets are still falling in that 145-165 character range. In addition, the exceptions to the rule are not only atypical situations, but in most cases those long snippets don’t seem to represent the original meta description. In other words, even if Google does grant you extra characters, they probably won’t be the extra characters you asked for in the first place.

Many people have asked: “How do I make sure that Google shows my meta description as is?” I’m afraid the answer is: “You don’t.” If this is very important to you, I would recommend keeping your description below the 155-character limit, and making sure that it’s a good match to your target keyword concepts. I suspect Google is going to take more liberties with snippets over time, and we’re going to have to let go of our obsession with having total control over the SERPs.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

​The 3 Most Common SEO Problems on Listings Sites

Posted by Dom-Woodman

Listings sites have a very specific set of search problems that you don’t run into everywhere else. In the day I’m one of Distilled’s analysts, but by night I run a job listings site, teflSearch. So, for my first Moz Blog post I thought I’d cover the three search problems with listings sites that I spent far too long agonising about.

Quick clarification time: What is a listings site (i.e. will this post be useful for you)?

The classic listings site is Craigslist, but plenty of other sites act like listing sites:

  • Job sites like Monster
  • E-commerce sites like Amazon
  • Matching sites like Spareroom

1. Generating quality landing pages

The landing pages on listings sites are incredibly important. These pages are usually the primary drivers of converting traffic, and they’re usually generated automatically (or are occasionally custom category pages) .

For example, if I search “Jobs in Manchester“, you can see nearly every result is an automatically generated landing page or category page.

There are three common ways to generate these pages (occasionally a combination of more than one is used):

  • Faceted pages: These are generated by facets—groups of preset filters that let you filter the current search results. They usually sit on the left-hand side of the page.
  • Category pages: These pages are listings which have already had a filter applied and can’t be changed. They’re usually custom pages.
  • Free-text search pages: These pages are generated by a free-text search box.

Those definitions are still bit general; let’s clear them up with some examples:

Amazon uses a combination of categories and facets. If you click on browse by department you can see all the category pages. Then on each category page you can see a faceted search. Amazon is so large that it needs both.

Indeed generates its landing pages through free text search, for example if we search for “IT jobs in manchester” it will generate: IT jobs in manchester.

teflSearch generates landing pages using just facets. The jobs in China landing page is simply a facet of the main search page.

Each method has its own search problems when used for generating landing pages, so lets tackle them one by one.


Aside

Facets and free text search will typically generate pages with parameters e.g. a search for “dogs” would produce:

www.mysite.com?search=dogs

But to make the URL user friendly sites will often alter the URLs to display them as folders

www.mysite.com/results/dogs/

These are still just ordinary free text search and facets, the URLs are just user friendly. (They’re a lot easier to work with in robots.txt too!)


Free search (& category) problems

If you’ve decided the base of your search will be a free text search, then we’ll have two major goals:

  • Goal 1: Helping search engines find your landing pages
  • Goal 2: Giving them link equity.

Solution

Search engines won’t use search boxes and so the solution to both problems is to provide links to the valuable landing pages so search engines can find them.

There are plenty of ways to do this, but two of the most common are:

  • Category links alongside a search

    Photobucket uses a free text search to generate pages, but if we look at example search for photos of dogs, we can see the categories which define the landing pages along the right-hand side. (This is also an example of URL friendly searches!)

  • Putting the main landing pages in a top-level menu

    Indeed also uses free text to generate landing pages, and they have a browse jobs section which contains the URL structure to allow search engines to find all the valuable landing pages.

Breadcrumbs are also often used in addition to the two above and in both the examples above, you’ll find breadcrumbs that reinforce that hierarchy.

Category (& facet) problems

Categories, because they tend to be custom pages, don’t actually have many search disadvantages. Instead it’s the other attributes that make them more or less desirable. You can create them for the purposes you want and so you typically won’t have too many problems.

However, if you also use a faceted search in each category (like Amazon) to generate additional landing pages, then you’ll run into all the problems described in the next section.

At first facets seem great, an easy way to generate multiple strong relevant landing pages without doing much at all. The problems appear because people don’t put limits on facets.

Lets take the job page on teflSearch. We can see it has 18 facets each with many options. Some of these options will generate useful landing pages:

The China facet in countries will generate “Jobs in China” that’s a useful landing page.

On the other hand, the “Conditional Bonus” facet will generate “Jobs with a conditional bonus,” and that’s not so great.

We can also see that the options within a single facet aren’t always useful. As of writing, I have a single job available in Serbia. That’s not a useful search result, and the poor user engagement combined with the tiny amount of content will be a strong signal to Google that it’s thin content. Depending on the scale of your site it’s very easy to generate a mass of poor-quality landing pages.

Facets generate other problems too. The primary one being they can create a huge amount of duplicate content and pages for search engines to get lost in. This is caused by two things: The first is the sheer number of possibilities they generate, and the second is because selecting facets in different orders creates identical pages with different URLs.

We end up with four goals for our facet-generated landing pages:

  • Goal 1: Make sure our searchable landing pages are actually worth landing on, and that we’re not handing a mass of low-value pages to the search engines.
  • Goal 2: Make sure we don’t generate multiple copies of our automatically generated landing pages.
  • Goal 3: Make sure search engines don’t get caught in the metaphorical plastic six-pack rings of our facets.
  • Goal 4: Make sure our landing pages have strong internal linking.

The first goal needs to be set internally; you’re always going to be the best judge of the number of results that need to present on a page in order for it to be useful to a user. I’d argue you can rarely ever go below three, but it depends both on your business and on how much content fluctuates on your site, as the useful landing pages might also change over time.

We can solve the next three problems as group. There are several possible solutions depending on what skills and resources you have access to; here are two possible solutions:

Category/facet solution 1: Blocking the majority of facets and providing external links
  • Easiest method
  • Good if your valuable category pages rarely change and you don’t have too many of them.
  • Can be problematic if your valuable facet pages change a lot

Nofollow all your facet links, and noindex and block category pages which aren’t valuable or are deeper than x facet/folder levels into your search using robots.txt.

You set x by looking at where your useful facet pages exist that have search volume. So, for example, if you have three facets for televisions: manufacturer, size, and resolution, and even combinations of all three have multiple results and search volume, then you could set you index everything up to three levels.

On the other hand, if people are searching for three levels (e.g. “Samsung 42″ Full HD TV”) but you only have one or two results for three-level facets, then you’d be better off indexing two levels and letting the product pages themselves pick up long-tail traffic for the third level.

If you have valuable facet pages that exist deeper than 1 facet or folder into your search, then this creates some duplicate content problems dealt with in the aside “Indexing more than 1 level of facets” below.)

The immediate problem with this set-up, however, is that in one stroke we’ve removed most of the internal links to our category pages, and by no-following all the facet links, search engines won’t be able to find your valuable category pages.

In order re-create the linking, you can add a top level drop down menu to your site containing the most valuable category pages, add category links elsewhere on the page, or create a separate part of the site with links to the valuable category pages.

The top level drop down menu you can see on teflSearch (it’s the search jobs menu), the other two examples are demonstrated in Photobucket and Indeed respectively in the previous section.

The big advantage for this method is how quick it is to implement, it doesn’t require any fiddly internal logic and adding an extra menu option is usually minimal effort.

Category/facet solution 2: Creating internal logic to work with the facets

  • Requires new internal logic
  • Works for large numbers of category pages with value that can change rapidly

There are four parts to the second solution:

  1. Select valuable facet categories and allow those links to be followed. No-follow the rest.
  2. No-index all pages that return a number of items below the threshold for a useful landing page
  3. No-follow all facets on pages with a search depth greater than x.
  4. Block all facet pages deeper than x level in robots.txt

As with the last solution, x is set by looking at where your useful facet pages exist that have search volume (full explanation in the first solution), and if you’re indexing more than one level you’ll need to check out the aside below to see how to deal with the duplicate content it generates.


Aside: Indexing more than one level of facets

If you want more than one level of facets to be indexable, then this will create certain problems.

Suppose you have a facet for size:

  • Televisions: Size: 46″, 44″, 42″

And want to add a brand facet:

  • Televisions: Brand: Samsung, Panasonic, Sony

This will create duplicate content because the search engines will be able to follow your facets in both orders, generating:

  • Television – 46″ – Samsung
  • Television – Samsung – 46″

You’ll have to either rel canonical your duplicate pages with another rule or set up your facets so they create a single unique URL.

You also need to be aware that each followable facet you add will multiply with each other followable facet and it’s very easy to generate a mass of pages for search engines to get stuck in. Depending on your setup you might need to block more paths in robots.txt or set-up more logic to prevent them being followed.

Letting search engines index more than one level of facets adds a lot of possible problems; make sure you’re keeping track of them.


2. User-generated content cannibalization

This is a common problem for listings sites (assuming they allow user generated content). If you’re reading this as an e-commerce site who only lists their own products, you can skip this one.

As we covered in the first area, category pages on listings sites are usually the landing pages aiming for the valuable search terms, but as your users start generating pages they can often create titles and content that cannibalise your landing pages.

Suppose you’re a job site with a category page for PHP Jobs in Greater Manchester. If a recruiter then creates a job advert for PHP Jobs in Greater Manchester for the 4 positions they currently have, you’ve got a duplicate content problem.

This is less of a problem when your site is large and your categories mature, it will be obvious to any search engine which are your high value category pages, but at the start where you’re lacking authority and individual listings might contain more relevant content than your own search pages this can be a problem.

Solution 1: Create structured titles

Set the <title> differently than the on-page title. Depending on variables you have available to you can set the title tag programmatically without changing the page title using other information given by the user.

For example, on our imaginary job site, suppose the recruiter also provided the following information in other fields:

  • The no. of positions: 4
  • The primary area: PHP Developer
  • The name of the recruiting company: ABC Recruitment
  • Location: Manchester

We could set the <title> pattern to be: *No of positions* *The primary area* with *recruiter name* in *Location* which would give us:

4 PHP Developers with ABC Recruitment in Manchester

Setting a <title> tag allows you to target long-tail traffic by constructing detailed descriptive titles. In our above example, imagine the recruiter had specified “Castlefield, Manchester” as the location.

All of a sudden, you’ve got a perfect opportunity to pick up long-tail traffic for people searching in Castlefield in Manchester.

On the downside, you lose the ability to pick up long-tail traffic where your users have chosen keywords you wouldn’t have used.

For example, suppose Manchester has a jobs program called “Green Highway.” A job advert title containing “Green Highway” might pick up valuable long-tail traffic. Being able to discover this, however, and find a way to fit it into a dynamic title is very hard.

Solution 2: Use regex to noindex the offending pages

Perform a regex (or string contains) search on your listings titles and no-index the ones which cannabalise your main category pages.

If it’s not possible to construct titles with variables or your users provide a lot of additional long-tail traffic with their own titles, then is a great option. On the downside, you miss out on possible structured long-tail traffic that you might’ve been able to aim for.

Solution 3: De-index all your listings

It may seem rash, but if you’re a large site with a huge number of very similar or low-content listings, you might want to consider this, but there is no common standard. Some sites like Indeed choose to no-index all their job adverts, whereas some other sites like Craigslist index all their individual listings because they’ll drive long tail traffic.

Don’t de-index them all lightly!

3. Constantly expiring content

Our third and final problem is that user-generated content doesn’t last forever. Particularly on listings sites, it’s constantly expiring and changing.

For most use cases I’d recommend 301’ing expired content to a relevant category page, with a message triggered by the redirect notifying the user of why they’ve been redirected. It typically comes out as the best combination of search and UX.

For more information or advice on how to deal with the edge cases, there’s a previous Moz blog post on how to deal with expired content which I think does an excellent job of covering this area.

Summary

In summary, if you’re working with listings sites, all three of the following need to be kept in mind:

  • How are the landing pages generated? If they’re generated using free text or facets have the potential problems been solved?
  • Is user generated content cannibalising the main landing pages?
  • How has constantly expiring content been dealt with?

Good luck listing, and if you’ve had any other tricky problems or solutions you’ve come across working on listings sites lets chat about them in the comments below!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

Whats New In SEO Trends 2015 Panel Discussion

2015 can be a record year for your digital marketing efforts. But, Google’s radical changes in 2014 – algorithm updates, Answer Boxes, Knowledge Graph – can be overwhelming to hardworking search…

Reblogged 3 years ago from www.youtube.com

How We Fixed the Internet (Ok, an Answer Box)

Posted by Dr-Pete

Last year, Google expanded the Knowledge Graph to use data extracted (*cough* scraped) from the index to create answer boxes. Back in October, I wrote about a failed experiment. One of my posts, an odd dive
into Google’s revenue, was being answer-fied for the query “How much does Google make?”:

Objectively speaking, even I could concede that this wasn’t a very good answer in 2014. I posted it on Twitter, and
David Iwanow asked the inevitable question:

Enthusiasm may have gotten the best of us, a few more people got involved (like my former Moz colleague
Ruth Burr Reedy), and suddenly we were going to fix this once and for all:

There Was Just One Problem

I updated the post, carefully rewriting the first paragraph to reflect the new reality of Google’s revenue. I did my best to make the change user-friendly, adding valuable information but not disrupting the original post. I did, however, completely replace the old text that Google was scraping.

Within less than a day, Google had re-cached the content, and I just had to wait to see the new answer box. So, I waited, and waited… and waited. Two months later, still no change. Some days, the SERP showed no answer box at all (although I’ve since found these answer boxes are very dynamic), and I was starting to wonder if it was all a mistake.

Then, Something Happened

Last week, months after I had given up, I went to double-check this query for entirely different reasons, and I saw the following:

Google had finally updated the answer box with the new text, and they had even pulled an image from the post. It was a strange choice of images, but in fairness, it was a strange post.

Interestingly, Google also added the publication date of the post, perhaps recognizing that outdated answers aren’t always useful. Unfortunately, this doesn’t reflect the timing of the new content, but that’s understandable – Google doesn’t have easy access to that data.

It’s interesting to note that sometimes Google shows the image, and sometimes they don’t. This seems to be independent of whether the SERP is personalized or incognito. Here’s a capture of the image-free version, along with the #1 organic ranking:

You’ll notice that the #1 result is also my Moz post, and that result has an expanded meta description. So, the same URL is essentially double-dipping this SERP. This isn’t always the case – answers can be extracted from URLs that appear lower on page 1 (although almost always page 1, in my experience). Anecdotally, it’s also not always the case that these organic result ends up getting an expanded meta description.

However, it definitely seems that some of the quality signals driving organic ranking and expanded meta descriptions are also helping Google determine whether a query deserves a direct answer. Put simply, it’s not an accident that this post was chosen to answer this question.

What Does This Mean for You?

Let’s start with the obvious – Yes, the v2 answer boxes (driven by the index, not Freebase/WikiData)
can be updated. However, the update cycle is independent of the index’s refresh cycle. In other words, just because a post is re-cached, it doesn’t mean the answer box will update. Presumably, Google is creating a second Knowledge Graph, based on the index, and this data is only periodically updated.

It’s also entirely possible that updating could cause you to lose an answer box, if the new data weren’t a strong match to the question or the quality of the content came into question. Here’s an interesting question – on a query where a competitor has an answer box, could you change your own content enough to either replace them or knock out the answer box altogether? We are currently testing this question, but it may be a few more months before we have any answers.

Another question is what triggers this style of answer box in the first place? Eric Enge has an
in-depth look at 850,000 queries that’s well worth your time, and in many cases Google is still triggering on obvious questions (“how”, “what”, “where”, etc.). Nouns that could be interpreted as ambiguous also can trigger the new answer boxes. For example, a search for “ruby” is interpreted by Google as roughly meaning “What is Ruby?”:

This answer box also triggers “Related topics” that use content pulled from other sites but drive users to more Google searches. The small, gray links are the source sites. The much more visible, blue links are more Google searches.

Note that these also have to be questions (explicit or implied) that Google can’t answer with their curated Knowledge Graph (based on sources like Freebase and WikiData). So, for example, the question “When is Mother’s Day?” triggers an older-style answer:

Sites offering this data aren’t going to have a chance to get attribution, because Google essentially already owns the answer to this question as part of their core Knowledge Graph.

Do You Want to Be An Answer?

This is where things get tricky. At this point, we have no clear data on how these answer boxes impact CTR, and it’s likely that the impact depends a great deal on the context. I think we’re facing a certain degree of inevitability – if Google is going to list an answer, better it’s your answer then someone else’s, IMO. On the other hand, what if that answer is so complete that it renders your URL irrelevant? Consider, for example, the SERP for “how to make grilled cheese”:

Sorry, Food Network, but making a grilled cheese sandwich isn’t really that hard, and this answer box doesn’t leave much to the imagination. As these answers get more and more thorough, expect CTRs to fall.

For now, I’d argue that it’s better to have your link in the box than someone else’s, but that’s cold comfort in many cases. These new answer boxes represent what I feel is a dramatic shift in the relationship between Google and webmasters, and they may be tipping the balance. For now, we can’t do much but wait, see, and experiment.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it