Unraveling Panda Patterns

Posted by billslawski

This is my first official blog post at Moz.com, and I’m going to be requesting your help and expertise and imagination.

I’m going to be asking you to take over as Panda for a little while to see if you can identify the kinds of things that Google’s Navneet Panda addressed when faced with what looked like an incomplete patent created to identify sites as parked domain pages, content farm pages, and link farm pages. You’re probably better at this now then he was then.

You’re a subject matter expert.

To put things in perspective, I’m going to include some information about what appears to be the very first Panda patent, and some of Google’s effort behind what they were calling the “high-quality site algorithm.”

I’m going to then include some of the patterns they describe in the patent to identify lower-quality pages, and then describe some of the features I personally would suggest to score and rank a higher-quality site of one type.

Google’s Amit Singhal identified a number of questions about higher quality sites that he might use, and told us in the blog post where he listed those that it was an incomplete list because they didn’t want to make it easy for people to abuse their algorithm.

In my opinion though, any discussion about improving the quality of webpages is one worth having, because it can help improve the quality of the Web for everyone, which Google should be happy to see anyway.

Warning searchers about low-quality content

In “Processing web pages based on content quality,” the original patent filing for Panda, there’s a somewhat mysterious statement that makes it sound as if Google might warn searchers before sending them to a low quality search result, and give them a choice whether or not they might actually click through to such a page.

As it notes, the types of low quality pages the patent was supposed to address included parked domain pages, content farm pages, and link farm pages (yes,
link farm pages):

“The processor 260 is configured to receive from a client device (e.g., 110), a request for a web page (e.g., 206). The processor 260 is configured to determine the content quality of the requested web page based on whether the requested web page is a parked web page, a content farm web page, or a link farm web page.

Based on the content quality of the requested web page, the processor is configured to provide for display, a graphical component (e.g., a warning prompt). That is, the processor 260 is configured to provide for display a graphical component (e.g., a warning prompt) if the content quality of the requested web page is at or below a certain threshold.

The graphical component provided for display by the processor 260 includes options to proceed to the requested web page or to proceed to one or more alternate web pages relevant to the request for the web page (e.g., 206). The graphical component may also provide an option to stop proceeding to the requested web page.

The processor 260 is further configured to receive an indication of a selection of an option from the graphical component to proceed to the requested web page, or to proceed to an alternate web page. The processor 260 is further configured to provide for display, based on the received indication, the requested web page or the alternate web page.”

This did not sound like a good idea.

Recently, Google announced in a post on the Google Webmaster Central blog post,
Promoting modern websites for modern devices in Google search results, that they would start providing warning notices on mobile versions of sites if there were issues on those pages that visitors might go to.

I imagine that as a site owner, you might be disappointed seeing such warning notice shown to searchers on your site about technology used on your site possibly not working correctly on a specific device. That recent blog post mentions Flash as an example of a technology that might not work correctly on some devices. For example, we know that Apple’s mobile devices and Flash don’t work well together.

That’s not a bad warning in that it provides enough information to act upon and fix to the benefit of a lot of potential visitors. 🙂

But imagine if you tried to visit your website in 2011, and instead of getting to the site, you received a Google warning that the page you were trying to visit was a content farm page or a link farm page, and it provided alternative pages to visit as well.

That ”
your website sucks” warning still doesn’t sound like a good idea. One of the inventors listed on the patent is described in LinkedIn as presently working on the Google Play store. The warning for mobile devices might have been something he brought to Google from his work on this Panda patent.

We know that when the Panda Update was released that it was targeting specific types of pages that people at places such as
The New York Times were complaining about, such as parked domains and content farm sites. A
follow-up from the Timesafter the algorithm update was released puts it into perspective for us.

It wasn’t easy to know that your pages might have been targeted by that particular Google update either, or if your site was a false positive—and many site owners ended up posting in the Google Help forums after a Google search engineer invited them to post there if they believed that they were targeted by the update when they shouldn’t have been.

The wording of that
invitation is interesting in light of the original name of the Panda algorithm. (Note that the thread was broken into multiple threads when Google did a migration of posts to new software, and many appear to have disappeared at some point.)

As we were told in the invite from the Google search engineer:

“According to our metrics, this update improves overall search quality. However, we are interested in hearing feedback from site owners and the community as we continue to refine our algorithms. If you know of a high-quality site that has been negatively affected by this change, please bring it to our attention in this thread.

Note that as this is an algorithmic change we are unable to make manual exceptions, but in cases of high quality content we can pass the examples along to the engineers who will look at them as they work on future iterations and improvements to the algorithm.

So even if you don’t see us responding, know that we’re doing a lot of listening.”

The timing for such in-SERP warnings might have been troublesome. A site that mysteriously stops appearing in search results for queries that it used to rank well for might be said to have gone astray of
Google’s guidelines. Instead, such a warning might be a little like the purposefully embarrassing “Scarlet A” in Nathaniel Hawthorn’s novel The Scarlet Letter.

A page that shows up in search results with a warning to searchers stating that it was a content farm, or a link farm, or a parked domain probably shouldn’t be ranking well to begin with. Having Google continuing to display those results ranking highly, showing both a link and a warning to those pages, and then diverting searchers to alternative pages might have been more than those site owners could handle. Keep in mind that the fates of those businesses are usually tied to such detoured traffic.

My imagination is filled with the filing of lawsuits against Google based upon such tantalizing warnings, rather than site owners filling up a Google Webmaster Help Forum with information about the circumstances involving their sites being impacted by the upgrade.

In retrospect, it is probably a good idea that the warnings hinted at in the original Panda Patent were avoided.

Google seems to think that such warnings are appropriate now when it comes to multiple devices and technologies that may not work well together, like Flash and iPhones.

But there were still issues with how well or how poorly the algorithm described in the patent might work.

In the March, 2011 interview with Google’s Head of Search Quality, Amit Sighal, and his team member and Head of Web Spam at Google, Matt Cutts, titled
TED 2011: The “Panda” That Hates Farms: A Q&A With Google’s Top Search Engineers, we learned of the code name that Google claimed to be using to refer to the algorithm update as “Panda,” after an engineer with that name came along and provided suggestions on patterns that could be used by the patent to identify high- and low-quality pages.

His input seems to have been pretty impactful—enough for Google to have changed the name of the update, from the “High Quality Site Algorithm” to the “Panda” update.

How the High-Quality Site Algorithm became Panda

Danny Sullivan named the update the “Farmer update” since it supposedly targeted content farm web sites. Soon afterwards the joint interview with Singhal and Cutts identified the Panda codename, and that’s what it’s been called ever since.

Google didn’t completely abandon the name found in the original patent, the “high quality sites algorithm,” as can be seen in the titles of these Google Blog posts:

The most interesting of those is the “more guidance” post, in which Amit Singhal lists 23 questions about things Google might look for on a page to determine whether or not it was high-quality. I’ve spent a lot of time since then looking at those questions thinking of features on a page that might convey quality.

The original patent is at:

Processing web pages based on content quality
Inventors: Brandon Bilinski and Stephen Kirkham
Assigned to Google

US Patent 8,775,924

Granted July 8, 2014

Filed: March 9, 2012

Abstract

“Computer-implemented methods of processing web pages based on content quality are provided. In one aspect, a method includes receiving a request for a web page.

The method includes determining the content quality of the requested web page based on whether it is a parked web page, a content farm web page, or a link farm web page. The method includes providing for display, based on the content quality of the requested web page, a graphical component providing options to proceed to the requested web page or to an alternate web page relevant to the request for the web page.

The method includes receiving an indication of a selection of an option from the graphical component to proceed to the requested web page or to an alternate web page. The method further includes providing, based on the received indication, the requested web page or an alternate web page.

The patent expands on what are examples of low-quality web pages, including:

  • Parked web pages
  • Content farm web pages
  • Link farm web pages
  • Default pages
  • Pages that do not offer useful content, and/or pages that contain advertisements and little else

An invitation to crowdsource high-quality patterns

This is the section I mentioned above where I am asking for your help. You don’t have to publish your thoughts on how quality might be identified, but I’m going to start with some examples.

Under the patent, a content quality value score is calculated for every page on a website based upon patterns found on known low-quality pages, “such as parked web pages, content farm web pages, and/or link farm web pages.”

For each of the patterns identified on a page, the content quality value of the page might be reduced based upon the presence of that particular pattern—and each pattern might be weighted differently.

Some simple patterns that might be applied to a low-quality web page might be one or more references to:

  • A known advertising network,
  • A web page parking service, and/or
  • A content farm provider

One of these references may be in the form of an IP address that the destination hostname resolves to, a Domain Name Server (“DNS server”) that the destination domain name is pointing to, an “a href” attribute on the destination page, and/or an “img src” attribute on the destination page.

That’s a pretty simple pattern, but a web page resolving to an IP address known to exclusively serve parked web pages provided by a particular Internet domain registrar can be deemed a parked web page, so it can be pretty effective.

A web page with a DNS server known to be associated with web pages that contain little or no content other than advertisements may very well provide little or no content other than advertising. So that one can be effective, too.

Some of the patterns listed in the patent don’t seem quite as useful or informative. For example, the one stating that a web page containing a common typographical error of a bona fide domain name may likely be a low-quality web page, or a non-existent web page. I’ve seen more than a couple of legitimate sites with common misspellings of good domains, so I’m not too sure how helpful a pattern that is.

Of course, some textual content is a dead giveaway the patent tells us, with terms on them such as “domain is for sale,” “buy this domain,” and/or “this page is parked.”

Likewise, a web page with little or no content is probably (but not always) a low-quality web page.

This is a simple but effective pattern, even if not too imaginative:

… page providing 99% hyperlinks and 1% plain text is more likely to be a low-quality web page than a web page providing 50% hyperlinks and 50% plain text.

Another pattern is one that I often check upon and address in site audits, and it involves how functional and responsive pages on a site are.

The determination of whether a web site is full functional may be based on an HTTP response code, information received from a DNS server (e.g., hostname records), and/or a lack of a response within a certain amount of time. As an example, an HTTP response that is anything other than 200 (e.g., “404 Not Found”) would indicate that a web site is not fully functional.

As another example, a DNS server that does not return authoritative records for a hostname would indicate that the web site is not fully functional. Similarly, a lack of a response within a certain amount of time, from the IP address of the hostname for a web site would indicate that the web site is not fully functional.

As for user-data, sometimes it might play a role as well, as the patent tells us:

A web page may be suggested for review and/or its content quality value may be adapted based on the amount of time spent on that page.

For example, if a user reaches a web page and then leaves immediately, the brief nature of the visit may cause the content quality value of that page to be reviewed and/or reduced. The amount of time spent on a particular web page may be determined through a variety of approaches. For example, web requests for web pages may be used to determine the amount of time spent on a particular web page.”

My example of some patterns for an e-commerce website

There are a lot of things that you might want to include on an ecommerce site that help to indicate that it’s high quality. If you look at the questions that Amit Singhal raised in the last Google Blog post I mentioned above, one of his questions was “Would you be comfortable giving your credit card information to this site?” Patterns that might fit with this question could include:

  • Is there a privacy policy linked to on pages of the site?
  • Is there a “terms of service” page linked to on pages of the site?
  • Is there a “customer service” page or section linked to on pages of the site?
  • Do ordering forms function fully on the site? Do they return 404 pages or 500 server errors?
  • If an order is made, does a thank-you or acknowledgement page show up?
  • Does the site use an https protocol when sending data or personally identifiable data (like a credit card number)?

As I mentioned above, the patent tells us that a high-quality content score for a page might be different from one pattern to another.

The
questions from Amit Singhal imply a lot of other patterns, but as SEOs who work on and build and improve a lot of websites, this is an area where we probably have more expertise than Google’s search engineers.

What other questions would you ask if you were tasked with looking at this original Panda Patent? What patterns would you suggest looking for when trying to identify high or low quality pages?  Perhaps if we share with one another patterns or features on a site that Google might look for algorithmically, we could build pages that might not be interpreted by Google as being a low quality site. I provided a few patterns for an ecommerce site above. What patterns would you suggest?

(Illustrations: Devin Holmes @DevinGoFish)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from feedproxy.google.com

Your Google Algorithm Cheat Sheet: Panda, Penguin, and Hummingbird

Posted by MarieHaynes

If you’re reading the Moz blog, then you probably have a decent understanding of Google and its algorithm changes. However, there is probably a good percentage of the Moz audience that is still confused about the effects that Panda, Penguin, and Hummingbird can have on your site. I did write a post last year about the main 
differences between Penguin and a Manual Unnautral Links Penalty, and if you haven’t read that, it’ll give you a good primer.

The point of this article is to explain very simply what each of these algorithms are meant to do. It is hopefully a good reference that you can point your clients to if you want to explain an algorithm change and not overwhelm them with technical details about 301s, canonicals, crawl errors, and other confusing SEO terminologies.

What is an algorithm change?

First of all, let’s start by discussing the Google algorithm. It’s immensely complicated and continues to get more complicated as Google tries its best to provide searchers with the information that they need. When search engines were first created, early search marketers were able to easily find ways to make the search engine think that their client’s site was the one that should rank well. In some cases it was as simple as putting in some code on the website called a meta keywords tag. The meta keywords tag would tell search engines what the page was about.

As Google evolved, its engineers, who were primarily focused on making the search engine results as relevant to users as possible, continued to work on ways to stop people from cheating, and looked at other ways to show the most relevant pages at the top of their searches. The algorithm now looks at hundreds of different factors. There are some that we know are significant such as having a good descriptive title (between the <title></title> tags in the code.) And there are many that are the subject of speculation such as 
whether or not Google +1’s contribute to a site’s rankings.

In the past, the Google algorithm would change very infrequently. If your site was sitting at #1 for a certain keyword, it was guaranteed to stay there until the next update which might not happen for weeks or months. Then, they would push out another update and things would change. They would stay that way until the next update happened. If you’re interested in reading about how Google used to push updates out of its index, you may find this 
Webmaster World forum thread from 2002 interesting. (Many thanks to Paul Macnamara  for explaining to me how algo changes used to work on Google in the past and pointing me to the Webmaster World thread.)

This all changed with launch of “Caffeine” in 2010. Since Caffeine launched, the search engine results have been changing several times a day rather than every few weeks. Google makes over 600 changes to its algorithm in a year, and the vast majority of these are not announced. But, when Google makes a really big change, they give it a name, usually make an announcement, and everyone in the SEO world goes crazy trying to figure out how to understand the changes and use them to their advantage.

Three of the biggest changes that have happened in the last few years are the Panda algorithm, the Penguin algorithm and Hummingbird.

What is the Panda algorithm?

Panda first launched on February 23, 2011. It was a big deal. The purpose of Panda was to try to show high-quality sites higher in search results and demote sites that may be of lower quality. This algorithm change was unnamed when it first came out, and many of us called it the “Farmer” update as it seemed to affect content farms. (Content farms are sites that aggregate information from many sources, often stealing that information from other sites, in order to create large numbers of pages with the sole purpose of ranking well in Google for many different keywords.) However, it affected a very large number of sites. The algorithm change was eventually officially named after one of its creators, Navneet Panda.

When Panda first happened, a lot of SEOs in forums thought that this algorithm was targeting sites with unnatural backlink patterns. However, it turns out that links are most likely
not a part of the Panda algorithm. It is all about on-site quality.

In most cases, sites that were affected by Panda were hit quite hard. But, I have also seen sites that have taken a slight loss on the date of a Panda update. Panda tends to be a site-wide issue which means that it doesn’t just demote certain pages of your site in the search engine results, but instead, Google considers the entire site to be of lower quality. In some cases though Panda can affect just a section of a site such as a news blog or one particular subdomain.

Whenever a Google employee is asked about what needs to be done to recover from Panda, they refer to a 
blog post by Google Employee Amit Singhal that gives a checklist that you can use on your site to determine if your site really is high quality or not. Here is the list:

  • Would you trust the information presented in this article?
  • Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
  • Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
  • Would you be comfortable giving your credit card information to this site?
  • Does this article have spelling, stylistic, or factual errors?
  • Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
  • Does the article provide original content or information, original reporting, original research, or original analysis?
  • Does the page provide substantial value when compared to other pages in search results?
  • How much quality control is done on content?
  • Does the article describe both sides of a story?
  • Is the site a recognized authority on its topic?
  • Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  • Was the article edited well, or does it appear sloppy or hastily produced?
  • For a health related query, would you trust information from this site?
  • Would you recognize this site as an authoritative source when mentioned by name?
  • Does this article provide a complete or comprehensive description of the topic?
  • Does this article contain insightful analysis or interesting information that is beyond obvious?
  • Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  • Does this article have an excessive amount of ads that distract from or interfere with the main content?
  • Would you expect to see this article in a printed magazine, encyclopedia or book?
  • Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  • Are the pages produced with great care and attention to detail vs. less attention to detail?
  • Would users complain when they see pages from this site?

Phew! That list is pretty overwhelming! These questions do not necessarily mean that Google tries to algorithmically figure out whether your articles are interesting or whether you have told both sides of a story. Rather, the questions are there because all of these factors can contribute to how real-life users would rate the quality of your site. No one really knows all of the factors that Google uses in determining the quality of your site through the eyes of Panda. Ultimately though, the focus is on creating the best site possible for your users.  It is also important that only your best stuff is given to Google to have in its index. There are a few factors that are widely accepted as important things to look at in regards to Panda:

Thin content

A “thin” page is a page that adds little or no value to someone who is reading it. It doesn’t necessarily mean that a page has to be a certain number of words, but quite often, pages with very few words are not super-helpful. If you have a large number of pages on your site that contain just one or two sentences and those pages are all included in the Google index, then the Panda algorithm may determine that the majority of your indexed pages are of low quality.

Having the odd thin page is not going to cause you to run in to Panda problems. But, if a big enough portion of your site contains pages that are not helpful to users, then that is not good.

Duplicate content

There are several ways that duplicate content can cause your site to be viewed as a low-quality site by the Panda algorithm. The first is when a site has a large amount of content that is copied from other sources on the web. Let’s say that you have a blog on your site and you populate that blog with articles that are taken from other sources. Google is pretty good at figuring out that you are not the creator of this content. If the algorithm can see that a large portion of your site is made up of content that exists on other sites then this can cause Panda to look at you unfavorably.

You can also run into problems with duplicated content on your own site. One example would be for a site that has a large number of products for sale. Perhaps each product has a separate page for each color variation and size. But, all of these pages are essentially the same. If one product comes in 20 different colors and each of those come in 6 different sizes, then that means that you have 120 pages for the same product, all of which are almost identical. Now, imagine that you sell 4,000 products. This means that you’ve got almost half a million pages in the Google index when really 4,000 pages would suffice. In this type of situation, the fix for this problem is to use something called a canonical tag. Moz has got a really good guide on using canonical tags 
here, and Dr. Pete has also written this great article on canonical tag use

Low-quality content

When I write an article and publish it on one of my websites, the only type of information that I want to present to Google is information that is the absolute best of its kind. In the past, many SEOs have given advice to site owners saying that it was important to blog every day and make sure that you are always adding content for Google to index. But, if what you are producing is not high quality content, then you could be doing more harm than good. A lot of Amit Singhal’s questions listed above are asking whether the content on your site is valuable to readers. Let’s say that I have an SEO blog and every day I take a short blurb from each of the interesting SEO articles that I have read online and publish it as a blog post on my site. Is Google going to want to show searchers my summary of these articles, or would they rather show them the actual articles? Of course my summary is not going to be as valuable as the real thing! Now, let’s say that I have done this every day for 4 years. Now my site has over 4,000 pages that contain information that is not unique and not as valuable as other sites on the same topics.

Here is another example. Let’s say that I am a plumber. I’ve been told that I should blog regularly, so several times a week I write a 2-3 paragraph article on things like, “How to fix a leaky faucet” or “How to unclog a toilet.” But, I’m busy and don’t have much time to put into my website so each article I’ve written contains keywords in the title and a few times in the content, but the content is not in depth and is not that helpful to readers. If the majority of the pages on my site contain information that no one is engaging with, then this can be a sign of low quality in the eyes of the Panda algorithm.

There are other factors that probably play a roll in the Panda algorithm.  Glenn Gabe recently wrote an 
excellent article on his evaluation of sites affected by the most recent Panda update.  His bullet point list of things to improve upon when affected by Panda is extremely thorough.

How to recover from a Panda hit

Google refreshes the Panda algorithm approximately monthly. They used to announce whenever they were refreshing the algorithm, but now they only do this if there is a really big change to the Panda algorithm. What happens when the Panda algorithm refreshes is that Google takes a new look at each site on the web and determines whether or not it looks like a quality site in regards to the criteria that the Panda algorithm looks at. If your site was adversely affected by Panda and you have made changes such as removing thin and duplicate content then, when Panda refreshes, you should see that things improve. However, for some sites it can take a couple of Panda refreshes to see the full extent of the improvements. This is because it can sometimes take several months for Google to revisit all of your pages and recognize the changes that you have made.

Every now and then, instead of just
refreshing the algorithm, Google does what they call an update. When an update happens, this means that Google has changed the criteria that they use to determine what is and isn’t considered high quality. On May 20, 2014, Google did a major update which they called Panda 4.0. This caused a lot of sites to see significant changes in regards to Panda:

Not all Panda recoveries are as dramatic as this one. But, if you have been affected by Panda and you work hard to make changes to your site, you really should see some improvement.

What is the Penguin algorithm?

Penguin

The Penguin algorithm initially rolled out on April 24, 2012. The goal of Penguin is to reduce the trust that Google has in sites that have cheated by creating unnatural backlinks in order to gain an advantage in the Google results. While the primary focus of Penguin is on unnatural links, there can be other 
factors that can affect a site in the eyes of Penguin as well. Links, though, are known to be by far the most important thing to look at.

Why are links important?

A link is like a vote for your site. If a well respected site links to your site, then this is a recommendation for your site. If a small, unknown site links to you then this vote is not going to count for as much as a vote from an authoritative site. Still, if you can get a large number of these small votes, they really can make a difference. This is why, in the past, SEOs would try to get as many links as they could from any possible source.

Another thing that is important in the Google algorithms is anchor text. Anchor text is the text that is underlined in a link. So, in this link to a great 
SEO blog, the anchor text would be “SEO blog.” If Moz.com gets a number of sites linking to them using the anchor text “SEO blog,” that is a hint to Google that people searching for “SEO blog” probably want to see sites like Moz in their search results.

It’s not hard to see how people could manipulate this part of the algorithm. Let’s say that I am doing SEO for a landscaping company in Orlando. In the past, one of the ways that I could cheat the algorithm into thinking that my company should be ranked highly would be to create a bunch of self made links and use anchor text in these links that contain phrases like
Orlando Landscaping Company, Landscapers in Orlando and Orlando Landscaping. While an authoritative link from a well respected site is good, what people discovered is that creating a large number of links from low quality sites was quite effective. As such, what SEOs would do is create links from easy to get places like directory listings, self made articles, and links in comments and forum posts.

While we don’t know exactly what factors the Penguin algorithm looks at, what we do know is that this type of low quality, self made link is what the algorithm is trying to detect. In my mind, the Penguin algorithm is sort of like Google putting a “trust factor” on your links. I used to tell people that Penguin could affect a site on a page or even a keyword level, but Google employee John Mueller has said several times now that Penguin is a sitewide algorithm. This means that if the Penguin algorithm determines that a large number of the links to your site are untrustworthy, then this reduces Google’s trust in your entire site. As such, the whole site will see a reduction in rankings.  

While Penguin affected a lot of sites drastically, I have seen many sites that saw a small reduction in rankings.  The difference, of course, depends on the amount of link manipulation that has been done.

How to recover from a Penguin hit?

Penguin is a filter just like Panda. What that means, is that the algorithm is re-run periodically and sites are re-evaluated with each re-run. At this point it is not run very often at all. The last update was October 4, 2013 which means that we have currently been waiting eight months for a new Penguin update. In order to recover from Penguin, you need to identify the unnatural links pointing to your site and either remove them, or if you can’t remove them you can ask Google to no longer count them by using the 
disavow tool. Then, the next time that Penguin refreshes or updates, if you have done a good enough job at cleaning up your unnatural links, you will once again regain trust in Google’s eyes.  In some cases, it can take a couple of refreshes in order for a site to completely escape Penguin because it can take up to 6 months for all of a site’s disavow file to be completely processed.

If you are not certain how to identify which links to your site are unnatural, here are some good resources for you:

The disavow tool is something that you probably should only be using if you really understand how it works. It is potentially possible for you to do more harm than good to your site if you disavow the wrong links. Here is some information on using the disavow tool:

It’s important to note that when sites “recover” from Penguin, they often don’t skyrocket up to top rankings once again as those previously high rankings were probably based on the power of links that are now considered unnatural. Here is some information on 
what to expect when you have recovered from a link based penalty or algorithmic issue.

Also, the Penguin algorithm is not the same thing as a manual unnatural links penalty. You do not need to file a reconsideration request to recover from Penguin. You also do not need to document the work that you have done in order to get links removed as no Google employee will be manually reviewing your work. As mentioned previously, here is more information on the 
difference between the Penguin algorithm and a manual unnatural links penalty.

What is Hummingbird?

Hummingbird is a completely different animal than Penguin or Panda. (Yeah, I know…that was a bad pun.) I will commonly get people emailing me telling me that Hummingbird destroyed their rankings. I would say that in almost every case that I have evalutated, this was not true. Google made their announcement about Hummingbird on September 26, 2013. However, at that time, they announced that Hummingbird had already been live for about a month. If the Hummingbird algorithm was truly responsible for catastrophic ranking fluctuations then we really should have seen an outcry from the SEO world of something drastic happening in August of 2013, and this did not happen. There did seem to be some type of fluctuation that happened around August 21 as reported here on Search Engine Round Table, but there were not many sites that reported huge ranking changes on that day.

If you think that Hummingbird affected you, it’s not a bad idea to look at your traffic to see if you noticed a drop on October 4, 2013 which was actually a refresh of the Penguin algorithm. I believe that a lot of people who thought that they were affected by Hummingbird were actually affected by Penguin which happened just a week after Google made their announcement about Hummingbird.

There are some excellent articles on Hummingbird here and here. Hummingbird was a complete overhaul of the entire Google algorithm. As Danny Sullivan put it, if you consider the Google algorithm as an engine, Panda and Penguin are algorithm changes that were like putting a new part in the engine such as a filter or a fuel pump. But, Hummingbird wasn’t just a new part; it was a completely new engine. That new engine still makes use of many of the old parts (such as Panda and Penguin) but a good amount of the engine is completely original.

The goal of the Hummingbird algorithm is for Google to better understand a user’s query. Bill Slawski who writes about Google patents has a great example of this in his post here. He explains that when someone searches for “What is the best place to find and eat Chicago deep dish style pizza?”, Hummingbird is able to discern that by “place” the user likely would be interested in results that show “restaurants”. There is speculation that these changes were necessary in order for Google’s voice search to be more effective. When we’re typing a search query, we might type, “best Seattle SEO company” but when we’re speaking a query (i.e. via Google Glass or via Google Now) we’re more likely to say something like, “Which firm in Seattle offers the best SEO services?” The point of Hummingbird is to better understand what users mean when they have queries like this.

So how do I recover or improve in the eyes of Hummingbird?

If you read the posts referenced above, the answer to this question is essentially to create content that answers users queries rather than just trying to rank for a particular keyword. But really, this is what you should already be doing!

It appears that Google’s goal with all of these algorithm changes (Panda, Penguin and Hummingbird) is to encourage webmasters to publish content that is the best of its kind. Google’s goal is to deliver answers to people who are searching. If you can produce content that answers people’s questions, then you’re on the right track.

I know that that is a really vague answer when it comes to “recovering” from Hummingbird. Hummingbird really is different than Panda and Penguin. When a site has been demoted by the Panda or Penguin algorithm, it’s because Google has lost some trust in the site’s quality, whether it is on-site quality or the legitimacy of its backlinks. If you fix those quality issues you can regain the algorithm’s trust and subsequently see improvements. But, if your site seems to be doing poorly since the launch of Hummingbird, then there really isn’t a way to recover those keyword rankings that you once held. You can, however, get new traffic by finding ways to be more thorough and complete in what your website offers.

Do you have more questions?

My goal in writing this article was to have a resource to point people to when they had basic questions about Panda, Penguin and Hummingbird. Recently, when I published my penalty newsletter, I had a small business owner comment that it was very interesting but that most of it went over their head. I realized that many people outside of the SEO world are greatly affected by these algorithm changes, but don’t have much information on why they have affected their website.

Do you have more questions about Panda, Penguin or Hummingbird? If so, I’d be happy to address them in the comments. I also would love for those of you who are experienced with dealing with websites affected by these issues to comment as well.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from feedproxy.google.com