Becoming Better SEO Scientists – Whiteboard Friday

Posted by MarkTraphagen

Editor’s note: Today we’re featuring back-to-back episodes of Whiteboard Friday from our friends at Stone Temple Consulting. Make sure to also check out the second episode, “UX, Content Quality, and SEO” from Eric Enge.

Like many other areas of marketing, SEO incorporates elements of science. It becomes problematic for everyone, though, when theories that haven’t been the subject of real scientific rigor are passed off as proven facts. In today’s Whiteboard Friday, Stone Temple Consulting’s Mark Traphagen is here to teach us a thing or two about the scientific method and how it can be applied to our day-to-day work.

For reference, here’s a still of this week’s whiteboard.
Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Mozzers. Mark Traphagen from Stone Temple Consulting here today to share with you how to become a better SEO scientist. We know that SEO is a science in a lot of ways, and everything I’m going to say today applies not only to SEO, but testing things like your AdWords, how does that work, quality scores. There’s a lot of different applications you can make in marketing, but we’ll focus on the SEO world because that’s where we do a lot of testing. What I want to talk to you about today is how that really is a science and how we need to bring better science in it to get better results.

The reason is in astrophysics, things like that we know there’s something that they’re talking about these days called dark matter, and dark matter is something that we know it’s there. It’s pretty much accepted that it’s there. We can’t see it. We can’t measure it directly. We don’t even know what it is. We can’t even imagine what it is yet, and yet we know it’s there because we see its effect on things like gravity and mass. Its effects are everywhere. And that’s a lot like search engines, isn’t it? It’s like Google or Bing. We see the effects, but we don’t see inside the machine. We don’t know exactly what’s happening in there.

An artist’s depiction of how search engines work.

So what do we do? We do experiments. We do tests to try to figure that out, to see the effects, and from the effects outside we can make better guesses about what’s going on inside and do a better job of giving those search engines what they need to connect us with our customers and prospects. That’s the goal in the end.

Now, the problem is there’s a lot of testing going on out there, a lot of experiments that maybe aren’t being run very well. They’re not being run according to scientific principles that have been proven over centuries to get the best possible results.

Basic data science in 10 steps

So today I want to give you just very quickly 10 basic things that a real scientist goes through on their way to trying to give you better data. Let’s see what we can do with those in our SEO testing in the future.

So let’s start with number one. You’ve got to start with a hypothesis. Your hypothesis is the question that you want to solve. You always start with that, a good question in mind, and it’s got to be relatively narrow. You’ve got to narrow it down to something very specific. Something like how does time on page effect rankings, that’s pretty narrow. That’s very specific. That’s a good question. Might be able to test that. But something like how do social signals effect rankings, that’s too broad. You’ve got to narrow it down. Get it down to one simple question.

Then you choose a variable that you’re going to test. Out of all the things that you could do, that you could play with or you could tweak, you should choose one thing or at least a very few things that you’re going to tweak and say, “When we tweak this, when we change this, when we do this one thing, what happens? Does it change anything out there in the world that we are looking at?” That’s the variable.

The next step is to set a sample group. Where are you going to gather the data from? Where is it going to come from? That’s the world that you’re working in here. Out of all the possible data that’s out there, where are you going to gather your data and how much? That’s the small circle within the big circle. Now even though it’s smaller, you’re probably not going to get all the data in the world. You’re not going to scrape every search ranking that’s possible or visit every URL.

You’ve got to ask yourself, “Is it large enough that we’re at least going to get some validity?” If I wanted to find out what is the typical person in Seattle and I might walk through just one part of the Moz offices here, I’d get some kind of view. But is that a typical, average person from Seattle? I’ve been around here at Moz. Probably not. But this was large enough.

Also, it should be randomized as much as possible. Again, going back to that example, if I just stayed here within the walls of Moz and do research about Mozzers, I’d learn a lot about what Mozzers do, what Mozzers think, how they behave. But that may or may not be applicable to the larger world outside, so you randomized.

We want to control. So we’ve got our sample group. If possible, it’s always good to have another sample group that you don’t do anything to. You do not manipulate the variable in that group. Now, why do you have that? You have that so that you can say, to some extent, if we saw a change when we manipulated our variable and we did not see it in the control group, the same thing didn’t happen, more likely it’s not just part of the natural things that happen in the world or in the search engine.

If possible, even better you want to make that what scientists call double blind, which means that even you the experimenter don’t know who that control group is out of all the SERPs that you’re looking at or whatever it is. As careful as you might be and honest as you might be, you can end up manipulating the results if you know who is who within the test group? It’s not going to apply to every test that we do in SEO, but a good thing to have in mind as you work on that.

Next, very quickly, duration. How long does it have to be? Is there sufficient time? If you’re just testing like if I share a URL to Google +, how quickly does it get indexed in the SERPs, you might only need a day on that because typically it takes less than a day in that case. But if you’re looking at seasonality effects, you might need to go over several years to get a good test on that.

Let’s move to the second group here. The sixth thing keep a clean lab. Now what that means is try as much as possible to keep anything that might be dirtying your results, any kind of variables creeping in that you didn’t want to have in the test. Hard to do, especially in what we’re testing, but do the best you can to keep out the dirt.

Manipulate only one variable. Out of all the things that you could tweak or change choose one thing or a very small set of things. That will give more accuracy to your test. The more variables that you change, the more other effects and inner effects that are going to happen that you may not be accounting for and are going to muddy your results.

Make sure you have statistical validity when you go to analyze those results. Now that’s beyond the scope of this little talk, but you can read up on that. Or even better, if you are able to, hire somebody or work with somebody who is a trained data scientist or has training in statistics so they can look at your evaluation and say the correlations or whatever you’re seeing, “Does it have a statistical significance?” Very important.

Transparency. As much as possible, share with the world your data set, your full results, your methodology. What did you do? How did you set up the study? That’s going to be important to our last step here, which is replication and falsification, one of the most important parts of any scientific process.

So what you want to invite is, hey we did this study. We did this test. Here’s what we found. Here’s how we did it. Here’s the data. If other people ask the same question again and run the same kind of test, do they get the same results? Somebody runs it again, do they get the same results? Even better, if you have some people out there who say, “I don’t think you’re right about that because I think you missed this, and I’m going to throw this in and see what happens,” aha they falsify. That might make you feel like you failed, but it’s success because in the end what are we after? We’re after the truth about what really works.

Think about your next test, your next experiment that you do. How can you apply these 10 principles to do better testing, get better results, and have better marketing? Thanks.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

UX, Content Quality, and SEO – Whiteboard Friday

Posted by EricEnge

Editor’s note: Today we’re featuring back-to-back episodes of Whiteboard Friday from our friends at Stone Temple Consulting. Make sure to also check out the first episode, “Becoming Better SEO Scientists” from Mark Traphagen.

User experience and the quality of your content have an incredibly broad impact on your SEO efforts. In this episode of Whiteboard Friday, Stone Temple’s Eric Enge shows you how paying attention to your users can benefit your position in the SERPs.

For reference, here’s a still of this week’s whiteboard.
Click on it to open a high resolution image in a new tab!

Video transcription

Hi, Mozzers. I’m Eric Enge, CEO of Stone Temple Consulting. Today I want to talk to you about one of the most underappreciated aspects of SEO, and that is the interaction between user experience, content quality, and your SEO rankings and traffic.

I’m going to take you through a little history first. You know, we all know about the Panda algorithm update that came out in February 23, 2011, and of course more recently we have the search quality update that came out in May 19, 2015. Our Panda friend had 27 different updates that we know of along the way. So a lot of stuff has gone on, but we need to realize that that is not where it all started.

The link algorithm from the very beginning was about search quality. Links allowed Google to have an algorithm that gave better results than the other search engines of their day, which were dependent on keywords. These things however, that I’ve just talked about, are still just the tip of the iceberg. Google goes a lot deeper than that, and I want to walk you through the different things that it does.

So consider for a moment, you have someone search on the phrase “men’s shoes” and they come to your website.

What is that they want when they come to your website? Do they want sneakers, sandals, dress shoes? Well, those are sort of the obvious things that they might want. But you need to think a little bit more about what the user really wants to be able to know before they buy from you.

First of all, there has to be a way to buy. By the way, affiliate sites don’t have ways to buy. So the line of thinking I’m talking about might not work out so well for affiliate sites and works better for people who can actually sell the product directly. But in addition to a way to buy, they might want a privacy policy. They might want to see an About Us page. They might want to be able to see your phone number. These are all different kinds of things that users look for when they arrive on the pages of your site.

So as we think about this, what is it that we can do to do a better job with our websites? Well, first of all, lose the focus on keywords. Don’t get me wrong, keywords haven’t gone entirely away. But the pages where we overemphasize one particular keyword over another or related phrases are long gone, and you need to have a broader focus on how you approach things.

User experience is now a big deal. You really need to think about how users are interacting with your page and how that shows your overall page quality. Think about the percent satisfaction. If I send a hundred users to your page from my search engine, how many of those users are going to be happy with the content or the products or everything that they see with your page? You need to think through the big picture. So at the end of the day, this impacts the content on your page to be sure, but a lot more than that it impacts the design, related items that you have on the page.

So let me just give you an example of that. I looked at one page recently that was for a flower site. It was a page about annuals on that site, and that page had no link to their perennials page. Well, okay, a fairly good percentage of people who arrive on a page about annuals are also going to want to have perennials as something they might consider buying. So that page was probably coming across as a poor user experience. So these related items concepts are incredibly important.

Then the links to your page is actually a way to get to some of those related items, and so those are really important as well. What are the related products that you link to?

Finally, really it impacts everything you do with your page design. You need to move past the old-fashioned way of thinking about SEO and into the era of: How am I doing with satisfying all the people who come to the pages of your site?

Thank you, Mozzers. Have a great day.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Eliminate Duplicate Content in Faceted Navigation with Ajax/JSON/JQuery

Posted by EricEnge

One of the classic problems in SEO is that while complex navigation schemes may be useful to users, they create problems for search engines. Many publishers rely on tags such as rel=canonical, or the parameters settings in Webmaster Tools to try and solve these types of issues. However, each of the potential solutions has limitations. In today’s post, I am going to outline how you can use JavaScript solutions to more completely eliminate the problem altogether.

Note that I am not going to provide code examples in this post, but I am going to outline how it works on a conceptual level. If you are interested in learning more about Ajax/JSON/jQuery here are some resources you can check out:

  1. Ajax Tutorial
  2. Learning Ajax/jQuery

Defining the problem with faceted navigation

Having a page of products and then allowing users to sort those products the way they want (sorted from highest to lowest price), or to use a filter to pick a subset of the products (only those over $60) makes good sense for users. We typically refer to these types of navigation options as “faceted navigation.”

However, faceted navigation can cause problems for search engines because they don’t want to crawl and index all of your different sort orders or all your different filtered versions of your pages. They would end up with many different variants of your pages that are not significantly different from a search engine user experience perspective.

Solutions such as rel=canonical tags and parameters settings in Webmaster Tools have some limitations. For example, rel=canonical tags are considered “hints” by the search engines, and they may not choose to accept them, and even if they are accepted, they do not necessarily keep the search engines from continuing to crawl those pages.

A better solution might be to use JSON and jQuery to implement your faceted navigation so that a new page is not created when a user picks a filter or a sort order. Let’s take a look at how it works.

Using JSON and jQuery to filter on the client side

The main benefit of the implementation discussed below is that a new URL is not created when a user is on a page of yours and applies a filter or sort order. When you use JSON and jQuery, the entire process happens on the client device without involving your web server at all.

When a user initially requests one of the product pages on your web site, the interaction looks like this:

using json on faceted navigation

This transfers the page to the browser the user used to request the page. Now when a user picks a sort order (or filter) on that page, here is what happens:

jquery and faceted navigation diagram

When the user picks one of those options, a jQuery request is made to the JSON data object. Translation: the entire interaction happens within the client’s browser and the sort or filter is applied there. Simply put, the smarts to handle that sort or filter resides entirely within the code on the client device that was transferred with the initial request for the page.

As a result, there is no new page created and no new URL for Google or Bing to crawl. Any concerns about crawl budget or inefficient use of PageRank are completely eliminated. This is great stuff! However, there remain limitations in this implementation.

Specifically, if your list of products spans multiple pages on your site, the sorting and filtering will only be applied to the data set already transferred to the user’s browser with the initial request. In short, you may only be sorting the first page of products, and not across the entire set of products. It’s possible to have the initial JSON data object contain the full set of pages, but this may not be a good idea if the page size ends up being large. In that event, we will need to do a bit more.

What Ajax does for you

Now we are going to dig in slightly deeper and outline how Ajax will allow us to handle sorting, filtering, AND pagination. Warning: There is some tech talk in this section, but I will try to follow each technical explanation with a layman’s explanation about what’s happening.

The conceptual Ajax implementation looks like this:

ajax and faceted navigation diagram

In this structure, we are using an Ajax layer to manage the communications with the web server. Imagine that we have a set of 10 pages, the user has gotten the first page of those 10 on their device and then requests a change to the sort order. The Ajax requests a fresh set of data from the web server for your site, similar to a normal HTML transaction, except that it runs asynchronously in a separate thread.

If you don’t know what that means, the benefit is that the rest of the page can load completely while the process to capture the data that the Ajax will display is running in parallel. This will be things like your main menu, your footer links to related products, and other page elements. This can improve the perceived performance of the page.

When a user selects a different sort order, the code registers an event handler for a given object (e.g. HTML Element or other DOM objects) and then executes an action. The browser will perform the action in a different thread to trigger the event in the main thread when appropriate. This happens without needing to execute a full page refresh, only the content controlled by the Ajax refreshes.

To translate this for the non-technical reader, it just means that we can update the sort order of the page, without needing to redraw the entire page, or change the URL, even in the case of a paginated sequence of pages. This is a benefit because it can be faster than reloading the entire page, and it should make it clear to search engines that you are not trying to get some new page into their index.

Effectively, it does this within the existing Document Object Model (DOM), which you can think of as the basic structure of the documents and a spec for the way the document is accessed and manipulated.

How will Google handle this type of implementation?

For those of you who read Adam Audette’s excellent recent post on the tests his team performed on how Google reads Javascript, you may be wondering if Google will still load all these page variants on the same URL anyway, and if they will not like it.

I had the same question, so I reached out to Google’s Gary Illyes to get an answer. Here is the dialog that transpired:

Eric Enge: I’d like to ask you about using JSON and jQuery to render different sort orders and filters within the same URL. I.e. the user selects a sort order or a filter, and the content is reordered and redrawn on the page on the client site. Hence no new URL would be created. It’s effectively a way of canonicalizing the content, since each variant is a strict subset.

Then there is a second level consideration with this approach, which involves doing the same thing with pagination. I.e. you have 10 pages of products, and users still have sorting and filtering options. In order to support sorting and filtering across the entire 10 page set, you use an Ajax solution, so all of that still renders on one URL.

So, if you are on page 1, and a user executes a sort, they get that all back in that one page. However, to do this right, going to page 2 would also render on the same URL. Effectively, you are taking the 10 page set and rendering it all within one URL. This allows sorting, filtering, and pagination without needing to use canonical, noindex, prev/next, or robots.txt.

If this was not problematic for Google, the only downside is that it makes the pagination not visible to Google. Does that make sense, or is it a bad idea?

Gary Illyes
: If you have one URL only, and people have to click on stuff to see different sort orders or filters for the exact same content under that URL, then typically we would only see the default content.

If you don’t have pagination information, that’s not a problem, except we might not see the content on the other pages that are not contained in the HTML within the initial page load. The meaning of rel-prev/next is to funnel the signals from child pages (page 2, 3, 4, etc.) to the group of pages as a collection, or to the view-all page if you have one. If you simply choose to render those paginated versions on a single URL, that will have the same impact from a signals point of view, meaning that all signals will go to a single entity, rather than distributed to several URLs.

Summary

Keep in mind, the reason why Google implemented tags like rel=canonical, NoIndex, rel=prev/next, and others is to reduce their crawling burden and overall page bloat and to help focus signals to incoming pages in the best way possible. The use of Ajax/JSON/jQuery as outlined above does this simply and elegantly.

On most e-commerce sites, there are many different “facets” of how a user might want to sort and filter a list of products. With the Ajax-style implementation, this can be done without creating new pages. The end users get the control they are looking for, the search engines don’t have to deal with excess pages they don’t want to see, and signals in to the site (such as links) are focused on the main pages where they should be.

The one downside is that Google may not see all the content when it is paginated. A site that has lots of very similar products in a paginated list does not have to worry too much about Google seeing all the additional content, so this isn’t much of a concern if your incremental pages contain more of what’s on the first page. Sites that have content that is materially different on the additional pages, however, might not want to use this approach.

These solutions do require Javascript coding expertise but are not really that complex. If you have the ability to consider a path like this, you can free yourself from trying to understand the various tags, their limitations, and whether or not they truly accomplish what you are looking for.

Credit: Thanks for Clark Lefavour for providing a review of the above for technical correctness.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Has Google Gone Too Far with the Bias Toward Its Own Content?

Posted by ajfried

Since the beginning of SEO time, practitioners have been trying to crack the Google algorithm. Every once in a while, the industry gets a glimpse into how the search giant works and we have opportunity to deconstruct it. We don’t get many of these opportunities, but when we do—assuming we spot them in time—we try to take advantage of them so we can “fix the Internet.”

On Feb. 16, 2015, news started to circulate that NBC would start removing images and references of Brian Williams from its website.

This was it!

A golden opportunity.

This was our chance to learn more about the Knowledge Graph.

Expectation vs. reality

Often it’s difficult to predict what Google is truly going to do. We expect something to happen, but in reality it’s nothing like we imagined.

Expectation

What we expected to see was that Google would change the source of the image. Typically, if you hover over the image in the Knowledge Graph, it reveals the location of the image.

Keanu-Reeves-Image-Location.gif

This would mean that if the image disappeared from its original source, then the image displayed in the Knowledge Graph would likely change or even disappear entirely.

Reality (February 2015)

The only problem was, there was no official source (this changed, as you will soon see) and identifying where the image was coming from proved extremely challenging. In fact, when you clicked on the image, it took you to an image search result that didn’t even include the image.

Could it be? Had Google started its own database of owned or licensed images and was giving it priority over any other sources?

In order to find the source, we tried taking the image from the Knowledge Graph and “search by image” in images.google.com to find others like it. For the NBC Nightly News image, Google failed to even locate a match to the image it was actually using anywhere on the Internet. For other television programs, it was successful. Here is an example of what happened for Morning Joe:

Morning_Joe_image_search.png

So we found the potential source. In fact, we found three potential sources. Seemed kind of strange, but this seemed to be the discovery we were looking for.

This looks like Google is using someone else’s content and not referencing it. These images have a source, but Google is choosing not to show it.

Then Google pulled the ol’ switcheroo.

New reality (March 2015)

Now things changed and Google decided to put a source to their images. Unfortunately, I mistakenly assumed that hovering over an image showed the same thing as the file path at the bottom, but I was wrong. The URL you see when you hover over an image in the Knowledge Graph is actually nothing more than the title. The source is different.

Morning_Joe_Source.png

Luckily, I still had two screenshots I took when I first saw this saved on my desktop. Success. One screen capture was from NBC Nightly News, and the other from the news show Morning Joe (see above) showing that the source was changed.

NBC-nightly-news-crop.png

(NBC Nightly News screenshot.)

The source is a Google-owned property: gstatic.com. You can clearly see the difference in the source change. What started as a hypothesis in now a fact. Google is certainly creating a database of images.

If this is the direction Google is moving, then it is creating all kinds of potential risks for brands and individuals. The implications are a loss of control for any brand that is looking to optimize its Knowledge Graph results. As well, it seems this poses a conflict of interest to Google, whose mission is to organize the world’s information, not license and prioritize it.

How do we think Google is supposed to work?

Google is an information-retrieval system tasked with sourcing information from across the web and supplying the most relevant results to users’ searches. In recent months, the search giant has taken a more direct approach by answering questions and assumed questions in the Answer Box, some of which come from un-credited sources. Google has clearly demonstrated that it is building a knowledge base of facts that it uses as the basis for its Answer Boxes. When it sources information from that knowledge base, it doesn’t necessarily reference or credit any source.

However, I would argue there is a difference between an un-credited Answer Box and an un-credited image. An un-credited Answer Box provides a fact that is indisputable, part of the public domain, unlikely to change (e.g., what year was Abraham Lincoln shot? How long is the George Washington Bridge?) Answer Boxes that offer more than just a basic fact (or an opinion, instructions, etc.) always credit their sources.

There are four possibilities when it comes to Google referencing content:

  • Option 1: It credits the content because someone else owns the rights to it
  • Option 2: It doesn’t credit the content because it’s part of the public domain, as seen in some Answer Box results
  • Option 3: It doesn’t reference it because it owns or has licensed the content. If you search for “Chicken Pox” or other diseases, Google appears to be using images from licensed medical illustrators. The same goes for song lyrics, which Eric Enge discusses here: Google providing credit for content. This adds to the speculation that Google is giving preference to its own content by displaying it over everything else.
  • Option 4: It doesn’t credit the content, but neither does it necessarily own the rights to the content. This is a very gray area, and is where Google seemed to be back in February. If this were the case, it would imply that Google is “stealing” content—which I find hard to believe, but felt was necessary to include in this post for the sake of completeness.

Is this an isolated incident?

At Five Blocks, whenever we see these anomalies in search results, we try to compare the term in question against others like it. This is a categorization concept we use to bucket individuals or companies into similar groups. When we do this, we uncover some incredible trends that help us determine what a search result “should” look like for a given group. For example, when looking at searches for a group of people or companies in an industry, this grouping gives us a sense of how much social media presence the group has on average or how much media coverage it typically gets.

Upon further investigation of terms similar to NBC Nightly News (other news shows), we noticed the un-credited image scenario appeared to be a trend in February, but now all of the images are being hosted on gstatic.com. When we broadened the categories further to TV shows and movies, the trend persisted. Rather than show an image in the Knowledge Graph and from the actual source, Google tends to show an image and reference the source from Google’s own database of stored images.

And just to ensure this wasn’t a case of tunnel vision, we researched other categories, including sports teams, actors and video games, in addition to spot-checking other genres.

Unlike terms for specific TV shows and movies, terms in each of these other groups all link to the actual source in the Knowledge Graph.

Immediate implications

It’s easy to ignore this and say “Well, it’s Google. They are always doing something.” However, there are some serious implications to these actions:

  1. The TV shows/movies aren’t receiving their due credit because, from within the Knowledge Graph, there is no actual reference to the show’s official site
  2. The more Google moves toward licensing and then retrieving their own information, the more biased they become, preferring their own content over the equivalent—or possibly even superior—content from another source
  3. If feels wrong and misleading to get a Google Image Search result rather than an actual site because:
    • The search doesn’t include the original image
    • Considering how poor Image Search results are normally, it feels like a poor experience
  4. If Google is moving toward licensing as much content as possible, then it could make the Knowledge Graph infinitely more complicated when there is a “mistake” or something unflattering. How could one go about changing what Google shows about them?

Google is objectively becoming subjective

It is clear that Google is attempting to create databases of information, including lyrics stored in Google Play, photos, and, previously, facts in Freebase (which is now Wikidata and not owned by Google).

I am not normally one to point my finger and accuse Google of wrongdoing. But this really strikes me as an odd move, one bordering on a clear bias to direct users to stay within the search engine. The fact is, we trust Google with a heck of a lot of information with our searches. In return, I believe we should expect Google to return an array of relevant information for searchers to decide what they like best. The example cited above seems harmless, but what about determining which is the right religion? Or even who the prettiest girl in the world is?

Religion-and-beauty-queries.png

Questions such as these, which Google is returning credited answers for, could return results that are perceived as facts.

Should we next expect Google to decide who is objectively the best service provider (e.g., pizza chain, painter, or accountant), then feature them in an un-credited answer box? The direction Google is moving right now, it feels like we should be calling into question their objectivity.

But that’s only my (subjective) opinion.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How We Fixed the Internet (Ok, an Answer Box)

Posted by Dr-Pete

Last year, Google expanded the Knowledge Graph to use data extracted (*cough* scraped) from the index to create answer boxes. Back in October, I wrote about a failed experiment. One of my posts, an odd dive
into Google’s revenue, was being answer-fied for the query “How much does Google make?”:

Objectively speaking, even I could concede that this wasn’t a very good answer in 2014. I posted it on Twitter, and
David Iwanow asked the inevitable question:

Enthusiasm may have gotten the best of us, a few more people got involved (like my former Moz colleague
Ruth Burr Reedy), and suddenly we were going to fix this once and for all:

There Was Just One Problem

I updated the post, carefully rewriting the first paragraph to reflect the new reality of Google’s revenue. I did my best to make the change user-friendly, adding valuable information but not disrupting the original post. I did, however, completely replace the old text that Google was scraping.

Within less than a day, Google had re-cached the content, and I just had to wait to see the new answer box. So, I waited, and waited… and waited. Two months later, still no change. Some days, the SERP showed no answer box at all (although I’ve since found these answer boxes are very dynamic), and I was starting to wonder if it was all a mistake.

Then, Something Happened

Last week, months after I had given up, I went to double-check this query for entirely different reasons, and I saw the following:

Google had finally updated the answer box with the new text, and they had even pulled an image from the post. It was a strange choice of images, but in fairness, it was a strange post.

Interestingly, Google also added the publication date of the post, perhaps recognizing that outdated answers aren’t always useful. Unfortunately, this doesn’t reflect the timing of the new content, but that’s understandable – Google doesn’t have easy access to that data.

It’s interesting to note that sometimes Google shows the image, and sometimes they don’t. This seems to be independent of whether the SERP is personalized or incognito. Here’s a capture of the image-free version, along with the #1 organic ranking:

You’ll notice that the #1 result is also my Moz post, and that result has an expanded meta description. So, the same URL is essentially double-dipping this SERP. This isn’t always the case – answers can be extracted from URLs that appear lower on page 1 (although almost always page 1, in my experience). Anecdotally, it’s also not always the case that these organic result ends up getting an expanded meta description.

However, it definitely seems that some of the quality signals driving organic ranking and expanded meta descriptions are also helping Google determine whether a query deserves a direct answer. Put simply, it’s not an accident that this post was chosen to answer this question.

What Does This Mean for You?

Let’s start with the obvious – Yes, the v2 answer boxes (driven by the index, not Freebase/WikiData)
can be updated. However, the update cycle is independent of the index’s refresh cycle. In other words, just because a post is re-cached, it doesn’t mean the answer box will update. Presumably, Google is creating a second Knowledge Graph, based on the index, and this data is only periodically updated.

It’s also entirely possible that updating could cause you to lose an answer box, if the new data weren’t a strong match to the question or the quality of the content came into question. Here’s an interesting question – on a query where a competitor has an answer box, could you change your own content enough to either replace them or knock out the answer box altogether? We are currently testing this question, but it may be a few more months before we have any answers.

Another question is what triggers this style of answer box in the first place? Eric Enge has an
in-depth look at 850,000 queries that’s well worth your time, and in many cases Google is still triggering on obvious questions (“how”, “what”, “where”, etc.). Nouns that could be interpreted as ambiguous also can trigger the new answer boxes. For example, a search for “ruby” is interpreted by Google as roughly meaning “What is Ruby?”:

This answer box also triggers “Related topics” that use content pulled from other sites but drive users to more Google searches. The small, gray links are the source sites. The much more visible, blue links are more Google searches.

Note that these also have to be questions (explicit or implied) that Google can’t answer with their curated Knowledge Graph (based on sources like Freebase and WikiData). So, for example, the question “When is Mother’s Day?” triggers an older-style answer:

Sites offering this data aren’t going to have a chance to get attribution, because Google essentially already owns the answer to this question as part of their core Knowledge Graph.

Do You Want to Be An Answer?

This is where things get tricky. At this point, we have no clear data on how these answer boxes impact CTR, and it’s likely that the impact depends a great deal on the context. I think we’re facing a certain degree of inevitability – if Google is going to list an answer, better it’s your answer then someone else’s, IMO. On the other hand, what if that answer is so complete that it renders your URL irrelevant? Consider, for example, the SERP for “how to make grilled cheese”:

Sorry, Food Network, but making a grilled cheese sandwich isn’t really that hard, and this answer box doesn’t leave much to the imagination. As these answers get more and more thorough, expect CTRs to fall.

For now, I’d argue that it’s better to have your link in the box than someone else’s, but that’s cold comfort in many cases. These new answer boxes represent what I feel is a dramatic shift in the relationship between Google and webmasters, and they may be tipping the balance. For now, we can’t do much but wait, see, and experiment.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

The Most Important Link Penalty Removal Tool: Your Mindset

Posted by Eric Enge

Let’s face it. Getting slapped by a manual link penalty, or by the Penguin algorithm, really stinks. Once this has happened to you, your business is in a world of hurt. Worse still is the fact that you can’t get clear information from Google on which of your links are the bad ones. In today’s post, I am going to focus on the number one reason why people fail to get out from under these types of problems, and how to improve your chances of success.

The mindset

Success begins, continues, and ends with the right mindset. A large percentage of people I see who go through a link cleanup process are not aggressive enough about cleaning up their links. They worry about preserving some of that hard-won link juice they obtained over the years.

You have to start by understanding what a link cleanup process looks like, and just how long it can take. Some of the people I have spoken with have gone through a process like this one:

link removal timeline

In this fictitious timeline example, we see someone who spends four months working on trying to recover, and at the end of it all, they have not been successful.
A lot of time and money have been spent, and they have nothing to show for it. Then, the people at Google get frustrated and send them a message that basically tells them they are not getting it. At this point, they have no idea when they will be able to recover. The result is that the complete process might end up taking six months or more.

In contrast, imagine someone who is far more aggressive in removing and disavowing links. They are so aggressive that 20 percent of the links they cut out are actually ones that Google has not currently judged as being bad. They also start on March 9, and by April 30, the penalty has been lifted on their site.

Now they can begin rebuilding their business, five or months sooner than the person who does not take as aggressive an approach. Yes, they cut out some links that Google was not currently penalizing, but this is a small price to pay for getting your penalty cleared five months sooner. In addition, using our mindset-based approach, the 20 percent of links we cut out were probably not links that were helping much anyway, and that Google might also take action on them in the future.

Now that you understand the approach, it’s time to make the commitment. You have to make the decision that you are going to do whatever it takes to get this done, and that getting it done means cutting hard and deep, because that’s what will get you through it the fastest. Once you’ve got your head on straight about what it will take and have summoned the courage to go through with it, then and only then, you’re ready to do the work. Now let’s look at what that work entails.

Obtaining link data

We use four sources of data for links:

  1. Google Webmaster Tools
  2. Open Site Explorer
  3. Majestic SEO
  4. ahrefs

You will want to pull in data from all four of these sources, get them into one list, and then dedupe them to create a master list. Focus only on followed links as well, as nofollowed links are not an issue. The overall process is shown here:

pulling a link set

One other simplification is also possible at this stage. Once you have obtained a list of the followed links, there is another thing you can do to dramatically simplify your life.
You don’t need to look at every single link.

You do need to look at a small sampling of links from every domain that links to you. Chances are that this is a significantly smaller quantity of links to look at than all links. If a domain has 12 links to you, and you look at three of them, and any of those are bad, you will need to disavow the entire domain anyway.

I take the time to emphasize this because I’ve seen people with more than 1 million inbound links from 10,000 linking domains. Evaluating 1 million individual links could take a lifetime. Looking at 10,000 domains is not small, but it’s 100 times smaller than 1 million. But here is where the mindset comes in.
Do examine every domain.

This may be a grinding and brutal process, but there is no shortcut available here. What you don’t look at will hurt you. The sooner you start on the entire list, the sooner you will get the job done.

How to evaluate links

Now that you have a list, you can get to work. This is a key part where having the right mindset is critical. The first part of the process is really quite simple. You need to eliminate each and every one of these types of links:

  1. Article directory links
  2. Links in forum comments, or their related profiles
  3. Links in blog comments, or their related profiles
  4. Links from countries where you don’t operate/sell your products
  5. Links from link sharing schemes such as Link Wheels
  6. Any links you know were paid for

Here is an example of a foreign language link that looks somewhat out of place:

foreign language link

For the most part, you should also remove any links you have from web directories. Sure, if you have a link from DMOZ, Business.com, or BestofTheWeb.com, and the most important one or two directories dedicated to your market space, you can probably keep those.

For a decade I have offered people a rule for these types of directories, which is “no more than seven links from directories.” Even the good ones carry little to no value, and the bad ones can definitely hurt you. So there is absolutely no win to be had running around getting links from a bunch of directories, and there is no win in trying to keep them during a link cleanup process.

Note that I am NOT talking about local business directories such as Yelp, CityPages, YellowPages, SuperPages, etc. Those are a different class of directory that you don’t need to worry about. But general purpose web directories are, generally speaking, a poison.

Rich anchor text

Rich anchor text has been the downfall of many a publisher. Here is one of my favorite examples ever of rich anchor text:

The author wanted the link to say “buy cars,” but was too lazy to fit the two words into the same sentence! Of course, you may have many guest posts that you have written that are not nearly as obvious as this one. One great way to deal with that is to take your list of links that you built and sort them by URL and look at the overall mix of anchor text. You know it’s a problem if it looks anything like this:

overly optimized anchor text

The problem with the distribution in the above image is that the percentage of links that are non “rich” in nature is way too small. In the real world, most people don’t conveniently link to you using one of your key money phrases. Some do, but it’s normally a small percentage.

Other types of bad links

There is no way for me to cover every type of bad link in this post, but here are other types of links, or link scenarios, to be concerned about:

  1. If a large percentage of your links are coming from over on the right rail of sites, or in the footers of sites
  2. If there are sites that give you a site-wide link, or a very large number of links from one domain
  3. Links that come from sites whose IP address is identical in the A block, B block, and C block (read more about what these are here)
  4. Links from crappy sites

The definition of a crappy site may seem subjective, but if a site has not been updated in a while, or its information is of poor quality, or it just seems to have no one who cares about it, you can probably consider it a crappy site. Remember our discussion on mindset. Your objective is to be harsh in cleaning up your links.

In fact, the most important principle in evaluating links is this:
If you can argue that it’s a good link, it’s NOT. You don’t have to argue for good quality links. To put it another way, if they are not obviously good, then out they go!

Quick case study anecdote: I know of someone who really took a major knife to their backlinks. They removed and/or disavowed every link they had that was below a Moz Domain Authority of 70. They did not even try to justify or keep any links with lower DA than that. It worked like a champ. The penalty was lifted. If you are willing to try a hyper-aggressive approach like this one, you can avoid all the work evaluating links I just outlined above. Just get the Domain Authority data for all the links pointing to your site and bring out the hatchet.

No doubt that they ended up cutting out a large number of links that were perfectly fine, but their approach was way faster than doing the complete domain by domain analysis.

Requesting link removals

Why is it that we request link removals? Can’t we just build a
disavow file and submit that to Google? In my experience, for manual link penalties, the answer to this question is no, you can’t. (Note: if you have been hit by Penguin, and not a manual link penalty, you may not need to request link removals.)

Yes, disavowing a link is supposed to tell Google that you don’t want to receive any PageRank, or benefit, from it. However, there is a human element at play here.
Google likes to see that you put some effort into cleaning up the bad links that you have gotten that led to your penalty. The more bad links you have, the more important this becomes.

This does make the process a lot more expensive to get through, but if you approach this with the “whatever it takes” mindset, you dive into the requesting link removal process and go ahead and get it done.

I usually have people go through three rounds of requests asking people to remove links. This can be a very annoying process for those receiving your request, so you need to be aware of that. Don’t start your email with a line like “Your site is causing mine to be penalized …”, as that’s just plain offensive.

I’d be honest, and tell them “Hey, we’ve been hit by a penalty, and as part of our effort to recover we are trying to get many of the links we have gotten to our site removed. We don’t know which sites are causing the problem, but we’d appreciate your help …”

Note that some people will come back to you and ask for money to remove the link. Just ignore them, and put their domains in your disavow file.

Once you are done with the overall removal requests, and had whatever success you have had, take the rest of the domains and disavow them. There is a complete guide to
creating a disavow file here. The one incremental tip I would add is that you should nearly always disavow entire domains, not just the individual links you see.

This is important because even with the four tools we used to get information on as many links as we could, we still only have a subset of the total links. For example, the tools may have only seen one link from a domain, but in fact you have five. If you disavow only the one link, you still have four problem links, and that will torpedo your reconsideration request.

Disavowing the domain is a better-safe-than-sorry step you should take almost every time. As I illustrated at the beginning of this post, adding extra cleanup/reconsideration request loops is very expensive for your business.

The overall process

When all is said and done, the process looks something like this:

link removal process

If you run this process efficiently, and you don’t try to cut corners, you might be able to get out from your penalty in a single pass through the process. If so, congratulations!

What about tools?

There are some fairly well-known tools that are designed to help you with the link cleanup process. These include
Link Detox and Remove’em. In addition, at STC we have developed our own internal tool that we use with our clients.

These tools can be useful in flagging some of your links, but they are not comprehensive—they will help identify some really obvious offenders, but the great majority of links you need to deal with and remove/disavow are not identified. Plan on investing substantial manual time and effort to do the heavy lifting of a comprehensive review of all your links. Remember the “mindset.”

Summary

As I write this post, I have this sense of being heartless because I outline an approach that is often grueling to execute. But consider it tough love. Recovering from link penalties is indeed brutal.
In my experience, the winners are the ones who come with meat cleaver in hand, don’t try to cut corners, and take on the full task from the very start, no matter how extensive an effort it may be.

Does this type of process succeed? You bet. Here is an example of a traffic chart from a successful recovery:

manual penalty recovery graph

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it