How Much Has Link Building Changed in Recent Years?

Posted by Paddy_Moogan

I get asked this question a lot. It’s mainly asked by people who are considering buying my link building book and want to know whether it’s still up to date. This is understandable given that the first edition was published in February 2013 and our industry has a deserved reputation for always changing.

I find myself giving the same answer, even though I’ve been asked it probably dozens of times in the last two years—”not that much”. I don’t think this is solely due to the book itself standing the test of time, although I’ll happily take a bit of credit for that 🙂 I think it’s more a sign of our industry as a whole not changing as much as we’d like to think.

I started to question myself and if I was right and honestly, it’s one of the reasons it has taken me over two years to release the second edition of the book.

So I posed this question to a group of friends not so long ago, some via email and some via a Facebook group. I was expecting to be called out by many of them because my position was that in reality, it hasn’t actually changed that much. The thing is, many of them agreed and the conversations ended with a pretty long thread with lots of insights. In this post, I’d like to share some of them, share what my position is and talk about what actually has changed.

My personal view

Link building hasn’t changed as much we think it has.

The core principles of link building haven’t changed. The signals around link building have changed, but mainly around new machine learning developments that have indirectly affected what we do. One thing that has definitely changed is the mindset of SEOs (and now clients) towards link building.

I think the last big change to link building came in April 2012 when Penguin rolled out. This genuinely did change our industry and put to bed a few techniques that should never have worked so well in the first place.

Since then, we’ve seen some things change, but the core principles haven’t changed if you want to build a business that will be around for years to come and not run the risk of being hit by a link related Google update. For me, these principles are quite simple:

  • You need to deserve links – either an asset you create or your product
  • You need to put this asset in front of a relevant audience who have the ability to share it
  • You need consistency – one new asset every year is unlikely to cut it
  • Anything that scales is at risk

For me, the move towards user data driving search results + machine learning has been the biggest change we’ve seen in recent years and it’s still going.

Let’s dive a bit deeper into all of this and I’ll talk about how this relates to link building.

The typical mindset for building links has changed

I think that most SEOs are coming round to the idea that you can’t get away with building low quality links any more, not if you want to build a sustainable, long-term business. Spammy link building still works in the short-term and I think it always will, but it’s much harder than it used to be to sustain websites that are built on spam. The approach is more “churn and burn” and spammers are happy to churn through lots of domains and just make a small profit on each one before moving onto another.

For everyone else, it’s all about the long-term and not putting client websites at risk.

This has led to many SEOs embracing different forms of link building and generally starting to use content as an asset when it comes to attracting links. A big part of me feels that it was actually Penguin in 2012 that drove the rise of content marketing amongst SEOs, but that’s a post for another day…! For today though, this goes some way towards explain the trend we see below.

Slowly but surely, I’m seeing clients come to my company already knowing that low quality link building isn’t what they want. It’s taken a few years after Penguin for it to filter down to client / business owner level, but it’s definitely happening. This is a good thing but unfortunately, the main reason for this is that most of them have been burnt in the past by SEO companies who have built low quality links without giving thought to building good quality ones too.

I have no doubt that it’s this change in mindset which has led to trends like this:

The thing is, I don’t think this was by choice.

Let’s be honest. A lot of us used the kind of link building tactics that Google no longer like because they worked. I don’t think many SEOs were under the illusion that it was genuinely high quality stuff, but it worked and it was far less risky to do than it is today. Unless you were super-spammy, the low-quality links just worked.

Fast forward to a post-Penguin world, things are far more risky. For me, it’s because of this that we see the trends like the above. As an industry, we had the easiest link building methods taken away from us and we’re left with fewer options. One of the main options is content marketing which, if you do it right, can lead to good quality links and importantly, the types of links you won’t be removing in the future. Get it wrong and you’ll lose budget and lose the trust if your boss or client in the power of content when it comes to link building.

There are still plenty of other methods to build links and sometimes we can forget this. Just look at this epic list from Jon Cooper. Even with this many tactics still available to us, it’s hard work. Way harder than it used to be.

My summary here is that as an industry, our mindset has shifted but it certainly wasn’t a voluntary shift. If the tactics that Penguin targeted still worked today, we’d still be using them.

A few other opinions…

I definitely think too many people want the next easy win. As someone surfing the edge of what Google is bringing our way, here’s my general take—SEO, in broad strokes, is changing a lot, *but* any given change is more and more niche and impacts fewer people. What we’re seeing isn’t radical, sweeping changes that impact everyone, but a sort of modularization of SEO, where we each have to be aware of what impacts our given industries, verticals, etc.”

Dr. Pete

 

I don’t feel that techniques for acquiring links have changed that much. You can either earn them through content and outreach or you can just buy them. What has changed is the awareness of “link building” outside of the SEO community. This makes link building / content marketing much harder when pitching to journalists and even more difficult when pitching to bloggers.

“Link building has to be more integrated with other channels and struggles to work in its own environment unless supported by brand, PR and social. Having other channels supporting your link development efforts also creates greater search signals and more opportunity to reach a bigger audience which will drive a greater ROI.

Carl Hendy

 

SEO has grown up in terms of more mature staff and SEOs becoming more ingrained into businesses so there is a smarter (less pressure) approach. At the same time, SEO has become more integrated into marketing and has made marketing teams and decision makers more intelligent in strategies and not pushing for the quick win. I’m also seeing that companies who used to rely on SEO and building links have gone through IPOs and the need to build 1000s of links per quarter has rightly reduced.

Danny Denhard

Signals that surround link building have changed

There is no question about this one in my mind. I actually wrote about this last year in my previous blog post where I talked about signals such as anchor text and deep links changing over time.

Many of the people I asked felt the same, here are some quotes from them, split out by the types of signal.

Domain level link metrics

I think domain level links have become increasingly important compared with page level factors, i.e. you can get a whole site ranking well off the back of one insanely strong page, even with sub-optimal PageRank flow from that page to the rest of the site.

Phil Nottingham

I’d agree with Phil here and this is what I was getting at in my previous post on how I feel “deep links” will matter less over time. It’s not just about domain level links here, it’s just as much about the additional signals available for Google to use (more on that later).

Anchor text

I’ve never liked anchor text as a link signal. I mean, who actually uses exact match commercial keywords as anchor text on the web?

SEOs. 🙂

Sure there will be natural links like this, but honestly, I struggle with the idea that it took Google so long to start turning down the dial on commercial anchor text as a ranking signal. They are starting to turn it down though, slowly but surely. Don’t get me wrong, it still matters and it still works. But like pure link spam, the barrier is a lot more lower now in terms what of constitutes too much.

Rand feels that they matter more than we’d expect and I’d mostly agree with this statement:

Exact match anchor text links still have more power than you’d expect—I think Google still hasn’t perfectly sorted what is “brand” or “branded query” from generics (i.e. they want to start ranking a new startup like meldhome.com for “Meld” if the site/brand gets popular, but they can’t quite tell the difference between that and https://moz.com/learn/seo/redirection getting a few manipulative links that say “redirect”)

Rand Fishkin

What I do struggle with though, is that Google still haven’t figured this out and that short-term, commercial anchor text spam is still so effective. Even for a short burst of time.

I don’t think link building as a concept has changed loads—but I think links as a signal have, mainly because of filters and penalties but I don’t see anywhere near the same level of impact from coverage anymore, even against 18 months ago.

Paul Rogers

New signals have been introduced

It isn’t just about established signals changing though, there are new signals too and I personally feel that this is where we’ve seen the most change in Google algorithms in recent years—going all the way back to Panda in 2011.

With Panda, we saw a new level of machine learning where it almost felt like Google had found a way of incorporating human reaction / feelings into their algorithms. They could then run this against a website and answer questions like the ones included in this post. Things such as:

  • “Would you be comfortable giving your credit card information to this site?”
  • “Does this article contain insightful analysis or interesting information that is beyond obvious?”
  • “Are the pages produced with great care and attention to detail vs. less attention to detail?”

It is a touch scary that Google was able to run machine learning against answers to questions like this and write an algorithm to predict the answers for any given page on the web. They have though and this was four years ago now.

Since then, they’ve made various moves to utilize machine learning and AI to build out new products and improve their search results. For me, this was one of the biggest and went pretty unnoticed by our industry. Well, until Hummingbird came along I feel pretty sure that we have Ray Kurzweil to thank for at least some of that.

There seems to be more weight on theme/topic related to sites, though it’s hard to tell if this is mostly link based or more user/usage data based. Google is doing a good job of ranking sites and pages that don’t earn the most links but do provide the most relevant/best answer. I have a feeling they use some combination of signals to say “people who perform searches like this seem to eventually wind up on this website—let’s rank it.” One of my favorite examples is the Audubon Society ranking for all sorts of birding-related searches with very poor keyword targeting, not great links, etc. I think user behavior patterns are stronger in the algo than they’ve ever been.

– Rand Fishkin

Leading on from what Rand has said, it’s becoming more and more common to see search results that just don’t make sense if you look at the link metrics—but are a good result.

For me, the move towards user data driving search results + machine learning advanced has been the biggest change we’ve seen in recent years and it’s still going.

Edit: since drafting this post, Tom Anthony released this excellent blog post on his views on the future of search and the shift to data-driven results. I’d recommend reading that as it approaches this whole area from a different perspective and I feel that an off-shoot of what Tom is talking about is the impact on link building.

You may be asking at this point, what does machine learning have to do with link building?

Everything. Because as strong as links are as a ranking signal, Google want more signals and user signals are far, far harder to manipulate than established link signals. Yes it can be done—I’ve seen it happen. There have even been a few public tests done. But it’s very hard to scale and I’d venture a guess that only the top 1% of spammers are capable of doing it, let alone maintaining it for a long period of time. When I think about the process for manipulation here, I actually think we go a step beyond spammers towards hackers and more cut and dry illegal activity.

For link building, this means that traditional methods of manipulating signals are going to become less and less effective as these user signals become stronger. For us as link builders, it means we can’t keep searching for that silver bullet or the next method of scaling link building just for an easy win. The fact is that scalable link building is always going to be at risk from penalization from Google—I don’t really want to live a life where I’m always worried about my clients being hit by the next update. Even if Google doesn’t catch up with a certain method, machine learning and user data mean that these methods may naturally become less effective and cost efficient over time.

There are of course other things such as social signals that have come into play. I certainly don’t feel like these are a strong ranking factor yet, but with deals like this one between Google and Twitter being signed, I wouldn’t be surprised if that ever-growing dataset is used at some point in organic results. The one advantage that Twitter has over Google is it’s breaking news freshness. Twitter is still way quicker at breaking news than Google is—140 characters in a tweet is far quicker than Google News! Google know this which is why I feel they’ve pulled this partnership back into existence after a couple of years apart.

There is another important point to remember here and it’s nicely summarised by Dr. Pete:

At the same time, as new signals are introduced, these are layers not replacements. People hear social signals or user signals or authorship and want it to be the link-killer, because they already fucked up link-building, but these are just layers on top of on-page and links and all of the other layers. As each layer is added, it can verify the layers that came before it and what you need isn’t the magic signal but a combination of signals that generally matches what Google expects to see from real, strong entities. So, links still matter, but they matter in concert with other things, which basically means it’s getting more complicated and, frankly, a bit harder. Of course, on one wants to hear that.”

– Dr. Pete

The core principles have not changed

This is the crux of everything for me. With all the changes listed above, the key is that the core principles around link building haven’t changed. I could even argue that Penguin didn’t change the core principles because the techniques that Penguin targeted should never have worked in the first place. I won’t argue this too much though because even Google advised website owners to build directory links at one time.

You need an asset

You need to give someone a reason to link to you. Many won’t do it out of the goodness of their heart! One of the most effective ways to do this is to develop a content asset and use this as your reason to make people care. Once you’ve made someone care, they’re more likely to share the content or link to it from somewhere.

You need to promote that asset to the right audience

I really dislike the stance that some marketers take when it comes to content promotion—build great content and links will come.

No. Sorry but for the vast majority of us, that’s simply not true. The exceptions are people that sky dive from space or have huge existing audiences to leverage.

You simply have to spend time promoting your content or your asset for it to get shares and links. It is hard work and sometimes you can spend a long time on it and get little return, but it’s important to keep working at until you’re at a point where you have two things:

  • A big enough audience where you can almost guarantee at least some traffic to your new content along with some shares
  • Enough strong relationships with relevant websites who you can speak to when new content is published and stand a good chance of them linking to it

Getting to this point is hard—but that’s kind of the point. There are various hacks you can use along the way but it will take time to get right.

You need consistency

Leading on from the previous point. It takes time and hard work to get links to your content—the types of links that stand the test of time and you’re not going to be removing in 12 months time anyway! This means that you need to keep pushing content out and getting better each and every time. This isn’t to say you should just churn content out for the sake of it, far from it. I am saying that with each piece of content you create, you will learn to do at least one thing better the next time. Try to give yourself the leverage to do this.

Anything scalable is at risk

Scalable link building is exactly what Google has been trying to crack down on for the last few years. Penguin was the biggest move and hit some of the most scalable tactics we had at our disposal. When you scale something, you often lose some level of quality, which is exactly what Google doesn’t want when it comes to links. If you’re still relying on tactics that could fall into the scalable category, I think you need to be very careful and just look at the trend in the types of links Google has been penalizing to understand why.

The part Google plays in this

To finish up, I want to briefly talk about the part that Google plays in all of this and shaping the future they want for the web.

I’ve always tried to steer clear of arguments involving the idea that Google is actively pushing FUD into the community. I’ve preferred to concentrate more on things I can actually influence and change with my clients rather than what Google is telling us all to do.

However, for the purposes of this post, I want to talk about it.

General paranoia has increased. My bet is there are some companies out there carrying out zero specific linkbuilding activity through worry.

Dan Barker

Dan’s point is a very fair one and just a day or two after reading this in an email, I came across a page related to a client’s target audience that said:

“We are not publishing guest posts on SITE NAME any more. All previous guest posts are now deleted. For more information, see www.mattcutts.com/blog/guest-blogging/“.

I’ve reworded this as to not reveal the name of the site, but you get the point.

This is silly. Honestly, so silly. They are a good site, publish good content, and had good editorial standards. Yet they have ignored all of their own policies, hard work, and objectives to follow a blog post from Matt. I’m 100% confident that it wasn’t sites like this one that Matt was talking about in this blog post.

This is, of course, from the publishers’ angle rather than the link builders’ angle, but it does go to show the effect that statements from Google can have. Google know this so it does make sense for them to push out messages that make their jobs easier and suit their own objectives—why wouldn’t they? In a similar way, what did they do when they were struggling to classify at scale which links are bad vs. good and they didn’t have a big enough web spam team? They got us to do it for them 🙂

I’m mostly joking here, but you see the point.

The most recent infamous mobilegeddon update, discussed here by Dr. Pete is another example of Google pushing out messages that ultimately scared a lot of people into action. Although to be fair, I think that despite the apparent small impact so far, the broad message from Google is a very serious one.

Because of this, I think we need to remember that Google does have their own agenda and many shareholders to keep happy. I’m not in the camp of believing everything that Google puts out is FUD, but I’m much more sensitive and questioning of the messages now than I’ve ever been.

What do you think? I’d love to hear your feedback and thoughts in the comments.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

What Deep Learning and Machine Learning Mean For the Future of SEO – Whiteboard Friday

Posted by randfish

Imagine a world where even the high-up Google engineers don’t know what’s in the ranking algorithm. We may be moving in that direction. In today’s Whiteboard Friday, Rand explores and explains the concepts of deep learning and machine learning, drawing us a picture of how they could impact our work as SEOs.

For reference, here’s a still of this week’s whiteboard!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are going to take a peek into Google’s future and look at what it could mean as Google advances their machine learning and deep learning capabilities. I know these sound like big, fancy, important words. They’re not actually that tough of topics to understand. In fact, they’re simplistic enough that even a lot of technology firms like Moz do some level of machine learning. We don’t do anything with deep learning and a lot of neural networks. We might be going that direction.

But I found an article that was published in January, absolutely fascinating and I think really worth reading, and I wanted to extract some of the contents here for Whiteboard Friday because I do think this is tactically and strategically important to understand for SEOs and really important for us to understand so that we can explain to our bosses, our teams, our clients how SEO works and will work in the future.

The article is called “Google Search Will Be Your Next Brain.” It’s by Steve Levy. It’s over on Medium. I do encourage you to read it. It’s a relatively lengthy read, but just a fascinating one if you’re interested in search. It starts with a profile of Geoff Hinton, who was a professor in Canada and worked on neural networks for a long time and then came over to Google and is now a distinguished engineer there. As the article says, a quote from the article: “He is versed in the black art of organizing several layers of artificial neurons so that the entire system, the system of neurons, could be trained or even train itself to divine coherence from random inputs.”

This sounds complex, but basically what we’re saying is we’re trying to get machines to come up with outcomes on their own rather than us having to tell them all the inputs to consider and how to process those incomes and the outcome to spit out. So this is essentially machine learning. Google has used this, for example, to figure out when you give it a bunch of photos and it can say, “Oh, this is a landscape photo. Oh, this is an outdoor photo. Oh, this is a photo of a person.” Have you ever had that creepy experience where you upload a photo to Facebook or to Google+ and they say, “Is this your friend so and so?” And you’re like, “God, that’s a terrible shot of my friend. You can barely see most of his face, and he’s wearing glasses which he usually never wears. How in the world could Google+ or Facebook figure out that this is this person?”

That’s what they use, these neural networks, these deep machine learning processes for. So I’ll give you a simple example. Here at MOZ, we do machine learning very simplistically for page authority and domain authority. We take all the inputs — numbers of links, number of linking root domains, every single metric that you could get from MOZ on the page level, on the sub-domain level, on the root-domain level, all these metrics — and then we combine them together and we say, “Hey machine, we want you to build us the algorithm that best correlates with how Google ranks pages, and here’s a bunch of pages that Google has ranked.” I think we use a base set of 10,000, and we do it about quarterly or every 6 months, feed that back into the system and the system pumps out the little algorithm that says, “Here you go. This will give you the best correlating metric with how Google ranks pages.” That’s how you get page authority domain authority.

Cool, really useful, helpful for us to say like, “Okay, this page is probably considered a little more important than this page by Google, and this one a lot more important.” Very cool. But it’s not a particularly advanced system. The more advanced system is to have these kinds of neural nets in layers. So you have a set of networks, and these neural networks, by the way, they’re designed to replicate nodes in the human brain, which is in my opinion a little creepy, but don’t worry. The article does talk about how there’s a board of scientists who make sure Terminator 2 doesn’t happen, or Terminator 1 for that matter. Apparently, no one’s stopping Terminator 4 from happening? That’s the new one that’s coming out.

So one layer of the neural net will identify features. Another layer of the neural net might classify the types of features that are coming in. Imagine this for search results. Search results are coming in, and Google’s looking at the features of all the websites and web pages, your websites and pages, to try and consider like, “What are the elements I could pull out from there?”

Well, there’s the link data about it, and there are things that happen on the page. There are user interactions and all sorts of stuff. Then we’re going to classify types of pages, types of searches, and then we’re going to extract the features or metrics that predict the desired result, that a user gets a search result they really like. We have an algorithm that can consistently produce those, and then neural networks are hopefully designed — that’s what Geoff Hinton has been working on — to train themselves to get better. So it’s not like with PA and DA, our data scientist Matt Peters and his team looking at it and going, “I bet we could make this better by doing this.”

This is standing back and the guys at Google just going, “All right machine, you learn.” They figure it out. It’s kind of creepy, right?

In the original system, you needed those people, these individuals here to feed the inputs, to say like, “This is what you can consider, system, and the features that we want you to extract from it.”

Then unsupervised learning, which is kind of this next step, the system figures it out. So this takes us to some interesting places. Imagine the Google algorithm, circa 2005. You had basically a bunch of things in here. Maybe you’d have anchor text, PageRank and you’d have some measure of authority on a domain level. Maybe there are people who are tossing new stuff in there like, “Hey algorithm, let’s consider the location of the searcher. Hey algorithm, let’s consider some user and usage data.” They’re tossing new things into the bucket that the algorithm might consider, and then they’re measuring it, seeing if it improves.

But you get to the algorithm today, and gosh there are going to be a lot of things in there that are driven by machine learning, if not deep learning yet. So there are derivatives of all of these metrics. There are conglomerations of them. There are extracted pieces like, “Hey, we only ant to look and measure anchor text on these types of results when we also see that the anchor text matches up to the search queries that have previously been performed by people who also search for this.” What does that even mean? But that’s what the algorithm is designed to do. The machine learning system figures out things that humans would never extract, metrics that we would never even create from the inputs that they can see.

Then, over time, the idea is that in the future even the inputs aren’t given by human beings. The machine is getting to figure this stuff out itself. That’s weird. That means that if you were to ask a Google engineer in a world where deep learning controls the ranking algorithm, if you were to ask the people who designed the ranking system, “Hey, does it matter if I get more links,” they might be like, “Well, maybe.” But they don’t know, because they don’t know what’s in this algorithm. Only the machine knows, and the machine can’t even really explain it. You could go take a snapshot and look at it, but (a) it’s constantly evolving, and (b) a lot of these metrics are going to be weird conglomerations and derivatives of a bunch of metrics mashed together and torn apart and considered only when certain criteria are fulfilled. Yikes.

So what does that mean for SEOs. Like what do we have to care about from all of these systems and this evolution and this move towards deep learning, which by the way that’s what Jeff Dean, who is, I think, a senior fellow over at Google, he’s the dude that everyone mocks for being the world’s smartest computer scientist over there, and Jeff Dean has basically said, “Hey, we want to put this into search. It’s not there yet, but we want to take these models, these things that Hinton has built, and we want to put them into search.” That for SEOs in the future is going to mean much less distinct universal ranking inputs, ranking factors. We won’t really have ranking factors in the way that we know them today. It won’t be like, “Well, they have more anchor text and so they rank higher.” That might be something we’d still look at and we’d say, “Hey, they have this anchor text. Maybe that’s correlated with what the machine is finding, the system is finding to be useful, and that’s still something I want to care about to a certain extent.”

But we’re going to have to consider those things a lot more seriously. We’re going to have to take another look at them and decide and determine whether the things that we thought were ranking factors still are when the neural network system takes over. It also is going to mean something that I think many, many SEOs have been predicting for a long time and have been working towards, which is more success for websites that satisfy searchers. If the output is successful searches, and that’ s what the system is looking for, and that’s what it’s trying to correlate all its metrics to, if you produce something that means more successful searches for Google searchers when they get to your site, and you ranking in the top means Google searchers are happier, well you know what? The algorithm will catch up to you. That’s kind of a nice thing. It does mean a lot less info from Google about how they rank results.

So today you might hear from someone at Google, “Well, page speed is a very small ranking factor.” In the future they might be, “Well, page speed is like all ranking factors, totally unknown to us.” Because the machine might say, “Well yeah, page speed as a distinct metric, one that a Google engineer could actually look at, looks very small.” But derivatives of things that are connected to page speed may be huge inputs. Maybe page speed is something, that across all of these, is very well connected with happier searchers and successful search results. Weird things that we never thought of before might be connected with them as the machine learning system tries to build all those correlations, and that means potentially many more inputs into the ranking algorithm, things that we would never consider today, things we might consider wholly illogical, like, “What servers do you run on?” Well, that seems ridiculous. Why would Google ever grade you on that?

If human beings are putting factors into the algorithm, they never would. But the neural network doesn’t care. It doesn’t care. It’s a honey badger. It doesn’t care what inputs it collects. It only cares about successful searches, and so if it turns out that Ubuntu is poorly correlated with successful search results, too bad.

This world is not here yet today, but certainly there are elements of it. Google has talked about how Panda and Penguin are based off of machine learning systems like this. I think, given what Geoff Hinton and Jeff Dean are working on at Google, it sounds like this will be making its way more seriously into search and therefore it’s something that we’re really going to have to consider as search marketers.

All right everyone, I hope you’ll join me again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Cortex SEO Toolkit Demo

Using big data and machine learning to help your SEO within the Cortex ecommerce platform.

Reblogged 4 years ago from www.youtube.com

Information Architecture for SEO – Whiteboard Friday

Posted by randfish

It wasn’t too long ago that there was significant tension between information architects and SEOs; one group wanted to make things easier for humans, the other for search engines. That line is largely disappearing, and there are several best practices in IA that can lead to great benefits in search. In today’s Whiteboard Friday, Rand explains what they are and how we can benefit from them.

For reference, here’s a still of this week’s whiteboard!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat a little bit about information architecture, and specifically how you can organize the content of your website in such a fashion to make information architecture help your SEO and your rankings and how search engines interpret your pages and the links between those.

I want to start by talking broadly about IA and the interaction with SEO. IA is designed to say, “Hey, we want to help web users accomplish their goals on the website quickly and easily.” There are many more broad things around that, but basically that’s the concept.

This actually is not in conflict at all, should almost never be in conflict, even a little bit, with the goals that we have around SEO. In the past, this was not always true, and unfortunately in the past some mythology got created around the things that we have to worry about that could conflict between SEO and information architecture.

Here we’ve got a page that’s optimal for IA, and it’s got this top navigation and left side navigation, some footers, maybe a big image at the front and some text. Great, fine. Then, we have this other version that I’m not going to call it optimal for SEO, because it’s actually not optimal for SEO. It is instead SEO to the max! “At the Tacoma Dome this Sunday, Sunday, Sunday!”

The problem is this is kind of taking SEO much too far. It’s no longer SEO, it’s SE . . . I don’t know, ridiculousness.

The idea would be things like we know that keyword rich anchors are important, and linking internally we want to be descriptive. We know that as people use those terms and links other places on the web, that might help our rankings. So instead of making the navigation obvious for users, we’re going to make it keyword stuffed for SEO. This makes no sense anymore, as I’m sure, hopefully, all of you know.

Text high up on the page, this actually does mean something. It used to mean a little more than it does. So maybe we’re going to take oh, yeah, we want to have that leader image right up at the top because that grabs people’s attention, and the headline flows nicely into that image. But for SEO purposes, we want the text to be even higher. That doesn’t make any sense either.

Even if there is some part of Google’s algorithm, Bing’s algorithm, or Baidu’s algorithm, that says, “Oh, text higher up on the page is a teensy little spattering more meaningful,” this is totally overwhelmed and dwarfed by the fact that SEO today cares a ton about engagement. If people come to this page and are less engaged, are more likely to click the Back button, are less likely to stay here and consume the content and link to it and share it and all these kinds of things, it’s going to lose out even to the slightly less optimized version of the page over here, which really does grab people’s attention.

If your IA folks and your usability folks and your testing is showing you that that leader image up top there is grabbing people’s attention and is working, don’t break it by saying, “Oh, but SEO demands content higher on the page.”

Likewise, if you have something where you say, “Hey, in order to flow or sculpt the link equity around these things, we don’t want to link to this page and this page. We do want to link to these things. We want make sure that we’ve got a very keyword heavy and link heavy footer so that we can point to all the places we need to point to, even though they’re not really for users. It’s mostly for engines. Also, BS. One of the things that modern engines are doing is they’re kind of looking and saying, “Hey, if no one uses these links to navigate internally on a site, we’re not going to take them into consideration from a ranking perspective either.”

They have lots of modeling and machine learning and algorithmic ways to do that, but basic story is make links for users that search engines will also care about, because that’s the only thing that search engines really do want to care about. So IA and SEO, shouldn’t be in conflict.

Important information architecture best practices

Now that we know this, we can move on to some important IA best practices, generally speaking IA best practices that are also SEO best practices and that most of the time, 99.99% of the time work really well together.

1. Broad-to-narrow organization

The first one, in general, it’s the case that you want to do broad to narrow organization of your content. I’ll show you what I mean.

Let’s say that I’ve got a website about adorable animals, a particularly fun one this week, and on my adorable animals page I’ve got some subsections, sub-pages, one on the slow loris, which of course is super adorable, and hedgehogs, also super adorable. Then getting even more detailed from there, I have particular pages on hedgehogs in military uniforms — that page is probably going to bring down the Internet because it will be so popular — and hedgehogs wearing ridiculous hats. These are two sub-pages of my hedgehog page. My hedgehog page, subset of my adorable animals page.

This is generally speaking how I want to do things. I probably would not want to organize, at least from the top level down in my actual architecture for my site, I probably wouldn’t want to say adorable animals and here’s a list of hedgehogs in military uniforms, a list of hedgehogs wearing ridiculous hats, a list of slow loris licking itself. No. I want to have that organization of broad to more narrow to more narrow.

This makes general sense. By the way, for SEO purposes it does help if I link back and forth one level in each case. So for my hedgehog page, I do want to link down to my hedgehogs in military uniforms page, and I also want to link up to my adorable animals page.

You don’t have to do it with exactly these keyword anchor text phrases, that kind of stuff. Just make sure that you are linking. If you want, you can use breadcrumbs. Breadcrumbs are very kind of old-fashioned, been around since the late ’90s, sort of style system for showing off links, and that can work really well for some websites. It doesn’t have to be the only way things can work though.

2. Link to evergreen pages from fresh content

When you’re publishing fresh content is when I think many SEOs get into a lot of trouble. They’re like, “Well, I have a blog that does all this, but then I have the regular parts of my site that have all of my content or my product pages or my detailed descriptions. How do I make these two things work together?”

This has actually become much easier but different in the last five or six years. It used to be the case that we would talk, in the SEO world, about not having keyword cannibalization, meaning if I’ve got an adorable animals page in my main section of my website, I don’t actually want to publish a blog post called “New Adorable Animals to Add to My Collection,” because now I’m competing with myself and I’m diluting my link juice.

Actually, this has gotten way easier. Google, and Bing as well, have become much more intelligent about identifying what’s new content, what’s old, sort of evergreen content, and they’ll promote one. You even sometimes have an opportunity to get both in there. Certainly if you’re posting fresh content that gets into Google news, the blog or the news section can be an opportunity to get in Google news. The old one can be an opportunity to just stay in the search results for a long time period. Get ting links to one doesn’t actually dilute your ranking ability for the other because of how Google is doing much more topic focused associations around entire websites.

So this can be actually a really good thing. However, that being said, you do still want to try and link back to the most relevant, evergreen kind of original page. If I publish a new blog post that has some aggregation of hedgehogs in military uniforms from the Swiss Naval Academy — I don’t know why Switzerland would have a navy since they’re landlocked — I would probably want to take that hedgehogs in Swiss military uniforms and link back to my original one here.

I wouldn’t necessarily want to do the same thing and link over here, unless I decide, hey, a lot of people who are interested in this are going to want to check out this article too, in which case it’s fine to do that.

I would worry a little bit that sometimes people bias to quantity over quality of links internally when they’re publishing their blog content or publishing these detail pages and they think, “Oh, I need to link to everything that’s possibly relevant.” I wouldn’t do that. I would actually link to the things that you are most certain that a high number, a high percent of the users who are enjoying or visiting or consuming one page, one piece of information are really going to want in their journey. If you don’t have that confidence, I wouldn’t necessarily put them in there. I wouldn’t try and stack those up with tons of extra links.

Like I said, you don’t need to worry about keyword cannibalization. If you want to publish a new article every week about hedgehogs in military uniforms, you go for it. That’s a great blog.

3. Make sub-pages if intent is unique, combine if not

Number three, and the last one here, make these sub-pages when there’s unique intent. Information architecture is actually really good about this in practice. They basically say, “Hey, why would we create a new page if we already have a page that serves the same goals and same intent?” One of the reasons that people used to say, “Well, I know that we have that, but it doesn’t do a great job of targeting phrase A and phrase B, which both have the same intent but aren’t going to rank for those two separate phrases A and B.”

That’s also not the case anymore in the SEO world. Google and Bing have both become incredibly good at sorting out searcher intent and matching those to the pages and the keywords that fit those intents, even if the keyword match isn’t perfect one-to-one exact.

So if I’ve got a page that’s on slow lorises yawning and another one on slow lorises that are sleepy, are those really all that different? Is the intent of the searcher very different? When someone is searching for a sleepy loris, are they looking for one that’s probably yawning? Yeah. You know what? I would say these are the same intent. I would make a single page for them.

However, over here I’ve got a slow loris in a sombrero and a slow loris wearing a top hat. Now, these are two very different kinds of head wear, and people who are searching for sombreros are not going to want to find a slow loris wearing a top hat. They might want to see a cross link over between them. They might say, “Oh, top hat wearing slow lorises are also interesting to me.” But this is very specific intent, different from this one. Two different intents means two different pages.

That’s how I do all of my information architecture when it comes to a keyword and SEO perspective. You want to go broad to narrow. You want to not worry too much about publishing fresh content, but you do want to link back to the original evergreen. You want to make sure that if there are pages or intents that are exactly the same, you make a single page. If they’re intents that are different, you have different pages targeting those different intents.

All right everyone, look forward to the comments, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from moz.com

Panda 4.1: The Devil Is in the Aggregate

Posted by russvirante

I wish I didn’t have to say this. I wish I could look in the eyes of every victim of the last Panda 4.1 update and tell them it was something new, something unforeseeable, something out of their control. I wish I could tell them that Google pulled a fast one that no one saw coming. But I can’t.

Like many in the industry, I have been studying Panda closely since its inception. Google gave us a rare glimpse behind the curtain by providing us with the very guidelines they set in place to build their massive machine-learned algorithm which came to be known as Panda. Three and a half years later, Panda is still with us and seems to still catch us off guard.
Enough is enough.

What I intend to show you throughout this piece is that the original Panda questionnaire still remains a powerful predictive tool to wield in defense of what can be a painful organic traffic loss. By analyzing the winner/loser reports of Panda 4.1 using standard Panda surveys, we can determine whether Google’s choices are still in line with their original vision. So let’s dive in.

The process

The first thing we need to do is acquire a winners and losers list. I picked this excellent
one from SearchMetrics although any list would do as long as it is accurate. Second, I proceeded to run a Panda questionnaire with 10 questions on random pages from each of the sites (both the winners and losers). You can run your own Panda survey by following Distilled and Moz’s instructions here or just use PandaRisk like I did. After completing these analyses, we simply compare the scores across the board to determine whether they continue to reflect what we would expect given the original goals of the Panda algorithm.

The aggregate results

I actually want to do this a little bit backwards to drive home a point. Normally we would build to the aggregate results, starting with the details and leaving you with the big picture. But Panda
is a big-picture kind of algorithmic update. It is specially focused on the intersection of myriad features, the sum is greater than the parts. While breaking down these features can give us some insight, at the end of the day we need to stay acutely aware that unless we do well across the board, we are at risk.

Below is a graph of the average cumulative scores across the winners and losers. The top row are winners, the bottom row are losers. The left and right red circles indicate the lowest and highest scores within those categories, and the blue circle represents the average. There is something very important that I want to point out on this graph.
The highest individual average score of all the losers is less than the lowest average score of the winners. This means that in our randomly selected data set, not a single loser averaged as high a score as the worst winner. When we aggregate the data together, even with a crude system of averages rather than the far more sophisticated machine learning techniques employed by Google, there is a clear disparity between the sites that survive Panda and those that do not.

It is also worth pointing out here that there is no
positive Panda algorithm to our knowledge. Sites that perform well on Panda do not see boosts because they are being given ranking preference by Google, rather their competitors have seen rankings loss or their own previous Panda penalties have been lifted. In either scenario, we should remember that performing well on Panda assessments isn’t going to necessarily increase your rankings, but it should help you sustain them.

Now, let’s move on to some of the individual questions. We are going to start with the least correlated questions and move to those which most strongly correlate with performance in Panda 4.1. While all of the questions had positive correlations, a few lacked statistical significance.


Insignificant correlation

The first question which was not statistically significant in its correlation with Panda performance was “This page has visible errors on it”. The scores have been inverted here so that the higher the score, the fewer the number of people who reported that the page has errors. You can see that while more respondents did say that the winners had no visible errors, the difference was very slight. In fact, there was only a 5.35% difference between the two. I will save comment on this until after we discuss the next question.

The second question which was not statistically significant in its correlation with Panda performance was “This page has too many ads”. The scores have once again been inverted here so that the higher the score, the fewer the number of people who reported that the page has too many ads. This was even closer. The winners performed only 2.3% better than the losers in Panda 4.1.

I think there is a clear takeaway from these two questions. Nearly everyone gets the easy stuff right, but that isn’t enough. First, a lot of pages just have no ads whatsoever because that isn’t their business model. Even those that do have ads have caught on for the most part and optimized their pages accordingly, especially given that Google has other layout algorithms in place aside from Panda. Moreover, content inaccuracy is more likely to impact scrapers and content spinners than most sites, so it is unsurprising that few if any reported that the pages were filled with errors. If you score poorly on either of these, you have only begun to scratch the surface, because most websites get these right enough.


Moderate correlation

A number of Panda questions drew statistically significant difference in means but there was still substantial crossover between the winners and losers.
Whenever the average of the losers was greater than the lowest of the winners, I considered it only a moderate correlation. While the difference between means remained strong, there was still a good deal of variance in the scores. 

The first of these to consider was the question as to whether the content was “trustworthy”. You will notice a trend in a lot of these questions that there is a great deal of subjective human opinion. This subjectivity plays itself out quite a bit when the topics of the site might deal with very different categories of knowledge. For example, a celebrity fact site might be very trustworthy (although the site might be ad-laden) and an opinion piece in the New Yorker on the same celebrity might not be seen as trustworthy – even though it is plainly labeled as opinion. The trustworthy question ties back to the “does this page have errors” question quite nicely, drawing attention to the difference between a subjective and objective question and the way it can spread the means out nicely when you ask a respondent to give more of a personal opinion. This might seem unfair, but in the real world your site and Google itself is being judged by that subjective opinion, so it is understandable why Google wants to get at it algorithmically. Nevertheless, there was a strong difference in means between winners and losers of 12.57%, more than double the difference we saw between winners and losers on the question of Errors.

Original content has long been a known requirement of organic search success, so no one was surprised when it made its way into the Panda questionnaire. It still remains an influential piece of the puzzle with a difference in mean of nearly 20%. It was barely ruled out from being a heavily correlated feature due to one loser edging out a loss against the losers’ average mean. Notice though that one of the winners scored a perfect 100% on the survey. This perfect score was received despite hundreds of respondents.
It can be done.

As you can imagine, perception on what is and is not an authority is very subjective. This question is powerful because it pulls in all kinds of assumptions and presuppositions about brand, subject matter, content quality, design, justification, citations, etc. This likely explains why this question is beleaguered by one of the highest variances on the survey. Nevertheless, there was a 13.42% difference in means. And, on the other side of the scale, we did see what it is like to have a site that is clearly not an authority, scoring the worst possible 0% on this question. This is what happens when you include highly irrelevant content on your site just for the purpose of picking up either links or traffic. Be wary.

Everyone hates the credit card question, and luckily there is huge variance in answers. At least one site survived Panda despite scoring 5% on this question. Notice that there is a huge overlap between the lowest winner and the average of the losing sites. Also, if you notice by the placement of the mean (blue circle) in the winners category, the average wasn’t skewed to the right indicating just one outlier. There was strong variance in the responses across the board. The same was true of the losers. However, with a +15% difference in means, there was a clear average differentiation between the performance of winners and losers. Once again, though, we are drawn back to that aggregate score at the top, where we see how Google can use all these questions together to build a much clearer picture of site and content quality. For example, it is possible that Google pays more attention to this question when it is analyzing a site that has other features like the words “shopping cart” or “check out” on the homepage. 

I must admit that the bookmarking question surprised me. I always considered it to be the most subjective of the bunch. It seemed unfair that a site might be judged because it has material that simply doesn’t appeal to the masses. The survey just didn’t bear this out though. There was a clear difference in means, but after comparing the sites that were from similar content categories, there just wasn’t any reason to believe that a bias was created by subject matter. The 14.64% difference seemed to be, editorially speaking, related more to the construction of the page and the quality of the content, not the topic being discussed. Perhaps a better way to think about this question is:
would you be embarrassed if your friends knew THIS was the site you were getting your information from rather than another.

This wraps up the 5 questions that had good correlations but substantial enough variance that it was possible for the highest loser to beat out the average winner. I think one clear takeaway from this section is that these questions, while harder to improve upon than the Low Ads and No Errors questions before, are completely within the webmaster’s grasp. Making your content and site appear original, trustworthy, authoritative, and worthy of bookmarking aren’t terribly difficult. Sure, it takes some time and effort, but these goals, unlike the next, don’t appear that far out of reach.


Heavy correlation

The final three questions that seemed to distinguish the most between the winners and losers of Panda 4.1 all had high difference-in-means and, more importantly, had little to no crossover between the highest loser and lowest winner. In my opinion, these questions are also the hardest for the webmaster to address. They require thoughtful design, high quality content, and real, expert human authors.

The first question that met this classification was “could this content could appear in print”. With a difference in mean of 22.62%, the winners thoroughly trounced the losers in this category. Their sites and content were just better designed and better written. They showed the kind of editorial oversight you would expect in a print publication. The content wasn’t trite and unimportant, it was thorough and timely. 

The next heavily correlated question was whether the page was written by experts. With over a 34% difference in means between the winners and losers, and
literally no overlap at all between the winners’ and losers’ individual averages, it was clearly the strongest question. You can see why Google would want to look into things like authorship when they knew that expertise was such a powerful distinguisher between Panda winners and losers. This really begs the question – who is writing your content and do your readers know it?

Finally, insightful analysis had a huge difference in means of +32% between winners and losers. It is worth noting that the highest loser is an outlier, which is typified by the skewed mean (blue circle) being closer to the bottom that the top. Most of the answers were closer to the lower score than the top. Thus, the overlap is exaggerated a bit. But once again, this just draws us back to the original conclusion – that the devil is not in the details, the devil is in the aggregate. You might be able to score highly on one or two of the questions, but it won’t be enough to carry you through.


The takeaways

OK, so hopefully it is clear that Panda really hasn’t changed all that much. The same questions we looked at for Panda 1.0 still matter. In fact, I would argue that Google is just getting better at algorithmically answering those same questions, not changing them. They are still the right way to judge a site in Google’s eyes. So how should you respond?

The first and most obvious thing is you should run a Panda survey on your (or your clients’) sites. Select a random sample of pages from the site. The easiest way to do this is get an export of all of the pages of your site, perhaps from Open Site Explorer, put them in Excel and shuffle them. Then choose the top 10 that come up.  You can follow the Moz instructions I linked to above, do it at PandaRisk, or just survey your employees, friends, colleagues, etc. While the latter probably will be positively biased, it is still better than nothing. Go ahead and get yourself a benchmark.

The next step is to start pushing those scores up one at a time. I
give some solid examples on the Panda 4.0 release article about improving press release sites, but there is another better resource that just came out as well. Josh Bachynski released an amazing set of known Panda factors over at his website The Moral Concept. It is well worth a thorough read. There is a lot to take in, but there are tons of easy-to-implement improvements that could help you out quite a bit. Once you have knocked out a few for each of your low-scoring questions, run the exact same survey again and see how you improve. Keep iterating this process until you beat out each of the question averages for winners. At that point, you can rest assured that your site is safe from the Panda by beating the devil in the aggregate. 

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 5 years ago from feedproxy.google.com

Podcast SEO Vol. I Ep. 15 : Sylvain Peyronnet bis

Pour fêter le 1er anniversaire du podcast référencement, Sylvain Peyronnet revient, puisqu’il était mon premier invité. Nous discutons du Machine Learning (a…

Reblogged 5 years ago from www.youtube.com