Should I Use Relative or Absolute URLs? – Whiteboard Friday

Posted by RuthBurrReedy

It was once commonplace for developers to code relative URLs into a site. There are a number of reasons why that might not be the best idea for SEO, and in today’s Whiteboard Friday, Ruth Burr Reedy is here to tell you all about why.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Let’s discuss some non-philosophical absolutes and relatives

Howdy, Moz fans. My name is Ruth Burr Reedy. You may recognize me from such projects as when I used to be the Head of SEO at Moz. I’m now the Senior SEO Manager at BigWing Interactive in Oklahoma City. Today we’re going to talk about relative versus absolute URLs and why they are important.

At any given time, your website can have several different configurations that might be causing duplicate content issues. You could have just a standard http://www.example.com. That’s a pretty standard format for a website.

But the main sources that we see of domain level duplicate content are when the non-www.example.com does not redirect to the www or vice-versa, and when the HTTPS versions of your URLs are not forced to resolve to HTTP versions or, again, vice-versa. What this can mean is if all of these scenarios are true, if all four of these URLs resolve without being forced to resolve to a canonical version, you can, in essence, have four versions of your website out on the Internet. This may or may not be a problem.

It’s not ideal for a couple of reasons. Number one, duplicate content is a problem because some people think that duplicate content is going to give you a penalty. Duplicate content is not going to get your website penalized in the same way that you might see a spammy link penalty from Penguin. There’s no actual penalty involved. You won’t be punished for having duplicate content.

The problem with duplicate content is that you’re basically relying on Google to figure out what the real version of your website is. Google is seeing the URL from all four versions of your website. They’re going to try to figure out which URL is the real URL and just rank that one. The problem with that is you’re basically leaving that decision up to Google when it’s something that you could take control of for yourself.

There are a couple of other reasons that we’ll go into a little bit later for why duplicate content can be a problem. But in short, duplicate content is no good.

However, just having these URLs not resolve to each other may or may not be a huge problem. When it really becomes a serious issue is when that problem is combined with injudicious use of relative URLs in internal links. So let’s talk a little bit about the difference between a relative URL and an absolute URL when it comes to internal linking.

With an absolute URL, you are putting the entire web address of the page that you are linking to in the link. You’re putting your full domain, everything in the link, including /page. That’s an absolute URL.

However, when coding a website, it’s a fairly common web development practice to instead code internal links with what’s called a relative URL. A relative URL is just /page. Basically what that does is it relies on your browser to understand, “Okay, this link is pointing to a page that’s on the same domain that we’re already on. I’m just going to assume that that is the case and go there.”

There are a couple of really good reasons to code relative URLs

1) It is much easier and faster to code.

When you are a web developer and you’re building a site and there thousands of pages, coding relative versus absolute URLs is a way to be more efficient. You’ll see it happen a lot.

2) Staging environments

Another reason why you might see relative versus absolute URLs is some content management systems — and SharePoint is a great example of this — have a staging environment that’s on its own domain. Instead of being example.com, it will be examplestaging.com. The entire website will basically be replicated on that staging domain. Having relative versus absolute URLs means that the same website can exist on staging and on production, or the live accessible version of your website, without having to go back in and recode all of those URLs. Again, it’s more efficient for your web development team. Those are really perfectly valid reasons to do those things. So don’t yell at your web dev team if they’ve coded relative URLS, because from their perspective it is a better solution.

Relative URLs will also cause your page to load slightly faster. However, in my experience, the SEO benefits of having absolute versus relative URLs in your website far outweigh the teeny-tiny bit longer that it will take the page to load. It’s very negligible. If you have a really, really long page load time, there’s going to be a whole boatload of things that you can change that will make a bigger difference than coding your URLs as relative versus absolute.

Page load time, in my opinion, not a concern here. However, it is something that your web dev team may bring up with you when you try to address with them the fact that, from an SEO perspective, coding your website with relative versus absolute URLs, especially in the nav, is not a good solution.

There are even better reasons to use absolute URLs

1) Scrapers

If you have all of your internal links as relative URLs, it would be very, very, very easy for a scraper to simply scrape your whole website and put it up on a new domain, and the whole website would just work. That sucks for you, and it’s great for that scraper. But unless you are out there doing public services for scrapers, for some reason, that’s probably not something that you want happening with your beautiful, hardworking, handcrafted website. That’s one reason. There is a scraper risk.

2) Preventing duplicate content issues

But the other reason why it’s very important to have absolute versus relative URLs is that it really mitigates the duplicate content risk that can be presented when you don’t have all of these versions of your website resolving to one version. Google could potentially enter your site on any one of these four pages, which they’re the same page to you. They’re four different pages to Google. They’re the same domain to you. They are four different domains to Google.

But they could enter your site, and if all of your URLs are relative, they can then crawl and index your entire domain using whatever format these are. Whereas if you have absolute links coded, even if Google enters your site on www. and that resolves, once they crawl to another page, that you’ve got coded without the www., all of that other internal link juice and all of the other pages on your website, Google is not going to assume that those live at the www. version. That really cuts down on different versions of each page of your website. If you have relative URLs throughout, you basically have four different websites if you haven’t fixed this problem.

Again, it’s not always a huge issue. Duplicate content, it’s not ideal. However, Google has gotten pretty good at figuring out what the real version of your website is.

You do want to think about internal linking, when you’re thinking about this. If you have basically four different versions of any URL that anybody could just copy and paste when they want to link to you or when they want to share something that you’ve built, you’re diluting your internal links by four, which is not great. You basically would have to build four times as many links in order to get the same authority. So that’s one reason.

3) Crawl Budget

The other reason why it’s pretty important not to do is because of crawl budget. I’m going to point it out like this instead.

When we talk about crawl budget, basically what that is, is every time Google crawls your website, there is a finite depth that they will. There’s a finite number of URLs that they will crawl and then they decide, “Okay, I’m done.” That’s based on a few different things. Your site authority is one of them. Your actual PageRank, not toolbar PageRank, but how good Google actually thinks your website is, is a big part of that. But also how complex your site is, how often it’s updated, things like that are also going to contribute to how often and how deep Google is going to crawl your site.

It’s important to remember when we think about crawl budget that, for Google, crawl budget cost actual dollars. One of Google’s biggest expenditures as a company is the money and the bandwidth it takes to crawl and index the Web. All of that energy that’s going into crawling and indexing the Web, that lives on servers. That bandwidth comes from servers, and that means that using bandwidth cost Google actual real dollars.

So Google is incentivized to crawl as efficiently as possible, because when they crawl inefficiently, it cost them money. If your site is not efficient to crawl, Google is going to save itself some money by crawling it less frequently and crawling to a fewer number of pages per crawl. That can mean that if you have a site that’s updated frequently, your site may not be updating in the index as frequently as you’re updating it. It may also mean that Google, while it’s crawling and indexing, may be crawling and indexing a version of your website that isn’t the version that you really want it to crawl and index.

So having four different versions of your website, all of which are completely crawlable to the last page, because you’ve got relative URLs and you haven’t fixed this duplicate content problem, means that Google has to spend four times as much money in order to really crawl and understand your website. Over time they’re going to do that less and less frequently, especially if you don’t have a really high authority website. If you’re a small website, if you’re just starting out, if you’ve only got a medium number of inbound links, over time you’re going to see your crawl rate and frequency impacted, and that’s bad. We don’t want that. We want Google to come back all the time, see all our pages. They’re beautiful. Put them up in the index. Rank them well. That’s what we want. So that’s what we should do.

There are couple of ways to fix your relative versus absolute URLs problem

1) Fix what is happening on the server side of your website

You have to make sure that you are forcing all of these different versions of your domain to resolve to one version of your domain. For me, I’m pretty agnostic as to which version you pick. You should probably already have a pretty good idea of which version of your website is the real version, whether that’s www, non-www, HTTPS, or HTTP. From my view, what’s most important is that all four of these versions resolve to one version.

From an SEO standpoint, there is evidence to suggest and Google has certainly said that HTTPS is a little bit better than HTTP. From a URL length perspective, I like to not have the www. in there because it doesn’t really do anything. It just makes your URLs four characters longer. If you don’t know which one to pick, I would pick one this one HTTPS, no W’s. But whichever one you pick, what’s really most important is that all of them resolve to one version. You can do that on the server side, and that’s usually pretty easy for your dev team to fix once you tell them that it needs to happen.

2) Fix your internal links

Great. So you fixed it on your server side. Now you need to fix your internal links, and you need to recode them for being relative to being absolute. This is something that your dev team is not going to want to do because it is time consuming and, from a web dev perspective, not that important. However, you should use resources like this Whiteboard Friday to explain to them, from an SEO perspective, both from the scraper risk and from a duplicate content standpoint, having those absolute URLs is a high priority and something that should get done.

You’ll need to fix those, especially in your navigational elements. But once you’ve got your nav fixed, also pull out your database or run a Screaming Frog crawl or however you want to discover internal links that aren’t part of your nav, and make sure you’re updating those to be absolute as well.

Then you’ll do some education with everybody who touches your website saying, “Hey, when you link internally, make sure you’re using the absolute URL and make sure it’s in our preferred format,” because that’s really going to give you the most bang for your buck per internal link. So do some education. Fix your internal links.

Sometimes your dev team going to say, “No, we can’t do that. We’re not going to recode the whole nav. It’s not a good use of our time,” and sometimes they are right. The dev team has more important things to do. That’s okay.

3) Canonicalize it!

If you can’t get your internal links fixed or if they’re not going to get fixed anytime in the near future, a stopgap or a Band-Aid that you can kind of put on this problem is to canonicalize all of your pages. As you’re changing your server to force all of these different versions of your domain to resolve to one, at the same time you should be implementing the canonical tag on all of the pages of your website to self-canonize. On every page, you have a canonical page tag saying, “This page right here that they were already on is the canonical version of this page. ” Or if there’s another page that’s the canonical version, then obviously you point to that instead.

But having each page self-canonicalize will mitigate both the risk of duplicate content internally and some of the risk posed by scrappers, because when they scrape, if they are scraping your website and slapping it up somewhere else, those canonical tags will often stay in place, and that lets Google know this is not the real version of the website.

In conclusion, relative links, not as good. Absolute links, those are the way to go. Make sure that you’re fixing these very common domain level duplicate content problems. If your dev team tries to tell you that they don’t want to do this, just tell them I sent you. Thanks guys.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

How to Defeat Duplicate Content – Next Level

Posted by EllieWilkinson

Welcome to the third installment of Next Level! In the previous Next Level blog post, we shared a workflow showing you how to take on your competitors using Moz tools. We’re continuing the educational series with several new videos all about resolving duplicate content. Read on and level up!


Dealing with duplicate content can feel a bit like doing battle with your site’s evil doppelgänger—confusing and tricky to defeat! But identifying and resolving duplicates is a necessary part of helping search engines decide on relevant results. In this short video, learn about how duplicate content happens, why it’s important to fix, and a bit about how you can uncover it.

Next Level – Identifying Duplicate_pt1

[
Quick clarification: Search engines don’t actively penalize duplicate content, per se; they just don’t always understand it as well, which can lead to a drop in rankings. More info here.]

Now that you have a better idea of how to identify those dastardly duplicates, let’s get rid of ’em once and for all. Watch this next video to review how to use Moz Analytics to find and fix duplicate content using three common solutions. (You’ll need a Moz Pro subscription to use Moz Analytics. If you aren’t yet a Moz Pro subscriber, you can always try out the tools with a
30-day free trial.)

Workflow summary

Here’s a review of the three common solutions to conquering duplicate content:

  1. 301 redirect. Check Page Authority to see if one page has a higher PA than the other using Open Site Explorer, then set up a 301 redirect from the duplicate page to the original page. This will ensure that they no longer compete with one another in the search results. Wondering what a 301 redirect is and how to do it? Read more about redirection here.
  2. Rel=canonical. A rel=canonical tag passes the same amount of ranking power as a 301 redirect, and there’s a bonus: it often takes less development time to implement! Add this tag to the HTML head of a web page to tell search engines that it should be treated as a copy of the “canon,” or original, page:
    <head> <link rel="canonical" href="http://moz.com/blog/" /> </head>

    If you’re curious, you can
    read more about canonicalization here.

  3. noindex, follow. Add the values “noindex, follow” to the meta robots tag to tell search engines not to include the duplicate pages in their indexes, but to crawl their links. This works really well with paginated content or if you have a system set up to tag or categorize content (as with a blog). Here’s what it should look like:
    <head> <meta name="robots" content="noindex, follow" /> </head>

    If you’re looking to block the Moz crawler, Rogerbot, you can use the robots.txt file if you prefer—he’s a good robot, and he’ll obey!
    More about meta robots (and robots.txt) here.

Can’t get enough of duplicate content? Want to become a duplicate content connoisseur? This last video explains more about how Moz finds duplicates, if you’re curious. And you can read even more over at the
Moz Developer Blog.

We’d love to hear about your techniques for defeating duplicates! Chime in below in the comments.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

The Coming Integration of PR and SEO

Posted by SamuelScott

Earlier this year, I published a Moz post that aimed to introduce the
basic principles of public relations that SEOs and digital marketers, I argued, need to know. (Specifically, the post was on media relations and story-pitching as a means of getting coverage and “earning” good links.)

Following the positive response to the post, Moz invited me to host a recent Mozinar on the integration of PR and SEO. (
You can listen to it and download the slides here for free!) As a former print journalist who later became a digital marketer, I love to discuss this niche because I am very passionate about the topic.

In summary, the Mozinar discussed:

  • Traditional marketing and communications theory
  • Why both inbound and outbound marketing are needed
  • An overview of the basic PR process
  • How to use PR software
  • Examples of messaging and positioning
  • Where to research demographic data for audience profiles
  • How to integrate SEO into each step of the workflow
  • How SEO and PR teams can help each other
  • Why the best links come as natural results of doing good PR and marketing
  • “Don’t think about how to get links. Think about how to get coverage and publicity.”

At the end of the Mozinar, the community had some intriguing and insightful questions (no surprise there!), and Moz invited me to write a follow-up post to provide more answers and discuss the relationship between SEO and PR further.

Follow-ups to the PR Mozinar

Before I address the questions and ideas at the end of the Mozinar, I just wanted to give some more credit where the credit is certainly due.

People like me, who write for major publications or speak at large conferences, get a lot of attention. But, truth is, we are always helped immensely by so many of our talented colleagues behind the scenes. Since the beginning of my digital marketing career, I have known about SEO, but I have learned more about public relations from observing (albeit from a distance) The Cline Group’s front line PR team in Philadelphia over the years.

So, I just wanted to thank (in alphabetical order)
Kim Cox, Gabrielle Dratch, Caitlin Driscoll, Max Marine, and Ariel Shore as well as our senior PR executives Bill Robinson and DeeDee Rudenstein and CEO Josh Cline. What I hope the Moz community learned from the Mozinar is what I have learned from them.

Now, onto the three Mozinar Q&A questions that had been left unanswered.

  • Why do you use Cision and not Vocus or Meltwater or others?

I do not want to focus on why The Cline Group specifically uses Cision. I would not want my agency (and indirectly Moz) to be seen as endorsing one type of PR software over another. What I can do is encourage people to read these writings from 
RMP Media Analysis, LinkedIn, Alaniz Marketing and Ombud, then do further research into which platform may work best for them and their specific companies and needs.

(Cision and Vocus recently agreed to merge, with the combined company continuing under the Cision brand.)

  • Do you have examples of good PR pitches?

I’ve anonymized and uploaded three successful client pitches to our website. You can download them here: a
mobile-advertising network, a high-end vaporizer for the ingestion of medicinal herbs and a mobile app that helps to protect personal privacy. As you will see, these pitches incorporated the various tactics that I had detailed in the Mozinar.

Important caveat: Do not fall into the trap of relying too much on templates. Every reporter and every outlet you pitch will be different. The ideas in these examples of pitches may help, but please do not use them verbatim. 

  • Are there other websites similar to HARO (Help a Reporter Out) that people can use to find reporters who are looking for stories? Are the other free, simpler tools?

Some commonly mentioned tools are
My Blog U, ProfNet, BuzzStream and My Local Reporter. Raven Tools also has a good-sized list. But I can only vouch for My Blog U because it’s the only one I have used personally. It’s also important to note that using a PR tool is not a magic bullet. You have to know how to use it in the context of the overall public relations process. Creating a media list is just one part of the puzzle.

An infographic of integration

And now, the promised infographic!

I told the Mozinar audience we would provide a detailed infographic as a quick guide to the step-by-step process of PR and SEO integration. Well, here it is:

pr-seo-infographic-final.jpg

A second credit to my awesome colleague
Thomas Kerr, who designs most of The Cline Group’s presentations and graphics while also being our social media and overall digital wizard.

Just a few notes on the infographic:

First, I have segmented the two pillars by “PR and Traditional Marketing” and “SEO & Digital Marketing.” I hate to sound stereotypical, but the use of this differentiation was the easiest way to explain the integration process. The “PR” side deals with
people and content (e.g., messaging, media relations, and materials, etc.), while the “SEO” side focuses on things (e.g., online data, analytics, and research, etc.). See the end of this post for an important prediction.

Second, I have put social media on the online side because that is where the practice seems to sit in most companies and agencies. However, social media is really just a set of PR and communications channels, so it will likely increasingly move to the “traditional marketing” side of things. Again, see the end.

Third, there is a CMO / VP of Marketing / Project Leader (based on the structure of a company and whether the context is an agency or an in-house department) column between SEO and PR. This position should be a person with enough experience in both disciplines to mediate between the two as well as make judgment calls and final decisions in the case of conflicts. “SEO,” for example, may want to use certain keyword-based language in messaging in an attempt to rank highly for certain search terms. “PR” might want to use different terms that may resonate more with media outlets and the public. Someone will need to make a decision.

Fourth, it is important to understand that companies with numerous brands, products or services, and/or a diverse set of target audiences will need to take additional steps:

The marketing work for each brand, product, or service will need its own specific goal and KPI(s) in step one. Separate audience research and persona development will need to be performed for each distinct audience in step two. So, for a larger company, such as the one described above, parts of steps 3-8 below will often need to be done, say, six times, once for each audience of each product.

However, the complexity does not end there.

Online and offline is the same thing

Essentially, as more and more human activity occurs online, we are rapidly approaching a point where the offline and online worlds are merging into the same space. “Traditional” and “online” marketing are all collectively becoming simply “marketing.”

Above is our modern version of traditional communications and marketing theory. A sender decides upon a message; the message is packaged into a piece of content; the content is transmitted via a desired channel; and the channel delivers the content to the receiver. Marketing is essentially sending a message that is packaged into a piece of content to a receiver via a channel. The rest is just details.

As Google becomes smarter and smarter, marketers will need to stop thinking only about SEO and think more like, well, marketers. Mad Men’s Don Draper, the subject of the meme at the top of the page, would best the performance of any link builder today because he understood how to gain mass publicity and coverage, both of which have always been more important than just building links here and there. The best and greatest numbers of links come naturally as a
result of good marketing and not as a result of any direct linkbuilding. In the 2014 Linkbuilding Survey published on Moz, most of the (good) tactics that were described in the post – such as “content plus outreach” – are PR by another name.

At SMX West 2014 (where I gave a talk on SEO and PR strategy), Rand Fishkin took to the main stage to discuss what the future holds for SEO. Starting at 6:30 in the video above, he argued that there will soon be a bias towards brands in organic search. (For an extensive discussion of this issue, I’ll refer you to Bryson Meunier’s essay at Search Engine Land.) I agree that it will soon become crucial to use PR, advertisingand publicity to build a brand, but that action is something the Don Drapers of the world had already known to do long before the Internet had ever existed.

But things are changing

The process that I have outlined above is a little vague on purpose. The lines between SEO and PR are increasingly blurring as online and offline marketing becomes more and more integrated. For example, take this very post: is it me doing SEO or PR for our agency (while
first and foremost aiming to help the readers)? The answer: Yes.

In a Moz post by Jason Acidre on
SEO and brand building, I commented with the following:

Say, 10 years ago, “SEOs” were focused on techie things: keyword research, sitemaps, site hierarchy, site speed, backlinks, and a lot more. Then, as Google became smarter and the industry become more and more mature, “SEOs” woke up one day and realized that online marketers need to think, you know, like marketers. Now, I get the sense that digital marketers are trying to learn all about traditional marketing as much as possible because, in the end, all marketing is about
people — not machines and algorithms. What the f&*# is a positioning statement? What is a pitch? I just wish “SEOs” had done this from the beginning.

Of course, the same thing has been occurring in the inverse in the traditional marketing world. Traditional marketers have usually focused on these types of things: messaging documents, media lists, promotional campaigns, the 4 Ps, and SWOT analyses. Then, as more human activity moved to the Internet, they also woke up one day and saw an anarchic set of communications channels that operate under different sets of rules. Now, on the other end, I get the sense that traditional marketers are trying to learn as much as possible about SEO and digital marketing. 
What the f&^% is a rel=canonical tag? What is Google+ authorship? I just wish traditional marketers had done this from the start.

In fact, such a separation between SEO and PR is quickly dying. Here is a simplified version of the marketing and communications process I outlined at the beginning:

Traditional marketers and communications professionals have used this process for decades, and almost everything that (the umbrella term of) SEO does can fit into one of these boxes. A message can appear in a newspaper article or in a blog post. Content can be a sales brochure or an e-book. A channel can be the television or Facebook. A lot of  technical and on-page SEO is simply good web development. The most-effective type of off-page SEO is just PR and publicity. Public-relations executives, as I
have written elsewhere, can also learn to use analytics as yet another way to gauge results.

It all goes back to this tweet from Rand, which I cite in nearly every offline conversation with the marketing community:

SEO as an entity (sorry for the pun)
unto itself is quickly dying. The more SEO entails, the more the umbrella term becomes useless in any meaningful context. For this reason, it is crucial that digital marketers learn as much as possible about traditional marketing and PR.

So, in the end, how does one integrate public relations and SEO? By simply doing good
marketing.

Want more? Don’t forget to watch the Mozinar — I’d love to get your feedback in the comments below!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from feedproxy.google.com

Syndicating Content – Whiteboard Friday

Posted by Eric Enge

It’s hard to foresee a lot of benefit to your hard work creating content when you don’t have much of a following, and even if you do, scaling that content creation is difficult for any marketer. One viable answer is syndication, and in this Whiteboard Friday, Eric Enge shows you both reasons why you might want to syndicate as well as tips on how to go about it.

Heads-up! We published a one-two punch of Whiteboard Friday videos from our friends at Stone Temple Consulting today. Check out “I See Content (Everywhere)” by Mark Traphagen, too!

For reference, here’s a still of this week’s whiteboard!

Video transcription

Hi everybody. I’m Eric Enge, CEO of Stone Temple Consulting. Welcome to another edition of Whiteboard Friday, and today we’re going to be talking about syndicated content. I probably just smeared my picture, but in any case, you hear about syndicated content and the first thing that comes across your mind is, “Doesn’t that create duplicate content, and isn’t somebody going to outrank me for my own stuff?” And it is a legitimate concern. But before I talk about how to do it, I want to tell you about why to do it, because there are really, really good sound reasons for syndicating content.

Why (and how) should I syndicate my content?

So first of all, here is your site. You get to be the site in purple by the way, and then here is an authority site, which is the site in green. You have an article that you’ve written called, “All About Fruit,” and you deliver that article to that authority site and they publish the same article, hence creating the duplicate content. So why would you consider doing this?

Well, the first reason is that by association with a higher authority site there is going to be some authority passed to you, both from a human perspective from people that see that your content is up there. They see that your authored content is on this authority site. That by itself is a great thing. When we do the right things, we’re also going to get some link juice or SEO authority passed to you as well. So these are really good reasons by itself to do it.

But the other thing that happens is you get exposure to what I call OPA or Other People’s Audiences, and that’s a very helpful thing as well. These people, as I’ve mentioned before, they’re going to see you here, and this crowd, some of this crowd is going to start to become your crowd. This is great stuff. But let’s talk about how to do it. So here we go.

Three ways to contentedly syndicate content

#1 rel=canonical

There are three ways that you can do this that can make this work for you. The first is, here’s your site again, here’s the authority site. You get the authority site to implement a rel=canonical tag back to your page, the same page, the exact article page on your site. That tells Google and Bing that the real canonical version of the content is this one over here. The result of that is that all of the PageRank that accrues to this page on the authority site now gets passed over to you. So any links, all the links, in fact, that this page gets now gets passed through to you, and you get the PageRank from all that. This is great stuff. But that’s just one of the solutions. It’s actually the best one in my opinion.

#2 meta noindex

The second best one down here, okay, same scenario — your site, the authority’s site. The authority’s site implements a meta no index tag on their page. That’s an instruction to the search engine to not keep this page in the index, so that solves the duplicate content problem for you in a different way. This does as well, but this is a way of just taking it out of the index. Now any links from this page here over to your page still pass PageRank. So you still want to make sure you’re getting those in the process. So a second great solution for this problem.

#3 Clean Link to Original Article

So these are both great, but it turns out that a lot of sites don’t really like to do either of these two things. They actually want to be able to have the page in the index, or they don’t want to take the trouble to do this extra coding. There is a third solution, which is not the best solution, but it’s still very workable in the right scenarios. That is you get them to implement a clean text link from the copied page that they have on their site over to your site, to the same article on your site. The search engines are pretty good at understanding, when they see that link, that it means that you’re the original author. So you’re still getting a lot of authority passed, and you’re probably eliminating a duplicate content problem.

So again, let’s just recap briefly. The reason why you want to go through this trouble is you get authority from the authority site passed to you, both at a human level and at an SEO level, and you can gain audience from the audience of that authority site.

So that’s it for this edition of Whiteboard Friday.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from feedproxy.google.com

Your Google Algorithm Cheat Sheet: Panda, Penguin, and Hummingbird

Posted by MarieHaynes

If you’re reading the Moz blog, then you probably have a decent understanding of Google and its algorithm changes. However, there is probably a good percentage of the Moz audience that is still confused about the effects that Panda, Penguin, and Hummingbird can have on your site. I did write a post last year about the main 
differences between Penguin and a Manual Unnautral Links Penalty, and if you haven’t read that, it’ll give you a good primer.

The point of this article is to explain very simply what each of these algorithms are meant to do. It is hopefully a good reference that you can point your clients to if you want to explain an algorithm change and not overwhelm them with technical details about 301s, canonicals, crawl errors, and other confusing SEO terminologies.

What is an algorithm change?

First of all, let’s start by discussing the Google algorithm. It’s immensely complicated and continues to get more complicated as Google tries its best to provide searchers with the information that they need. When search engines were first created, early search marketers were able to easily find ways to make the search engine think that their client’s site was the one that should rank well. In some cases it was as simple as putting in some code on the website called a meta keywords tag. The meta keywords tag would tell search engines what the page was about.

As Google evolved, its engineers, who were primarily focused on making the search engine results as relevant to users as possible, continued to work on ways to stop people from cheating, and looked at other ways to show the most relevant pages at the top of their searches. The algorithm now looks at hundreds of different factors. There are some that we know are significant such as having a good descriptive title (between the <title></title> tags in the code.) And there are many that are the subject of speculation such as 
whether or not Google +1’s contribute to a site’s rankings.

In the past, the Google algorithm would change very infrequently. If your site was sitting at #1 for a certain keyword, it was guaranteed to stay there until the next update which might not happen for weeks or months. Then, they would push out another update and things would change. They would stay that way until the next update happened. If you’re interested in reading about how Google used to push updates out of its index, you may find this 
Webmaster World forum thread from 2002 interesting. (Many thanks to Paul Macnamara  for explaining to me how algo changes used to work on Google in the past and pointing me to the Webmaster World thread.)

This all changed with launch of “Caffeine” in 2010. Since Caffeine launched, the search engine results have been changing several times a day rather than every few weeks. Google makes over 600 changes to its algorithm in a year, and the vast majority of these are not announced. But, when Google makes a really big change, they give it a name, usually make an announcement, and everyone in the SEO world goes crazy trying to figure out how to understand the changes and use them to their advantage.

Three of the biggest changes that have happened in the last few years are the Panda algorithm, the Penguin algorithm and Hummingbird.

What is the Panda algorithm?

Panda first launched on February 23, 2011. It was a big deal. The purpose of Panda was to try to show high-quality sites higher in search results and demote sites that may be of lower quality. This algorithm change was unnamed when it first came out, and many of us called it the “Farmer” update as it seemed to affect content farms. (Content farms are sites that aggregate information from many sources, often stealing that information from other sites, in order to create large numbers of pages with the sole purpose of ranking well in Google for many different keywords.) However, it affected a very large number of sites. The algorithm change was eventually officially named after one of its creators, Navneet Panda.

When Panda first happened, a lot of SEOs in forums thought that this algorithm was targeting sites with unnatural backlink patterns. However, it turns out that links are most likely
not a part of the Panda algorithm. It is all about on-site quality.

In most cases, sites that were affected by Panda were hit quite hard. But, I have also seen sites that have taken a slight loss on the date of a Panda update. Panda tends to be a site-wide issue which means that it doesn’t just demote certain pages of your site in the search engine results, but instead, Google considers the entire site to be of lower quality. In some cases though Panda can affect just a section of a site such as a news blog or one particular subdomain.

Whenever a Google employee is asked about what needs to be done to recover from Panda, they refer to a 
blog post by Google Employee Amit Singhal that gives a checklist that you can use on your site to determine if your site really is high quality or not. Here is the list:

  • Would you trust the information presented in this article?
  • Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
  • Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
  • Would you be comfortable giving your credit card information to this site?
  • Does this article have spelling, stylistic, or factual errors?
  • Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
  • Does the article provide original content or information, original reporting, original research, or original analysis?
  • Does the page provide substantial value when compared to other pages in search results?
  • How much quality control is done on content?
  • Does the article describe both sides of a story?
  • Is the site a recognized authority on its topic?
  • Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  • Was the article edited well, or does it appear sloppy or hastily produced?
  • For a health related query, would you trust information from this site?
  • Would you recognize this site as an authoritative source when mentioned by name?
  • Does this article provide a complete or comprehensive description of the topic?
  • Does this article contain insightful analysis or interesting information that is beyond obvious?
  • Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  • Does this article have an excessive amount of ads that distract from or interfere with the main content?
  • Would you expect to see this article in a printed magazine, encyclopedia or book?
  • Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  • Are the pages produced with great care and attention to detail vs. less attention to detail?
  • Would users complain when they see pages from this site?

Phew! That list is pretty overwhelming! These questions do not necessarily mean that Google tries to algorithmically figure out whether your articles are interesting or whether you have told both sides of a story. Rather, the questions are there because all of these factors can contribute to how real-life users would rate the quality of your site. No one really knows all of the factors that Google uses in determining the quality of your site through the eyes of Panda. Ultimately though, the focus is on creating the best site possible for your users.  It is also important that only your best stuff is given to Google to have in its index. There are a few factors that are widely accepted as important things to look at in regards to Panda:

Thin content

A “thin” page is a page that adds little or no value to someone who is reading it. It doesn’t necessarily mean that a page has to be a certain number of words, but quite often, pages with very few words are not super-helpful. If you have a large number of pages on your site that contain just one or two sentences and those pages are all included in the Google index, then the Panda algorithm may determine that the majority of your indexed pages are of low quality.

Having the odd thin page is not going to cause you to run in to Panda problems. But, if a big enough portion of your site contains pages that are not helpful to users, then that is not good.

Duplicate content

There are several ways that duplicate content can cause your site to be viewed as a low-quality site by the Panda algorithm. The first is when a site has a large amount of content that is copied from other sources on the web. Let’s say that you have a blog on your site and you populate that blog with articles that are taken from other sources. Google is pretty good at figuring out that you are not the creator of this content. If the algorithm can see that a large portion of your site is made up of content that exists on other sites then this can cause Panda to look at you unfavorably.

You can also run into problems with duplicated content on your own site. One example would be for a site that has a large number of products for sale. Perhaps each product has a separate page for each color variation and size. But, all of these pages are essentially the same. If one product comes in 20 different colors and each of those come in 6 different sizes, then that means that you have 120 pages for the same product, all of which are almost identical. Now, imagine that you sell 4,000 products. This means that you’ve got almost half a million pages in the Google index when really 4,000 pages would suffice. In this type of situation, the fix for this problem is to use something called a canonical tag. Moz has got a really good guide on using canonical tags 
here, and Dr. Pete has also written this great article on canonical tag use

Low-quality content

When I write an article and publish it on one of my websites, the only type of information that I want to present to Google is information that is the absolute best of its kind. In the past, many SEOs have given advice to site owners saying that it was important to blog every day and make sure that you are always adding content for Google to index. But, if what you are producing is not high quality content, then you could be doing more harm than good. A lot of Amit Singhal’s questions listed above are asking whether the content on your site is valuable to readers. Let’s say that I have an SEO blog and every day I take a short blurb from each of the interesting SEO articles that I have read online and publish it as a blog post on my site. Is Google going to want to show searchers my summary of these articles, or would they rather show them the actual articles? Of course my summary is not going to be as valuable as the real thing! Now, let’s say that I have done this every day for 4 years. Now my site has over 4,000 pages that contain information that is not unique and not as valuable as other sites on the same topics.

Here is another example. Let’s say that I am a plumber. I’ve been told that I should blog regularly, so several times a week I write a 2-3 paragraph article on things like, “How to fix a leaky faucet” or “How to unclog a toilet.” But, I’m busy and don’t have much time to put into my website so each article I’ve written contains keywords in the title and a few times in the content, but the content is not in depth and is not that helpful to readers. If the majority of the pages on my site contain information that no one is engaging with, then this can be a sign of low quality in the eyes of the Panda algorithm.

There are other factors that probably play a roll in the Panda algorithm.  Glenn Gabe recently wrote an 
excellent article on his evaluation of sites affected by the most recent Panda update.  His bullet point list of things to improve upon when affected by Panda is extremely thorough.

How to recover from a Panda hit

Google refreshes the Panda algorithm approximately monthly. They used to announce whenever they were refreshing the algorithm, but now they only do this if there is a really big change to the Panda algorithm. What happens when the Panda algorithm refreshes is that Google takes a new look at each site on the web and determines whether or not it looks like a quality site in regards to the criteria that the Panda algorithm looks at. If your site was adversely affected by Panda and you have made changes such as removing thin and duplicate content then, when Panda refreshes, you should see that things improve. However, for some sites it can take a couple of Panda refreshes to see the full extent of the improvements. This is because it can sometimes take several months for Google to revisit all of your pages and recognize the changes that you have made.

Every now and then, instead of just
refreshing the algorithm, Google does what they call an update. When an update happens, this means that Google has changed the criteria that they use to determine what is and isn’t considered high quality. On May 20, 2014, Google did a major update which they called Panda 4.0. This caused a lot of sites to see significant changes in regards to Panda:

Not all Panda recoveries are as dramatic as this one. But, if you have been affected by Panda and you work hard to make changes to your site, you really should see some improvement.

What is the Penguin algorithm?

Penguin

The Penguin algorithm initially rolled out on April 24, 2012. The goal of Penguin is to reduce the trust that Google has in sites that have cheated by creating unnatural backlinks in order to gain an advantage in the Google results. While the primary focus of Penguin is on unnatural links, there can be other 
factors that can affect a site in the eyes of Penguin as well. Links, though, are known to be by far the most important thing to look at.

Why are links important?

A link is like a vote for your site. If a well respected site links to your site, then this is a recommendation for your site. If a small, unknown site links to you then this vote is not going to count for as much as a vote from an authoritative site. Still, if you can get a large number of these small votes, they really can make a difference. This is why, in the past, SEOs would try to get as many links as they could from any possible source.

Another thing that is important in the Google algorithms is anchor text. Anchor text is the text that is underlined in a link. So, in this link to a great 
SEO blog, the anchor text would be “SEO blog.” If Moz.com gets a number of sites linking to them using the anchor text “SEO blog,” that is a hint to Google that people searching for “SEO blog” probably want to see sites like Moz in their search results.

It’s not hard to see how people could manipulate this part of the algorithm. Let’s say that I am doing SEO for a landscaping company in Orlando. In the past, one of the ways that I could cheat the algorithm into thinking that my company should be ranked highly would be to create a bunch of self made links and use anchor text in these links that contain phrases like
Orlando Landscaping Company, Landscapers in Orlando and Orlando Landscaping. While an authoritative link from a well respected site is good, what people discovered is that creating a large number of links from low quality sites was quite effective. As such, what SEOs would do is create links from easy to get places like directory listings, self made articles, and links in comments and forum posts.

While we don’t know exactly what factors the Penguin algorithm looks at, what we do know is that this type of low quality, self made link is what the algorithm is trying to detect. In my mind, the Penguin algorithm is sort of like Google putting a “trust factor” on your links. I used to tell people that Penguin could affect a site on a page or even a keyword level, but Google employee John Mueller has said several times now that Penguin is a sitewide algorithm. This means that if the Penguin algorithm determines that a large number of the links to your site are untrustworthy, then this reduces Google’s trust in your entire site. As such, the whole site will see a reduction in rankings.  

While Penguin affected a lot of sites drastically, I have seen many sites that saw a small reduction in rankings.  The difference, of course, depends on the amount of link manipulation that has been done.

How to recover from a Penguin hit?

Penguin is a filter just like Panda. What that means, is that the algorithm is re-run periodically and sites are re-evaluated with each re-run. At this point it is not run very often at all. The last update was October 4, 2013 which means that we have currently been waiting eight months for a new Penguin update. In order to recover from Penguin, you need to identify the unnatural links pointing to your site and either remove them, or if you can’t remove them you can ask Google to no longer count them by using the 
disavow tool. Then, the next time that Penguin refreshes or updates, if you have done a good enough job at cleaning up your unnatural links, you will once again regain trust in Google’s eyes.  In some cases, it can take a couple of refreshes in order for a site to completely escape Penguin because it can take up to 6 months for all of a site’s disavow file to be completely processed.

If you are not certain how to identify which links to your site are unnatural, here are some good resources for you:

The disavow tool is something that you probably should only be using if you really understand how it works. It is potentially possible for you to do more harm than good to your site if you disavow the wrong links. Here is some information on using the disavow tool:

It’s important to note that when sites “recover” from Penguin, they often don’t skyrocket up to top rankings once again as those previously high rankings were probably based on the power of links that are now considered unnatural. Here is some information on 
what to expect when you have recovered from a link based penalty or algorithmic issue.

Also, the Penguin algorithm is not the same thing as a manual unnatural links penalty. You do not need to file a reconsideration request to recover from Penguin. You also do not need to document the work that you have done in order to get links removed as no Google employee will be manually reviewing your work. As mentioned previously, here is more information on the 
difference between the Penguin algorithm and a manual unnatural links penalty.

What is Hummingbird?

Hummingbird is a completely different animal than Penguin or Panda. (Yeah, I know…that was a bad pun.) I will commonly get people emailing me telling me that Hummingbird destroyed their rankings. I would say that in almost every case that I have evalutated, this was not true. Google made their announcement about Hummingbird on September 26, 2013. However, at that time, they announced that Hummingbird had already been live for about a month. If the Hummingbird algorithm was truly responsible for catastrophic ranking fluctuations then we really should have seen an outcry from the SEO world of something drastic happening in August of 2013, and this did not happen. There did seem to be some type of fluctuation that happened around August 21 as reported here on Search Engine Round Table, but there were not many sites that reported huge ranking changes on that day.

If you think that Hummingbird affected you, it’s not a bad idea to look at your traffic to see if you noticed a drop on October 4, 2013 which was actually a refresh of the Penguin algorithm. I believe that a lot of people who thought that they were affected by Hummingbird were actually affected by Penguin which happened just a week after Google made their announcement about Hummingbird.

There are some excellent articles on Hummingbird here and here. Hummingbird was a complete overhaul of the entire Google algorithm. As Danny Sullivan put it, if you consider the Google algorithm as an engine, Panda and Penguin are algorithm changes that were like putting a new part in the engine such as a filter or a fuel pump. But, Hummingbird wasn’t just a new part; it was a completely new engine. That new engine still makes use of many of the old parts (such as Panda and Penguin) but a good amount of the engine is completely original.

The goal of the Hummingbird algorithm is for Google to better understand a user’s query. Bill Slawski who writes about Google patents has a great example of this in his post here. He explains that when someone searches for “What is the best place to find and eat Chicago deep dish style pizza?”, Hummingbird is able to discern that by “place” the user likely would be interested in results that show “restaurants”. There is speculation that these changes were necessary in order for Google’s voice search to be more effective. When we’re typing a search query, we might type, “best Seattle SEO company” but when we’re speaking a query (i.e. via Google Glass or via Google Now) we’re more likely to say something like, “Which firm in Seattle offers the best SEO services?” The point of Hummingbird is to better understand what users mean when they have queries like this.

So how do I recover or improve in the eyes of Hummingbird?

If you read the posts referenced above, the answer to this question is essentially to create content that answers users queries rather than just trying to rank for a particular keyword. But really, this is what you should already be doing!

It appears that Google’s goal with all of these algorithm changes (Panda, Penguin and Hummingbird) is to encourage webmasters to publish content that is the best of its kind. Google’s goal is to deliver answers to people who are searching. If you can produce content that answers people’s questions, then you’re on the right track.

I know that that is a really vague answer when it comes to “recovering” from Hummingbird. Hummingbird really is different than Panda and Penguin. When a site has been demoted by the Panda or Penguin algorithm, it’s because Google has lost some trust in the site’s quality, whether it is on-site quality or the legitimacy of its backlinks. If you fix those quality issues you can regain the algorithm’s trust and subsequently see improvements. But, if your site seems to be doing poorly since the launch of Hummingbird, then there really isn’t a way to recover those keyword rankings that you once held. You can, however, get new traffic by finding ways to be more thorough and complete in what your website offers.

Do you have more questions?

My goal in writing this article was to have a resource to point people to when they had basic questions about Panda, Penguin and Hummingbird. Recently, when I published my penalty newsletter, I had a small business owner comment that it was very interesting but that most of it went over their head. I realized that many people outside of the SEO world are greatly affected by these algorithm changes, but don’t have much information on why they have affected their website.

Do you have more questions about Panda, Penguin or Hummingbird? If so, I’d be happy to address them in the comments. I also would love for those of you who are experienced with dealing with websites affected by these issues to comment as well.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from feedproxy.google.com