The Linkbait Bump: How Viral Content Creates Long-Term Lift in Organic Traffic – Whiteboard Friday

Posted by randfish

A single fantastic (or “10x”) piece of content can lift a site’s traffic curves long beyond the popularity of that one piece. In today’s Whiteboard Friday, Rand talks about why those curves settle into a “new normal,” and how you can go about creating the content that drives that change.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re chatting about the linkbait bump, classic phrase in the SEO world and almost a little dated. I think today we’re talking a little bit more about viral content and how high-quality content, content that really is the cornerstone of a brand or a website’s content can be an incredible and powerful driver of traffic, not just when it initially launches but over time.

So let’s take a look.

This is a classic linkbait bump, viral content bump analytics chart. I’m seeing over here my traffic and over here the different months of the year. You know, January, February, March, like I’m under a thousand. Maybe I’m at 500 visits or something, and then I have this big piece of viral content. It performs outstandingly well from a relative standpoint for my site. It gets 10,000 or more visits, drives a ton more people to my site, and then what happens is that that traffic falls back down. But the new normal down here, new normal is higher than the old normal was. So the new normal might be at 1,000, 1,500 or 2,000 visits whereas before I was at 500.

Why does this happen?

A lot of folks see an analytics chart like this, see examples of content that’s done this for websites, and they want to know: Why does this happen and how can I replicate that effect? The reasons why are it sort of feeds back into that viral loop or the flywheel, which we’ve talked about in previous Whiteboard Fridays, where essentially you start with a piece of content. That content does well, and then you have things like more social followers on your brand’s accounts. So now next time you go to amplify content or share content socially, you’re reaching more potential people. You have a bigger audience. You have more people who share your content because they’ve seen that that content performs well for them in social. So they want to find other content from you that might help their social accounts perform well.

You see more RSS and email subscribers because people see your interesting content and go, “Hey, I want to see when these guys produce something else.” You see more branded search traffic because people are looking specifically for content from you, not necessarily just around this viral piece, although that’s often a big part of it, but around other pieces as well, especially if you do a good job of exposing them to that additional content. You get more bookmark and type in traffic, more searchers biased by personalization because they’ve already visited your site. So now when they search and they’re logged into their accounts, they’re going to see your site ranking higher than they normally would otherwise, and you get an organic SEO lift from all the links and shares and engagement.

So there’s a ton of different factors that feed into this, and you kind of want to hit all of these things. If you have a piece of content that gets a lot of shares, a lot of links, but then doesn’t promote engagement, doesn’t get more people signing up, doesn’t get more people searching for your brand or searching for that content specifically, then it’s not going to have the same impact. Your traffic might fall further and more quickly.

How do you achieve this?

How do we get content that’s going to do this? Well, we’re going to talk through a number of things that we’ve talked about previously on Whiteboard Friday. But there are some additional ones as well. This isn’t just creating good content or creating high quality content, it’s creating a particular kind of content. So for this what you want is a deep understanding, not necessarily of what your standard users or standard customers are interested in, but a deep understanding of what influencers in your niche will share and promote and why they do that.

This often means that you follow a lot of sharers and influencers in your field, and you understand, hey, they’re all sharing X piece of content. Why? Oh, because it does this, because it makes them look good, because it helps their authority in the field, because it provides a lot of value to their followers, because they know it’s going to get a lot of retweets and shares and traffic. Whatever that because is, you have to have a deep understanding of it in order to have success with viral kinds of content.

Next, you want to have empathy for users and what will give them the best possible experience. So if you know, for example, that a lot of people are coming on mobile and are going to be sharing on mobile, which is true of almost all viral content today, FYI, you need to be providing a great mobile and desktop experience. Oftentimes that mobile experience has to be different, not just responsive design, but actually a different format, a different way of being able to scroll through or watch or see or experience that content.

There are some good examples out there of content that does that. It makes a very different user experience based on the browser or the device you’re using.

You also need to be aware of what will turn them off. So promotional messages, pop-ups, trying to sell to them, oftentimes that diminishes user experience. It means that content that could have been more viral, that could have gotten more shares won’t.

Unique value and attributes that separate your content from everything else in the field. So if there’s like ABCD and whoa, what’s that? That’s very unique. That stands out from the crowd. That provides a different form of value in a different way than what everyone else is doing. That uniqueness is often a big reason why content spreads virally, why it gets more shared than just the normal stuff.

I’ve talk about this a number of times, but content that’s 10X better than what the competition provides. So unique value from the competition, but also quality that is not just a step up, but 10X better, massively, massively better than what else you can get out there. That makes it unique enough. That makes it stand out from the crowd, and that’s a very hard thing to do, but that’s why this is so rare and so valuable.

This is a critical one, and I think one that, I’ll just say, many organizations fail at. That is the freedom and support to fail many times, to try to create these types of effects, to have this impact many times before you hit on a success. A lot of managers and clients and teams and execs just don’t give marketing teams and content teams the freedom to say, “Yeah, you know what? You spent a month and developer resources and designer resources and spent some money to go do some research and contracted with this third party, and it wasn’t a hit. It didn’t work. We didn’t get the viral content bump. It just kind of did okay. You know what? We believe in you. You’ve got a lot of chances. You should try this another 9 or 10 times before we throw it out. We really want to have a success here.”

That is something that very few teams invest in. The powerful thing is because so few people are willing to invest that way, the ones that do, the ones that believe in this, the ones that invest long term, the ones that are willing to take those failures are going to have a much better shot at success, and they can stand out from the crowd. They can get these bumps. It’s powerful.

Not a requirement, but it really, really helps to have a strong engaged community, either on your site and around your brand, or at least in your niche and your topic area that will help, that wants to see you, your brand, your content succeed. If you’re in a space that has no community, I would work on building one, even if it’s very small. We’re not talking about building a community of thousands or tens of thousands. A community of 100 people, a community of 50 people even can be powerful enough to help content get that catalyst, that first bump that’ll boost it into viral potential.

Then finally, for this type of content, you need to have a logical and not overly promotional match between your brand and the content itself. You can see many sites in what I call sketchy niches. So like a criminal law site or a casino site or a pharmaceutical site that’s offering like an interactive musical experience widget, and you’re like, “Why in the world is this brand promoting this content? Why did they even make it? How does that match up with what they do? Oh, it’s clearly just intentionally promotional.”

Look, many of these brands go out there and they say, “Hey, the average web user doesn’t know and doesn’t care.” I agree. But the average web user is not an influencer. Influencers know. Well, they’re very, very suspicious of why content is being produced and promoted, and they’re very skeptical of promoting content that they don’t think is altruistic. So this kills a lot of content for brands that try and invest in it when there’s no match. So I think you really need that.

Now, when you do these linkbait bump kinds of things, I would strongly recommend that you follow up, that you consider the quality of the content that you’re producing. Thereafter, that you invest in reproducing these resources, keeping those resources updated, and that you don’t simply give up on content production after this. However, if you’re a small business site, a small or medium business, you might think about only doing one or two of these a year. If you are a heavy content player, you’re doing a lot of content marketing, content marketing is how you’re investing in web traffic, I’d probably be considering these weekly or monthly at the least.

All right, everyone. Look forward to your experiences with the linkbait bump, and I will see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Controlling Search Engine Crawlers for Better Indexation and Rankings – Whiteboard Friday

Posted by randfish

When should you disallow search engines in your robots.txt file, and when should you use meta robots tags in a page header? What about nofollowing links? In today’s Whiteboard Friday, Rand covers these tools and their appropriate use in four situations that SEOs commonly find themselves facing.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to talk about controlling search engine crawlers, blocking bots, sending bots where we want, restricting them from where we don’t want them to go. We’re going to talk a little bit about crawl budget and what you should and shouldn’t have indexed.

As a start, what I want to do is discuss the ways in which we can control robots. Those include the three primary ones: robots.txt, meta robots, and—well, the nofollow tag is a little bit less about controlling bots.

There are a few others that we’re going to discuss as well, including Webmaster Tools (Search Console) and URL status codes. But let’s dive into those first few first.

Robots.txt lives at yoursite.com/robots.txt, it tells crawlers what they should and shouldn’t access, it doesn’t always get respected by Google and Bing. So a lot of folks when you say, “hey, disallow this,” and then you suddenly see those URLs popping up and you’re wondering what’s going on, look—Google and Bing oftentimes think that they just know better. They think that maybe you’ve made a mistake, they think “hey, there’s a lot of links pointing to this content, there’s a lot of people who are visiting and caring about this content, maybe you didn’t intend for us to block it.” The more specific you get about an individual URL, the better they usually are about respecting it. The less specific, meaning the more you use wildcards or say “everything behind this entire big directory,” the worse they are about necessarily believing you.

Meta robots—a little different—that lives in the headers of individual pages, so you can only control a single page with a meta robots tag. That tells the engines whether or not they should keep a page in the index, and whether they should follow the links on that page, and it’s usually a lot more respected, because it’s at an individual-page level; Google and Bing tend to believe you about the meta robots tag.

And then the nofollow tag, that lives on an individual link on a page. It doesn’t tell engines where to crawl or not to crawl. All it’s saying is whether you editorially vouch for a page that is being linked to, and whether you want to pass the PageRank and link equity metrics to that page.

Interesting point about meta robots and robots.txt working together (or not working together so well)—many, many folks in the SEO world do this and then get frustrated.

What if, for example, we take a page like “blogtest.html” on our domain and we say “all user agents, you are not allowed to crawl blogtest.html. Okay—that’s a good way to keep that page away from being crawled, but just because something is not crawled doesn’t necessarily mean it won’t be in the search results.

So then we have our SEO folks go, “you know what, let’s make doubly sure that doesn’t show up in search results; we’ll put in the meta robots tag:”

<meta name="robots" content="noindex, follow">

So, “noindex, follow” tells the search engine crawler they can follow the links on the page, but they shouldn’t index this particular one.

Then, you go and run a search for “blog test” in this case, and everybody on the team’s like “What the heck!? WTF? Why am I seeing this page show up in search results?”

The answer is, you told the engines that they couldn’t crawl the page, so they didn’t. But they are still putting it in the results. They’re actually probably not going to include a meta description; they might have something like “we can’t include a meta description because of this site’s robots.txt file.” The reason it’s showing up is because they can’t see the noindex; all they see is the disallow.

So, if you want something truly removed, unable to be seen in search results, you can’t just disallow a crawler. You have to say meta “noindex” and you have to let them crawl it.

So this creates some complications. Robots.txt can be great if we’re trying to save crawl bandwidth, but it isn’t necessarily ideal for preventing a page from being shown in the search results. I would not recommend, by the way, that you do what we think Twitter recently tried to do, where they tried to canonicalize www and non-www by saying “Google, don’t crawl the www version of twitter.com.” What you should be doing is rel canonical-ing or using a 301.

Meta robots—that can allow crawling and link-following while disallowing indexation, which is great, but it requires crawl budget and you can still conserve indexing.

The nofollow tag, generally speaking, is not particularly useful for controlling bots or conserving indexation.

Webmaster Tools (now Google Search Console) has some special things that allow you to restrict access or remove a result from the search results. For example, if you have 404’d something or if you’ve told them not to crawl something but it’s still showing up in there, you can manually say “don’t do that.” There are a few other crawl protocol things that you can do.

And then URL status codes—these are a valid way to do things, but they’re going to obviously change what’s going on on your pages, too.

If you’re not having a lot of luck using a 404 to remove something, you can use a 410 to permanently remove something from the index. Just be aware that once you use a 410, it can take a long time if you want to get that page re-crawled or re-indexed, and you want to tell the search engines “it’s back!” 410 is permanent removal.

301—permanent redirect, we’ve talked about those here—and 302, temporary redirect.

Now let’s jump into a few specific use cases of “what kinds of content should and shouldn’t I allow engines to crawl and index” in this next version…

[Rand moves at superhuman speed to erase the board and draw part two of this Whiteboard Friday. Seriously, we showed Roger how fast it was, and even he was impressed.]

Four crawling/indexing problems to solve

So we’ve got these four big problems that I want to talk about as they relate to crawling and indexing.

1. Content that isn’t ready yet

The first one here is around, “If I have content of quality I’m still trying to improve—it’s not yet ready for primetime, it’s not ready for Google, maybe I have a bunch of products and I only have the descriptions from the manufacturer and I need people to be able to access them, so I’m rewriting the content and creating unique value on those pages… they’re just not ready yet—what should I do with those?”

My options around crawling and indexing? If I have a large quantity of those—maybe thousands, tens of thousands, hundreds of thousands—I would probably go the robots.txt route. I’d disallow those pages from being crawled, and then eventually as I get (folder by folder) those sets of URLs ready, I can then allow crawling and maybe even submit them to Google via an XML sitemap.

If I’m talking about a small quantity—a few dozen, a few hundred pages—well, I’d probably just use the meta robots noindex, and then I’d pull that noindex off of those pages as they are made ready for Google’s consumption. And then again, I would probably use the XML sitemap and start submitting those once they’re ready.

2. Dealing with duplicate or thin content

What about, “Should I noindex, nofollow, or potentially disallow crawling on largely duplicate URLs or thin content?” I’ve got an example. Let’s say I’m an ecommerce shop, I’m selling this nice Star Wars t-shirt which I think is kind of hilarious, so I’ve got starwarsshirt.html, and it links out to a larger version of an image, and that’s an individual HTML page. It links out to different colors, which change the URL of the page, so I have a gray, blue, and black version. Well, these four pages are really all part of this same one, so I wouldn’t recommend disallowing crawling on these, and I wouldn’t recommend noindexing them. What I would do there is a rel canonical.

Remember, rel canonical is one of those things that can be precluded by disallowing. So, if I were to disallow these from being crawled, Google couldn’t see the rel canonical back, so if someone linked to the blue version instead of the default version, now I potentially don’t get link credit for that. So what I really want to do is use the rel canonical, allow the indexing, and allow it to be crawled. If you really feel like it, you could also put a meta “noindex, follow” on these pages, but I don’t really think that’s necessary, and again that might interfere with the rel canonical.

3. Passing link equity without appearing in search results

Number three: “If I want to pass link equity (or at least crawling) through a set of pages without those pages actually appearing in search results—so maybe I have navigational stuff, ways that humans are going to navigate through my pages, but I don’t need those appearing in search results—what should I use then?”

What I would say here is, you can use the meta robots to say “don’t index the page, but do follow the links that are on that page.” That’s a pretty nice, handy use case for that.

Do NOT, however, disallow those in robots.txt—many, many folks make this mistake. What happens if you disallow crawling on those, Google can’t see the noindex. They don’t know that they can follow it. Granted, as we talked about before, sometimes Google doesn’t obey the robots.txt, but you can’t rely on that behavior. Trust that the disallow in robots.txt will prevent them from crawling. So I would say, the meta robots “noindex, follow” is the way to do this.

4. Search results-type pages

Finally, fourth, “What should I do with search results-type pages?” Google has said many times that they don’t like your search results from your own internal engine appearing in their search results, and so this can be a tricky use case.

Sometimes a search result page—a page that lists many types of results that might come from a database of types of content that you’ve got on your site—could actually be a very good result for a searcher who is looking for a wide variety of content, or who wants to see what you have on offer. Yelp does this: When you say, “I’m looking for restaurants in Seattle, WA,” they’ll give you what is essentially a list of search results, and Google does want those to appear because that page provides a great result. But you should be doing what Yelp does there, and make the most common or popular individual sets of those search results into category-style pages. A page that provides real, unique value, that’s not just a list of search results, that is more of a landing page than a search results page.

However, that being said, if you’ve got a long tail of these, or if you’d say “hey, our internal search engine, that’s really for internal visitors only—it’s not useful to have those pages show up in search results, and we don’t think we need to make the effort to make those into category landing pages.” Then you can use the disallow in robots.txt to prevent those.

Just be cautious here, because I have sometimes seen an over-swinging of the pendulum toward blocking all types of search results, and sometimes that can actually hurt your SEO and your traffic. Sometimes those pages can be really useful to people. So check your analytics, and make sure those aren’t valuable pages that should be served up and turned into landing pages. If you’re sure, then go ahead and disallow all your search results-style pages. You’ll see a lot of sites doing this in their robots.txt file.

That being said, I hope you have some great questions about crawling and indexing, controlling robots, blocking robots, allowing robots, and I’ll try and tackle those in the comments below.

We’ll look forward to seeing you again next week for another edition of Whiteboard Friday. Take care!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Case Study: How I Turned Autocomplete Ideas into Traffic &amp; Ranking Results with Only 5 Hours of Effort

Posted by jamiejpress

Many of us have known for a while that Google Autocomplete can be a useful tool for identifying keyword opportunities. But did you know it is also an extremely powerful tool for content ideation?

And by pushing the envelope a little further, you can turn an Autocomplete topic from a good content idea into a link-building, traffic-generating powerhouse for your website.

Here’s how I did it for one of my clients. They are in the diesel power generator industry in the Australian market, but you can use this same process for businesses in literally any industry and market you can think of.

Step 1: Find the spark of an idea using Google Autocomplete

I start by seeking out long-tail keyword ideas from Autocomplete. By typing in some of my client’s core keywords, I come across one that sparked my interest in particular—diesel generator fuel consumption.

What’s more, the Google AdWords Keyword Planner says it is a high competition term. So advertisers are prepared to spend good money on this phrase—all the better to try to rank well organically for the term. We want to get the traffic without incurring the click costs.

keyword_planner.png

Step 2: Check the competition and find an edge

Next, we find out what pages rank well for the phrase, and then identify how we can do better, with user experience top of mind.

In the case of “diesel generator fuel consumption” in Google.com.au, the top-ranking page is this one: a US-focused piece of content using gallons instead of litres.

top_ranking_page.png

This observation, paired with the fact that the #2 Autocomplete suggestion was “diesel generator fuel consumption in litres” gives me the right slant for the content that will give us the edge over the top competing page: Why not create a table using metric measurements instead of imperial measurements for our Australian audience?

So that’s what I do.

I work with the client to gather the information and create the post on the their website. Also, I insert the target phrase in the page title, meta description, URL, and once in the body content. We also create a PDF downloadable with similar content.

client_content.png

Note: While figuring out how to make product/service pages better than those of competitors is the age-old struggle when it comes to working on core SEO keywords, with longer-tail keywords like the ones you work with using this tactic, users generally want detailed information, answers to questions, or implementable tips. So it makes it a little easier to figure out how you can do it better by putting yourself in the user’s shoes.

Step 3: Find the right way to market the content

If people are searching for the term in Google, then there must also be people on forums asking about it.

A quick search through Quora, Reddit and an other forums brings up some relevant threads. I engage with the users in these forums and add non-spammy, helpful no-followed links to our new content in answering their questions.

Caveat: Forum marketing has had a bad reputation for some time, and rightly so, as SEOs have abused the tactic. Before you go linking to your content in forums, I strongly recommend you check out this resource on the right way to engage in forum marketing.

Okay, what about the results?

Since I posted the page in December 2014, referral traffic from the forums has been picking up speed; organic traffic to the page keeps building, too.

referral_traffic.png

organic_traffic.jpg

Yeah, yeah, but what about keyword rankings?

While we’re yet to hit the top-ranking post off its perch (give us time!), we are sitting at #2 and #3 in the search results as I write this. So it looks like creating that downloadable PDF paid off.

ranking.jpg

All in all, this tactic took minimal time to plan and execute—content ideation, research and creation (including the PDF version) took three hours, while link building research and implementation took an additional two hours. That’s only five hours, yet the payoff for the client is already evident, and will continue to grow in the coming months.

Why not take a crack at using this technique yourself? I would love to hear how your ideas about how you could use it to benefit your business or clients.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

The Meta Referrer Tag: An Advancement for SEO and the Internet

Posted by Cyrus-Shepard

The movement to make the Internet more secure through HTTPS brings several useful advancements for webmasters. In addition to security improvements, HTTPS promises future technological advances and potential SEO benefits for marketers.

HTTPS in search results is rising. Recent MozCast data from Dr. Pete shows nearly 20% of first page Google results are now HTTPS.

Sadly, HTTPS also has its downsides.

Marketers run into their first challenge when they switch regular HTTP sites over to HTTPS. Technically challenging, the switch typically involves routing your site through a series of 301 redirects. Historically, these types of redirects are associated with a loss of link equity (thought to be around 15%) which can lead to a loss in rankings. This can offset any SEO advantage that Google claims switching.

Ross Hudgens perfectly summed it up in this tweet:

Many SEOs have anecdotally shared stories of HTTPS sites performing well in Google search results (and our soon-to-be-published Ranking Factors data seems to support this.) However, the short term effect of a large migration can be hard to take. When Moz recently switched to HTTPS to provide better security to our logged-in users, we saw an 8-9% dip in our organic search traffic.

Problem number two is the subject of this post. It involves the loss of referral data. Typically, when one site sends traffic to another, information is sent that identifies the originating site as the source of traffic. This invaluable data allows people to see where their traffic is coming from, and helps spread the flow of information across the web.

SEOs have long used referrer data for a number of beneficial purposes. Oftentimes, people will link back or check out the site sending traffic when they see the referrer in their analytics data. Spammers know this works, as evidenced by the recent increase in referrer spam:

This process stops when traffic flows from an HTTPS site to a non-secure HTTP site. In this case, no referrer data is sent. Webmasters can’t know where their traffic is coming from.

Here’s how referral data to my personal site looked when Moz switched to HTTPS. I lost all visibility into where my traffic came from.

Its (not provided) all over again!

Enter the meta referrer tag

While we can’t solve the ranking challenges imposed by switching a site to HTTPS, we can solve the loss of referral data, and it’s actually super-simple.

Almost completely unknown to most marketers, the relatively new meta referrer tag (it’s actually been around for a few years) was designed to help out in these situations.

Better yet, the tag allows you to control how your referrer information is passed.

The meta referrer tag works with most browsers to pass referrer information in a manner defined by the user. Traffic remains encrypted and all the benefits of using HTTPS remain in place, but now you can pass referrer data to all websites, even those that use HTTP.

How to use the meta referrer tag

What follows are extremely simplified instructions for using the meta referrer tag. For more in-depth understanding, we highly recommend referring to the W3C working draft of the spec.

The meta referrer tag is placed in the <head> section of your HTML, and references one of five states, which control how browsers send referrer information from your site. The five states are:

  1. None: Never pass referral data
    <meta name="referrer" content="none">
    
  2. None When Downgrade: Sends referrer information to secure HTTPS sites, but not insecure HTTP sites
    <meta name="referrer" content="none-when-downgrade">
    
  3. Origin Only: Sends the scheme, host, and port (basically, the subdomain) stripped of the full URL as a referrer, i.e. https://moz.com/example.html would simply send https://moz.com
    <meta name="referrer" content="origin">
    

  4. Origin When Cross-Origin: Sends the full URL as the referrer when the target has the same scheme, host, and port (i.e. subdomain) regardless if it’s HTTP or HTTPS, while sending origin-only referral information to external sites. (note: There is a typo in the official spec. Future versions should be “origin-when-cross-origin”)
    <meta name="referrer" content="origin-when-crossorigin">
    
  5. Unsafe URL: Always passes the URL string as a referrer. Note if you have any sensitive information contained in your URL, this isn’t the safest option. By default, URL fragments, username, and password are automatically stripped out.
    <meta name="referrer" content="unsafe-url">
    

The meta referrer tag in action

By clicking the link below, you can get a sense of how the meta referrer tag works.

Check Referrer

Boom!

We’ve set the meta referrer tag for Moz to “origin”, which means when we link out to another site, we pass our scheme, host, and port. The end result is you see http://moz.com as the referrer, stripped of the full URL path (/meta-referrer-tag).

My personal site typically receives several visits per day from Moz. Here’s what my analytics data looked like before and after we implemented the meta referrer tag.

For simplicity and security, most sites may want to implement the “origin” state, but there are drawbacks.

One negative side effect was that as soon as we implemented the meta referrer tag, our AdRoll analytics, which we use for retargeting, stopped working. It turns out that AdRoll uses our referrer information for analytics, but the meta referrer tag “origin” state meant that the only URL they ever saw reported was https://moz.com.

Conclusion

We love the meta referrer tag because it keeps information flowing on the Internet. It’s the way the web is supposed to work!

It helps marketers and webmasters see exactly where their traffic is coming from. It encourages engagement, communication, and even linking, which can lead to improvements in SEO.

Useful links:

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How to Use Server Log Analysis for Technical SEO

Posted by SamuelScott

It’s ten o’clock. Do you know where your logs are?

I’m introducing this guide with a pun on a common public-service announcement that has run on late-night TV news broadcasts in the United States because log analysis is something that is extremely newsworthy and important.

If your technical and on-page SEO is poor, then nothing else that you do will matter. Technical SEO is the key to helping search engines to crawl, parse, and index websites, and thereby rank them appropriately long before any marketing work begins.

The important thing to remember: Your log files contain the only data that is 100% accurate in terms of how search engines are crawling your website. By helping Google to do its job, you will set the stage for your future SEO work and make your job easier. Log analysis is one facet of technical SEO, and correcting the problems found in your logs will help to lead to higher rankings, more traffic, and more conversions and sales.

Here are just a few reasons why:

  • Too many response code errors may cause Google to reduce its crawling of your website and perhaps even your rankings.
  • You want to make sure that search engines are crawling everything, new and old, that you want to appear and rank in the SERPs (and nothing else).
  • It’s crucial to ensure that all URL redirections will pass along any incoming “link juice.”

However, log analysis is something that is unfortunately discussed all too rarely in SEO circles. So, here, I wanted to give the Moz community an introductory guide to log analytics that I hope will help. If you have any questions, feel free to ask in the comments!

What is a log file?

Computer servers, operating systems, network devices, and computer applications automatically generate something called a log entry whenever they perform an action. In a SEO and digital marketing context, one type of action is whenever a page is requested by a visiting bot or human.

Server log entries are specifically programmed to be output in the Common Log Format of the W3C consortium. Here is one example from Wikipedia with my accompanying explanations:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • 127.0.0.1 — The remote hostname. An IP address is shown, like in this example, whenever the DNS hostname is not available or DNSLookup is turned off.
  • user-identifier — The remote logname / RFC 1413 identity of the user. (It’s not that important.)
  • frank — The user ID of the person requesting the page. Based on what I see in my Moz profile, Moz’s log entries would probably show either “SamuelScott” or “392388” whenever I visit a page after having logged in.
  • [10/Oct/2000:13:55:36 -0700] — The date, time, and timezone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP/1.0 — “GET” is one of the two commands (the other is “POST”) that can be performed. “GET” fetches a URL while “POST” is submitting something (such as a forum comment). The second part is the URL that is being accessed, and the last part is the version of HTTP that is being accessed.
  • 200 — The status code of the document that was returned.
  • 2326 — The size, in bytes, of the document that was returned.

Note: A hyphen is shown in a field when that information is unavailable.

Every single time that you — or the Googlebot — visit a page on a website, a line with this information is output, recorded, and stored by the server.

Log entries are generated continuously and anywhere from several to thousands can be created every second — depending on the level of a given server, network, or application’s activity. A collection of log entries is called a log file (or often in slang, “the log” or “the logs”), and it is displayed with the most-recent log entry at the bottom. Individual log files often contain a calendar day’s worth of log entries.

Accessing your log files

Different types of servers store and manage their log files differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers:

What is log analysis?

Log analysis (or log analytics) is the process of going through log files to learn something from the data. Some common reasons include:

  • Development and quality assurance (QA) — Creating a program or application and checking for problematic bugs to make sure that it functions properly
  • Network troubleshooting — Responding to and fixing system errors in a network
  • Customer service — Determining what happened when a customer had a problem with a technical product
  • Security issues — Investigating incidents of hacking and other intrusions
  • Compliance matters — Gathering information in response to corporate or government policies
  • Technical SEO — This is my favorite! More on that in a bit.

Log analysis is rarely performed regularly. Usually, people go into log files only in response to something — a bug, a hack, a subpoena, an error, or a malfunction. It’s not something that anyone wants to do on an ongoing basis.

Why? This is a screenshot of ours of just a very small part of an original (unstructured) log file:

Ouch. If a website gets 10,000 visitors who each go to ten pages per day, then the server will create a log file every day that will consist of 100,000 log entries. No one has the time to go through all of that manually.

How to do log analysis

There are three general ways to make log analysis easier in SEO or any other context:

  • Do-it-yourself in Excel
  • Proprietary software such as Splunk or Sumo-logic
  • The ELK Stack open-source software

Tim Resnik’s Moz essay from a few years ago walks you through the process of exporting a batch of log files into Excel. This is a (relatively) quick and easy way to do simple log analysis, but the downside is that one will see only a snapshot in time and not any overall trends. To obtain the best data, it’s crucial to use either proprietary tools or the ELK Stack.

Splunk and Sumo-Logic are proprietary log analysis tools that are primarily used by enterprise companies. The ELK Stack is a free and open-source batch of three platforms (Elasticsearch, Logstash, and Kibana) that is owned by Elastic and used more often by smaller businesses. (Disclosure: We at Logz.io use the ELK Stack to monitor our own internal systems as well as for the basis of our own log management software.)

For those who are interested in using this process to do technical SEO analysis, monitor system or application performance, or for any other reason, our CEO, Tomer Levy, has written a guide to deploying the ELK Stack.

Technical SEO insights in log data

However you choose to access and understand your log data, there are many important technical SEO issues to address as needed. I’ve included screenshots of our technical SEO dashboard with our own website’s data to demonstrate what to examine in your logs.

Bot crawl volume

It’s important to know the number of requests made by Baidu, BingBot, GoogleBot, Yahoo, Yandex, and others over a given period time. If, for example, you want to get found in search in Russia but Yandex is not crawling your website, that is a problem. (You’d want to consult Yandex Webmaster and see this article on Search Engine Land.)

Response code errors

Moz has a great primer on the meanings of the different status codes. I have an alert system setup that tells me about 4XX and 5XX errors immediately because those are very significant.

Temporary redirects

Temporary 302 redirects do not pass along the “link juice” of external links from the old URL to the new one. Almost all of the time, they should be changed to permanent 301 redirects.

Crawl budget waste

Google assigns a crawl budget to each website based on numerous factors. If your crawl budget is, say, 100 pages per day (or the equivalent amount of data), then you want to be sure that all 100 are things that you want to appear in the SERPs. No matter what you write in your robots.txt file and meta-robots tags, you might still be wasting your crawl budget on advertising landing pages, internal scripts, and more. The logs will tell you — I’ve outlined two script-based examples in red above.

If you hit your crawl limit but still have new content that should be indexed to appear in search results, Google may abandon your site before finding it.

Duplicate URL crawling

The addition of URL parameters — typically used in tracking for marketing purposes — often results in search engines wasting crawl budgets by crawling different URLs with the same content. To learn how to address this issue, I recommend reading the resources on Google and Search Engine Land here, here, here, and here.

Crawl priority

Google might be ignoring (and not crawling or indexing) a crucial page or section of your website. The logs will reveal what URLs and/or directories are getting the most and least attention. If, for example, you have published an e-book that attempts to rank for targeted search queries but it sits in a directory that Google only visits once every six months, then you won’t get any organic search traffic from the e-book for up to six months.

If a part of your website is not being crawled very often — and it is updated often enough that it should be — then you might need to check your internal-linking structure and the crawl-priority settings in your XML sitemap.

Last crawl date

Have you uploaded something that you hope will be indexed quickly? The log files will tell you when Google has crawled it.

Crawl budget

One thing I personally like to check and see is Googlebot’s real-time activity on our site because the crawl budget that the search engine assigns to a website is a rough indicator — a very rough one — of how much it “likes” your site. Google ideally does not want to waste valuable crawling time on a bad website. Here, I had seen that Googlebot had made 154 requests of our new startup’s website over the prior twenty-four hours. Hopefully, that number will go up!

As I hope you can see, log analysis is critically important in technical SEO. It’s eleven o’clock — do you know where your logs are now?

Additional resources

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it