Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking – Whiteboard Friday

Posted by randfish

There’s no doubt that quite a bit has changed about SEO, and that the field is far more integrated with other aspects of online marketing than it once was. In today’s Whiteboard Friday, Rand pushes back against the idea that effective modern SEO doesn’t require any technical expertise, outlining a fantastic list of technical elements that today’s SEOs need to know about in order to be truly effective.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week I’m going to do something unusual. I don’t usually point out these inconsistencies or sort of take issue with other folks’ content on the web, because I generally find that that’s not all that valuable and useful. But I’m going to make an exception here.

There is an article by Jayson DeMers, who I think might actually be here in Seattle — maybe he and I can hang out at some point — called “Why Modern SEO Requires Almost No Technical Expertise.” It was an article that got a shocking amount of traction and attention. On Facebook, it has thousands of shares. On LinkedIn, it did really well. On Twitter, it got a bunch of attention.

Some folks in the SEO world have already pointed out some issues around this. But because of the increasing popularity of this article, and because I think there’s, like, this hopefulness from worlds outside of kind of the hardcore SEO world that are looking to this piece and going, “Look, this is great. We don’t have to be technical. We don’t have to worry about technical things in order to do SEO.”

Look, I completely get the appeal of that. I did want to point out some of the reasons why this is not so accurate. At the same time, I don’t want to rain on Jayson, because I think that it’s very possible he’s writing an article for Entrepreneur, maybe he has sort of a commitment to them. Maybe he had no idea that this article was going to spark so much attention and investment. He does make some good points. I think it’s just really the title and then some of the messages inside there that I take strong issue with, and so I wanted to bring those up.

First off, some of the good points he did bring up.

One, he wisely says, “You don’t need to know how to code or to write and read algorithms in order to do SEO.” I totally agree with that. If today you’re looking at SEO and you’re thinking, “Well, am I going to get more into this subject? Am I going to try investing in SEO? But I don’t even know HTML and CSS yet.”

Those are good skills to have, and they will help you in SEO, but you don’t need them. Jayson’s totally right. You don’t have to have them, and you can learn and pick up some of these things, and do searches, watch some Whiteboard Fridays, check out some guides, and pick up a lot of that stuff later on as you need it in your career. SEO doesn’t have that hard requirement.

And secondly, he makes an intelligent point that we’ve made many times here at Moz, which is that, broadly speaking, a better user experience is well correlated with better rankings.

You make a great website that delivers great user experience, that provides the answers to searchers’ questions and gives them extraordinarily good content, way better than what’s out there already in the search results, generally speaking you’re going to see happy searchers, and that’s going to lead to higher rankings.

But not entirely. There are a lot of other elements that go in here. So I’ll bring up some frustrating points around the piece as well.

First off, there’s no acknowledgment — and I find this a little disturbing — that the ability to read and write code, or even HTML and CSS, which I think are the basic place to start, is helpful or can take your SEO efforts to the next level. I think both of those things are true.

So being able to look at a web page, view source on it, or pull up Firebug in Firefox or something and diagnose what’s going on and then go, “Oh, that’s why Google is not able to see this content. That’s why we’re not ranking for this keyword or term, or why even when I enter this exact sentence in quotes into Google, which is on our page, this is why it’s not bringing it up. It’s because it’s loading it after the page from a remote file that Google can’t access.” These are technical things, and being able to see how that code is built, how it’s structured, and what’s going on there, very, very helpful.

Some coding knowledge also can take your SEO efforts even further. I mean, so many times, SEOs are stymied by the conversations that we have with our programmers and our developers and the technical staff on our teams. When we can have those conversations intelligently, because at least we understand the principles of how an if-then statement works, or what software engineering best practices are being used, or they can upload something into a GitHub repository, and we can take a look at it there, that kind of stuff is really helpful.

Secondly, I don’t like that the article overly reduces all of this information that we have about what we’ve learned about Google. So he mentions two sources. One is things that Google tells us, and others are SEO experiments. I think both of those are true. Although I’d add that there’s sort of a sixth sense of knowledge that we gain over time from looking at many, many search results and kind of having this feel for why things rank, and what might be wrong with a site, and getting really good at that using tools and data as well. There are people who can look at Open Site Explorer and then go, “Aha, I bet this is going to happen.” They can look, and 90% of the time they’re right.

So he boils this down to, one, write quality content, and two, reduce your bounce rate. Neither of those things are wrong. You should write quality content, although I’d argue there are lots of other forms of quality content that aren’t necessarily written — video, images and graphics, podcasts, lots of other stuff.

And secondly, that just doing those two things is not always enough. So you can see, like many, many folks look and go, “I have quality content. It has a low bounce rate. How come I don’t rank better?” Well, your competitors, they’re also going to have quality content with a low bounce rate. That’s not a very high bar.

Also, frustratingly, this really gets in my craw. I don’t think “write quality content” means anything. You tell me. When you hear that, to me that is a totally non-actionable, non-useful phrase that’s a piece of advice that is so generic as to be discardable. So I really wish that there was more substance behind that.

The article also makes, in my opinion, the totally inaccurate claim that modern SEO really is reduced to “the happier your users are when they visit your site, the higher you’re going to rank.”

Wow. Okay. Again, I think broadly these things are correlated. User happiness and rank is broadly correlated, but it’s not a one to one. This is not like a, “Oh, well, that’s a 1.0 correlation.”

I would guess that the correlation is probably closer to like the page authority range. I bet it’s like 0.35 or something correlation. If you were to actually measure this broadly across the web and say like, “Hey, were you happier with result one, two, three, four, or five,” the ordering would not be perfect at all. It probably wouldn’t even be close.

There’s a ton of reasons why sometimes someone who ranks on Page 2 or Page 3 or doesn’t rank at all for a query is doing a better piece of content than the person who does rank well or ranks on Page 1, Position 1.

Then the article suggests five and sort of a half steps to successful modern SEO, which I think is a really incomplete list. So Jayson gives us;

  • Good on-site experience
  • Writing good content
  • Getting others to acknowledge you as an authority
  • Rising in social popularity
  • Earning local relevance
  • Dealing with modern CMS systems (which he notes most modern CMS systems are SEO-friendly)

The thing is there’s nothing actually wrong with any of these. They’re all, generally speaking, correct, either directly or indirectly related to SEO. The one about local relevance, I have some issue with, because he doesn’t note that there’s a separate algorithm for sort of how local SEO is done and how Google ranks local sites in maps and in their local search results. Also not noted is that rising in social popularity won’t necessarily directly help your SEO, although it can have indirect and positive benefits.

I feel like this list is super incomplete. Okay, I brainstormed just off the top of my head in the 10 minutes before we filmed this video a list. The list was so long that, as you can see, I filled up the whole whiteboard and then didn’t have any more room. I’m not going to bother to erase and go try and be absolutely complete.

But there’s a huge, huge number of things that are important, critically important for technical SEO. If you don’t know how to do these things, you are sunk in many cases. You can’t be an effective SEO analyst, or consultant, or in-house team member, because you simply can’t diagnose the potential problems, rectify those potential problems, identify strategies that your competitors are using, be able to diagnose a traffic gain or loss. You have to have these skills in order to do that.

I’ll run through these quickly, but really the idea is just that this list is so huge and so long that I think it’s very, very, very wrong to say technical SEO is behind us. I almost feel like the opposite is true.

We have to be able to understand things like;

  • Content rendering and indexability
  • Crawl structure, internal links, JavaScript, Ajax. If something’s post-loading after the page and Google’s not able to index it, or there are links that are accessible via JavaScript or Ajax, maybe Google can’t necessarily see those or isn’t crawling them as effectively, or is crawling them, but isn’t assigning them as much link weight as they might be assigning other stuff, and you’ve made it tough to link to them externally, and so they can’t crawl it.
  • Disabling crawling and/or indexing of thin or incomplete or non-search-targeted content. We have a bunch of search results pages. Should we use rel=prev/next? Should we robots.txt those out? Should we disallow from crawling with meta robots? Should we rel=canonical them to other pages? Should we exclude them via the protocols inside Google Webmaster Tools, which is now Google Search Console?
  • Managing redirects, domain migrations, content updates. A new piece of content comes out, replacing an old piece of content, what do we do with that old piece of content? What’s the best practice? It varies by different things. We have a whole Whiteboard Friday about the different things that you could do with that. What about a big redirect or a domain migration? You buy another company and you’re redirecting their site to your site. You have to understand things about subdomain structures versus subfolders, which, again, we’ve done another Whiteboard Friday about that.
  • Proper error codes, downtime procedures, and not found pages. If your 404 pages turn out to all be 200 pages, well, now you’ve made a big error there, and Google could be crawling tons of 404 pages that they think are real pages, because you’ve made it a status code 200, or you’ve used a 404 code when you should have used a 410, which is a permanently removed, to be able to get it completely out of the indexes, as opposed to having Google revisit it and keep it in the index.

Downtime procedures. So there’s specifically a… I can’t even remember. It’s a 5xx code that you can use. Maybe it was a 503 or something that you can use that’s like, “Revisit later. We’re having some downtime right now.” Google urges you to use that specific code rather than using a 404, which tells them, “This page is now an error.”

Disney had that problem a while ago, if you guys remember, where they 404ed all their pages during an hour of downtime, and then their homepage, when you searched for Disney World, was, like, “Not found.” Oh, jeez, Disney World, not so good.

  • International and multi-language targeting issues. I won’t go into that. But you have to know the protocols there. Duplicate content, syndication, scrapers. How do we handle all that? Somebody else wants to take our content, put it on their site, what should we do? Someone’s scraping our content. What can we do? We have duplicate content on our own site. What should we do?
  • Diagnosing traffic drops via analytics and metrics. Being able to look at a rankings report, being able to look at analytics connecting those up and trying to see: Why did we go up or down? Did we have less pages being indexed, more pages being indexed, more pages getting traffic less, more keywords less?
  • Understanding advanced search parameters. Today, just today, I was checking out the related parameter in Google, which is fascinating for most sites. Well, for Moz, weirdly, related:oursite.com shows nothing. But for virtually every other sit, well, most other sites on the web, it does show some really interesting data, and you can see how Google is connecting up, essentially, intentions and topics from different sites and pages, which can be fascinating, could expose opportunities for links, could expose understanding of how they view your site versus your competition or who they think your competition is.

Then there are tons of parameters, like in URL and in anchor, and da, da, da, da. In anchor doesn’t work anymore, never mind about that one.

I have to go faster, because we’re just going to run out of these. Like, come on. Interpreting and leveraging data in Google Search Console. If you don’t know how to use that, Google could be telling you, you have all sorts of errors, and you don’t know what they are.

  • Leveraging topic modeling and extraction. Using all these cool tools that are coming out for better keyword research and better on-page targeting. I talked about a couple of those at MozCon, like MonkeyLearn. There’s the new Moz Context API, which will be coming out soon, around that. There’s the Alchemy API, which a lot of folks really like and use.
  • Identifying and extracting opportunities based on site crawls. You run a Screaming Frog crawl on your site and you’re going, “Oh, here’s all these problems and issues.” If you don’t have these technical skills, you can’t diagnose that. You can’t figure out what’s wrong. You can’t figure out what needs fixing, what needs addressing.
  • Using rich snippet format to stand out in the SERPs. This is just getting a better click-through rate, which can seriously help your site and obviously your traffic.
  • Applying Google-supported protocols like rel=canonical, meta description, rel=prev/next, hreflang, robots.txt, meta robots, x robots, NOODP, XML sitemaps, rel=nofollow. The list goes on and on and on. If you’re not technical, you don’t know what those are, you think you just need to write good content and lower your bounce rate, it’s not going to work.
  • Using APIs from services like AdWords or MozScape, or hrefs from Majestic, or SEM refs from SearchScape or Alchemy API. Those APIs can have powerful things that they can do for your site. There are some powerful problems they could help you solve if you know how to use them. It’s actually not that hard to write something, even inside a Google Doc or Excel, to pull from an API and get some data in there. There’s a bunch of good tutorials out there. Richard Baxter has one, Annie Cushing has one, I think Distilled has some. So really cool stuff there.
  • Diagnosing page load speed issues, which goes right to what Jayson was talking about. You need that fast-loading page. Well, if you don’t have any technical skills, you can’t figure out why your page might not be loading quickly.
  • Diagnosing mobile friendliness issues
  • Advising app developers on the new protocols around App deep linking, so that you can get the content from your mobile apps into the web search results on mobile devices. Awesome. Super powerful. Potentially crazy powerful, as mobile search is becoming bigger than desktop.

Okay, I’m going to take a deep breath and relax. I don’t know Jayson’s intention, and in fact, if he were in this room, he’d be like, “No, I totally agree with all those things. I wrote the article in a rush. I had no idea it was going to be big. I was just trying to make the broader points around you don’t have to be a coder in order to do SEO.” That’s completely fine.

So I’m not going to try and rain criticism down on him. But I think if you’re reading that article, or you’re seeing it in your feed, or your clients are, or your boss is, or other folks are in your world, maybe you can point them to this Whiteboard Friday and let them know, no, that’s not quite right. There’s a ton of technical SEO that is required in 2015 and will be for years to come, I think, that SEOs have to have in order to be effective at their jobs.

All right, everyone. Look forward to some great comments, and we’ll see you again next time for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

Why We Can’t Do Keyword Research Like It’s 2010 – Whiteboard Friday

Posted by randfish

Keyword Research is a very different field than it was just five years ago, and if we don’t keep up with the times we might end up doing more harm than good. From the research itself to the selection and targeting process, in today’s Whiteboard Friday Rand explains what has changed and what we all need to do to conduct effective keyword research today.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

What do we need to change to keep up with the changing world of keyword research?

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat a little bit about keyword research, why it’s changed from the last five, six years and what we need to do differently now that things have changed. So I want to talk about changing up not just the research but also the selection and targeting process.

There are three big areas that I’ll cover here. There’s lots more in-depth stuff, but I think we should start with these three.

1) The Adwords keyword tool hides data!

This is where almost all of us in the SEO world start and oftentimes end with our keyword research. We go to AdWords Keyword Tool, what used to be the external keyword tool and now is inside AdWords Ad Planner. We go inside that tool, and we look at the volume that’s reported and we sort of record that as, well, it’s not good, but it’s the best we’re going to do.

However, I think there are a few things to consider here. First off, that tool is hiding data. What I mean by that is not that they’re not telling the truth, but they’re not telling the whole truth. They’re not telling nothing but the truth, because those rounded off numbers that you always see, you know that those are inaccurate. Anytime you’ve bought keywords, you’ve seen that the impression count never matches the count that you see in the AdWords tool. It’s not usually massively off, but it’s often off by a good degree, and the only thing it’s great for is telling relative volume from one from another.

But because AdWords hides data essentially by saying like, “Hey, you’re going to type in . . .” Let’s say I’m going to type in “college tuition,” and Google knows that a lot of people search for how to reduce college tuition, but that doesn’t come up in the suggestions because it’s not a commercial term, or they don’t think that an advertiser who bids on that is going to do particularly well and so they don’t show it in there. I’m giving an example. They might indeed show that one.

But because that data is hidden, we need to go deeper. We need to go beyond and look at things like Google Suggest and related searches, which are down at the bottom. We need to start conducting customer interviews and staff interviews, which hopefully has always been part of your brainstorming process but really needs to be now. Then you can apply that to AdWords. You can apply that to suggest and related.

The beautiful thing is once you get these tools from places like visiting forums or communities, discussion boards and seeing what terms and phrases people are using, you can collect all this stuff up, plug it back into AdWords, and now they will tell you how much volume they’ve got. So you take that how to lower college tuition term, you plug it into AdWords, they will show you a number, a non-zero number. They were just hiding it in the suggestions because they thought, “Hey, you probably don’t want to bid on that. That won’t bring you a good ROI.” So you’ve got to be careful with that, especially when it comes to SEO kinds of keyword research.

2) Building separate pages for each term or phrase doesn’t make sense

It used to be the case that we built separate pages for every single term and phrase that was in there, because we wanted to have the maximum keyword targeting that we could. So it didn’t matter to us that college scholarship and university scholarships were essentially people looking for exactly the same thing, just using different terminology. We would make one page for one and one page for the other. That’s not the case anymore.

Today, we need to group by the same searcher intent. If two searchers are searching for two different terms or phrases but both of them have exactly the same intent, they want the same information, they’re looking for the same answers, their query is going to be resolved by the same content, we want one page to serve those, and that’s changed up a little bit of how we’ve done keyword research and how we do selection and targeting as well.

3) Build your keyword consideration and prioritization spreadsheet with the right metrics

Everybody’s got an Excel version of this, because I think there’s just no awesome tool out there that everyone loves yet that kind of solves this problem for us, and Excel is very, very flexible. So we go into Excel, we put in our keyword, the volume, and then a lot of times we almost stop there. We did keyword volume and then like value to the business and then we prioritize.

What are all these new columns you’re showing me, Rand? Well, here I think is how sophisticated, modern SEOs that I’m seeing in the more advanced agencies, the more advanced in-house practitioners, this is what I’m seeing them add to the keyword process.

Difficulty

A lot of folks have done this, but difficulty helps us say, “Hey, this has a lot of volume, but it’s going to be tremendously hard to rank.”

The difficulty score that Moz uses and attempts to calculate is a weighted average of the top 10 domain authorities. It also uses page authority, so it’s kind of a weighted stack out of the two. If you’re seeing very, very challenging pages, very challenging domains to get in there, it’s going to be super hard to rank against them. The difficulty is high. For all of these ones it’s going to be high because college and university terms are just incredibly lucrative.

That difficulty can help bias you against chasing after terms and phrases for which you are very unlikely to rank for at least early on. If you feel like, “Hey, I already have a powerful domain. I can rank for everything I want. I am the thousand pound gorilla in my space,” great. Go after the difficulty of your choice, but this helps prioritize.

Opportunity

This is actually very rarely used, but I think sophisticated marketers are using it extremely intelligently. Essentially what they’re saying is, “Hey, if you look at a set of search results, sometimes there are two or three ads at the top instead of just the ones on the sidebar, and that’s biasing some of the click-through rate curve.” Sometimes there’s an instant answer or a Knowledge Graph or a news box or images or video, or all these kinds of things that search results can be marked up with, that are not just the classic 10 web results. Unfortunately, if you’re building a spreadsheet like this and treating every single search result like it’s just 10 blue links, well you’re going to lose out. You’re missing the potential opportunity and the opportunity cost that comes with ads at the top or all of these kinds of features that will bias the click-through rate curve.

So what I’ve seen some really smart marketers do is essentially build some kind of a framework to say, “Hey, you know what? When we see that there’s a top ad and an instant answer, we’re saying the opportunity if I was ranking number 1 is not 10 out of 10. I don’t expect to get whatever the average traffic for the number 1 position is. I expect to get something considerably less than that. Maybe something around 60% of that, because of this instant answer and these top ads.” So I’m going to mark this opportunity as a 6 out of 10.

There are 2 top ads here, so I’m giving this a 7 out of 10. This has two top ads and then it has a news block below the first position. So again, I’m going to reduce that click-through rate. I think that’s going down to a 6 out of 10.

You can get more and less scientific and specific with this. Click-through rate curves are imperfect by nature because we truly can’t measure exactly how those things change. However, I think smart marketers can make some good assumptions from general click-through rate data, which there are several resources out there on that to build a model like this and then include it in their keyword research.

This does mean that you have to run a query for every keyword you’re thinking about, but you should be doing that anyway. You want to get a good look at who’s ranking in those search results and what kind of content they’re building . If you’re running a keyword difficulty tool, you are already getting something like that.

Business value

This is a classic one. Business value is essentially saying, “What’s it worth to us if visitors come through with this search term?” You can get that from bidding through AdWords. That’s the most sort of scientific, mathematically sound way to get it. Then, of course, you can also get it through your own intuition. It’s better to start with your intuition than nothing if you don’t already have AdWords data or you haven’t started bidding, and then you can refine your sort of estimate over time as you see search visitors visit the pages that are ranking, as you potentially buy those ads, and those kinds of things.

You can get more sophisticated around this. I think a 10 point scale is just fine. You could also use a one, two, or three there, that’s also fine.

Requirements or Options

Then I don’t exactly know what to call this column. I can’t remember the person who’ve showed me theirs that had it in there. I think they called it Optional Data or Additional SERPs Data, but I’m going to call it Requirements or Options. Requirements because this is essentially saying, “Hey, if I want to rank in these search results, am I seeing that the top two or three are all video? Oh, they’re all video. They’re all coming from YouTube. If I want to be in there, I’ve got to be video.”

Or something like, “Hey, I’m seeing that most of the top results have been produced or updated in the last six months. Google appears to be biasing to very fresh information here.” So, for example, if I were searching for “university scholarships Cambridge 2015,” well, guess what? Google probably wants to bias to show results that have been either from the official page on Cambridge’s website or articles from this year about getting into that university and the scholarships that are available or offered. I saw those in two of these search results, both the college and university scholarships had a significant number of the SERPs where a fresh bump appeared to be required. You can see that a lot because the date will be shown ahead of the description, and the date will be very fresh, sometime in the last six months or a year.

Prioritization

Then finally I can build my prioritization. So based on all the data I had here, I essentially said, “Hey, you know what? These are not 1 and 2. This is actually 1A and 1B, because these are the same concepts. I’m going to build a single page to target both of those keyword phrases.” I think that makes good sense. Someone who is looking for college scholarships, university scholarships, same intent.

I am giving it a slight prioritization, 1A versus 1B, and the reason I do this is because I always have one keyword phrase that I’m leaning on a little more heavily. Because Google isn’t perfect around this, the search results will be a little different. I want to bias to one versus the other. In this case, my title tag, since I more targeting university over college, I might say something like college and university scholarships so that university and scholarships are nicely together, near the front of the title, that kind of thing. Then 1B, 2, 3.

This is kind of the way that modern SEOs are building a more sophisticated process with better data, more inclusive data that helps them select the right kinds of keywords and prioritize to the right ones. I’m sure you guys have built some awesome stuff. The Moz community is filled with very advanced marketers, probably plenty of you who’ve done even more than this.

I look forward to hearing from you in the comments. I would love to chat more about this topic, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

The Nifty Guide to Local Content Strategy and Marketing

Posted by NiftyMarketing

This is my Grandma.

She helped raised me and I love her dearly. That chunky baby with the Gerber cheeks is
me. The scarlet letter “A” means nothing… I hope.

This is a rolled up newspaper. 

rolled up newspaper

When I was growing up, I was the king of mischief and had a hard time following parental guidelines. To ensure the lessons she wanted me to learn “sunk in” my grandma would give me a soft whack with a rolled up newspaper and would say,

“Mike, you like to learn the hard way.”

She was right. I have
spent my life and career learning things the hard way.

Local content has been no different. I started out my career creating duplicate local doorway pages using “find and replace” with city names. After getting whacked by the figurative newspaper a few times, I decided there had to be a better way. To save others from the struggles I experienced, I hope that the hard lessons I have learned about local content strategy and marketing help to save you fearing a rolled newspaper the same way I do.

Lesson one: Local content doesn’t just mean the written word

local content ecosystem

Content is everything around you. It all tells a story. If you don’t have a plan for how that story is being told, then you might not like how it turns out. In the local world, even your brick and mortar building is a piece of content. It speaks about your brand, your values, your appreciation of customers and employees, and can be used to attract organic visitors if it is positioned well and provides a good user experience. If you just try to make the front of a building look good, but don’t back up the inside inch by inch with the same quality, people will literally say, “Hey man, this place sucks… let’s bounce.”

I had this experience proved to me recently while conducting an interview at
Nifty for our law division. Our office is a beautifully designed brick, mustache, animal on the wall, leg lamp in the center of the room, piece of work you would expect for a creative company.

nifty offices idaho

Anywho, for our little town of Burley, Idaho it is a unique space, and helps to set apart our business in our community. But, the conference room has a fluorescent ballast light system that can buzz so loudly that you literally can’t carry on a proper conversation at times, and in the recent interviews I literally had to conduct them in the dark because it was so bad.

I’m cheap and slow to spend money, so I haven’t got it fixed yet. The problem is I have two more interviews this week and I am so embarrassed by the experience in that room, I am thinking of holding them offsite to ensure that we don’t product a bad content experience. What I need to do is just fix the light but I will end up spending weeks going back and forth with the landlord on whose responsibility it is.

Meanwhile, the content experience suffers. Like I said, I like to learn the hard way.

Start thinking about everything in the frame of content and you will find that you make better decisions and less costly mistakes.

Lesson two: Scalable does not mean fast and easy growth

In every sales conversation I have had about local content, the question of scalability comes up. Usually, people want two things:

  1. Extremely Fast Production 
  2. Extremely Low Cost

While these two things would be great for every project, I have come to find that there are rare cases where quality can be achieved if you are optimizing for fast production and low cost. A better way to look at scale is as follows:

The rate of growth in revenue/traffic is greater than the cost of continued content creation.

A good local content strategy at scale will create a model that looks like this:

scaling content graph

Lesson three: You need a continuous local content strategy

This is where the difference between local content marketing and content strategy kicks in. Creating a single piece of content that does well is fairly easy to achieve. Building a true scalable machine that continually puts out great local content and consistently tells your story is not. This is a graph I created outlining the process behind creating and maintaining a local content strategy:

local content strategy

This process is not a one-time thing. It is not a box to be checked off. It is a structure that should become the foundation of your marketing program and will need to be revisited, re-tweaked, and replicated over and over again.

1. Identify your local audience

Most of you reading this will already have a service or product and hopefully local customers. Do you have personas developed for attracting and retaining more of them? Here are some helpful tools available to give you an idea of how many people fit your personas in any given market.

Facebook Insights

Pretend for a minute that you live in the unique market of Utah and have a custom wedding dress line. You focus on selling modest wedding dresses. It is a definite niche product, but one that shows the idea of personas very well.

You have interviewed your customer base and found a few interests that your customer base share. Taking that information and putting it into Facebook insights will give you a plethora of data to help you build out your understanding of a local persona.

facebook insights data

We are able to see from the interests of our customers there are roughly 6k-7k current engaged woman in Utah who have similar interests to our customer base.

The location tab gives us a break down of the specific cities and, understandably, Salt Lake City has the highest percentage with Provo (home of BYU) in second place. You can also see pages this group would like, activity levels on Facebook, and household income with spending habits. If you wanted to find more potential locations for future growth you can open up the search to a region or country.

localized facebook insights data

From this data it’s apparent that Arizona would be a great expansion opportunity after Utah.

Neilson Prizm

Neilson offers a free and extremely useful tool for local persona research called Zip Code Lookup that allows you to identify pre-determined personas in a given market.

Here is a look at my hometown and the personas they have developed are dead on.

Neilson Prizm data

Each persona can be expanded to learn more about the traits, income level, and areas across the country with other high concentrations of the same persona group.

You can also use the segment explorer to get a better idea of pre-determined persona lists and can work backwards to determine the locations with the highest density of a given persona.

Google Keyword Planner Tool

The keyword tool is fantastic for local research. Using our same Facebook Insight data above we can match keyword search volume against the audience size to determine how active our persona is in product research and purchasing. In the case of engaged woman looking for dresses, it is a very active group with a potential of 20-30% actively searching online for a dress.

google keyword planner tool

2. Create goals and rules

I think the most important idea for creating the goals and rules around your local content is the following from the must read book Content Strategy for the Web.

You also need to ensure that everyone who will be working on things even remotely related to content has access to style and brand guides and, ultimately, understands the core purpose for what, why, and how everything is happening.

3. Audit and analyze your current local content

The point of this step is to determine how the current content you have stacks up against the goals and rules you established, and determine the value of current pages on your site. With tools like Siteliner (for finding duplicate content) and ScreamingFrog (identifying page titles, word count, error codes and many other things) you can grab a lot of information very fast. Beyond that, there are a few tools that deserve a more in-depth look.

BuzzSumo

With BuzzSumo you can see social data and incoming links behind important pages on your site. This can you a good idea which locations or areas are getting more promotion than others and identify what some of the causes could be.

Buzzsumo also can give you access to competitors’ information where you might find some new ideas. In the following example you can see that one of Airbnb.com’s most shared pages was a motiongraphic of its impact on Berlin.

Buzzsumo

urlProfiler

This is another great tool for scraping urls for large sites that can return about every type of measurement you could want. For sites with 1000s of pages, this tool could save hours of data gathering and can spit out a lovely formatted CSV document that will allow you to sort by things like word count, page authority, link numbers, social shares, or about anything else you could imagine.

url profiler

4. Develop local content marketing tactics

This is how most of you look when marketing tactics are brought up.

monkey

Let me remind you of something with a picture. 

rolled up newspaper

Do not start with tactics. Do the other things first. It will ensure your marketing tactics fall in line with a much bigger organizational movement and process. With the warning out of the way, here are a few tactics that could work for you.

Local landing page content

Our initial concept of local landing pages has stood the test of time. If you are scared to even think about local pages with the upcoming doorway page update then please read this analysis and don’t be too afraid. Here are local landing pages that are done right.

Marriott local content

Marriot’s Burley local page is great. They didn’t think about just ensuring they had 500 unique words. They have custom local imagery of the exterior/interior, detailed information about the area’s activities, and even their own review platform that showcases both positive and negative reviews with responses from local management.

If you can’t build your own platform handling reviews like that, might I recommend looking at Get Five Stars as a platform that could help you integrate reviews as part of your continuous content strategy.

Airbnb Neighborhood Guides

I not so secretly have a big crush on Airbnb’s approach to local. These neighborhood guides started it. They only have roughly 21 guides thus far and handle one at a time with Seoul being the most recent addition. The idea is simple, they looked at extremely hot markets for them and built out guides not just for the city, but down to a specific neighborhood.

air bnb neighborhood guides

Here is a look at Hell’s Kitchen in New York by imagery. They hire a local photographer to shoot the area, then they take some of their current popular listing data and reviews and integrate them into the page. This idea would have never flown if they only cared about creating content that could be fast and easy for every market they serve.

Reverse infographicing

Every decently sized city has had a plethora of infographics made about them. People spent the time curating information and coming up with the concept, but a majority just made the image and didn’t think about the crawlability or page title from an SEO standpoint.

Here is an example of an image search for Portland infographics.

image search results portland infographics

Take an infographic and repurpose it into crawlable content with a new twist or timely additions. Usually infographics share their data sources in the footer so you can easily find similar, new, or more information and create some seriously compelling data based content. You can even link to or share the infographic as part of it if you would like.

Become an Upworthy of local content

No one I know does this better than Movoto. Read the link for their own spin on how they did it and then look at these examples and share numbers from their local content.

60k shares in Boise by appealing to that hometown knowledge.

movoto boise content

65k shares in Salt Lake following the same formula.

movoto salt lake city content

It seems to work with video as well.

movoto video results

Think like a local directory

Directories understand where content should be housed. Not every local piece should be on the blog. Look at where Trip Advisor’s famous “Things to Do” page is listed. Right on the main city page.

trip advisor things to do in salt lake city

Or look at how many timely, fresh, quality pieces of content Yelp is showcasing from their main city page.

yelp main city page

The key point to understand is that local content isn’t just about being unique on a landing page. It is about BEING local and useful.

Ideas of things that are local:

  • Sports teams
  • Local celebrities or heroes 
  • Groups and events
  • Local pride points
  • Local pain points

Ideas of things that are useful:

  • Directions
  • Favorite local sports
  • Granular details only “locals” know

The other point to realize is that in looking at our definition of scale you don’t need to take shortcuts that un-localize the experience for users. Figure and test a location at a time until you have a winning formula and then move forward at a speed that ensures a quality local experience.

5. Create a content calendar

I am not going to get into telling you exactly how or what your content calendar needs to include. That will largely be based on the size and organization of your team and every situation might call for a unique approach. What I will do is explain how we do things at Nifty.

  1. We follow the steps above.
  2. We schedule the big projects and timelines first. These could be months out or weeks out. 
  3. We determine the weekly deliverables, checkpoints, and publish times.
  4. We put all of the information as tasks assigned to individuals or teams in Asana.

asana content calendar

The information then can be viewed by individual, team, groups of team, due dates, or any other way you would wish to sort. Repeatable tasks can be scheduled and we can run our entire operation visible to as many people as need access to the information through desktop or mobile devices. That is what works for us.

6. Launch and promote content

My personal favorite way to promote local content (other than the obvious ideas of sharing with your current followers or outreaching to local influencers) is to use Facebook ads to target the specific local personas you are trying to reach. Here is an example:

I just wrapped up playing Harold Hill in our communities production of The Music Man. When you live in a small town like Burley, Idaho you get the opportunity to play a lead role without having too much talent or a glee-based upbringing. You also get the opportunity to do all of the advertising, set design, and costuming yourself and sometime even get to pay for it.

For my advertising responsibilities, I decided to write a few blog posts and drive traffic to them. As any good Harold Hill would do, I used fear tactics.

music man blog post

I then created Facebook ads that had the following stats: Costs of $.06 per click, 12.7% click through rate, and naturally organic sharing that led to thousands of visits in a small Idaho farming community where people still think a phone book is the only way to find local businesses.

facebook ads setup

Then we did it again.

There was a protestor in Burley for over a year that parked a red pickup with signs saying things like, “I wud not trust Da Mayor” or “Don’t Bank wid Zions”. Basically, you weren’t working hard enough if you name didn’t get on the truck during the year.

Everyone knew that ol’ red pickup as it was parked on the corner of Main and Overland, which is one of the few stoplights in town. Then one day it was gone. We came up with the idea to bring the red truck back, put signs on it that said, “I wud Not Trust Pool Tables” and “Resist Sins n’ Corruption” and other things that were part of The Music Man and wrote another blog complete with pictures.

facebook ads red truck

Then I created another Facebook Ad.

facebook ads set up

A little under $200 in ad spend resulted in thousands more visits to the site which promoted the play and sold tickets to a generation that might not have been very familiar with the show otherwise.

All of it was local targeting and there was no other way would could have driven that much traffic in a community like Burley without paying Facebook and trying to create click bait ads in hope the promotion led to an organic sharing.

7. Measure and report

This is another very personal step where everyone will have different needs. At Nifty we put together very custom weekly or monthly reports that cover all of the plan, execution, and relevant stats such as traffic to specific content or location, share data, revenue or lead data if available, analysis of what worked and what didn’t, and the plan for the following period.

There is no exact data that needs to be shared. Everyone will want something slightly different, which is why we moved away from automated reporting years ago (when we moved away from auto link building… hehe) and built our report around our clients even if it took added time.

I always said that the product of a SEO or content shop is the report. That is what people buy because it is likely that is all they will see or understand.

8. In conclusion, you must refine and repeat the process

local content strategy - refine and repeat

From my point of view, this is by far the most important step and sums everything up nicely. This process model isn’t perfect. There will be things that are missed, things that need tweaked, and ways that you will be able to improve on your local content strategy and marketing all the time. The idea of the cycle is that it is never done. It never sleeps. It never quits. It never surrenders. You just keep perfecting the process until you reach the point that few locally-focused companies ever achieve… where your local content reaches and grows your target audience every time you click the publish button.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Understanding and Applying Moz’s Spam Score Metric – Whiteboard Friday

Posted by randfish

This week, Moz released a new feature that we call Spam Score, which helps you analyze your link profile and weed out the spam (check out the blog post for more info). There have been some fantastic conversations about how it works and how it should (and shouldn’t) be used, and we wanted to clarify a few things to help you all make the best use of the tool.

In today’s Whiteboard Friday, Rand offers more detail on how the score is calculated, just what those spam flags are, and how we hope you’ll benefit from using it.

For reference, here’s a still of this week’s whiteboard. 

Click on the image above to open a high resolution version in a new tab!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to chat a little bit about Moz’s Spam Score. Now I don’t typically like to do Whiteboard Fridays specifically about a Moz project, especially when it’s something that’s in our toolset. But I’m making an exception because there have been so many questions and so much discussion around Spam Score and because I hope the methodology, the way we calculate things, the look at correlation and causation, when it comes to web spam, can be useful for everyone in the Moz community and everyone in the SEO community in addition to being helpful for understanding this specific tool and metric.

The 17-flag scoring system

I want to start by describing the 17 flag system. As you might know, Spam Score is shown as a score from 0 to 17. You either fire a flag or you don’t. Those 17 flags you can see a list of them on the blog post, and we’ll show that in there. Essentially, those flags correlate to the percentage of sites that we found with that count of flags, not those specific flags, just any count of those flags that were penalized or banned by Google. I’ll show you a little bit more in the methodology.

Basically, what this means is for sites that had 0 spam flags, none of the 17 flags that we had fired, that actually meant that 99.5% of those sites were not penalized or banned, on average, in our analysis and 0.5% were. At 3 flags, 4.2% of those sites, that’s actually still a huge number. That’s probably in the millions of domains or subdomains that Google has potentially still banned. All the way down here with 11 flags, it’s 87.3% that we did find banned. That seems pretty risky or penalized. It seems pretty risky. But 12.7% of those is still a very big number, again probably in the hundreds of thousands of unique websites that are not banned but still have these flags.

If you’re looking at a specific subdomain and you’re saying, “Hey, gosh, this only has 3 flags or 4 flags on it, but it’s clearly been penalized by Google, Moz’s score must be wrong,” no, that’s pretty comfortable. That should fit right into those kinds of numbers. Same thing down here. If you see a site that is not penalized but has a number of flags, that’s potentially an indication that you’re in that percentage of sites that we found not to be penalized.

So this is an indication of percentile risk, not a “this is absolutely spam” or “this is absolutely not spam.” The only caveat is anything with, I think, more than 13 flags, we found 100% of those to have been penalized or banned. Maybe you’ll find an odd outlier or two. Probably you won’t.

Correlation â‰  causation

Correlation is not causation. This is something we repeat all the time here at Moz and in the SEO community. We do a lot of correlation studies around these things. I think people understand those very well in the fields of social media and in marketing in general. Certainly in psychology and electoral voting and election polling results, people understand those correlations. But for some reason in SEO we sometimes get hung up on this.

I want to be clear. Spam flags and the count of spam flags correlates with sites we saw Google penalize. That doesn’t mean that any of the flags or combinations of flags actually cause the penalty. It could be that the things that are flags are not actually connected to the reasons Google might penalize something at all. Those could be totally disconnected.

We are not trying to say with the 17 flags these are causes for concern or you need to fix these. We are merely saying this feature existed on this website when we crawled it, or it had this feature, maybe it still has this feature. Therefore, we saw this count of these features that correlates to this percentile number, so we’re giving you that number. That’s all that the score intends to say. That’s all it’s trying to show. It’s trying to be very transparent about that. It’s not trying to say you need to fix these.

A lot of flags and features that are measured are perfectly fine things to have on a website, like no social accounts or email links. That’s a totally reasonable thing to have, but it is a flag because we saw it correlate. A number in your domain name, I think it’s fine if you want to have a number in your domain name. There’s plenty of good domains that have a numerical character in them. That’s cool.

TLD extension that happens to be used by lots of spammers, like a .info or a .cc or a number of other ones, that’s also totally reasonable. Just because lots of spammers happen to use those TLD extensions doesn’t mean you are necessarily spam because you use one.

Or low link diversity. Maybe you’re a relatively new site. Maybe your niche is very small, so the number of folks who point to your site tends to be small, and lots of the sites that organically naturally link to you editorially happen to link to you from many of their pages, and there’s not a ton of them. That will lead to low link diversity, which is a flag, but it isn’t always necessarily a bad thing. It might still nudge you to try and get some more links because that will probably help you, but that doesn’t mean you are spammy. It just means you fired a flag that correlated with a spam percentile.

The methodology we use

The methodology that we use, for those who are curious — and I do think this is a methodology that might be interesting to potentially apply in other places — is we brainstormed a large list of potential flags, a huge number. We cut that down to the ones we could actually do, because there were some that were just unfeasible for our technology team, our engineering team to do.

Then, we got a huge list, many hundreds of thousands of sites that were penalized or banned. When we say banned or penalized, what we mean is they didn’t rank on page one for either their own domain name or their own brand name, the thing between the
www and the .com or .net or .info or whatever it was. If you didn’t rank for either your full domain name, www and the .com or Moz, that would mean we said, “Hey, you’re penalized or banned.”

Now you might say, “Hey, Rand, there are probably some sites that don’t rank on page one for their own brand name or their own domain name, but aren’t actually penalized or banned.” I agree. That’s a very small number. Statistically speaking, it probably is not going to be impactful on this data set. Therefore, we didn’t have to control for that. We ended up not controlling for that.

Then we found which of the features that we ideated, brainstormed, actually correlated with the penalties and bans, and we created the 17 flags that you see in the product today. There are lots things that I thought were going to correlate, for example spammy-looking anchor text or poison keywords on the page, like Viagra, Cialis, Texas Hold’em online, pornography. Those things, not all of them anyway turned out to correlate well, and so they didn’t make it into the 17 flags list. I hope over time we’ll add more flags. That’s how things worked out.

How to apply the Spam Score metric

When you’re applying Spam Score, I think there are a few important things to think about. Just like domain authority, or page authority, or a metric from Majestic, or a metric from Google, or any other kind of metric that you might come up with, you should add it to your toolbox and to your metrics where you find it useful. I think playing around with spam, experimenting with it is a great thing. If you don’t find it useful, just ignore it. It doesn’t actually hurt your website. It’s not like this information goes to Google or anything like that. They have way more sophisticated stuff to figure out things on their end.

Do not just disavow everything with seven or more flags, or eight or more flags, or nine or more flags. I think that we use the color coding to indicate 0% to 10% of these flag counts were penalized or banned, 10% to 50% were penalized or banned, or 50% or above were penalized or banned. That’s why you see the green, orange, red. But you should use the count and line that up with the percentile. We do show that inside the tool as well.

Don’t just take everything and disavow it all. That can get you into serious trouble. Remember what happened with Cyrus. Cyrus Shepard, Moz’s head of content and SEO, he disavowed all the backlinks to its site. It took more than a year for him to rank for anything again. Google almost treated it like he was banned, not completely, but they seriously took away all of his link power and didn’t let him back in, even though he changed the disavow file and all that.

Be very careful submitting disavow files. You can hurt yourself tremendously. The reason we offer it in disavow format is because many of the folks in our customer testing said that’s how they wanted it so they could copy and paste, so they could easily review, so they could get it in that format and put it into their already existing disavow file. But you should not do that. You’ll see a bunch of warnings if you try and generate a disavow file. You even have to edit your disavow file before you can submit it to Google, because we want to be that careful that you don’t go and submit.

You should expect the Spam Score accuracy. If you’re doing spam investigation, you’re probably looking at spammier sites. If you’re looking at a random hundred sites, you should expect that the flags would correlate with the percentages. If I look at a random hundred 4 flag Spam Score sites, 7.5% of those I would expect on average to be penalized or banned. If you are therefore seeing sites that don’t fit those, they probably fit into the percentiles that were not penalized, or up here were penalized, down here weren’t penalized, that kind of thing.

Hopefully, you find Spam Score useful and interesting and you add it to your toolbox. We would love to hear from you on iterations and ideas that you’ve got for what we can do in the future, where else you’d like to see it, and where you’re finding it useful/not useful. That would be great.

Hopefully, you’ve enjoyed this edition of Whiteboard Friday and will join us again next week. Thanks so much. Take care.

Video transcription by Speechpad.com

ADDITION FROM RAND: I also urge folks to check out Marie Haynes’ excellent Start-to-Finish Guide to Using Google’s Disavow Tool. We’re going to update the feature to link to that as well.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Spam Score: Moz’s New Metric to Measure Penalization Risk

Posted by randfish

Today, I’m very excited to announce that Moz’s Spam Score, an R&D project we’ve worked on for nearly a year, is finally going live. In this post, you can learn more about how we’re calculating spam score, what it means, and how you can potentially use it in your SEO work.

How does Spam Score work?

Over the last year, our data science team, led by 
Dr. Matt Peters, examined a great number of potential factors that predicted that a site might be penalized or banned by Google. We found strong correlations with 17 unique factors we call “spam flags,” and turned them into a score.

Almost every subdomain in 
Mozscape (our web index) now has a Spam Score attached to it, and this score is viewable inside Open Site Explorer (and soon, the MozBar and other tools). The score is simple; it just records the quantity of spam flags the subdomain triggers. Our correlations showed that no particular flag was more likely than others to mean a domain was penalized/banned in Google, but firing many flags had a very strong correlation (you can see the math below).

Spam Score currently operates only on the subdomain level—we don’t have it for pages or root domains. It’s been my experience and the experience of many other SEOs in the field that a great deal of link spam is tied to the subdomain-level. There are plenty of exceptions—manipulative links can and do live on plenty of high-quality sites—but as we’ve tested, we found that subdomain-level Spam Score was the best solution we could create at web scale. It does a solid job with the most obvious, nastiest spam, and a decent job highlighting risk in other areas, too.

How to access Spam Score

Right now, you can find Spam Score inside 
Open Site Explorer, both in the top metrics (just below domain/page authority) and in its own tab labeled “Spam Analysis.” Spam Score is only available for Pro subscribers right now, though in the future, we may make the score in the metrics section available to everyone (if you’re not a subscriber, you can check it out with a free trial). 

The current Spam Analysis page includes a list of subdomains or pages linking to your site. You can toggle the target to look at all links to a given subdomain on your site, given pages, or the entire root domain. You can further toggle source tier to look at the Spam Score for incoming linking pages or subdomains (but in the case of pages, we’re still showing the Spam Score for the subdomain on which that page is hosted).

You can click on any Spam Score row and see the details about which flags were triggered. We’ll bring you to a page like this:

Back on the original Spam Analysis page, at the very bottom of the rows, you’ll find an option to export a disavow file, which is compatible with Google Webmaster Tools. You can choose to filter the file to contain only those sites with a given spam flag count or higher:

Disavow exports usually take less than 3 hours to finish. We can send you an email when it’s ready, too.

WARNING: Please do not export this file and simply upload it to Google! You can really, really hurt your site’s ranking and there may be no way to recover. Instead, carefully sort through the links therein and make sure you really do want to disavow what’s in there. You can easily remove/edit the file to take out links you feel are not spam. When Moz’s Cyrus Shepard disavowed every link to his own site, it took more than a year for his rankings to return!

We’ve actually made the file not-wholly-ready for upload to Google in order to be sure folks aren’t too cavalier with this particular step. You’ll need to open it up and make some edits (specifically to lines at the top of the file) in order to ready it for Webmaster Tools

In the near future, we hope to have Spam Score in the Mozbar as well, which might look like this: 

Sweet, right? 🙂

Potential use cases for Spam Analysis

This list probably isn’t exhaustive, but these are a few of the ways we’ve been playing around with the data:

  1. Checking for spammy links to your own site: Almost every site has at least a few bad links pointing to it, but it’s been hard to know how much or how many potentially harmful links you might have until now. Run a quick spam analysis and see if there’s enough there to cause concern.
  2. Evaluating potential links: This is a big one where we think Spam Score can be helpful. It’s not going to catch every potentially bad link, and you should certainly still use your brain for evaluation too, but as you’re scanning a list of link opportunities or surfing to various sites, having the ability to see if they fire a lot of flags is a great warning sign.
  3. Link cleanup: Link cleanup projects can be messy, involved, precarious, and massively tedious. Spam Score might not catch everything, but sorting links by it can be hugely helpful in identifying potentially nasty stuff, and filtering out the more probably clean links.
  4. Disavow Files: Again, because Spam Score won’t perfectly catch everything, you will likely need to do some additional work here (especially if the site you’re working on has done some link buying on more generally trustworthy domains), but it can save you a heap of time evaluating and listing the worst and most obvious junk.

Over time, we’re also excited about using Spam Score to help improve the PA and DA calculations (it’s not currently in there), as well as adding it to other tools and data sources. We’d love your feedback and insight about where you’d most want to see Spam Score get involved.

Details about Spam Score’s calculation

This section comes courtesy of Moz’s head of data science, Dr. Matt Peters, who created the metric and deserves (at least in my humble opinion) a big round of applause. – Rand

Definition of “spam”

Before diving into the details of the individual spam flags and their calculation, it’s important to first describe our data gathering process and “spam” definition.

For our purposes, we followed Google’s definition of spam and gathered labels for a large number of sites as follows.

  • First, we randomly selected a large number of subdomains from the Mozscape index stratified by mozRank.
  • Then we crawled the subdomains and threw out any that didn’t return a “200 OK” (redirects, errors, etc).
  • Finally, we collected the top 10 de-personalized, geo-agnostic Google-US search results using the full subdomain name as the keyword and checked whether any of those results matched the original keyword. If they did not, we called the subdomain “spam,” otherwise we called it “ham.”

We performed the most recent data collection in November 2014 (after the Penguin 3.0 update) for about 500,000 subdomains.

Relationship between number of flags and spam

The overall Spam Score is currently an aggregate of 17 different “flags.” You can think of each flag a potential “warning sign” that signals that a site may be spammy. The overall likelihood of spam increases as a site accumulates more and more flags, so that the total number of flags is a strong predictor of spam. Accordingly, the flags are designed to be used together—no single flag, or even a few flags, is cause for concern (and indeed most sites will trigger at least a few flags).

The following table shows the relationship between the number of flags and percent of sites with those flags that we found Google had penalized or banned:

ABOVE: The overall probability of spam vs. the number of spam flags. Data collected in Nov. 2014 for approximately 500K subdomains. The table also highlights the three overall danger levels: low/green (< 10%) moderate/yellow (10-50%) and high/red (>50%)

The overall spam percent averaged across a large number of sites increases in lock step with the number of flags; however there are outliers in every category. For example, there are a small number of sites with very few flags that are tagged as spam by Google and conversely a small number of sites with many flags that are not spam.

Spam flag details

The individual spam flags capture a wide range of spam signals link profiles, anchor text, on page signals and properties of the domain name. At a high level the process to determine the spam flags for each subdomain is:

  • Collect link metrics from Mozscape (mozRank, mozTrust, number of linking domains, etc).
  • Collect anchor text metrics from Mozscape (top anchor text phrases sorted by number of links)
  • Collect the top five pages by Page Authority on the subdomain from Mozscape
  • Crawl the top five pages plus the home page and process to extract on page signals
  • Provide the output for Mozscape to include in the next index release cycle

Since the spam flags are incorporated into in the Mozscape index, fresh data is released with each new index. Right now, we crawl and process the spam flags for each subdomains every two – three months although this may change in the future.

Link flags

The following table lists the link and anchor text related flags with the the odds ratio for each flag. For each flag, we can compute two percents: the percent of sites with that flag that are penalized by Google and the percent of sites with that flag that were not penalized. The odds ratio is the ratio of these percents and gives the increase in likelihood that a site is spam if it has the flag. For example, the first row says that a site with this flag is 12.4 times more likely to be spam than one without the flag.

ABOVE: Description and odds ratio of link and anchor text related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

Working down the table, the flags are:

  • Low mozTrust to mozRank ratio: Sites with low mozTrust compared to mozRank are likely to be spam.
  • Large site with few links: Large sites with many pages tend to also have many links and large sites without a corresponding large number of links are likely to be spam.
  • Site link diversity is low: If a large percentage of links to a site are from a few domains it is likely to be spam.
  • Ratio of followed to nofollowed subdomains/domains (two separate flags): Sites with a large number of followed links relative to nofollowed are likely to be spam.
  • Small proportion of branded links (anchor text): Organically occurring links tend to contain a disproportionate amount of banded keywords. If a site does not have a lot of branded anchor text, it’s a signal the links are not organic.

On-page flags

Similar to the link flags, the following table lists the on page and domain name related flags:

ABOVE: Description and odds ratio of on page and domain name related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

  • Thin content: If a site has a relatively small ratio of content to navigation chrome it’s likely to be spam.
  • Site mark-up is abnormally small: Non-spam sites tend to invest in rich user experiences with CSS, Javascript and extensive mark-up. Accordingly, a large ratio of text to mark-up is a spam signal.
  • Large number of external links: A site with a large number of external links may look spammy.
  • Low number of internal links: Real sites tend to link heavily to themselves via internal navigation and a relative lack of internal links is a spam signal.
  • Anchor text-heavy page: Sites with a lot of anchor text are more likely to be spam then those with more content and less links.
  • External links in navigation: Spam sites may hide external links in the sidebar or footer.
  • No contact info: Real sites prominently display their social and other contact information.
  • Low number of pages found: A site with only one or a few pages is more likely to be spam than one with many pages.
  • TLD correlated with spam domains: Certain TLDs are more spammy than others (e.g. pw).
  • Domain name length: A long subdomain name like “bycheapviagra.freeshipping.onlinepharmacy.com” may indicate keyword stuffing.
  • Domain name contains numerals: domain names with numerals may be automatically generated and therefore spam.

If you’d like some more details on the technical aspects of the spam score, check out the 
video of Matt’s 2012 MozCon talk about Algorithmic Spam Detection or the slides (many of the details have evolved, but the overall ideas are the same):

We’d love your feedback

As with all metrics, Spam Score won’t be perfect. We’d love to hear your feedback and ideas for improving the score as well as what you’d like to see from it’s in-product application in the future. Feel free to leave comments on this post, or to email Matt (matt at moz dot com) and me (rand at moz dot com) privately with any suggestions.

Good luck cleaning up and preventing link spam!



Not a Pro Subscriber? No problem!



Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

What Deep Learning and Machine Learning Mean For the Future of SEO – Whiteboard Friday

Posted by randfish

Imagine a world where even the high-up Google engineers don’t know what’s in the ranking algorithm. We may be moving in that direction. In today’s Whiteboard Friday, Rand explores and explains the concepts of deep learning and machine learning, drawing us a picture of how they could impact our work as SEOs.

For reference, here’s a still of this week’s whiteboard!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are going to take a peek into Google’s future and look at what it could mean as Google advances their machine learning and deep learning capabilities. I know these sound like big, fancy, important words. They’re not actually that tough of topics to understand. In fact, they’re simplistic enough that even a lot of technology firms like Moz do some level of machine learning. We don’t do anything with deep learning and a lot of neural networks. We might be going that direction.

But I found an article that was published in January, absolutely fascinating and I think really worth reading, and I wanted to extract some of the contents here for Whiteboard Friday because I do think this is tactically and strategically important to understand for SEOs and really important for us to understand so that we can explain to our bosses, our teams, our clients how SEO works and will work in the future.

The article is called “Google Search Will Be Your Next Brain.” It’s by Steve Levy. It’s over on Medium. I do encourage you to read it. It’s a relatively lengthy read, but just a fascinating one if you’re interested in search. It starts with a profile of Geoff Hinton, who was a professor in Canada and worked on neural networks for a long time and then came over to Google and is now a distinguished engineer there. As the article says, a quote from the article: “He is versed in the black art of organizing several layers of artificial neurons so that the entire system, the system of neurons, could be trained or even train itself to divine coherence from random inputs.”

This sounds complex, but basically what we’re saying is we’re trying to get machines to come up with outcomes on their own rather than us having to tell them all the inputs to consider and how to process those incomes and the outcome to spit out. So this is essentially machine learning. Google has used this, for example, to figure out when you give it a bunch of photos and it can say, “Oh, this is a landscape photo. Oh, this is an outdoor photo. Oh, this is a photo of a person.” Have you ever had that creepy experience where you upload a photo to Facebook or to Google+ and they say, “Is this your friend so and so?” And you’re like, “God, that’s a terrible shot of my friend. You can barely see most of his face, and he’s wearing glasses which he usually never wears. How in the world could Google+ or Facebook figure out that this is this person?”

That’s what they use, these neural networks, these deep machine learning processes for. So I’ll give you a simple example. Here at MOZ, we do machine learning very simplistically for page authority and domain authority. We take all the inputs — numbers of links, number of linking root domains, every single metric that you could get from MOZ on the page level, on the sub-domain level, on the root-domain level, all these metrics — and then we combine them together and we say, “Hey machine, we want you to build us the algorithm that best correlates with how Google ranks pages, and here’s a bunch of pages that Google has ranked.” I think we use a base set of 10,000, and we do it about quarterly or every 6 months, feed that back into the system and the system pumps out the little algorithm that says, “Here you go. This will give you the best correlating metric with how Google ranks pages.” That’s how you get page authority domain authority.

Cool, really useful, helpful for us to say like, “Okay, this page is probably considered a little more important than this page by Google, and this one a lot more important.” Very cool. But it’s not a particularly advanced system. The more advanced system is to have these kinds of neural nets in layers. So you have a set of networks, and these neural networks, by the way, they’re designed to replicate nodes in the human brain, which is in my opinion a little creepy, but don’t worry. The article does talk about how there’s a board of scientists who make sure Terminator 2 doesn’t happen, or Terminator 1 for that matter. Apparently, no one’s stopping Terminator 4 from happening? That’s the new one that’s coming out.

So one layer of the neural net will identify features. Another layer of the neural net might classify the types of features that are coming in. Imagine this for search results. Search results are coming in, and Google’s looking at the features of all the websites and web pages, your websites and pages, to try and consider like, “What are the elements I could pull out from there?”

Well, there’s the link data about it, and there are things that happen on the page. There are user interactions and all sorts of stuff. Then we’re going to classify types of pages, types of searches, and then we’re going to extract the features or metrics that predict the desired result, that a user gets a search result they really like. We have an algorithm that can consistently produce those, and then neural networks are hopefully designed — that’s what Geoff Hinton has been working on — to train themselves to get better. So it’s not like with PA and DA, our data scientist Matt Peters and his team looking at it and going, “I bet we could make this better by doing this.”

This is standing back and the guys at Google just going, “All right machine, you learn.” They figure it out. It’s kind of creepy, right?

In the original system, you needed those people, these individuals here to feed the inputs, to say like, “This is what you can consider, system, and the features that we want you to extract from it.”

Then unsupervised learning, which is kind of this next step, the system figures it out. So this takes us to some interesting places. Imagine the Google algorithm, circa 2005. You had basically a bunch of things in here. Maybe you’d have anchor text, PageRank and you’d have some measure of authority on a domain level. Maybe there are people who are tossing new stuff in there like, “Hey algorithm, let’s consider the location of the searcher. Hey algorithm, let’s consider some user and usage data.” They’re tossing new things into the bucket that the algorithm might consider, and then they’re measuring it, seeing if it improves.

But you get to the algorithm today, and gosh there are going to be a lot of things in there that are driven by machine learning, if not deep learning yet. So there are derivatives of all of these metrics. There are conglomerations of them. There are extracted pieces like, “Hey, we only ant to look and measure anchor text on these types of results when we also see that the anchor text matches up to the search queries that have previously been performed by people who also search for this.” What does that even mean? But that’s what the algorithm is designed to do. The machine learning system figures out things that humans would never extract, metrics that we would never even create from the inputs that they can see.

Then, over time, the idea is that in the future even the inputs aren’t given by human beings. The machine is getting to figure this stuff out itself. That’s weird. That means that if you were to ask a Google engineer in a world where deep learning controls the ranking algorithm, if you were to ask the people who designed the ranking system, “Hey, does it matter if I get more links,” they might be like, “Well, maybe.” But they don’t know, because they don’t know what’s in this algorithm. Only the machine knows, and the machine can’t even really explain it. You could go take a snapshot and look at it, but (a) it’s constantly evolving, and (b) a lot of these metrics are going to be weird conglomerations and derivatives of a bunch of metrics mashed together and torn apart and considered only when certain criteria are fulfilled. Yikes.

So what does that mean for SEOs. Like what do we have to care about from all of these systems and this evolution and this move towards deep learning, which by the way that’s what Jeff Dean, who is, I think, a senior fellow over at Google, he’s the dude that everyone mocks for being the world’s smartest computer scientist over there, and Jeff Dean has basically said, “Hey, we want to put this into search. It’s not there yet, but we want to take these models, these things that Hinton has built, and we want to put them into search.” That for SEOs in the future is going to mean much less distinct universal ranking inputs, ranking factors. We won’t really have ranking factors in the way that we know them today. It won’t be like, “Well, they have more anchor text and so they rank higher.” That might be something we’d still look at and we’d say, “Hey, they have this anchor text. Maybe that’s correlated with what the machine is finding, the system is finding to be useful, and that’s still something I want to care about to a certain extent.”

But we’re going to have to consider those things a lot more seriously. We’re going to have to take another look at them and decide and determine whether the things that we thought were ranking factors still are when the neural network system takes over. It also is going to mean something that I think many, many SEOs have been predicting for a long time and have been working towards, which is more success for websites that satisfy searchers. If the output is successful searches, and that’ s what the system is looking for, and that’s what it’s trying to correlate all its metrics to, if you produce something that means more successful searches for Google searchers when they get to your site, and you ranking in the top means Google searchers are happier, well you know what? The algorithm will catch up to you. That’s kind of a nice thing. It does mean a lot less info from Google about how they rank results.

So today you might hear from someone at Google, “Well, page speed is a very small ranking factor.” In the future they might be, “Well, page speed is like all ranking factors, totally unknown to us.” Because the machine might say, “Well yeah, page speed as a distinct metric, one that a Google engineer could actually look at, looks very small.” But derivatives of things that are connected to page speed may be huge inputs. Maybe page speed is something, that across all of these, is very well connected with happier searchers and successful search results. Weird things that we never thought of before might be connected with them as the machine learning system tries to build all those correlations, and that means potentially many more inputs into the ranking algorithm, things that we would never consider today, things we might consider wholly illogical, like, “What servers do you run on?” Well, that seems ridiculous. Why would Google ever grade you on that?

If human beings are putting factors into the algorithm, they never would. But the neural network doesn’t care. It doesn’t care. It’s a honey badger. It doesn’t care what inputs it collects. It only cares about successful searches, and so if it turns out that Ubuntu is poorly correlated with successful search results, too bad.

This world is not here yet today, but certainly there are elements of it. Google has talked about how Panda and Penguin are based off of machine learning systems like this. I think, given what Geoff Hinton and Jeff Dean are working on at Google, it sounds like this will be making its way more seriously into search and therefore it’s something that we’re really going to have to consider as search marketers.

All right everyone, I hope you’ll join me again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How to Defeat Duplicate Content – Next Level

Posted by EllieWilkinson

Welcome to the third installment of Next Level! In the previous Next Level blog post, we shared a workflow showing you how to take on your competitors using Moz tools. We’re continuing the educational series with several new videos all about resolving duplicate content. Read on and level up!


Dealing with duplicate content can feel a bit like doing battle with your site’s evil doppelgänger—confusing and tricky to defeat! But identifying and resolving duplicates is a necessary part of helping search engines decide on relevant results. In this short video, learn about how duplicate content happens, why it’s important to fix, and a bit about how you can uncover it.

Next Level – Identifying Duplicate_pt1

[
Quick clarification: Search engines don’t actively penalize duplicate content, per se; they just don’t always understand it as well, which can lead to a drop in rankings. More info here.]

Now that you have a better idea of how to identify those dastardly duplicates, let’s get rid of ’em once and for all. Watch this next video to review how to use Moz Analytics to find and fix duplicate content using three common solutions. (You’ll need a Moz Pro subscription to use Moz Analytics. If you aren’t yet a Moz Pro subscriber, you can always try out the tools with a
30-day free trial.)

Workflow summary

Here’s a review of the three common solutions to conquering duplicate content:

  1. 301 redirect. Check Page Authority to see if one page has a higher PA than the other using Open Site Explorer, then set up a 301 redirect from the duplicate page to the original page. This will ensure that they no longer compete with one another in the search results. Wondering what a 301 redirect is and how to do it? Read more about redirection here.
  2. Rel=canonical. A rel=canonical tag passes the same amount of ranking power as a 301 redirect, and there’s a bonus: it often takes less development time to implement! Add this tag to the HTML head of a web page to tell search engines that it should be treated as a copy of the “canon,” or original, page:
    <head> <link rel="canonical" href="http://moz.com/blog/" /> </head>

    If you’re curious, you can
    read more about canonicalization here.

  3. noindex, follow. Add the values “noindex, follow” to the meta robots tag to tell search engines not to include the duplicate pages in their indexes, but to crawl their links. This works really well with paginated content or if you have a system set up to tag or categorize content (as with a blog). Here’s what it should look like:
    <head> <meta name="robots" content="noindex, follow" /> </head>

    If you’re looking to block the Moz crawler, Rogerbot, you can use the robots.txt file if you prefer—he’s a good robot, and he’ll obey!
    More about meta robots (and robots.txt) here.

Can’t get enough of duplicate content? Want to become a duplicate content connoisseur? This last video explains more about how Moz finds duplicates, if you’re curious. And you can read even more over at the
Moz Developer Blog.

We’d love to hear about your techniques for defeating duplicates! Chime in below in the comments.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Announcing the New &amp; Improved Link Intersect Tool

Posted by randfish

Y’all remember how last October, we launched a new section in Open Site Explorer called “Link Opportunities?” While I was proud of that work, there was one section that really disappointed me at the time (and I said as much in my comments on the post).

Well, today, that disappointment is over, because we’re stepping up the Link Intersect tool inside OSE big time:

Literally thousands of sweet, sweet link opportunities are now yours at the click of a button

In the initial launch, Link Intersect used Freshscape (which powers Fresh Web Explorer). Freshscape is great for certain kinds of data – links and mentions that come from newly published pages that are in news sources, blogs, and feeds. But it’s not great for non-news/blogs/feed sources because it’s intentionally avoiding those!

For example, in the screenshot above, I wanted to see all the pages that link to SeriousEats.com and SplendidTable.org but don’t link to SmittenKitchen.com.

That’s 671 more, juicy link opportunities thanks to the hard work of the Moz Big Data and Research Tools teams.

How does the new Link Intersect work?

The tool looks at the top 250,000 links our index has pointing to each of the intersecting targets you enter, and the top 1 mllion links in our index pointing to the excluded URL.

Link Intersect then runs a differential comparison to determine which of the 250K links to each of the intersecting targets are from the same URL or root domain, and removes any of those links that point to the top million links to the excluded URL/root/sub domain.

This means it’s possible for sites and pages with massive quantities of links that we won’t show every intersecting link we know about, but since the sorting is in Page Authority order, you’ll get the highest quality/most important ones at the top.

You can use Link Intersect to see three unique views on the data:

  • Pages that link to subdomains (particularly useful if you’re interested in shared links to sites on hosted subdomains like blogspot, wordpress, etc or to a specific subdomain section of a competitor’s site)
  • Pages that link to root domains (my personal favorite, as I find the results the most comprehensive)
  • Root domains that link to the root domains (great if you’re trying to get a broad sense of domain-level outreach/marketing targets)

Note that it’s possible the root domains will actually expose more links that pages because the domain-level link graph is easier and faster to sort through, so the 250K limit is less of a barrier.

Like most of the reports in Open Site Explorer, Link Intersect comes with a handy CSV Export option:

When it finishes (my most recent one took just under 3 minutes to run and email me), you’ll get a nice email like this one:

Please ignore the grammatical errors. I’m sure our team will fix those up soon 🙂

Why are these such good link/outreach/marketing targets?

Generally speaking, this type of data is invaluable for link outreach because these sites and pages are ones that clearly care about the shared topics or content of the intersecting targets. If you enter two of your primary competitors, you’ll often get news media, blog posts, reference resources, events, trade publications, and more that produce content in your topical niche.

They’re also good targets because they actually link out! This means you can avoid sifting through sites whose policies or practices mean they’re unlikely to ever link to you – if they’ve linked to those other two chaps, why not you, too?!

Basically, you can check the trifecta of link opportunity goodness boxes (which I’ve helpfully illustrated above, because that’s just the kind of SEO dork I am).

Link Intersect is limited only by your own creativity – so long as you can keep finding sites and pages on the web whose links might also be a match for your own site, we can keep digging through trillions of links, finding the intersects, and giving them back to you.

3 examples of Link Intersect in action

Let’s look at some ways we might put this to use in the real world:

#1: I’m trying to figure out who links to my two big competitors in the world of book reviews

First off, remember that Link Intersect works on a root domain or subdomain level, so we wouldn’t want to use something like the NYTimes’ review of books, because we’d be finding all the intersections to NYTimes.com. Instead, we want to pick more topically-focused domains, like these two:

You’ll also note that I’ve used a fake website as my excluded URL – this is a great trick for when you’re simply interested in any sites/pages that link to two domains and don’t need to remove a particular target.

#2: I’ve got a locally-focused website doing plumbing and need a few link sources to help boost my potential to rank in local and organic SERPs

In this instance, I’ll certainly look at pages linking to combinations of the top ranking sites in the local results, e.g. the 15 results for this query:

This is a solid starting point, especially considering how few links local sites often need to perform well. But we can get creative by branching outside of plumbing and exploring related fields like construction:

Focusing on better-linked-to industries and websites will give more results, so we want to try to broaden rather than narrow our categories and look for the most-linked-to sites in given verticals for comparisons.

#3: I’m planning some new content around weather patterns for my air conditioning website and want to know what news and blog sites cover extreme weather content

First, I’m going to start by browsing some search results for content in this field that’s received some serious link activity. By turning on my Mozbar’s SERPs overlay, I can see the sites and pages that have generated loads of links:

Now I can run a few combinations of these through the Link Intersect Tool:

While those domain names make me fear for humanity’s intelligence and future survival, they also expose a great link opportunity tactic I hadn’t previously considered – climate science deniers and the more politically charged universe of climate science overall.


I hope you enjoy the new Link Intersect tool as much as I have been – I think it’s one of the best things we’ve put in Open Site Explorer in the last few months, though what we’re releasing in March might beat even that, so stay tuned!

And, as always, please do give us feedback and feel free to ask questions in the comments below or through the Moz Community Q+A.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it