Understanding and Applying Moz’s Spam Score Metric – Whiteboard Friday

Posted by randfish

This week, Moz released a new feature that we call Spam Score, which helps you analyze your link profile and weed out the spam (check out the blog post for more info). There have been some fantastic conversations about how it works and how it should (and shouldn’t) be used, and we wanted to clarify a few things to help you all make the best use of the tool.

In today’s Whiteboard Friday, Rand offers more detail on how the score is calculated, just what those spam flags are, and how we hope you’ll benefit from using it.

For reference, here’s a still of this week’s whiteboard. 

Click on the image above to open a high resolution version in a new tab!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to chat a little bit about Moz’s Spam Score. Now I don’t typically like to do Whiteboard Fridays specifically about a Moz project, especially when it’s something that’s in our toolset. But I’m making an exception because there have been so many questions and so much discussion around Spam Score and because I hope the methodology, the way we calculate things, the look at correlation and causation, when it comes to web spam, can be useful for everyone in the Moz community and everyone in the SEO community in addition to being helpful for understanding this specific tool and metric.

The 17-flag scoring system

I want to start by describing the 17 flag system. As you might know, Spam Score is shown as a score from 0 to 17. You either fire a flag or you don’t. Those 17 flags you can see a list of them on the blog post, and we’ll show that in there. Essentially, those flags correlate to the percentage of sites that we found with that count of flags, not those specific flags, just any count of those flags that were penalized or banned by Google. I’ll show you a little bit more in the methodology.

Basically, what this means is for sites that had 0 spam flags, none of the 17 flags that we had fired, that actually meant that 99.5% of those sites were not penalized or banned, on average, in our analysis and 0.5% were. At 3 flags, 4.2% of those sites, that’s actually still a huge number. That’s probably in the millions of domains or subdomains that Google has potentially still banned. All the way down here with 11 flags, it’s 87.3% that we did find banned. That seems pretty risky or penalized. It seems pretty risky. But 12.7% of those is still a very big number, again probably in the hundreds of thousands of unique websites that are not banned but still have these flags.

If you’re looking at a specific subdomain and you’re saying, “Hey, gosh, this only has 3 flags or 4 flags on it, but it’s clearly been penalized by Google, Moz’s score must be wrong,” no, that’s pretty comfortable. That should fit right into those kinds of numbers. Same thing down here. If you see a site that is not penalized but has a number of flags, that’s potentially an indication that you’re in that percentage of sites that we found not to be penalized.

So this is an indication of percentile risk, not a “this is absolutely spam” or “this is absolutely not spam.” The only caveat is anything with, I think, more than 13 flags, we found 100% of those to have been penalized or banned. Maybe you’ll find an odd outlier or two. Probably you won’t.

Correlation ≠ causation

Correlation is not causation. This is something we repeat all the time here at Moz and in the SEO community. We do a lot of correlation studies around these things. I think people understand those very well in the fields of social media and in marketing in general. Certainly in psychology and electoral voting and election polling results, people understand those correlations. But for some reason in SEO we sometimes get hung up on this.

I want to be clear. Spam flags and the count of spam flags correlates with sites we saw Google penalize. That doesn’t mean that any of the flags or combinations of flags actually cause the penalty. It could be that the things that are flags are not actually connected to the reasons Google might penalize something at all. Those could be totally disconnected.

We are not trying to say with the 17 flags these are causes for concern or you need to fix these. We are merely saying this feature existed on this website when we crawled it, or it had this feature, maybe it still has this feature. Therefore, we saw this count of these features that correlates to this percentile number, so we’re giving you that number. That’s all that the score intends to say. That’s all it’s trying to show. It’s trying to be very transparent about that. It’s not trying to say you need to fix these.

A lot of flags and features that are measured are perfectly fine things to have on a website, like no social accounts or email links. That’s a totally reasonable thing to have, but it is a flag because we saw it correlate. A number in your domain name, I think it’s fine if you want to have a number in your domain name. There’s plenty of good domains that have a numerical character in them. That’s cool.

TLD extension that happens to be used by lots of spammers, like a .info or a .cc or a number of other ones, that’s also totally reasonable. Just because lots of spammers happen to use those TLD extensions doesn’t mean you are necessarily spam because you use one.

Or low link diversity. Maybe you’re a relatively new site. Maybe your niche is very small, so the number of folks who point to your site tends to be small, and lots of the sites that organically naturally link to you editorially happen to link to you from many of their pages, and there’s not a ton of them. That will lead to low link diversity, which is a flag, but it isn’t always necessarily a bad thing. It might still nudge you to try and get some more links because that will probably help you, but that doesn’t mean you are spammy. It just means you fired a flag that correlated with a spam percentile.

The methodology we use

The methodology that we use, for those who are curious — and I do think this is a methodology that might be interesting to potentially apply in other places — is we brainstormed a large list of potential flags, a huge number. We cut that down to the ones we could actually do, because there were some that were just unfeasible for our technology team, our engineering team to do.

Then, we got a huge list, many hundreds of thousands of sites that were penalized or banned. When we say banned or penalized, what we mean is they didn’t rank on page one for either their own domain name or their own brand name, the thing between the
www and the .com or .net or .info or whatever it was. If you didn’t rank for either your full domain name, www and the .com or Moz, that would mean we said, “Hey, you’re penalized or banned.”

Now you might say, “Hey, Rand, there are probably some sites that don’t rank on page one for their own brand name or their own domain name, but aren’t actually penalized or banned.” I agree. That’s a very small number. Statistically speaking, it probably is not going to be impactful on this data set. Therefore, we didn’t have to control for that. We ended up not controlling for that.

Then we found which of the features that we ideated, brainstormed, actually correlated with the penalties and bans, and we created the 17 flags that you see in the product today. There are lots things that I thought were going to correlate, for example spammy-looking anchor text or poison keywords on the page, like Viagra, Cialis, Texas Hold’em online, pornography. Those things, not all of them anyway turned out to correlate well, and so they didn’t make it into the 17 flags list. I hope over time we’ll add more flags. That’s how things worked out.

How to apply the Spam Score metric

When you’re applying Spam Score, I think there are a few important things to think about. Just like domain authority, or page authority, or a metric from Majestic, or a metric from Google, or any other kind of metric that you might come up with, you should add it to your toolbox and to your metrics where you find it useful. I think playing around with spam, experimenting with it is a great thing. If you don’t find it useful, just ignore it. It doesn’t actually hurt your website. It’s not like this information goes to Google or anything like that. They have way more sophisticated stuff to figure out things on their end.

Do not just disavow everything with seven or more flags, or eight or more flags, or nine or more flags. I think that we use the color coding to indicate 0% to 10% of these flag counts were penalized or banned, 10% to 50% were penalized or banned, or 50% or above were penalized or banned. That’s why you see the green, orange, red. But you should use the count and line that up with the percentile. We do show that inside the tool as well.

Don’t just take everything and disavow it all. That can get you into serious trouble. Remember what happened with Cyrus. Cyrus Shepard, Moz’s head of content and SEO, he disavowed all the backlinks to its site. It took more than a year for him to rank for anything again. Google almost treated it like he was banned, not completely, but they seriously took away all of his link power and didn’t let him back in, even though he changed the disavow file and all that.

Be very careful submitting disavow files. You can hurt yourself tremendously. The reason we offer it in disavow format is because many of the folks in our customer testing said that’s how they wanted it so they could copy and paste, so they could easily review, so they could get it in that format and put it into their already existing disavow file. But you should not do that. You’ll see a bunch of warnings if you try and generate a disavow file. You even have to edit your disavow file before you can submit it to Google, because we want to be that careful that you don’t go and submit.

You should expect the Spam Score accuracy. If you’re doing spam investigation, you’re probably looking at spammier sites. If you’re looking at a random hundred sites, you should expect that the flags would correlate with the percentages. If I look at a random hundred 4 flag Spam Score sites, 7.5% of those I would expect on average to be penalized or banned. If you are therefore seeing sites that don’t fit those, they probably fit into the percentiles that were not penalized, or up here were penalized, down here weren’t penalized, that kind of thing.

Hopefully, you find Spam Score useful and interesting and you add it to your toolbox. We would love to hear from you on iterations and ideas that you’ve got for what we can do in the future, where else you’d like to see it, and where you’re finding it useful/not useful. That would be great.

Hopefully, you’ve enjoyed this edition of Whiteboard Friday and will join us again next week. Thanks so much. Take care.

Video transcription by Speechpad.com

ADDITION FROM RAND: I also urge folks to check out Marie Haynes’ excellent Start-to-Finish Guide to Using Google’s Disavow Tool. We’re going to update the feature to link to that as well.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Spam Score: Moz’s New Metric to Measure Penalization Risk

Posted by randfish

Today, I’m very excited to announce that Moz’s Spam Score, an R&D project we’ve worked on for nearly a year, is finally going live. In this post, you can learn more about how we’re calculating spam score, what it means, and how you can potentially use it in your SEO work.

How does Spam Score work?

Over the last year, our data science team, led by 
Dr. Matt Peters, examined a great number of potential factors that predicted that a site might be penalized or banned by Google. We found strong correlations with 17 unique factors we call “spam flags,” and turned them into a score.

Almost every subdomain in 
Mozscape (our web index) now has a Spam Score attached to it, and this score is viewable inside Open Site Explorer (and soon, the MozBar and other tools). The score is simple; it just records the quantity of spam flags the subdomain triggers. Our correlations showed that no particular flag was more likely than others to mean a domain was penalized/banned in Google, but firing many flags had a very strong correlation (you can see the math below).

Spam Score currently operates only on the subdomain level—we don’t have it for pages or root domains. It’s been my experience and the experience of many other SEOs in the field that a great deal of link spam is tied to the subdomain-level. There are plenty of exceptions—manipulative links can and do live on plenty of high-quality sites—but as we’ve tested, we found that subdomain-level Spam Score was the best solution we could create at web scale. It does a solid job with the most obvious, nastiest spam, and a decent job highlighting risk in other areas, too.

How to access Spam Score

Right now, you can find Spam Score inside 
Open Site Explorer, both in the top metrics (just below domain/page authority) and in its own tab labeled “Spam Analysis.” Spam Score is only available for Pro subscribers right now, though in the future, we may make the score in the metrics section available to everyone (if you’re not a subscriber, you can check it out with a free trial). 

The current Spam Analysis page includes a list of subdomains or pages linking to your site. You can toggle the target to look at all links to a given subdomain on your site, given pages, or the entire root domain. You can further toggle source tier to look at the Spam Score for incoming linking pages or subdomains (but in the case of pages, we’re still showing the Spam Score for the subdomain on which that page is hosted).

You can click on any Spam Score row and see the details about which flags were triggered. We’ll bring you to a page like this:

Back on the original Spam Analysis page, at the very bottom of the rows, you’ll find an option to export a disavow file, which is compatible with Google Webmaster Tools. You can choose to filter the file to contain only those sites with a given spam flag count or higher:

Disavow exports usually take less than 3 hours to finish. We can send you an email when it’s ready, too.

WARNING: Please do not export this file and simply upload it to Google! You can really, really hurt your site’s ranking and there may be no way to recover. Instead, carefully sort through the links therein and make sure you really do want to disavow what’s in there. You can easily remove/edit the file to take out links you feel are not spam. When Moz’s Cyrus Shepard disavowed every link to his own site, it took more than a year for his rankings to return!

We’ve actually made the file not-wholly-ready for upload to Google in order to be sure folks aren’t too cavalier with this particular step. You’ll need to open it up and make some edits (specifically to lines at the top of the file) in order to ready it for Webmaster Tools

In the near future, we hope to have Spam Score in the Mozbar as well, which might look like this: 

Sweet, right? 🙂

Potential use cases for Spam Analysis

This list probably isn’t exhaustive, but these are a few of the ways we’ve been playing around with the data:

  1. Checking for spammy links to your own site: Almost every site has at least a few bad links pointing to it, but it’s been hard to know how much or how many potentially harmful links you might have until now. Run a quick spam analysis and see if there’s enough there to cause concern.
  2. Evaluating potential links: This is a big one where we think Spam Score can be helpful. It’s not going to catch every potentially bad link, and you should certainly still use your brain for evaluation too, but as you’re scanning a list of link opportunities or surfing to various sites, having the ability to see if they fire a lot of flags is a great warning sign.
  3. Link cleanup: Link cleanup projects can be messy, involved, precarious, and massively tedious. Spam Score might not catch everything, but sorting links by it can be hugely helpful in identifying potentially nasty stuff, and filtering out the more probably clean links.
  4. Disavow Files: Again, because Spam Score won’t perfectly catch everything, you will likely need to do some additional work here (especially if the site you’re working on has done some link buying on more generally trustworthy domains), but it can save you a heap of time evaluating and listing the worst and most obvious junk.

Over time, we’re also excited about using Spam Score to help improve the PA and DA calculations (it’s not currently in there), as well as adding it to other tools and data sources. We’d love your feedback and insight about where you’d most want to see Spam Score get involved.

Details about Spam Score’s calculation

This section comes courtesy of Moz’s head of data science, Dr. Matt Peters, who created the metric and deserves (at least in my humble opinion) a big round of applause. – Rand

Definition of “spam”

Before diving into the details of the individual spam flags and their calculation, it’s important to first describe our data gathering process and “spam” definition.

For our purposes, we followed Google’s definition of spam and gathered labels for a large number of sites as follows.

  • First, we randomly selected a large number of subdomains from the Mozscape index stratified by mozRank.
  • Then we crawled the subdomains and threw out any that didn’t return a “200 OK” (redirects, errors, etc).
  • Finally, we collected the top 10 de-personalized, geo-agnostic Google-US search results using the full subdomain name as the keyword and checked whether any of those results matched the original keyword. If they did not, we called the subdomain “spam,” otherwise we called it “ham.”

We performed the most recent data collection in November 2014 (after the Penguin 3.0 update) for about 500,000 subdomains.

Relationship between number of flags and spam

The overall Spam Score is currently an aggregate of 17 different “flags.” You can think of each flag a potential “warning sign” that signals that a site may be spammy. The overall likelihood of spam increases as a site accumulates more and more flags, so that the total number of flags is a strong predictor of spam. Accordingly, the flags are designed to be used together—no single flag, or even a few flags, is cause for concern (and indeed most sites will trigger at least a few flags).

The following table shows the relationship between the number of flags and percent of sites with those flags that we found Google had penalized or banned:

ABOVE: The overall probability of spam vs. the number of spam flags. Data collected in Nov. 2014 for approximately 500K subdomains. The table also highlights the three overall danger levels: low/green (< 10%) moderate/yellow (10-50%) and high/red (>50%)

The overall spam percent averaged across a large number of sites increases in lock step with the number of flags; however there are outliers in every category. For example, there are a small number of sites with very few flags that are tagged as spam by Google and conversely a small number of sites with many flags that are not spam.

Spam flag details

The individual spam flags capture a wide range of spam signals link profiles, anchor text, on page signals and properties of the domain name. At a high level the process to determine the spam flags for each subdomain is:

  • Collect link metrics from Mozscape (mozRank, mozTrust, number of linking domains, etc).
  • Collect anchor text metrics from Mozscape (top anchor text phrases sorted by number of links)
  • Collect the top five pages by Page Authority on the subdomain from Mozscape
  • Crawl the top five pages plus the home page and process to extract on page signals
  • Provide the output for Mozscape to include in the next index release cycle

Since the spam flags are incorporated into in the Mozscape index, fresh data is released with each new index. Right now, we crawl and process the spam flags for each subdomains every two – three months although this may change in the future.

Link flags

The following table lists the link and anchor text related flags with the the odds ratio for each flag. For each flag, we can compute two percents: the percent of sites with that flag that are penalized by Google and the percent of sites with that flag that were not penalized. The odds ratio is the ratio of these percents and gives the increase in likelihood that a site is spam if it has the flag. For example, the first row says that a site with this flag is 12.4 times more likely to be spam than one without the flag.

ABOVE: Description and odds ratio of link and anchor text related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

Working down the table, the flags are:

  • Low mozTrust to mozRank ratio: Sites with low mozTrust compared to mozRank are likely to be spam.
  • Large site with few links: Large sites with many pages tend to also have many links and large sites without a corresponding large number of links are likely to be spam.
  • Site link diversity is low: If a large percentage of links to a site are from a few domains it is likely to be spam.
  • Ratio of followed to nofollowed subdomains/domains (two separate flags): Sites with a large number of followed links relative to nofollowed are likely to be spam.
  • Small proportion of branded links (anchor text): Organically occurring links tend to contain a disproportionate amount of banded keywords. If a site does not have a lot of branded anchor text, it’s a signal the links are not organic.

On-page flags

Similar to the link flags, the following table lists the on page and domain name related flags:

ABOVE: Description and odds ratio of on page and domain name related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

  • Thin content: If a site has a relatively small ratio of content to navigation chrome it’s likely to be spam.
  • Site mark-up is abnormally small: Non-spam sites tend to invest in rich user experiences with CSS, Javascript and extensive mark-up. Accordingly, a large ratio of text to mark-up is a spam signal.
  • Large number of external links: A site with a large number of external links may look spammy.
  • Low number of internal links: Real sites tend to link heavily to themselves via internal navigation and a relative lack of internal links is a spam signal.
  • Anchor text-heavy page: Sites with a lot of anchor text are more likely to be spam then those with more content and less links.
  • External links in navigation: Spam sites may hide external links in the sidebar or footer.
  • No contact info: Real sites prominently display their social and other contact information.
  • Low number of pages found: A site with only one or a few pages is more likely to be spam than one with many pages.
  • TLD correlated with spam domains: Certain TLDs are more spammy than others (e.g. pw).
  • Domain name length: A long subdomain name like “bycheapviagra.freeshipping.onlinepharmacy.com” may indicate keyword stuffing.
  • Domain name contains numerals: domain names with numerals may be automatically generated and therefore spam.

If you’d like some more details on the technical aspects of the spam score, check out the 
video of Matt’s 2012 MozCon talk about Algorithmic Spam Detection or the slides (many of the details have evolved, but the overall ideas are the same):

We’d love your feedback

As with all metrics, Spam Score won’t be perfect. We’d love to hear your feedback and ideas for improving the score as well as what you’d like to see from it’s in-product application in the future. Feel free to leave comments on this post, or to email Matt (matt at moz dot com) and me (rand at moz dot com) privately with any suggestions.

Good luck cleaning up and preventing link spam!



Not a Pro Subscriber? No problem!



Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Headline Writing and Title Tag SEO in a Clickbait World – Whiteboard Friday

Posted by randfish

When writing headlines and title tags, we’re often conflicted in what we’re trying to say and (more to the point) how we’re trying to say it. Do we want it to help the page rank in SERPs? Do we want people to be intrigued enough to click through? Or are we trying to best satisfy the searcher’s intent? We’d like all three, but a headline that achieves them all is incredibly difficult to write.

In today’s Whiteboard Friday, Rand illustrates just how small the intersection of those goals is, and offers a process you can use to find the best way forward.

For reference, here’s a still of this week’s whiteboard!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat about writing titles and headlines, both for SEO and in this new click-bait, Facebook social world. This is kind of a challenge, because I think many folks are seeing and observing that a lot of the ranking signals that can help a page perform well are often preceded or well correlated with social activity, which would kind of bias us towards saying, “Hey, how can I do these click-baity, link-baity sorts of social viral pieces,” versus we’re also a challenge with, “Gosh, those things aren’t as traditionally well performing in search results from a perhaps click-through rate and certainly from a search conversion perspective. So how do we balance out these two and make them work together for us based on our marketing goals?” So I want to try and help with that.

Let’s look at a search query for Viking battles, in Google. These are the top two results. One is from Wikipedia. It’s a category page — Battles Involving the Vikings. That’s pretty darn straightforward. But then our second result — actually this might be a third result, I think there’s a indented second Wikipedia result — is the seven most bad ass last stands in the history of battles. It turns out that there happen to be a number of Viking related battles in there, and you can see that in the meta description that Google pulls. This one’s from Crack.com.

These are pretty representative of the two different kinds of results or of content pieces that I’m talking about. One is very, very viral, very social focused, clearly designed to sort of do well in the Facebook world. One is much more classic search focused, clearly designed to help answer the user query — here’s a list of Viking battles and their prominence and importance in history, and structure, and all those kinds of things.

Okay. Here’s another query — Viking jewelry. Going to stick with my Viking theme, because why not? We can see a website from Viking jewelry. This one’s on JellDragon.com. It’s an eCommerce site. They’re selling sterling silver and bronze Viking jewelry. They’ve actually done very classic SEO focus. Not only do they have Viking jewelry mentioned twice, in the second instance of Viking jewelry, I think they’ve intentionally — I hope it was intentionally — misspelled the word “jewelry” to hopefully catch misspellings. That’s some old-school SEO. I would actually not recommend this for any purpose.

But I thought it was interesting to highlight versus in this search result it takes until page three until I could really find a viral, social, targeted, more link-baity, click-baity type of article, this one from io9 — 1,000 Year-old Viking Jewelry Found On Danish Farm. You know what the interesting part is? In this case, both of these are on powerful domains. They both have quite a few links to them from many external sources. They’re pretty well SEO’d pages.

In this case, the first two pages of results are all kind of small jewelry website stores and a few results from like Etsy and Amazon, more powerful authoritative domains. But it really takes a long time before you get these, what I’d consider, very powerful, very strong attempts at ranking for Viking jewelry from more of your click-bait, social, headline, viral sites. io9 certainly, I would kind of expect them to perform higher, except that this doesn’t serve the searcher intent.

I think Google knows that when people look for Viking jewelry, they’re not looking for the history of Viking jewelry or where recent archeological finds of Viking jewelry happened. They’re looking specifically for eCommerce sites. They’re trying to transact and buy, or at least view and see what Viking jewelry looks like. So they’re looking for photo heavy, visual heavy, potentially places where they might buy stuff. Maybe it’s some people looking for artifacts as well, to view the images of those, but less of the click-bait focus kind of stuff.

This one I think it’s very likely that this does indeed perform well for this search query, and lots of people do click on that as a positive result for what they’re looking for from Viking battles, because they’d like to see, “Okay, what were the coolest, most amazing Viking battles that happened in history?”

You can kind of see what’s happened here with two things. One is with Hummingbird and Google’s focus on topic modeling, and the other with searcher intent and how Google has gotten so incredibly good at pattern matching to serve user intent. This is really important from an SEO perspective to understand as well, and I like how these two examples highlight it. One is saying, “Hey, just because you have the most links, the strongest domain, the best keyword targeting, doesn’t necessarily mean you’ll rank if you’re not serving searcher intent.”

Now, when we think about doing this for ourselves, that click-bait versus searched optimized experience for our content, what is it about? It’s really about choosing. It’s about choosing searcher intent, our website and marketing goals, or click-bait types of goals. I’ve visualized the intersection here with a Venn diagram. So these in pink here, the click-bait pieces that are going to resonate in social media — Facebook, Twitter, etc. Blue is the intent of searchers, and purple is your marketing goals, what you want to achieve when visitors get to your site, the reason you’re trying to attract this traffic in the first place.

This intersection, as you will notice, is super, uber tiny. It is miniscule. It is molecule sized, and it’s a very, very hard intersection to hit. In fact, for the vast majority of content pieces, I’m going to say that it’s going to be close to, not always, but close to impossible to get that perfect mix of click-bait, intent of searchers, and your marketing goals. The times when it works best is really when you’re trying to educate your audience or provide them with informational value, and that’s also something that’s going to resonate in the social web and something searchers are going to be looking for. It works pretty well in B2B types of things, particularly in spaces where there’s lots of influencers and amplifiers who also care about educating their followers. It doesn’t work so well when you’re trying to target Viking battles or Viking jewelry. What can I say, the historians of the Viking world simply aren’t that huge on Twitter yet. I hope they will be one day.

This is kind of the process that I would use to think about the structure of these and how to choose between them. First off, I think you need to ask, “Should I create a single piece of content to target all of these, or should I instead be thinking about individual pieces that hit one or two at a time?”

So it could be the case that maybe you’ve got an intersection of intent for searchers and your marketing goals. This happens quite a bit, and oftentimes for these folks, for the Jell Dragon Viking Jewelry, the intent of searchers and what they’re trying to accomplish on their site, perfectly in harmony, but definitely not with click-bait pieces that are going to resonate on the web. More challenging for io9 with this kind of a thing, because searchers just aren’t looking for that around Viking jewelry. They might instead be thinking about, “Hey, we’re trying to target the specific news item. We want anyone who looks for Viking jewelry in Danish farm, or Viking jewelry found, or those kind of things to be finding our site.”

Then, I would ask, “How can I best serve my own marketing goals, the marketing goals of my website through the pages that are targeted at search or social?” Sometimes that’s going to be very direct, like it is over here with JellDagon.com trying to convert folks and folks looking for Viking jewelry to buy.

Sometimes it’s going to be indirect,. A Moz Whiteboard Friday, for example, is a very indirect example. We’re trying to serve the intent of searchers and in the long term eventually, maybe sometime in the future some folks who watch this video might be interested in Moz’ tools or going to MozCon or signing up for an email list, or whatever it is. But our marketing goals are secondary and they’re further in the future. You could also think about that happening at the very end of a funnel, coming in if someone searches for say Moz versus Searchmetrics and maybe Searchmetrics has a great page comparing what’s better about their service versus Moz’ service and those types of things, and getting right in at the end of the funnel. So that should be a consideration as well. Same thing with social.

Then lastly, where are you going to focus that keyword targeting and the content foci efforts? What kind of content are you going to build? How are you going to keyword target them best to achieve this, and how much you interlink between those pages?

I’ll give you a quick example over here, but this can be expanded upon. So for my conversion page, I may try and target the same keywords or a slightly more commercial variation on the search terms I’m targeting with my more informational style content versus entertainment social style content. Then, conversion page might be separate, depending on how I’m structuring things and what the intent of searchers is. My click-bait piece may be not very keyword focused at all. I might write that headline and say, “I don’t care about the keywords at all. I don’t need to rank here. I’m trying to go viral on social media. I’m trying to achieve my click-bait goals. My goal is to drive traffic, get some links, get some topical authority around this subject matter, and later hopefully rank with this page or maybe even this page in search engines.” That’s a viable goal as well.

When you do that, what you want to do then is have a link structure that optimizes around this. So your click-bait piece, a lot of times with click-bait pieces they’re going to perform worse if you go over and try and link directly to your conversion page, because it looks like you’re trying to sell people something. That’s not what plays on Facebook, on Twitter, on social media in general. What plays is, “Hey, this is just entertainment, and I can just visit this piece and it’s fun and funny and interesting.”

What plays well in search, however, is something that let’s someone accomplish their tasks. So it’s fine to have information and then a call to action, and that call to action can point to the conversion page. The click-bait pieces content can do a great job of helping to send link equity, ranking signals, and maybe some visitor traffic who’s interested in truly learning more over to the informational page that you want ranking for search. This is kind of a beautiful way to think about the interaction between the three of these when you have these different levels of foci, when you have these different searcher versus click-bait intents, and how to bring them all together.

All right everyone, hope to see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

In-App Social &amp; Contact Data – New in Open Site Explorer

Posted by randfish

Today I’m excited to announce the launch of a new feature inside 
Open Site Explorer—In-App Social & Contact Data. 

With this launch, you’ll be able to see the
social or email accounts we’ve discovered associated with a given website, and have one-click access to those pages.


Initially, the feature offers:

  1. Availability today on the inbound links tab and in Link Intersect on the “pages -> subdomains” view. In the future, if y’all find it useful, we hope to expand its presence to other areas of the tool as well.
  2. Email accounts will only be shown if they match the domain name (e.g. rand@moz.com would be shown next to moz.com, randfishkin@yahoo.com would not) and if they appear in standard format on the page (we don’t try to grab emails in JavaScript or that use alternate formats to obsfucate).
  3. We show Facebook, Twitter, Google+, and email addresses we’ve found on multiple pages of the site (we take a small random set and analyze whether these social/contact data pieces are uniform). If we find multiple accounts, you’ll see this:

Use cases

There are three major use cases for this feature (at least for me; you might have more!):

1) Link/Outreach prospecting

It can be a pain to visit sites, find social accounts/emails, and copy them into a spreadsheet or send messages (and recall which ones you have/haven’t done yet). By including social/contact data in the same interface where you’re doing link analysis, we hope to save you time and clicks.

2) Link/site trust and audience reach analysis

We’re actually using this data on the back end at Moz for our upcoming Spam Score feature (coming very soon), but you can use it manually to help with a quick mental filter for trustworthy/authoritative/non-spammy sites, and to get a sense for the size and reach of a site’s social audience.

3) At-a-glance analysis of social networks among a group

If you’re in a given space (e.g. travel blogs), it’s a process to determine which social networks are/aren’t being used by industry participants and influencers. Social/contact data in OSE can help with that by showing which social networks various sites are using and linking to from their pages:

We need your feedback

This first implementation is relatively light in the app—we haven’t yet placed this data anywhere/everywhere it might be useful. Before we do, we want to hear what you think: Is this useful and valuable to your work? Does it help save you time? Would you want to see the feature expanded and if so, in what sections would it provide the greatest value to you? Please let us know in the comments, and by getting back in touch with us after you’ve had a chance to try it out for yourself.

Thanks for giving social/contact data a spin, and look for more upgrades to Open Site Explorer in the very near future!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

What Deep Learning and Machine Learning Mean For the Future of SEO – Whiteboard Friday

Posted by randfish

Imagine a world where even the high-up Google engineers don’t know what’s in the ranking algorithm. We may be moving in that direction. In today’s Whiteboard Friday, Rand explores and explains the concepts of deep learning and machine learning, drawing us a picture of how they could impact our work as SEOs.

For reference, here’s a still of this week’s whiteboard!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are going to take a peek into Google’s future and look at what it could mean as Google advances their machine learning and deep learning capabilities. I know these sound like big, fancy, important words. They’re not actually that tough of topics to understand. In fact, they’re simplistic enough that even a lot of technology firms like Moz do some level of machine learning. We don’t do anything with deep learning and a lot of neural networks. We might be going that direction.

But I found an article that was published in January, absolutely fascinating and I think really worth reading, and I wanted to extract some of the contents here for Whiteboard Friday because I do think this is tactically and strategically important to understand for SEOs and really important for us to understand so that we can explain to our bosses, our teams, our clients how SEO works and will work in the future.

The article is called “Google Search Will Be Your Next Brain.” It’s by Steve Levy. It’s over on Medium. I do encourage you to read it. It’s a relatively lengthy read, but just a fascinating one if you’re interested in search. It starts with a profile of Geoff Hinton, who was a professor in Canada and worked on neural networks for a long time and then came over to Google and is now a distinguished engineer there. As the article says, a quote from the article: “He is versed in the black art of organizing several layers of artificial neurons so that the entire system, the system of neurons, could be trained or even train itself to divine coherence from random inputs.”

This sounds complex, but basically what we’re saying is we’re trying to get machines to come up with outcomes on their own rather than us having to tell them all the inputs to consider and how to process those incomes and the outcome to spit out. So this is essentially machine learning. Google has used this, for example, to figure out when you give it a bunch of photos and it can say, “Oh, this is a landscape photo. Oh, this is an outdoor photo. Oh, this is a photo of a person.” Have you ever had that creepy experience where you upload a photo to Facebook or to Google+ and they say, “Is this your friend so and so?” And you’re like, “God, that’s a terrible shot of my friend. You can barely see most of his face, and he’s wearing glasses which he usually never wears. How in the world could Google+ or Facebook figure out that this is this person?”

That’s what they use, these neural networks, these deep machine learning processes for. So I’ll give you a simple example. Here at MOZ, we do machine learning very simplistically for page authority and domain authority. We take all the inputs — numbers of links, number of linking root domains, every single metric that you could get from MOZ on the page level, on the sub-domain level, on the root-domain level, all these metrics — and then we combine them together and we say, “Hey machine, we want you to build us the algorithm that best correlates with how Google ranks pages, and here’s a bunch of pages that Google has ranked.” I think we use a base set of 10,000, and we do it about quarterly or every 6 months, feed that back into the system and the system pumps out the little algorithm that says, “Here you go. This will give you the best correlating metric with how Google ranks pages.” That’s how you get page authority domain authority.

Cool, really useful, helpful for us to say like, “Okay, this page is probably considered a little more important than this page by Google, and this one a lot more important.” Very cool. But it’s not a particularly advanced system. The more advanced system is to have these kinds of neural nets in layers. So you have a set of networks, and these neural networks, by the way, they’re designed to replicate nodes in the human brain, which is in my opinion a little creepy, but don’t worry. The article does talk about how there’s a board of scientists who make sure Terminator 2 doesn’t happen, or Terminator 1 for that matter. Apparently, no one’s stopping Terminator 4 from happening? That’s the new one that’s coming out.

So one layer of the neural net will identify features. Another layer of the neural net might classify the types of features that are coming in. Imagine this for search results. Search results are coming in, and Google’s looking at the features of all the websites and web pages, your websites and pages, to try and consider like, “What are the elements I could pull out from there?”

Well, there’s the link data about it, and there are things that happen on the page. There are user interactions and all sorts of stuff. Then we’re going to classify types of pages, types of searches, and then we’re going to extract the features or metrics that predict the desired result, that a user gets a search result they really like. We have an algorithm that can consistently produce those, and then neural networks are hopefully designed — that’s what Geoff Hinton has been working on — to train themselves to get better. So it’s not like with PA and DA, our data scientist Matt Peters and his team looking at it and going, “I bet we could make this better by doing this.”

This is standing back and the guys at Google just going, “All right machine, you learn.” They figure it out. It’s kind of creepy, right?

In the original system, you needed those people, these individuals here to feed the inputs, to say like, “This is what you can consider, system, and the features that we want you to extract from it.”

Then unsupervised learning, which is kind of this next step, the system figures it out. So this takes us to some interesting places. Imagine the Google algorithm, circa 2005. You had basically a bunch of things in here. Maybe you’d have anchor text, PageRank and you’d have some measure of authority on a domain level. Maybe there are people who are tossing new stuff in there like, “Hey algorithm, let’s consider the location of the searcher. Hey algorithm, let’s consider some user and usage data.” They’re tossing new things into the bucket that the algorithm might consider, and then they’re measuring it, seeing if it improves.

But you get to the algorithm today, and gosh there are going to be a lot of things in there that are driven by machine learning, if not deep learning yet. So there are derivatives of all of these metrics. There are conglomerations of them. There are extracted pieces like, “Hey, we only ant to look and measure anchor text on these types of results when we also see that the anchor text matches up to the search queries that have previously been performed by people who also search for this.” What does that even mean? But that’s what the algorithm is designed to do. The machine learning system figures out things that humans would never extract, metrics that we would never even create from the inputs that they can see.

Then, over time, the idea is that in the future even the inputs aren’t given by human beings. The machine is getting to figure this stuff out itself. That’s weird. That means that if you were to ask a Google engineer in a world where deep learning controls the ranking algorithm, if you were to ask the people who designed the ranking system, “Hey, does it matter if I get more links,” they might be like, “Well, maybe.” But they don’t know, because they don’t know what’s in this algorithm. Only the machine knows, and the machine can’t even really explain it. You could go take a snapshot and look at it, but (a) it’s constantly evolving, and (b) a lot of these metrics are going to be weird conglomerations and derivatives of a bunch of metrics mashed together and torn apart and considered only when certain criteria are fulfilled. Yikes.

So what does that mean for SEOs. Like what do we have to care about from all of these systems and this evolution and this move towards deep learning, which by the way that’s what Jeff Dean, who is, I think, a senior fellow over at Google, he’s the dude that everyone mocks for being the world’s smartest computer scientist over there, and Jeff Dean has basically said, “Hey, we want to put this into search. It’s not there yet, but we want to take these models, these things that Hinton has built, and we want to put them into search.” That for SEOs in the future is going to mean much less distinct universal ranking inputs, ranking factors. We won’t really have ranking factors in the way that we know them today. It won’t be like, “Well, they have more anchor text and so they rank higher.” That might be something we’d still look at and we’d say, “Hey, they have this anchor text. Maybe that’s correlated with what the machine is finding, the system is finding to be useful, and that’s still something I want to care about to a certain extent.”

But we’re going to have to consider those things a lot more seriously. We’re going to have to take another look at them and decide and determine whether the things that we thought were ranking factors still are when the neural network system takes over. It also is going to mean something that I think many, many SEOs have been predicting for a long time and have been working towards, which is more success for websites that satisfy searchers. If the output is successful searches, and that’ s what the system is looking for, and that’s what it’s trying to correlate all its metrics to, if you produce something that means more successful searches for Google searchers when they get to your site, and you ranking in the top means Google searchers are happier, well you know what? The algorithm will catch up to you. That’s kind of a nice thing. It does mean a lot less info from Google about how they rank results.

So today you might hear from someone at Google, “Well, page speed is a very small ranking factor.” In the future they might be, “Well, page speed is like all ranking factors, totally unknown to us.” Because the machine might say, “Well yeah, page speed as a distinct metric, one that a Google engineer could actually look at, looks very small.” But derivatives of things that are connected to page speed may be huge inputs. Maybe page speed is something, that across all of these, is very well connected with happier searchers and successful search results. Weird things that we never thought of before might be connected with them as the machine learning system tries to build all those correlations, and that means potentially many more inputs into the ranking algorithm, things that we would never consider today, things we might consider wholly illogical, like, “What servers do you run on?” Well, that seems ridiculous. Why would Google ever grade you on that?

If human beings are putting factors into the algorithm, they never would. But the neural network doesn’t care. It doesn’t care. It’s a honey badger. It doesn’t care what inputs it collects. It only cares about successful searches, and so if it turns out that Ubuntu is poorly correlated with successful search results, too bad.

This world is not here yet today, but certainly there are elements of it. Google has talked about how Panda and Penguin are based off of machine learning systems like this. I think, given what Geoff Hinton and Jeff Dean are working on at Google, it sounds like this will be making its way more seriously into search and therefore it’s something that we’re really going to have to consider as search marketers.

All right everyone, I hope you’ll join me again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it