In common: dotdigital’s new diversity series

If you were lucky enough to join us at the Summit earlier this year, you may have decided to pop in to one of our personal development sessions. Because to us, empowering marketers isn’t only about supplying you with the tools to work smarter. We wanted to ensure all visitors, who took time out of their busy work schedule, left our event fully equipped to be the best marketers they could be.

Your feedback
proved they were a massive success. So, we wanted to find a way to keep giving
back to the community. To share learning, support one another, and help us
develop personally, as well as professionally.

Whether it’s
breaking down stereotypes, improving physical and mental wellbeing, or breaking
the glass ceiling, we’ll be tackling the hot topics facing everyone in the
workforce today.

Building a stronger community

dotdigital is frequently
talking about how the modern world is changing the behaviors of consumers – but
it’s also changed the workplace.

We’re under pressure
to deliver results faster which is leading to a rise in stress and burnout. It’s
now acceptable to talk openly about topics such as sexuality and gender. Society
is a glorious mish-mash of cultures and personalities and organizations must
work harder to reflect this.

This change must come from within. From us. Not because businesses aren’t trying – they are – but because, to accurately reflect the workforce, the workforce needs to be the ones to shape it. And the best way to do it is to work together.

Women at work

The goal of our first In common event was exactly that. Over the past 100 years, women have taken great strides towards greater equality, and not just at work. In the UK, women finally won their hard-fought right to vote in 1918. By 2018, there’d been two female Prime Ministers and women made up over 30% of board members in FSTE 100.

These positions would
have been unthinkable for those fighting for suffrage.

And women are only
getting stronger. By 2025, it’s predicted that women will possess over 60% of the
UK’s wealth. But that certainly doesn’t mean they’ve reached the peak. There
are still many steps that need to be taken to achieve greater gender equality.

For men and women to achieve this, we need to work together, support one another, and ensure that every advancement is equal.

Women diversity In common

Sparking discussion

In common events are an exciting and safe space where like-minded people can have conversations that spark change. We want every event to reflect the needs, interests, and desires of our audience – and that’s why your feedback is vital. If you have a subject or theme you’d like us to cover, please get in touch and let us know

If you attended our first event, women at work, please be sure to send us your feedback. And be sure to keep an eye out for upcoming In common events, including our mental health awareness event in October.

Speakers at women diversity event

Thank you to each and every one of the speakers who took the time out of their busy days to make the evening a massive success.

The post In common: dotdigital’s new diversity series appeared first on dotdigital blog.

Reblogged 4 months ago from blog.dotdigital.com

Big Data, Big Problems: 4 Major Link Indexes Compared

Posted by russangular

Given this blog’s readership, chances are good you will spend some time this week looking at backlinks in one of the growing number of link data tools. We know backlinks continue to be one of, if not the most important
parts of Google’s ranking algorithm. We tend to take these link data sets at face value, though, in part because they are all we have. But when your rankings are on the line, is there a better way to get at which data set is the best? How should we go
about assessing these different link indexes like
Moz,
Majestic, Ahrefs and SEMrush for quality? Historically, there have been 4 common approaches to this question of index quality…

  • Breadth: We might choose to look at the number of linking root domains any given service reports. We know
    that referring domains correlates strongly with search rankings, so it makes sense to judge a link index by how many unique domains it has
    discovered and indexed.
  • Depth: We also might choose to look at how deep the web has been crawled, looking more at the total number of URLs
    in the index, rather than the diversity of referring domains.
  • Link Overlap: A more sophisticated approach might count the number of links an index has in common with Google Webmaster
    Tools.
  • Freshness: Finally, we might choose to look at the freshness of the index. What percentage of links in the index are
    still live?

There are a number of really good studies (some newer than others) using these techniques that are worth checking out when you get a chance:

  • BuiltVisible analysis of Moz, Majestic, GWT, Ahrefs and Search Metrics
  • SEOBook comparison of Moz, Majestic, Ahrefs, and Ayima
  • MatthewWoodward
    study of Ahrefs, Majestic, Moz, Raven and SEO Spyglass
  • Marketing Signals analysis of Moz, Majestic, Ahrefs, and GWT
  • RankAbove comparison of Moz, Majestic, Ahrefs and Link Research Tools
  • StoneTemple study of Moz and Majestic

While these are all excellent at addressing the methodologies above, there is a particular limitation with all of them. They miss one of the
most important metrics we need to determine the value of a link index: proportional representation to Google’s link graph
. So here at Angular Marketing, we decided to take a closer look.

Proportional representation to Google Search Console data

So, why is it important to determine proportional representation? Many of the most important and valued metrics we use are built on proportional
models. PageRank, MozRank, CitationFlow and Ahrefs Rank are proportional in nature. The score of any one URL in the data set is relative to the
other URLs in the data set. If the data set is biased, the results are biased.

A Visualization

Link graphs are biased by their crawl prioritization. Because there is no full representation of the Internet, every link graph, even Google’s,
is a biased sample of the web. Imagine for a second that the picture below is of the web. Each dot represents a page on the Internet,
and the dots surrounded by green represent a fictitious index by Google of certain sections of the web.

Of course, Google isn’t the only organization that crawls the web. Other organizations like Moz,
Majestic, Ahrefs, and SEMrush
have their own crawl prioritizations which result in different link indexes.

In the example above, you can see different link providers trying to index the web like Google. Link data provider 1 (purple) does a good job
of building a model that is similar to Google. It isn’t very big, but it is proportional. Link data provider 2 (blue) has a much larger index,
and likely has more links in common with Google that link data provider 1, but it is highly disproportional. So, how would we go about measuring
this proportionality? And which data set is the most proportional to Google?

Methodology

The first step is to determine a measurement of relativity for analysis. Google doesn’t give us very much information about their link graph.
All we have is what is in Google Search Console. The best source we can use is referring domain counts. In particular, we want to look at
what we call
referring domain link pairs. A referring domain link pair would be something like ask.com->mlb.com: 9,444 which means
that ask.com links to mlb.com 9,444 times.

Steps

  1. Determine the root linking domain pairs and values to 100+ sites in Google Search Console
  2. Determine the same for Ahrefs, Moz, Majestic Fresh, Majestic Historic, SEMrush
  3. Compare the referring domain link pairs of each data set to Google, assuming a
    Poisson Distribution
  4. Run simulations of each data set’s performance against each other (ie: Moz vs Maj, Ahrefs vs SEMrush, Moz vs SEMrush, et al.)
  5. Analyze the results

Results

When placed head-to-head, there seem to be some clear winners at first glance. In head-to-head, Moz edges out Ahrefs, but across the board, Moz and Ahrefs fare quite evenly. Moz, Ahrefs and SEMrush seem to be far better than Majestic Fresh and Majestic Historic. Is that really the case? And why?

It turns out there is an inversely proportional relationship between index size and proportional relevancy. This might seem counterintuitive,
shouldn’t the bigger indexes be closer to Google? Not Exactly.

What does this mean?

Each organization has to create a crawl prioritization strategy. When you discover millions of links, you have to prioritize which ones you
might crawl next. Google has a crawl prioritization, so does Moz, Majestic, Ahrefs and SEMrush. There are lots of different things you might
choose to prioritize…

  • You might prioritize link discovery. If you want to build a very large index, you could prioritize crawling pages on sites that
    have historically provided new links.
  • You might prioritize content uniqueness. If you want to build a search engine, you might prioritize finding pages that are unlike
    any you have seen before. You could choose to crawl domains that historically provide unique data and little duplicate content.
  • You might prioritize content freshness. If you want to keep your search engine recent, you might prioritize crawling pages that
    change frequently.
  • You might prioritize content value, crawling the most important URLs first based on the number of inbound links to that page.

Chances are, an organization’s crawl priority will blend some of these features, but it’s difficult to design one exactly like Google. Imagine
for a moment that instead of crawling the web, you want to climb a tree. You have to come up with a tree climbing strategy.

  • You decide to climb the longest branch you see at each intersection.
  • One friend of yours decides to climb the first new branch he reaches, regardless of how long it is.
  • Your other friend decides to climb the first new branch she reaches only if she sees another branch coming off of it.

Despite having different climb strategies, everyone chooses the same first branch, and everyone chooses the same second branch. There are only
so many different options early on.

But as the climbers go further and further along, their choices eventually produce differing results. This is exactly the same for web crawlers
like Google, Moz, Majestic, Ahrefs and SEMrush. The bigger the crawl, the more the crawl prioritization will cause disparities. This is not a
deficiency; this is just the nature of the beast. However, we aren’t completely lost. Once we know how index size is related to disparity, we
can make some inferences about how similar a crawl priority may be to Google.

Unfortunately, we have to be careful in our conclusions. We only have a few data points with which to work, so it is very difficult to be
certain regarding this part of the analysis. In particular, it seems strange that Majestic would get better relative to its index size as it grows,
unless Google holds on to old data (which might be an important discovery in and of itself). It is most likely that at this point we can’t make
this level of conclusion.

So what do we do?

Let’s say you have a list of domains or URLs for which you would like to know their relative values. Your process might look something like
this…

  • Check Open Site Explorer to see if all URLs are in their index. If so, you are looking metrics most likely to be proportional to Google’s link graph.
  • If any of the links do not occur in the index, move to Ahrefs and use their Ahrefs ranking if all you need is a single PageRank-like metric.
  • If any of the links are missing from Ahrefs’s index, or you need something related to trust, move on to Majestic Fresh.
  • Finally, use Majestic Historic for (by leaps and bounds) the largest coverage available.

It is important to point out that the likelihood that all the URLs you want to check are in a single index increases as the accuracy of the metric
decreases. Considering the size of Majestic’s data, you can’t ignore them because you are less likely to get null value answers from their data than
the others. If anything rings true, it is that once again it makes sense to get data
from as many sources as possible. You won’t
get the most proportional data without Moz, the broadest data without Majestic, or everything in-between without Ahrefs.

What about SEMrush? They are making progress, but they don’t publish any relative statistics that would be useful in this particular
case. Maybe we can hope to see more from them soon given their already promising index!

Recommendations for the link graphing industry

All we hear about these days is big data; we almost never hear about good data. I know that the teams at Moz,
Majestic, Ahrefs, SEMrush and others are interested in mimicking Google, but I would love to see some organization stand up against the
allure of
more data in favor of better data—data more like Google’s. It could begin with testing various crawl strategies to see if they produce
a result more similar to that of data shared in Google Search Console. Having the most Google-like data is certainly a crown worth winning.

Credits

Thanks to Diana Carter at Angular for assistance with data acquisition and Andrew Cron with statistical analysis. Thanks also to the representatives from Moz, Majestic, Ahrefs, and SEMrush for answering questions about their indices.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Why the Links You’ve Built Aren’t Helping Your Page Rank Higher – Whiteboard Friday

Posted by randfish

Link building can be incredibly effective, but sometimes a lot of effort can go into earning links with absolutely no improvement in rankings. Why? In today’s Whiteboard Friday, Rand shows us four things we should look at in these cases, help us hone our link building skills and make the process more effective.

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re chatting about why link building sometimes fails.

So I’ve got an example here. I’m going to do a search for artificial sweeteners. Let’s say I’m working for these guys, ScienceMag.org. Well, this is actually in position 10. I put it in position 3 here, but I see that I’m position 10. I think to myself, “Man, if I could get higher up on this page, that would be excellent. I’ve already produced the content. It’s on my domain. Like, Google seems to have indexed it fine. It’s performing well enough to perform on page one, granted at the bottom of page one, for this competitive query. Now I want to move my rankings up.”

So a lot of SEOs, naturally and historically, for a long time have thought, “I need to build more links to that page. If I can get more links pointing to this page, I can move up the rankings.” Granted, there are some other ways to do that too, and we’ve discussed those in previous Whiteboard Fridays. But links are one of the big ones that people use.

I think one of the challenges that we encounter is sometimes we invest that effort. We go through the process of that outreach campaign, talking to bloggers and other news sites and looking at where our link sources are coming from and trying to get some more of those. It just doesn’t seem to do anything. The link building appears to fail. It’s like, man, I’ve got all these nice links and no new results. I didn’t move up at all. I am basically staying where I am, or maybe I’m even falling down. Why is that? Why does link building sometimes work so well and so clearly and obviously, and sometimes it seems to do nothing at all?

What are some possible reasons link acquisition efforts may not be effective?

Oftentimes if you get a fresh set of eyes on it, an outside SEO perspective, they can do this audit, and they’ll walk through a lot of this stuff and help you realize, “Oh yeah, that’s probably why.” These are things that you might need to change strategically or tactically as you approach this problem. But you can do this yourself as well by looking at why a link building campaign, why a link building effort, for a particular page, might not be working.

1) Not the right links

First one, it’s not the right links. Not the right links, I mean a wide range of things, even broader than what I’ve listed here. But a lot of times that could mean low domain diversity. Yeah, you’re getting new links, but they’re coming from all the same places that you always get links from. Google, potentially, maybe views that as not particularly worthy of moving you up the rankings, especially around competitive queries.

It might be trustworthiness of source. So maybe they’re saying “Yeah, you got some links, but they’re not from particularly trustworthy places.” Tied into that maybe we don’t think or we’re sure that they’re not editorial. Maybe we think they’re paid, or we think they’re promotional in some way rather than being truly editorially given by this independent resource.

They might not come from a site or from a page that has the authority that’s necessary to move you up. Again, particularly for competitive queries, sometimes low-value links are just that. They’re not going to move the needle, especially not like they used to three, four, five or six years ago, where really just a large quantity of links, even from diverse domains, even if they were crappy links on crappy pages on relatively crappy or unknown websites would move the needle, not so much anymore. Google is seeing a lot more about these things.

Where else does the source link to? Is that source pointing to other stuff that is potentially looking manipulative to Google and so they discounted the outgoing links from that particular domain or those sites or those pages on those sites?

They might look at the relevance and say, “Hey, you know what? Yeah, you got linked to by some technology press articles. That doesn’t really have anything to do with artificial sweeteners, this topic, this realm, or this region.” So you’re not getting the same result. Now we’ve shown that off-topic links can oftentimes move the rankings, but in particular areas and in health, in fact, may be one of those Google might be more topically sensitive to where the links are coming from than other places.

Location on page. So I’ve got a page here and maybe all of my links are coming from a bunch of different domains, but it’s always in the right sidebar and it’s always in this little feed section. So Google’s saying, “Hey, that’s not really an editorial endorsement. That’s just them showing all the links that come through your particular blog feed or a subscription that they’ve got to your content or whatever it is promotionally pushing out. So we’re not going to count it that way.” Same thing a lot of times with footer links. Doesn’t work quite as well. If you’re being honest with yourself, you really want those in content links. Generally speaking, those tend to perform the best.

Or uniqueness. So they might look and they might say, “Yeah, you’ve got a ton of links from people who are republishing your same article and then just linking back to it. That doesn’t feel to us like an editorial endorsement, and so we’re just going to treat those copies as if those links didn’t exist at all.” But the links themselves may not actually be the problem. I think this can be a really important topic if you’re doing link acquisition auditing, because sometimes people get too focused on, “Oh, it must be something about the links that we’re getting.” That’s not always the case actually.

2) Not the right content

Sometimes it’s not the right content. So that could mean things like it’s temporally focused versus evergreen. So for different kinds of queries, Google interprets the intent of the searchers to be different. So it could be that when they see a search like “artificial sweeteners,” they say, “Yeah, it’s great that you wrote this piece about this recent research that came out. But you know what, we’re actually thinking that searchers are going to want in the top few results something that’s evergreen, that contains all the broad information that a searcher might need around this particular topic.”

That speaks to it might not answer the searchers questions. You might think, “Well, I’m answering a great question here.” The problem is, yeah you’re answering one. Searchers may have many questions that they’re asking around a topic, and Google is looking for something comprehensive, something that doesn’t mean a searcher clicks your result and then says, “Well, that was interesting, but I need more from a different result.” They’re looking for the one true result, the one true answer that tells them, “Hey, this person is very happy with these types of results.”

It could be poor user experience causing people to bounce back. That could be speed things, UI things, layout things, browser support things, multi-device support things. It might not use language formatting or text that people or engines can interpret as on the topic. Perhaps this is way over people’s heads, far too scientifically focused, most searchers can’t understand the language, or the other way around. It’s a highly scientific search query and a very advanced search query and your language is way dumbed down. Google isn’t interpreting that as on-topic. All the Hummingbird and topic modeling kind of things that they have say this isn’t for them.

Or it might not match expectations of searchers. This is distinct and different from searchers’ questions. So searchers’ questions is, “I want to know how artificial sweeteners might affect me.” Expectations might be, “I expect to learn this kind of information. I expect to find out these things.” For example, if you go down a rabbit hole of artificial sweeteners will make your skin shiny, they’re like, “Well, that doesn’t meet with my expectation. I don’t think that’s right.” Even if you have some data around that, that’s not what they were expecting to find. They might bounce back. Engines might not interpret you as on-topic, etc. So lots of content kinds of things.

3) Not the right domain

Then there are also domain issues. You might not have the right domain. Your domain might not be associated with the topic or content that Google and searchers are expecting. So they see Mayo Clinic, they see MedicineNet, and they go, “ScienceMag? Do they do health information? I don’t think they do. I’m not sure if that’s an appropriate one.” It might be perceived, even if you aren’t, as spammy or manipulative by Google, more probably than by searchers. Or searchers just won’t click your brand for that content. This is a very frustrating one, because we have seen a ton of times when search behavior is biased by the brand itself, by what’s in this green text here, the domain name or the brand name that Google might show there. That’s very frustrating, but it means that you need to build brand affinity between that topic, that keyword, and what’s in searchers’ heads.

4) Accessibility or technical issues

Then finally, there could be some accessibility or technical issues. Usually when that’s the case, you will notice pretty easily because the page will have an error. It won’t show the content properly. The cache will be an issue. That’s a rare one, but you might want to check for it as well.

But hopefully, using this kind of an audit system, you can figure out why a link building campaign, a link building effort isn’t working to move the needle on your rankings.

With that, we will see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Moz’s 2014 Annual Report

Posted by SarahBird

Moz has a tradition of sharing its financials (check out 2012 and 2013 for funzies). It’s an important part of TAGFEE.

Why do we do it? Moz gets its strength from the community of marketers and entrepreneurs that support it. We celebrated 10 years of our community last October. In some ways, the purpose of this report is to give you an inside look into our company. It’s one of many lenses that tell the story of Moz.

Yep. I know. It’s April. I’m not proud. Better late than never, right?

I had a very long and extensive version of this post planned, something closer to last year’s extravaganza. I finally had to admit to myself that I was letting the perfect become the enemy of the good (or at least the done). There was no way I could capture an entire year’s worth of ups and downs—along with supporting data—in a single blog post.

Without further ado, here’s the meat-and-potatoes 2014 Year In Review (and here’s an infographic with more statistics for your viewing pleasure!):

Moz ended 2014 with $31.3 million in revenue. About $30 million was recurring revenue (mostly from subscriptions to Moz Pro and the API).

Here’s a breakdown of all our major revenue sources:

Compared to previous years, 2014 was a much slower growth year. We knew very early that it was going to be a tough year because we started Q1 with negative growth. We worked very hard and successfully shifted the momentum back to increasingly positive quarterly growth rates. I’m proud of what we’ve accomplished so far. We still have a long ways to go to meet our potential, but we’re on the path.

In subscription businesses, If you start the year with negative or even slow growth it is very hard to have meaningful annual growth. All things being equal, you’re better off having a bad quarter in Q4 than Q1. If you get a new customer in Q1, you usually earn revenue from that customer all year. If you get a new customer in Q4, it will barely make a dent in that year, although it should set you up nicely for the following year.

We exited 2014 on a good flight path, which bodes well for 2015. We slammed right into some nasty billing system challenges in Q1 2015, but still managed to grow revenue 6.5%. Mad props to the team for shifting momentum last year and for digging into the billing system challenges we’re experiencing now.

We were very successful in becoming more efficient and managing costs in 2014. Our Cost of Revenue (COR), the cost of producing what we sell, fell by 30% to $8.2 million. These savings drove our gross profit margin up from 63% in 2013 to 74%.

Our operating profit increased by 30%. Here’s a breakdown of our major expenses (both operating expenses and COR):

Total operating expenses (which don’t include COR) clocked in at about $29.9 million this year.

The efficiency gains positively impacted EBITDA (Earnings Before Interest, Taxes, Depreciation, and Amortization) by pushing it up 50% year over year. In 2013, EBITDA was -$4.5 million. We improved it to -$2.1 million in 2014. We’re a VC-backed startup, so this was a planned loss.

One of the most dramatic indicators of our improved efficiency in 2014 is the substantial decline in our consumption of cash.

In 2014, we spent $1.5 million in cash. This was a planned burn, and is actually very impressive for a startup. In fact, we are intentionally increasing our burn, so we don’t expect EBITDA and cash burn to look as good in 2015! Hopefully, though, you will see that revenue growth rate increase.

Let’s check in on some other Moz KPIs:

At the end of 2014, we reported a little over 27,000 Pro users. When billing system issues hit in Q1 2015, we discovered some weird under- and over-reporting, so the number of subscribers was adjusted down by about ~450 after we scrubbed a bunch of inactive accounts out of the database. We expect accounts to stabilize and be more reliable now that we’ve fixed those issues.

We launched Moz Local about a year ago. I’m amazed and thrilled that we were able to end the year managing 27,000 locations for a range of customers. We just recently took our baby steps into the UK, and we’ve got a bunch of great additional features planned. What an incredible launch year!

We published over 300 posts combined on the Moz Blog and YouMoz. Nearly 20,000 people left comments. Well done, team!

Our content and social efforts are paying off with a 26% year-over-year increase in organic search traffic.

We continue to see good growth across many of our off-site communities, too:

The team grew to 149 people last year. We’re at ~37% women, which is nowhere near where I want it to be. We have a long way to go before the team reflects the diversity of the communities around us.

Our paid, paid vacation perk is very popular with Mozzers, and why wouldn’t it be? Everyone gets $3,000/year to use toward their vacations. In 2014, we spent over $420,000 to help our Mozzers take a break and get connected with matters most.

PPV.png

Also, we’re hiring! You’ll have my undying gratitude if you send me your best software engineers. Help us, help you. 😉

Last, but certainly not least, Mozzers continue to be generous (the ‘G’ in TAGFEE) and donate to the charities of their choice. In 2014, Mozzers donated $48k, and Moz added another $72k to increase the impact of their gifts. Combining those two figures, we donated $120k to causes our team members are passionate about. That’s an average of $805 per employee!

Mozzers are optimists with initiative. I think that’s why they are so generous with their time and money to folks in need. They believe the world can be a better place if we act to change it.

That’s a wrap on 2014! A year with many ups and downs. Fortunately, Mozzers don’t quit when things get hard. They embrace TAGFEE and lean into the challenge.

Revenue is growing again. We’re still operating very efficiently, and TAGFEE is strong. We’re heads-down executing on some big projects that customers have been clamoring for. Thank you for sticking with us, and for inspiring us to make marketing better every day.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Understanding and Applying Moz’s Spam Score Metric – Whiteboard Friday

Posted by randfish

This week, Moz released a new feature that we call Spam Score, which helps you analyze your link profile and weed out the spam (check out the blog post for more info). There have been some fantastic conversations about how it works and how it should (and shouldn’t) be used, and we wanted to clarify a few things to help you all make the best use of the tool.

In today’s Whiteboard Friday, Rand offers more detail on how the score is calculated, just what those spam flags are, and how we hope you’ll benefit from using it.

For reference, here’s a still of this week’s whiteboard. 

Click on the image above to open a high resolution version in a new tab!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to chat a little bit about Moz’s Spam Score. Now I don’t typically like to do Whiteboard Fridays specifically about a Moz project, especially when it’s something that’s in our toolset. But I’m making an exception because there have been so many questions and so much discussion around Spam Score and because I hope the methodology, the way we calculate things, the look at correlation and causation, when it comes to web spam, can be useful for everyone in the Moz community and everyone in the SEO community in addition to being helpful for understanding this specific tool and metric.

The 17-flag scoring system

I want to start by describing the 17 flag system. As you might know, Spam Score is shown as a score from 0 to 17. You either fire a flag or you don’t. Those 17 flags you can see a list of them on the blog post, and we’ll show that in there. Essentially, those flags correlate to the percentage of sites that we found with that count of flags, not those specific flags, just any count of those flags that were penalized or banned by Google. I’ll show you a little bit more in the methodology.

Basically, what this means is for sites that had 0 spam flags, none of the 17 flags that we had fired, that actually meant that 99.5% of those sites were not penalized or banned, on average, in our analysis and 0.5% were. At 3 flags, 4.2% of those sites, that’s actually still a huge number. That’s probably in the millions of domains or subdomains that Google has potentially still banned. All the way down here with 11 flags, it’s 87.3% that we did find banned. That seems pretty risky or penalized. It seems pretty risky. But 12.7% of those is still a very big number, again probably in the hundreds of thousands of unique websites that are not banned but still have these flags.

If you’re looking at a specific subdomain and you’re saying, “Hey, gosh, this only has 3 flags or 4 flags on it, but it’s clearly been penalized by Google, Moz’s score must be wrong,” no, that’s pretty comfortable. That should fit right into those kinds of numbers. Same thing down here. If you see a site that is not penalized but has a number of flags, that’s potentially an indication that you’re in that percentage of sites that we found not to be penalized.

So this is an indication of percentile risk, not a “this is absolutely spam” or “this is absolutely not spam.” The only caveat is anything with, I think, more than 13 flags, we found 100% of those to have been penalized or banned. Maybe you’ll find an odd outlier or two. Probably you won’t.

Correlation ≠ causation

Correlation is not causation. This is something we repeat all the time here at Moz and in the SEO community. We do a lot of correlation studies around these things. I think people understand those very well in the fields of social media and in marketing in general. Certainly in psychology and electoral voting and election polling results, people understand those correlations. But for some reason in SEO we sometimes get hung up on this.

I want to be clear. Spam flags and the count of spam flags correlates with sites we saw Google penalize. That doesn’t mean that any of the flags or combinations of flags actually cause the penalty. It could be that the things that are flags are not actually connected to the reasons Google might penalize something at all. Those could be totally disconnected.

We are not trying to say with the 17 flags these are causes for concern or you need to fix these. We are merely saying this feature existed on this website when we crawled it, or it had this feature, maybe it still has this feature. Therefore, we saw this count of these features that correlates to this percentile number, so we’re giving you that number. That’s all that the score intends to say. That’s all it’s trying to show. It’s trying to be very transparent about that. It’s not trying to say you need to fix these.

A lot of flags and features that are measured are perfectly fine things to have on a website, like no social accounts or email links. That’s a totally reasonable thing to have, but it is a flag because we saw it correlate. A number in your domain name, I think it’s fine if you want to have a number in your domain name. There’s plenty of good domains that have a numerical character in them. That’s cool.

TLD extension that happens to be used by lots of spammers, like a .info or a .cc or a number of other ones, that’s also totally reasonable. Just because lots of spammers happen to use those TLD extensions doesn’t mean you are necessarily spam because you use one.

Or low link diversity. Maybe you’re a relatively new site. Maybe your niche is very small, so the number of folks who point to your site tends to be small, and lots of the sites that organically naturally link to you editorially happen to link to you from many of their pages, and there’s not a ton of them. That will lead to low link diversity, which is a flag, but it isn’t always necessarily a bad thing. It might still nudge you to try and get some more links because that will probably help you, but that doesn’t mean you are spammy. It just means you fired a flag that correlated with a spam percentile.

The methodology we use

The methodology that we use, for those who are curious — and I do think this is a methodology that might be interesting to potentially apply in other places — is we brainstormed a large list of potential flags, a huge number. We cut that down to the ones we could actually do, because there were some that were just unfeasible for our technology team, our engineering team to do.

Then, we got a huge list, many hundreds of thousands of sites that were penalized or banned. When we say banned or penalized, what we mean is they didn’t rank on page one for either their own domain name or their own brand name, the thing between the
www and the .com or .net or .info or whatever it was. If you didn’t rank for either your full domain name, www and the .com or Moz, that would mean we said, “Hey, you’re penalized or banned.”

Now you might say, “Hey, Rand, there are probably some sites that don’t rank on page one for their own brand name or their own domain name, but aren’t actually penalized or banned.” I agree. That’s a very small number. Statistically speaking, it probably is not going to be impactful on this data set. Therefore, we didn’t have to control for that. We ended up not controlling for that.

Then we found which of the features that we ideated, brainstormed, actually correlated with the penalties and bans, and we created the 17 flags that you see in the product today. There are lots things that I thought were going to correlate, for example spammy-looking anchor text or poison keywords on the page, like Viagra, Cialis, Texas Hold’em online, pornography. Those things, not all of them anyway turned out to correlate well, and so they didn’t make it into the 17 flags list. I hope over time we’ll add more flags. That’s how things worked out.

How to apply the Spam Score metric

When you’re applying Spam Score, I think there are a few important things to think about. Just like domain authority, or page authority, or a metric from Majestic, or a metric from Google, or any other kind of metric that you might come up with, you should add it to your toolbox and to your metrics where you find it useful. I think playing around with spam, experimenting with it is a great thing. If you don’t find it useful, just ignore it. It doesn’t actually hurt your website. It’s not like this information goes to Google or anything like that. They have way more sophisticated stuff to figure out things on their end.

Do not just disavow everything with seven or more flags, or eight or more flags, or nine or more flags. I think that we use the color coding to indicate 0% to 10% of these flag counts were penalized or banned, 10% to 50% were penalized or banned, or 50% or above were penalized or banned. That’s why you see the green, orange, red. But you should use the count and line that up with the percentile. We do show that inside the tool as well.

Don’t just take everything and disavow it all. That can get you into serious trouble. Remember what happened with Cyrus. Cyrus Shepard, Moz’s head of content and SEO, he disavowed all the backlinks to its site. It took more than a year for him to rank for anything again. Google almost treated it like he was banned, not completely, but they seriously took away all of his link power and didn’t let him back in, even though he changed the disavow file and all that.

Be very careful submitting disavow files. You can hurt yourself tremendously. The reason we offer it in disavow format is because many of the folks in our customer testing said that’s how they wanted it so they could copy and paste, so they could easily review, so they could get it in that format and put it into their already existing disavow file. But you should not do that. You’ll see a bunch of warnings if you try and generate a disavow file. You even have to edit your disavow file before you can submit it to Google, because we want to be that careful that you don’t go and submit.

You should expect the Spam Score accuracy. If you’re doing spam investigation, you’re probably looking at spammier sites. If you’re looking at a random hundred sites, you should expect that the flags would correlate with the percentages. If I look at a random hundred 4 flag Spam Score sites, 7.5% of those I would expect on average to be penalized or banned. If you are therefore seeing sites that don’t fit those, they probably fit into the percentiles that were not penalized, or up here were penalized, down here weren’t penalized, that kind of thing.

Hopefully, you find Spam Score useful and interesting and you add it to your toolbox. We would love to hear from you on iterations and ideas that you’ve got for what we can do in the future, where else you’d like to see it, and where you’re finding it useful/not useful. That would be great.

Hopefully, you’ve enjoyed this edition of Whiteboard Friday and will join us again next week. Thanks so much. Take care.

Video transcription by Speechpad.com

ADDITION FROM RAND: I also urge folks to check out Marie Haynes’ excellent Start-to-Finish Guide to Using Google’s Disavow Tool. We’re going to update the feature to link to that as well.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Spam Score: Moz’s New Metric to Measure Penalization Risk

Posted by randfish

Today, I’m very excited to announce that Moz’s Spam Score, an R&D project we’ve worked on for nearly a year, is finally going live. In this post, you can learn more about how we’re calculating spam score, what it means, and how you can potentially use it in your SEO work.

How does Spam Score work?

Over the last year, our data science team, led by 
Dr. Matt Peters, examined a great number of potential factors that predicted that a site might be penalized or banned by Google. We found strong correlations with 17 unique factors we call “spam flags,” and turned them into a score.

Almost every subdomain in 
Mozscape (our web index) now has a Spam Score attached to it, and this score is viewable inside Open Site Explorer (and soon, the MozBar and other tools). The score is simple; it just records the quantity of spam flags the subdomain triggers. Our correlations showed that no particular flag was more likely than others to mean a domain was penalized/banned in Google, but firing many flags had a very strong correlation (you can see the math below).

Spam Score currently operates only on the subdomain level—we don’t have it for pages or root domains. It’s been my experience and the experience of many other SEOs in the field that a great deal of link spam is tied to the subdomain-level. There are plenty of exceptions—manipulative links can and do live on plenty of high-quality sites—but as we’ve tested, we found that subdomain-level Spam Score was the best solution we could create at web scale. It does a solid job with the most obvious, nastiest spam, and a decent job highlighting risk in other areas, too.

How to access Spam Score

Right now, you can find Spam Score inside 
Open Site Explorer, both in the top metrics (just below domain/page authority) and in its own tab labeled “Spam Analysis.” Spam Score is only available for Pro subscribers right now, though in the future, we may make the score in the metrics section available to everyone (if you’re not a subscriber, you can check it out with a free trial). 

The current Spam Analysis page includes a list of subdomains or pages linking to your site. You can toggle the target to look at all links to a given subdomain on your site, given pages, or the entire root domain. You can further toggle source tier to look at the Spam Score for incoming linking pages or subdomains (but in the case of pages, we’re still showing the Spam Score for the subdomain on which that page is hosted).

You can click on any Spam Score row and see the details about which flags were triggered. We’ll bring you to a page like this:

Back on the original Spam Analysis page, at the very bottom of the rows, you’ll find an option to export a disavow file, which is compatible with Google Webmaster Tools. You can choose to filter the file to contain only those sites with a given spam flag count or higher:

Disavow exports usually take less than 3 hours to finish. We can send you an email when it’s ready, too.

WARNING: Please do not export this file and simply upload it to Google! You can really, really hurt your site’s ranking and there may be no way to recover. Instead, carefully sort through the links therein and make sure you really do want to disavow what’s in there. You can easily remove/edit the file to take out links you feel are not spam. When Moz’s Cyrus Shepard disavowed every link to his own site, it took more than a year for his rankings to return!

We’ve actually made the file not-wholly-ready for upload to Google in order to be sure folks aren’t too cavalier with this particular step. You’ll need to open it up and make some edits (specifically to lines at the top of the file) in order to ready it for Webmaster Tools

In the near future, we hope to have Spam Score in the Mozbar as well, which might look like this: 

Sweet, right? 🙂

Potential use cases for Spam Analysis

This list probably isn’t exhaustive, but these are a few of the ways we’ve been playing around with the data:

  1. Checking for spammy links to your own site: Almost every site has at least a few bad links pointing to it, but it’s been hard to know how much or how many potentially harmful links you might have until now. Run a quick spam analysis and see if there’s enough there to cause concern.
  2. Evaluating potential links: This is a big one where we think Spam Score can be helpful. It’s not going to catch every potentially bad link, and you should certainly still use your brain for evaluation too, but as you’re scanning a list of link opportunities or surfing to various sites, having the ability to see if they fire a lot of flags is a great warning sign.
  3. Link cleanup: Link cleanup projects can be messy, involved, precarious, and massively tedious. Spam Score might not catch everything, but sorting links by it can be hugely helpful in identifying potentially nasty stuff, and filtering out the more probably clean links.
  4. Disavow Files: Again, because Spam Score won’t perfectly catch everything, you will likely need to do some additional work here (especially if the site you’re working on has done some link buying on more generally trustworthy domains), but it can save you a heap of time evaluating and listing the worst and most obvious junk.

Over time, we’re also excited about using Spam Score to help improve the PA and DA calculations (it’s not currently in there), as well as adding it to other tools and data sources. We’d love your feedback and insight about where you’d most want to see Spam Score get involved.

Details about Spam Score’s calculation

This section comes courtesy of Moz’s head of data science, Dr. Matt Peters, who created the metric and deserves (at least in my humble opinion) a big round of applause. – Rand

Definition of “spam”

Before diving into the details of the individual spam flags and their calculation, it’s important to first describe our data gathering process and “spam” definition.

For our purposes, we followed Google’s definition of spam and gathered labels for a large number of sites as follows.

  • First, we randomly selected a large number of subdomains from the Mozscape index stratified by mozRank.
  • Then we crawled the subdomains and threw out any that didn’t return a “200 OK” (redirects, errors, etc).
  • Finally, we collected the top 10 de-personalized, geo-agnostic Google-US search results using the full subdomain name as the keyword and checked whether any of those results matched the original keyword. If they did not, we called the subdomain “spam,” otherwise we called it “ham.”

We performed the most recent data collection in November 2014 (after the Penguin 3.0 update) for about 500,000 subdomains.

Relationship between number of flags and spam

The overall Spam Score is currently an aggregate of 17 different “flags.” You can think of each flag a potential “warning sign” that signals that a site may be spammy. The overall likelihood of spam increases as a site accumulates more and more flags, so that the total number of flags is a strong predictor of spam. Accordingly, the flags are designed to be used together—no single flag, or even a few flags, is cause for concern (and indeed most sites will trigger at least a few flags).

The following table shows the relationship between the number of flags and percent of sites with those flags that we found Google had penalized or banned:

ABOVE: The overall probability of spam vs. the number of spam flags. Data collected in Nov. 2014 for approximately 500K subdomains. The table also highlights the three overall danger levels: low/green (< 10%) moderate/yellow (10-50%) and high/red (>50%)

The overall spam percent averaged across a large number of sites increases in lock step with the number of flags; however there are outliers in every category. For example, there are a small number of sites with very few flags that are tagged as spam by Google and conversely a small number of sites with many flags that are not spam.

Spam flag details

The individual spam flags capture a wide range of spam signals link profiles, anchor text, on page signals and properties of the domain name. At a high level the process to determine the spam flags for each subdomain is:

  • Collect link metrics from Mozscape (mozRank, mozTrust, number of linking domains, etc).
  • Collect anchor text metrics from Mozscape (top anchor text phrases sorted by number of links)
  • Collect the top five pages by Page Authority on the subdomain from Mozscape
  • Crawl the top five pages plus the home page and process to extract on page signals
  • Provide the output for Mozscape to include in the next index release cycle

Since the spam flags are incorporated into in the Mozscape index, fresh data is released with each new index. Right now, we crawl and process the spam flags for each subdomains every two – three months although this may change in the future.

Link flags

The following table lists the link and anchor text related flags with the the odds ratio for each flag. For each flag, we can compute two percents: the percent of sites with that flag that are penalized by Google and the percent of sites with that flag that were not penalized. The odds ratio is the ratio of these percents and gives the increase in likelihood that a site is spam if it has the flag. For example, the first row says that a site with this flag is 12.4 times more likely to be spam than one without the flag.

ABOVE: Description and odds ratio of link and anchor text related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

Working down the table, the flags are:

  • Low mozTrust to mozRank ratio: Sites with low mozTrust compared to mozRank are likely to be spam.
  • Large site with few links: Large sites with many pages tend to also have many links and large sites without a corresponding large number of links are likely to be spam.
  • Site link diversity is low: If a large percentage of links to a site are from a few domains it is likely to be spam.
  • Ratio of followed to nofollowed subdomains/domains (two separate flags): Sites with a large number of followed links relative to nofollowed are likely to be spam.
  • Small proportion of branded links (anchor text): Organically occurring links tend to contain a disproportionate amount of banded keywords. If a site does not have a lot of branded anchor text, it’s a signal the links are not organic.

On-page flags

Similar to the link flags, the following table lists the on page and domain name related flags:

ABOVE: Description and odds ratio of on page and domain name related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

  • Thin content: If a site has a relatively small ratio of content to navigation chrome it’s likely to be spam.
  • Site mark-up is abnormally small: Non-spam sites tend to invest in rich user experiences with CSS, Javascript and extensive mark-up. Accordingly, a large ratio of text to mark-up is a spam signal.
  • Large number of external links: A site with a large number of external links may look spammy.
  • Low number of internal links: Real sites tend to link heavily to themselves via internal navigation and a relative lack of internal links is a spam signal.
  • Anchor text-heavy page: Sites with a lot of anchor text are more likely to be spam then those with more content and less links.
  • External links in navigation: Spam sites may hide external links in the sidebar or footer.
  • No contact info: Real sites prominently display their social and other contact information.
  • Low number of pages found: A site with only one or a few pages is more likely to be spam than one with many pages.
  • TLD correlated with spam domains: Certain TLDs are more spammy than others (e.g. pw).
  • Domain name length: A long subdomain name like “bycheapviagra.freeshipping.onlinepharmacy.com” may indicate keyword stuffing.
  • Domain name contains numerals: domain names with numerals may be automatically generated and therefore spam.

If you’d like some more details on the technical aspects of the spam score, check out the 
video of Matt’s 2012 MozCon talk about Algorithmic Spam Detection or the slides (many of the details have evolved, but the overall ideas are the same):

We’d love your feedback

As with all metrics, Spam Score won’t be perfect. We’d love to hear your feedback and ideas for improving the score as well as what you’d like to see from it’s in-product application in the future. Feel free to leave comments on this post, or to email Matt (matt at moz dot com) and me (rand at moz dot com) privately with any suggestions.

Good luck cleaning up and preventing link spam!



Not a Pro Subscriber? No problem!



Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Dominate SE rankings – Search Engine Optimization made easy.mp4

http://article-submission-service.org understands that link diversity is an important aspect of any link building campaign. article-submission-service.org he…

Reblogged 4 years ago from www.youtube.com

Backlinks Genie – Daily Link Building

The best Link Building Service. Link Quality and Diversity. Get Your Free Trial http://topseosoft.com/Backlinks-Genie/

Reblogged 5 years ago from www.youtube.com