Pinpoint vs. Floodlight Content and Keyword Research Strategies – Whiteboard Friday

Posted by randfish

When we’re doing keyword research and targeting, we have a choice to make: Are we targeting broader keywords with multiple potential searcher intents, or are we targeting very narrow keywords where it’s pretty clear what the searchers were looking for? Those different approaches, it turns out, apply to content creation and site architecture, as well. In today’s Whiteboard Friday, Rand illustrates that connection.

Pinpoint vs Floodlight Content and Keyword Research Strategy Whiteboard

For reference, here are stills of this week’s whiteboards. Click on it to open a high resolution image in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat about pinpoint versus floodlight tactics for content targeting, content strategy, and keyword research, keyword targeting strategy. This is also called the shotgun versus sniper approach, but I’m not a big gun fan. So I’m going to stick with my floodlight versus pinpoint, plus, you know, for the opening shot we don’t have a whole lot of weaponry here at Moz, but we do have lighting.

So let’s talk through this at first. You’re going through and doing some keyword research. You’re trying to figure out which terms and phrases to target. You might look down a list like this.

Well, maybe, I’m using an example here around antique science equipment. So you see these various terms and phrases. You’ve got your volume numbers. You probably have lots of other columns. Hopefully, you’ve watched the Whiteboard Friday on how to do keyword research like it’s 2015 and not 2010.

So you know you have all these other columns to choose from, but I’m simplifying here for the purpose of this experiment. So you might choose some of these different terms. Now, they’re going to have different kinds of tactics and a different strategic approach, depending on the breadth and depth of the topic that you’re targeting. That’s going to determine what types of content you want to create and where you place it in your information architecture. So I’ll show you what I mean.

The floodlight approach

For antique science equipment, this is a relatively broad phrase. I’m going to do my floodlight analysis on this, and floodlight analysis is basically saying like, “Okay, are there multiple potential searcher intents?” Yeah, absolutely. That’s a fairly broad phase. People could be looking to transact around it. They might be looking for research information, historical information, different types of scientific equipment that they’re looking for.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b15fc96679b8.73854740.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

Are there four or more approximately unique keyword terms and phrases to target? Well, absolutely, in fact, there’s probably more than that. So antique science equipment, antique scientific equipment, 18th century scientific equipment, all these different terms and phrases that you might explore there.

Is this a broad content topic with many potential subtopics? Again, yes is the answer to this. Are we talking about generally larger search volume? Again, yes, this is going to have a much larger search volume than some of the narrower terms and phrases. That’s not always the case, but it is here.

The pinpoint approach

For pinpoint analysis, we kind of go the opposite direction. So we might look at a term like antique test tubes, which is a very specific kind of search, and that has a clear single searcher intent or maybe two. Someone might be looking for actually purchasing one of those, or they might be looking to research them and see what kinds there are. Not a ton of additional intents behind that. One to three unique keywords, yeah, probably. It’s pretty specific. Antique test tubes, maybe 19th century test tubes, maybe old science test tubes, but you’re talking about a limited set of keywords that you’re targeting. It’s a narrow content topic, typically smaller search volume.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b160069eb6b1.12473448.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

Now, these are going to feed into your IA, your information architecture, and your site structure in this way. So floodlight content generally sits higher up. It’s the category or the subcategory, those broad topic terms and phrases. Those are going to turn into those broad topic category pages. Then you might have multiple, narrower subtopics. So we could go into lab equipment versus astronomical equipment versus chemistry equipment, and then we’d get into those individual pinpoints from the pinpoint analysis.

How do I decide which approach is best for my keywords?

Why are we doing this? Well, generally speaking, if you can take your terms and phrases and categorize them like this and then target them differently, you’re going to provide a better, more logical user experience. Someone who searches for antique scientific equipment, they’re going to really expect to see that category and then to be able to drill down into things. So you’re providing them the experience they predict, the one that they want, the one that they expect.

It’s better for topic modeling analysis and for all of the algorithms around things like Hummingbird, where Google looks at: Are you using the types of terms and phrases, do you have the type of architecture that we expect to find for this keyword?

It’s better for search intent targeting, because the searcher intent is going to be fulfilled if you provide the multiple paths versus the narrow focus. It’s easier keyword targeting for you. You’re going to be able to know, “Hey, I need to target a lot of different terms and phrases and variations in floodlight and one very specific one in pinpoint.”

There’s usually higher searcher satisfaction, which means you get lower bounce rate. You get more engagement. You usually get a higher conversion rate. So it’s good for all those things.

For example…

I’ll actually create pages for each of antique scientific equipment and antique test tubes to illustrate this. So I’ve got two different types of pages here. One is my antique scientific equipment page.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b161fa871e32.54731215.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

This is that floodlight, shotgun approach, and what we’re doing here is going to be very different from a pinpoint approach. It’s looking at like, okay, you’ve landed on antique scientific equipment. Now, where do you want to go? What do you want to specifically explore? So we’re going to have a little bit of content specifically about this topic, and how robust that is depends on the type of topic and the type of site you are.

If this is an e-commerce site or a site that’s showing information about various antiques, well maybe we don’t need very much content here. You can see the filtration that we’ve got is going to be pretty broad. So I can go into different centuries. I can go into chemistry, astronomy, physics. Maybe I have a safe for kids type of stuff if you want to buy your kids antique lab equipment, which you might be. Who knows? Maybe you’re awesome and your kids are too. Then different types of stuff at a very broad level. So I can go to microscopes or test tubes, lab searches.

This is great because it’s got broad intent foci, serving many different kinds of searchers with the same page because we don’t know exactly what they want. It’s got multiple keyword targets so that we can go after broad phrases like antique or old or historical or 13th, 14th, whatever century, science and scientific equipment ,materials, labs, etc., etc., etc. This is a broad page that could reach any and all of those. Then there’s lots of navigational and refinement options once you get there.

Total opposite of pinpoint content.

<img src="http://d1avok0lzls2w.cloudfront.net/uploads/blog/55b1622740f0b5.73477500.jpg" rel="box-shadow: 0 0 10px 0 #999; border-radius: 20px;"

Pinpoint content, like this antique test tubes page, we’re still going to have some filtration options, but one of the important things to note is note how these are links that take you deeper. Depending on how deep the search volume goes in terms of the types of queries that people are performing, you might want to make a specific page for 17th century antique test tubes. You might not, and if you don’t want to do that, you can have these be filters that are simply clickable and change the content of the page here, narrowing the options rather than creating completely separate pages.

So if there’s no search volume for these different things and you don’t think you need to separately target them, go ahead and just make them filters on the data that already appears on this page or the results that are already in here as opposed to links that are going to take you deeper into specific content and create a new page, a new experience.

You can also see I’ve got my individual content here. I probably would go ahead and add some content specifically to this page that is just unique here and that describes antique test tubes and the things that your searchers need. They might want to know things about price. They might want to know things about make and model. They might want to know things about what they were used for. Great. You can have that information broadly, and then individual pieces of content that someone might dig into.

This is narrower intent foci obviously, serving maybe one or two searcher intents. This is really talking about targeting maybe one to two separate keywords. So antique test tubes, maybe lab tubes or test tube sets, but not much beyond that.

Ten we’re going to have fewer navigational paths, fewer distractions. We want to keep the searcher. Because we know their intent, we want to guide them along the path that we know they probably want to take and that we want them to take.

So when you’re considering your content, choose wisely between shotgun/floodlight approach or sniper/pinpoint approach. Your searchers will be better served. You’ll probably rank better. You’ll be more likely to earn links and amplification. You’re going to be more successful.

Looking forward to the comments, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

An Open-Source Tool for Checking rel-alternate-hreflang Annotations

Posted by Tom-Anthony

In the Distilled R&D department we have been ramping up the amount of automated monitoring and analysis we do, with an internal system monitoring our client’s sites both directly and via various data sources to ensure they remain healthy and we are alerted to any problems that may arise.

Recently we started work to add in functionality for including the rel-alternate-hreflang annotations in this system. In this blog post I’m going to share an open-source Python library we’ve just started work on for the purpose, which makes it easy to read the hreflang entries from a page and identify errors with them.

If you’re not a Python aficionado then don’t despair, as I have also built a ready-to-go tool for you to use, which will quickly do some checks on the hreflang entries for any URL you specify. 🙂

Google’s Search Console (formerly Webmaster Tools) does have some basic rel-alternate-hreflang checking built in, but it is limited in how you can use it and you are restricted to using it for verified sites.

rel-alternate-hreflang checklist

Before we introduce the code, I wanted to quickly review a list of five easy and common mistakes that we will want to check for when looking at rel-alternate-hreflang annotations:

  • return tag errors – Every alternate language/locale URL of a page should, itself, include a link back to the first page. This makes sense but I’ve seen people make mistakes with it fairly often.
  • indirect / broken links – Links to alternate language/region versions of the page should no go via redirects, and should not link to missing or broken pages.
  • multiple entries – There should never be multiple entries for a single language/region combo.
  • multiple defaults – You should never have more than one x-default entry.
  • conflicting modes – rel-alternate-hreflang entries can be implemented via inline HTML, XML sitemaps, or HTTP headers. For any one set of pages only one implementation mode should be used.

So now imagine that we want to simply automate these checks quickly and simply…

Introducing: polly – the hreflang checker library

polly is the name for the library we have developed to help us solve this problem, and we are releasing it as open source so the SEO community can use it freely to build upon. We only started work on it last week, but we plan to continue developing it, and will also accept contributions to the code from the community, so we expect its feature set to grow rapidly.

If you are not comfortable tinkering with Python, then feel free to skip down to the next section of the post, where there is a tool that is built with polly which you can use right away.

Still here? Ok, great. You can install polly easily via pip:

pip install polly

You can then create a PollyPage() object which will do all our work and store the data simply by instantiating the class with the desired URL:

my_page = PollyPage("http://www.facebook.com/")

You can quickly see the hreflang entries on the page by running:

print my_page.alternate_urls_map

You can list all the hreflang values encountered on a page, and which countries and languages they cover:

print my_page.hreflang_values
print my_page.languages
print my_page.regions

You can also check various aspects of a page, see whether the pages it includes in its rel-alternate-hreflang entries point back, or whether there are entries that do not see retrievable (due to 404 or 500 etc. errors):

print my_page.is_default
print my_page.no_return_tag_pages()
print my_page.non_retrievable_pages()

Get more instructions and grab the code at the polly github page. Hit me up in the comments with any questions.

Free tool: hreflang.ninja

I have put together a very simple tool that uses polly to run some of the checks we highlighted above as being common mistakes with rel-alternate-hreflang, which you can visit right now and start using:

http://hreflang.ninja

Simply enter a URL and hit enter, and you should see something like:

Example output from the ninja!

The tool shows you the rel-alternate-hreflang entries found on the page, the language and region of those entries, the alternate URLs, and any errors identified with the entry. It is perfect for doing quick’n’dirty checks of a URL to identify any errors.

As we add additional functionality to polly we will be updating hreflang.ninja as well, so please tweet me with feature ideas or suggestions.

To-do list!

This is the first release of polly and currently we only handle annotations that are in the HTML of the page, not those in the XML sitemap or HTTP headers. However, we are going to be updating polly (and hreflang.ninja) over the coming weeks, so watch this space! 🙂

Resources

Here are a few links you may find helpful for hreflang:

Got suggestions?

With the increasing number of SEO directives and annotations available, and the ever-changing guidelines around how to deploy them, it is important to automate whatever areas possible. Hopefully polly is helpful to the community in this regard, and we want to here what ideas you have for making these tools more useful – here in the comments or via Twitter.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

8 Ways Content Marketers Can Hack Facebook Multi-Product Ads

Posted by Alan_Coleman

The trick most content marketers are missing

Creating great content is the first half of success in content marketing. Getting quality content read by, and amplified to, a relevant audience is the oft overlooked second half of success. Facebook can be a content marketer’s best friend for this challenge. For reach, relevance and amplification potential, Facebook is unrivaled.

  1. Reach: 1 in 6 mobile minutes on planet earth is somebody reading something on Facebook.
  2. Relevance: Facebook is a lean mean interest and demo targeting machine. There is no online or offline media that owns as much juicy interest and demographic information on its audience and certainly no media has allowed advertisers to utilise this information as effectively as Facebook has.
  3. Amplification: Facebook is literally built to encourage sharing. Here’s the first 10 words from their mission statement: “Facebook’s mission is to give people the power to share…”, Enough said!

Because of these three digital marketing truths, if a content marketer gets their paid promotion* right on Facebook, the battle for eyeballs and amplification is already won.

For this reason it’s crucial that content marketers keep a close eye on Facebook advertising innovations and seek out ways to use them in new and creative ways.

In this post I will share with you eight ways we’ve hacked a new Facebook ad format to deliver content marketing success.

Multi-Product Ads (MPAs)

In 2014, Facebook unveiled multi-product ads (MPAs) for US advertisers, we got them in Europe earlier this year. They allow retailers to show multiple products in a carousel-type ad unit.

They look like this:

If the user clicks on the featured product, they are guided directly to the landing page for that specific product, from where they can make a purchase.

You could say MPAs are Facebook’s answer to Google Shopping.

Facebook’s mistake is a content marketer’s gain

I believe Facebook has misunderstood how people want to use their social network and the transaction-focused format is OK at best for selling products. People aren’t really on Facebook to hit the “buy now” button. I’m a daily Facebook user and I can’t recall a time this year where I have gone directly from Facebook to an e-commerce website and transacted. Can you remember a recent time when you did?

So, this isn’t an innovation that removes a layer of friction from something that we are all doing online already (as the most effective innovations do). Instead, it’s a bit of a “hit and hope” that, by providing this functionality, Facebook would encourage people to try to buy online in a way they never have before.

The Wolfgang crew felt the MPA format would be much more useful to marketers and users if they were leveraging Facebook for the behaviour we all demonstrate on the platform every day, guiding users to relevant content. We attempted to see if Facebook Ads Manager would accept MPAs promoting content rather than products. We plugged in the images, copy and landing pages, hit “place order”, and lo and behold the ads became active. We’re happy to say that the engagement rates, and more importantly the amplification rates, are fantastic!

Multi-Content Ads

We’ve re-invented the MPA format for multi-advertisers in multi-ways, eight ways to be exact! Here’s eight MPA Hacks that have worked well for us. All eight hacks use the MPA format to promote content rather than promote products.

Hack #1: Multi-Package Ads

Our first variation wasn’t a million miles away from multi-product ads; we were promoting the various packages offered by a travel operator.

By looking at the number of likes, comments, and shares (in blue below the ads) you can see the ads were a hit with Facebook users and they earned lots of free engagement and amplification.

NB: If you have selected “clicks to website” as your advertising objective, all those likes, comments and shares are free!

Independent Travel Multi Product Ad

The ad sparked plenty of conversation amongst Facebook friends in the comments section.

Comments on a Facebook MPA

Hack #2: Multi-Offer Ads

Everybody knows the Internet loves a bargain. So we decided to try another variation moving away from specific packages, focusing instead on deals for a different travel operator.

Here’s how the ads looked:

These ads got valuable amplification beyond the share. In the comments section, you can see people tagging specific friends. This led to the MPAs receiving further amplification, and a very targeted and personalised form of amplification to boot.

Abbey Travel Facebook Ad Comments

Word of mouth referrals have been a trader’s best friend since the stone age. These “personalised” word of mouth referrals en masse are a powerful marketing proposition. It’s worth mentioning again that those engagements are free!

Hack #3: Multi-Locations Ads

Putting the Lo in SOLOMO.

This multi-product feed ad was hacked to promote numerous locations of a waterpark. “Where to go?” is among the first questions somebody asks when researching a holiday. In creating this top of funnel content, we can communicate with our target audience at the very beginning of their research process. A simple truth of digital marketing is: the more interactions you have with your target market on their journey to purchase, the more likely they are to seal the deal with you when it comes time to hit the “buy now” button. Starting your relationship early gives you an advantage over those competitors who are hanging around the bottom of the purchase funnel hoping to make a quick and easy conversion.

Abbey Travel SplashWorld Facebook MPA

What was surprising here, was that because we expected to reach people at the very beginning of their research journey, we expected the booking enquiries to be some time away. What actually happened was these ads sparked an enquiry frenzy as Facebook users could see other people enquiring and the holidays selling out in real time.

Abbey Travel comments and replies

In fact nearly all of the 35 comments on this ad were booking enquiries. This means what we were measuring as an “engagement” was actually a cold hard “conversion”! You don’t need me to tell you a booking enquiry is far closer to the money than a Facebook like.

The three examples outlined so far are for travel companies. Travel is a great fit for Facebook as it sits naturally in the Facebook feed, my Facebook feed is full of envy-inducing friends’ holiday pictures right now. Another interesting reason why travel is a great fit for Facebook ads is because typically there are multiple parties to a travel purchase. What happened here is the comments section actually became a very visible and measurable forum for discussion between friends and family before becoming a stampede inducing medium of enquiry.

So, stepping outside of the travel industry, how do other industries fare with hacked MPAs?

Hack #3a: Multi-Location Ads (combined with location targeting)

Location, location, location. For a property listings website, we applied location targeting and repeated our Multi-Location Ad format to advertise properties for sale to people in and around that location.

Hack #4: Multi-Big Content Ad

“The future of big content is multi platform”

– Cyrus Shepard

The same property website had produced a report and an accompanying infographic to provide their audience with unique and up-to-the-minute market information via their blog. We used the MPA format to promote the report, the infographic and the search rentals page of the website. This brought their big content piece to a larger audience via a new platform.

Rental Report Multi Product Ad

Hack #5: Multi-Episode Ad

This MPA hack was for an online TV player. As you can see we advertised the most recent episodes of a TV show set in a fictional Dublin police station, Red Rock.

Engagement was high, opinion was divided.

TV3s Red Rock viewer feedback

LOL.

Hack #6: Multi-People Ads

In the cosmetic surgery world, past patients’ stories are valuable marketing material. Particularly when the past patients are celebrities. We recycled some previously published stories from celebrity patients using multi-people ads and targeted them to a very specific audience.

Avoca Clinic Multi People Ads

Hack #7: Multi-UGC Ads

Have you witnessed the power of user generated content (UGC) in your marketing yet? We’ve found interaction rates with authentic UGC images can be up to 10 fold of those of the usual stylised images. In order to encourage further UGC, we posted a number of customer’s images in our Multi-UGC Ads.

The CTR on the above ads was 6% (2% is the average CTR for Facebook News feed ads according to our study). Strong CTRs earn you more traffic for your budget. Facebook’s relevancy score lowers your CPC as your CTR increases.

When it comes to the conversion, UGC is a power player, we’ve learned that “customers attracting new customers” is a powerful acquisition tool.

Hack #8: Target past customers for amplification

“Who will support and amplify this content and why?”

– Rand Fishkin

Your happy customers Rand, that’s the who and the why! Check out these Multi-Package Ads targeted to past customers via custom audiences. The Camino walkers have already told all their friends about their great trip, now allow them to share their great experiences on Facebook and connect the tour operator with their Facebook friends via a valuable word of mouth referral. Just look at the ratio of share:likes and shares:comments. Astonishingly sharable ads!

Camino Ways Mulit Product Ads

Targeting past converters in an intelligent manner is a super smart way to find an audience ready to share your content.

How will hacking Multi-Product Ads work for you?

People don’t share ads, but they do share great content. So why not hack MPAs to promote your content and reap the rewards of the world’s greatest content sharing machine: Facebook.

MPAs allow you to tell a richer story by allowing you to promote multiple pieces of content simultaneously. So consider which pieces of content you have that will work well as “content bundles” and who the relevant audience for each “content bundle” is.

As Hack #8 above illustrates, the big wins come when you match a smart use of the format with the clever and relevant targeting Facebook allows. We’re massive fans of custom audiences so if you aren’t sure where to start, I’d suggest starting there.

So ponder your upcoming content pieces, consider your older content you’d like to breathe some new life into and perhaps you could become a Facebook Ads Hacker.

I’d love to hear about your ideas for turning Multi-Product Ads into Multi-Content Ads in the comments section below.

We could even take the conversation offline at Mozcon!

Happy hacking.


*Yes I did say paid promotion, it’s no secret that Facebook’s organic reach continues to dwindle. The cold commercial reality is you need to pay to play on FB. The good news is that if you select ‘website clicks’ as your objective you only pay for website traffic and engagement while amplification by likes, comments, and shares are free! Those website clicks you pay for are typically substantially cheaper than Adwords, Taboola, Outbrain, Twitter or LinkedIn. How does it compare to display? It doesn’t. Paying for clicks is always preferable to paying for impressions. If you are spending money on display advertising I’d urge you to fling a few spondoolas towards Facebook ads and compare results. You will be pleasantly surprised.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Big Data, Big Problems: 4 Major Link Indexes Compared

Posted by russangular

Given this blog’s readership, chances are good you will spend some time this week looking at backlinks in one of the growing number of link data tools. We know backlinks continue to be one of, if not the most important
parts of Google’s ranking algorithm. We tend to take these link data sets at face value, though, in part because they are all we have. But when your rankings are on the line, is there a better way to get at which data set is the best? How should we go
about assessing these different link indexes like
Moz,
Majestic, Ahrefs and SEMrush for quality? Historically, there have been 4 common approaches to this question of index quality…

  • Breadth: We might choose to look at the number of linking root domains any given service reports. We know
    that referring domains correlates strongly with search rankings, so it makes sense to judge a link index by how many unique domains it has
    discovered and indexed.
  • Depth: We also might choose to look at how deep the web has been crawled, looking more at the total number of URLs
    in the index, rather than the diversity of referring domains.
  • Link Overlap: A more sophisticated approach might count the number of links an index has in common with Google Webmaster
    Tools.
  • Freshness: Finally, we might choose to look at the freshness of the index. What percentage of links in the index are
    still live?

There are a number of really good studies (some newer than others) using these techniques that are worth checking out when you get a chance:

  • BuiltVisible analysis of Moz, Majestic, GWT, Ahrefs and Search Metrics
  • SEOBook comparison of Moz, Majestic, Ahrefs, and Ayima
  • MatthewWoodward
    study of Ahrefs, Majestic, Moz, Raven and SEO Spyglass
  • Marketing Signals analysis of Moz, Majestic, Ahrefs, and GWT
  • RankAbove comparison of Moz, Majestic, Ahrefs and Link Research Tools
  • StoneTemple study of Moz and Majestic

While these are all excellent at addressing the methodologies above, there is a particular limitation with all of them. They miss one of the
most important metrics we need to determine the value of a link index: proportional representation to Google’s link graph
. So here at Angular Marketing, we decided to take a closer look.

Proportional representation to Google Search Console data

So, why is it important to determine proportional representation? Many of the most important and valued metrics we use are built on proportional
models. PageRank, MozRank, CitationFlow and Ahrefs Rank are proportional in nature. The score of any one URL in the data set is relative to the
other URLs in the data set. If the data set is biased, the results are biased.

A Visualization

Link graphs are biased by their crawl prioritization. Because there is no full representation of the Internet, every link graph, even Google’s,
is a biased sample of the web. Imagine for a second that the picture below is of the web. Each dot represents a page on the Internet,
and the dots surrounded by green represent a fictitious index by Google of certain sections of the web.

Of course, Google isn’t the only organization that crawls the web. Other organizations like Moz,
Majestic, Ahrefs, and SEMrush
have their own crawl prioritizations which result in different link indexes.

In the example above, you can see different link providers trying to index the web like Google. Link data provider 1 (purple) does a good job
of building a model that is similar to Google. It isn’t very big, but it is proportional. Link data provider 2 (blue) has a much larger index,
and likely has more links in common with Google that link data provider 1, but it is highly disproportional. So, how would we go about measuring
this proportionality? And which data set is the most proportional to Google?

Methodology

The first step is to determine a measurement of relativity for analysis. Google doesn’t give us very much information about their link graph.
All we have is what is in Google Search Console. The best source we can use is referring domain counts. In particular, we want to look at
what we call
referring domain link pairs. A referring domain link pair would be something like ask.com->mlb.com: 9,444 which means
that ask.com links to mlb.com 9,444 times.

Steps

  1. Determine the root linking domain pairs and values to 100+ sites in Google Search Console
  2. Determine the same for Ahrefs, Moz, Majestic Fresh, Majestic Historic, SEMrush
  3. Compare the referring domain link pairs of each data set to Google, assuming a
    Poisson Distribution
  4. Run simulations of each data set’s performance against each other (ie: Moz vs Maj, Ahrefs vs SEMrush, Moz vs SEMrush, et al.)
  5. Analyze the results

Results

When placed head-to-head, there seem to be some clear winners at first glance. In head-to-head, Moz edges out Ahrefs, but across the board, Moz and Ahrefs fare quite evenly. Moz, Ahrefs and SEMrush seem to be far better than Majestic Fresh and Majestic Historic. Is that really the case? And why?

It turns out there is an inversely proportional relationship between index size and proportional relevancy. This might seem counterintuitive,
shouldn’t the bigger indexes be closer to Google? Not Exactly.

What does this mean?

Each organization has to create a crawl prioritization strategy. When you discover millions of links, you have to prioritize which ones you
might crawl next. Google has a crawl prioritization, so does Moz, Majestic, Ahrefs and SEMrush. There are lots of different things you might
choose to prioritize…

  • You might prioritize link discovery. If you want to build a very large index, you could prioritize crawling pages on sites that
    have historically provided new links.
  • You might prioritize content uniqueness. If you want to build a search engine, you might prioritize finding pages that are unlike
    any you have seen before. You could choose to crawl domains that historically provide unique data and little duplicate content.
  • You might prioritize content freshness. If you want to keep your search engine recent, you might prioritize crawling pages that
    change frequently.
  • You might prioritize content value, crawling the most important URLs first based on the number of inbound links to that page.

Chances are, an organization’s crawl priority will blend some of these features, but it’s difficult to design one exactly like Google. Imagine
for a moment that instead of crawling the web, you want to climb a tree. You have to come up with a tree climbing strategy.

  • You decide to climb the longest branch you see at each intersection.
  • One friend of yours decides to climb the first new branch he reaches, regardless of how long it is.
  • Your other friend decides to climb the first new branch she reaches only if she sees another branch coming off of it.

Despite having different climb strategies, everyone chooses the same first branch, and everyone chooses the same second branch. There are only
so many different options early on.

But as the climbers go further and further along, their choices eventually produce differing results. This is exactly the same for web crawlers
like Google, Moz, Majestic, Ahrefs and SEMrush. The bigger the crawl, the more the crawl prioritization will cause disparities. This is not a
deficiency; this is just the nature of the beast. However, we aren’t completely lost. Once we know how index size is related to disparity, we
can make some inferences about how similar a crawl priority may be to Google.

Unfortunately, we have to be careful in our conclusions. We only have a few data points with which to work, so it is very difficult to be
certain regarding this part of the analysis. In particular, it seems strange that Majestic would get better relative to its index size as it grows,
unless Google holds on to old data (which might be an important discovery in and of itself). It is most likely that at this point we can’t make
this level of conclusion.

So what do we do?

Let’s say you have a list of domains or URLs for which you would like to know their relative values. Your process might look something like
this…

  • Check Open Site Explorer to see if all URLs are in their index. If so, you are looking metrics most likely to be proportional to Google’s link graph.
  • If any of the links do not occur in the index, move to Ahrefs and use their Ahrefs ranking if all you need is a single PageRank-like metric.
  • If any of the links are missing from Ahrefs’s index, or you need something related to trust, move on to Majestic Fresh.
  • Finally, use Majestic Historic for (by leaps and bounds) the largest coverage available.

It is important to point out that the likelihood that all the URLs you want to check are in a single index increases as the accuracy of the metric
decreases. Considering the size of Majestic’s data, you can’t ignore them because you are less likely to get null value answers from their data than
the others. If anything rings true, it is that once again it makes sense to get data
from as many sources as possible. You won’t
get the most proportional data without Moz, the broadest data without Majestic, or everything in-between without Ahrefs.

What about SEMrush? They are making progress, but they don’t publish any relative statistics that would be useful in this particular
case. Maybe we can hope to see more from them soon given their already promising index!

Recommendations for the link graphing industry

All we hear about these days is big data; we almost never hear about good data. I know that the teams at Moz,
Majestic, Ahrefs, SEMrush and others are interested in mimicking Google, but I would love to see some organization stand up against the
allure of
more data in favor of better data—data more like Google’s. It could begin with testing various crawl strategies to see if they produce
a result more similar to that of data shared in Google Search Console. Having the most Google-like data is certainly a crown worth winning.

Credits

Thanks to Diana Carter at Angular for assistance with data acquisition and Andrew Cron with statistical analysis. Thanks also to the representatives from Moz, Majestic, Ahrefs, and SEMrush for answering questions about their indices.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it