Stop Ghost Spam in Google Analytics with One Filter

Posted by CarloSeo

The spam in Google Analytics (GA) is becoming a serious issue. Due to a deluge of referral spam from social buttons, adult sites, and many, many other sources, people are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.

The good news is, there is no need to panic. In this post, I’m going to focus on the most common mistakes people make when fighting spam in GA, and explain an efficient way to prevent it.

But first, let’s make sure we understand how spam works. A couple of months ago, Jared Gardner wrote an excellent article explaining what referral spam is, including its intended purpose. He also pointed out some great examples of referral spam.

Types of spam

The spam in Google Analytics can be categorized by two types: ghosts and crawlers.

Ghosts

The vast majority of spam is this type. They are called ghosts because they never access your site. It is important to keep this in mind, as it’s key to creating a more efficient solution for managing spam.

As unusual as it sounds, this type of spam doesn’t have any interaction with your site at all. You may wonder how that is possible since one of the main purposes of GA is to track visits to our sites.

They do it by using the Measurement Protocol, which allows people to send data directly to Google Analytics’ servers. Using this method, and probably randomly generated tracking codes (UA-XXXXX-1) as well, the spammers leave a “visit” with fake data, without even knowing who they are hitting.

Crawlers

This type of spam, the opposite to ghost spam, does access your site. As the name implies, these spam bots crawl your pages, ignoring rules like those found in robots.txt that are supposed to stop them from reading your site. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.

Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against this list might help you answer the question of whether or not it is spammy.

Most common mistakes made when dealing with spam in GA

I’ve been following this issue closely for the last few months. According to the comments people have made on my articles and conversations I’ve found in discussion forums, there are primarily three mistakes people make when dealing with spam in Google Analytics.

Mistake #1. Blocking ghost spam from the .htaccess file

One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.

For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won’t have any effect and will only add useless lines to your .htaccess file.

Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people think that they successfully blocked it from here when really it’s just a coincidence of timing.

Then when the spammers later return, they get worried because the solution is not working anymore, and they think the spammer somehow bypassed the barriers they set up.

The truth is, the .htaccess file can only effectively block crawlers such as buttons-for-website.com and a few others since these access your site. Most of the spam can’t be blocked using this method, so there is no other option than using filters to exclude them.

Mistake #2. Using the referral exclusion list to stop spam

Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in the way we want to for the spam. It has other purposes.

For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they’re redirected back to you website, and GA records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.

If you try to use the referral exclusion list to manage spam, however, the referral part will be stripped since there is no preexisting record. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with since. You will still have spam, and direct visits are harder to track.

Mistake #3. Worrying that bounce rate changes will affect rankings

When people see that the bounce rate changes drastically because of the spam, they start worrying about the impact that it will have on their rankings in the SERPs.

bounce.png

This is another mistake commonly made. With or without spam, Google doesn’t take into consideration Google Analytics metrics as a ranking factor. Here is an explanation about this from Matt Cutts, the former head of Google’s web spam team.

And if you think about it, Cutts’ explanation makes sense; because although many people have GA, not everyone uses it.

Assuming your site has been hacked

Another common concern when people see strange landing pages coming from spam on their reports is that they have been hacked.

landing page

The page that the spam shows on the reports doesn’t exist, and if you try to open it, you will get a 404 page. Your site hasn’t been compromised.

But you have to make sure the page doesn’t exist. Because there are cases (not spam) where some sites have a security breach and get injected with pages full of bad keywords to defame the website.

What should you worry about?

Now that we’ve discarded security issues and their effects on rankings, the only thing left to worry about is your data. The fake trail that the spam leaves behind pollutes your reports.

It might have greater or lesser impact depending on your site traffic, but everyone is susceptible to the spam.

Small and midsize sites are the most easily impacted – not only because a big part of their traffic can be spam, but also because usually these sites are self-managed and sometimes don’t have the support of an analyst or a webmaster.

Big sites with a lot of traffic can also be impacted by spam, and although the impact can be insignificant, invalid traffic means inaccurate reports no matter the size of the website. As an analyst, you should be able to explain what’s going on in even in the most granular reports.

You only need one filter to deal with ghost spam

Usually it is recommended to add the referral to an exclusion filter after it is spotted. Although this is useful for a quick action against the spam, it has three big disadvantages.

  • Making filters every week for every new spam detected is tedious and time-consuming, especially if you manage many sites. Plus, by the time you apply the filter, and it starts working, you already have some affected data.
  • Some of the spammers use direct visits along with the referrals.
  • These direct hits won’t be stopped by the filter so even if you are excluding the referral you will sill be receiving invalid traffic, which explains why some people have seen an unusual spike in direct traffic.

Luckily, there is a good way to prevent all these problems. Most of the spam (ghost) works by hitting GA’s random tracking-IDs, meaning the offender doesn’t really know who is the target, and for that reason either the hostname is not set or it uses a fake one. (See report below)

Ghost-Spam.png

You can see that they use some weird names or don’t even bother to set one. Although there are some known names in the list, these can be easily added by the spammer.

On the other hand, valid traffic will always use a real hostname. In most of the cases, this will be the domain. But it also can also result from paid services, translation services, or any other place where you’ve inserted GA tracking code.

Valid-Referral.png

Based on this, we can make a filter that will include only hits that use real hostnames. This will automatically exclude all hits from ghost spam, whether it shows up as a referral, keyword, or pageview; or even as a direct visit.

To create this filter, you will need to find the report of hostnames. Here’s how:

  1. Go to the Reporting tab in GA
  2. Click on Audience in the lefthand panel
  3. Expand Technology and select Network
  4. At the top of the report, click on Hostname

Valid-list

You will see a list of all hostnames, including the ones that the spam uses. Make a list of all the valid hostnames you find, as follows:

  • yourmaindomain.com
  • blog.yourmaindomain.com
  • es.yourmaindomain.com
  • payingservice.com
  • translatetool.com
  • anotheruseddomain.com

For small to medium sites, this list of hostnames will likely consist of the main domain and a couple of subdomains. After you are sure you got all of them, create a regular expression similar to this one:

yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com

You don’t need to put all of your subdomains in the regular expression. The main domain will match all of them. If you don’t have a view set up without filters, create one now.

Then create a Custom Filter.

Make sure you select INCLUDE, then select “Hostname” on the filter field, and copy your expression into the Filter Pattern box.

filter

You might want to verify the filter before saving to check that everything is okay. Once you’re ready, set it to save, and apply the filter to all the views you want (except the view without filters).

This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn’t require much maintenance. But it’s important that every time you add your tracking code to any service, you add it to the end of the filter.

Now you should only need to take care of the crawler spam. Since crawlers access your site, you can block them by adding these lines to the .htaccess file:

## STOP REFERRER SPAM 
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR] 
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] 
RewriteRule .* - [F]

It is important to note that this file is very sensitive, and misplacing a single character it it can bring down your entire site. Therefore, make sure you create a backup copy of your .htaccess file prior to editing it.

If you don’t feel comfortable messing around with your .htaccess file, you can alternatively make an expression with all the crawlers, then and add it to an exclude filter by Campaign Source.

Implement these combined solutions, and you will worry much less about spam contaminating your analytics data. This will have the added benefit of freeing up more time for you to spend actually analyze your valid data.

After stopping spam, you can also get clean reports from the historical data by using the same expressions in an Advance Segment to exclude all the spam.

Bonus resources to help you manage spam

If you still need more information to help you understand and deal with the spam on your GA reports, you can read my main article on the subject here: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/.

Additional information on how to stop spam can be found at these URLs:

In closing, I am eager to hear your ideas on this serious issue. Please share them in the comments below.

(Editor’s Note: All images featured in this post were created by the author.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Moz Local Officially Launches in the UK

Posted by David-Mihm

To all Moz Local fans in the UK, I’m excited to announce that your wait is over. As the sun rises “across the pond” this morning, Moz Local is officially live in the United Kingdom!

A bit of background

As many of you know, we released the US version of Moz Local in March 2014. After 12 months of terrific growth in the US, and a boatload of technical improvements and feature releases–especially for Enterprise customers–we released the Check Listing feature for a limited set of partner search engines and directories in the UK in April of this year.

Over 20,000 of you have checked your listings (or your clients’ listings) in the last 3-1/2 months. Those lookups have helped us refine and improve the background technology immensely (more on that below). We’ve been just as eager to release the fully-featured product as you’ve been to use it, and the technical pieces have finally fallen into place for us to do so.

How does it work?

The concept is the same as the US version of Moz Local: show you how accurately and completely your business is listed on the most important local search platforms and directories, and optimize and perfect as many of those business listings as we can on your behalf.

For customers specifically looking for you, accurate business listings are obviously important. For customers who might not know about you yet, they’re also among the most important factors for ranking in local searches on Google. Basically, the more times Google sees your name, address, phone, and website listed the same way on quality local websites, the more trust they have in your business, and the higher you’re likely to rank.

Moz Local is designed to help on both these fronts.

To use the product, you simply need to type a name and postcode at moz.com/local. We’ll then show you a list of the closest matching listings we found. We prioritize verified listing information that we find on Google or Facebook, and selecting one of those verified listings means we’ll be able to distribute it on your behalf.

Clicking on a result brings you to a full details report for that listing. We’ll show you how accurate and complete your listings are now, and where they could be after using our product.

Clicking the tabs beneath the Listing Score graphic will show you some of the incompletions and inconsistencies that publishing your listing with Moz Local will address.

For customers with hundreds or thousands of locations, bulk upload is also available using a modified version of your data from Google My Business–feel free to e-mail enterpriselocal@moz.com for more details.

Where do we distribute your data?

We’ve prioritized the most important commercial sites in the UK local search ecosystem, and made them the centerpieces of Moz Local. We’ll update your data directly on globally-important players Factual and Foursquare, and the UK-specific players CentralIndex, Thomson Local, and the Scoot network–which includes key directories like TouchLocal, The Independent, The Sun, The Mirror, The Daily Scotsman, and Wales Online.

We’ll be adding two more major destinations shortly, and for those of you who sign up before that time, your listings will be automatically distributed to the additional destinations when the integrations are complete.

How much does it cost?

The cost per listing is £84/year, which includes distribution to the sites mentioned above with unlimited updates throughout the year, monitoring of your progress over time, geographically- focused reporting, and the ability to find and close duplicate listings right from your Moz Local dashboard–all the great upgrades that my colleague Noam Chitayat blogged about here.

What’s next?

Well, as I mentioned just a couple paragraphs ago, we’ve got two additional destinations to which we’ll be sending your data in very short order. Once those integrations are complete, we’ll be just a few weeks away from releasing our biggest set of features since we launched. I look forward to sharing more about these features at BrightonSEO at the end of the summer!

For those of you around the world in Canada, Australia, and other countries, we know there’s plenty of demand for Moz Local overseas, and we’re working as quickly as we can to build additional relationships abroad. And to our friends in the UK, please let us know how we can continue to make the product even better!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

The Meta Referrer Tag: An Advancement for SEO and the Internet

Posted by Cyrus-Shepard

The movement to make the Internet more secure through HTTPS brings several useful advancements for webmasters. In addition to security improvements, HTTPS promises future technological advances and potential SEO benefits for marketers.

HTTPS in search results is rising. Recent MozCast data from Dr. Pete shows nearly 20% of first page Google results are now HTTPS.

Sadly, HTTPS also has its downsides.

Marketers run into their first challenge when they switch regular HTTP sites over to HTTPS. Technically challenging, the switch typically involves routing your site through a series of 301 redirects. Historically, these types of redirects are associated with a loss of link equity (thought to be around 15%) which can lead to a loss in rankings. This can offset any SEO advantage that Google claims switching.

Ross Hudgens perfectly summed it up in this tweet:

Many SEOs have anecdotally shared stories of HTTPS sites performing well in Google search results (and our soon-to-be-published Ranking Factors data seems to support this.) However, the short term effect of a large migration can be hard to take. When Moz recently switched to HTTPS to provide better security to our logged-in users, we saw an 8-9% dip in our organic search traffic.

Problem number two is the subject of this post. It involves the loss of referral data. Typically, when one site sends traffic to another, information is sent that identifies the originating site as the source of traffic. This invaluable data allows people to see where their traffic is coming from, and helps spread the flow of information across the web.

SEOs have long used referrer data for a number of beneficial purposes. Oftentimes, people will link back or check out the site sending traffic when they see the referrer in their analytics data. Spammers know this works, as evidenced by the recent increase in referrer spam:

This process stops when traffic flows from an HTTPS site to a non-secure HTTP site. In this case, no referrer data is sent. Webmasters can’t know where their traffic is coming from.

Here’s how referral data to my personal site looked when Moz switched to HTTPS. I lost all visibility into where my traffic came from.

Its (not provided) all over again!

Enter the meta referrer tag

While we can’t solve the ranking challenges imposed by switching a site to HTTPS, we can solve the loss of referral data, and it’s actually super-simple.

Almost completely unknown to most marketers, the relatively new meta referrer tag (it’s actually been around for a few years) was designed to help out in these situations.

Better yet, the tag allows you to control how your referrer information is passed.

The meta referrer tag works with most browsers to pass referrer information in a manner defined by the user. Traffic remains encrypted and all the benefits of using HTTPS remain in place, but now you can pass referrer data to all websites, even those that use HTTP.

How to use the meta referrer tag

What follows are extremely simplified instructions for using the meta referrer tag. For more in-depth understanding, we highly recommend referring to the W3C working draft of the spec.

The meta referrer tag is placed in the <head> section of your HTML, and references one of five states, which control how browsers send referrer information from your site. The five states are:

  1. None: Never pass referral data
    <meta name="referrer" content="none">
    
  2. None When Downgrade: Sends referrer information to secure HTTPS sites, but not insecure HTTP sites
    <meta name="referrer" content="none-when-downgrade">
    
  3. Origin Only: Sends the scheme, host, and port (basically, the subdomain) stripped of the full URL as a referrer, i.e. https://moz.com/example.html would simply send https://moz.com
    <meta name="referrer" content="origin">
    

  4. Origin When Cross-Origin: Sends the full URL as the referrer when the target has the same scheme, host, and port (i.e. subdomain) regardless if it’s HTTP or HTTPS, while sending origin-only referral information to external sites. (note: There is a typo in the official spec. Future versions should be “origin-when-cross-origin”)
    <meta name="referrer" content="origin-when-crossorigin">
    
  5. Unsafe URL: Always passes the URL string as a referrer. Note if you have any sensitive information contained in your URL, this isn’t the safest option. By default, URL fragments, username, and password are automatically stripped out.
    <meta name="referrer" content="unsafe-url">
    

The meta referrer tag in action

By clicking the link below, you can get a sense of how the meta referrer tag works.

Check Referrer

Boom!

We’ve set the meta referrer tag for Moz to “origin”, which means when we link out to another site, we pass our scheme, host, and port. The end result is you see http://moz.com as the referrer, stripped of the full URL path (/meta-referrer-tag).

My personal site typically receives several visits per day from Moz. Here’s what my analytics data looked like before and after we implemented the meta referrer tag.

For simplicity and security, most sites may want to implement the “origin” state, but there are drawbacks.

One negative side effect was that as soon as we implemented the meta referrer tag, our AdRoll analytics, which we use for retargeting, stopped working. It turns out that AdRoll uses our referrer information for analytics, but the meta referrer tag “origin” state meant that the only URL they ever saw reported was https://moz.com.

Conclusion

We love the meta referrer tag because it keeps information flowing on the Internet. It’s the way the web is supposed to work!

It helps marketers and webmasters see exactly where their traffic is coming from. It encourages engagement, communication, and even linking, which can lead to improvements in SEO.

Useful links:

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How to Use Server Log Analysis for Technical SEO

Posted by SamuelScott

It’s ten o’clock. Do you know where your logs are?

I’m introducing this guide with a pun on a common public-service announcement that has run on late-night TV news broadcasts in the United States because log analysis is something that is extremely newsworthy and important.

If your technical and on-page SEO is poor, then nothing else that you do will matter. Technical SEO is the key to helping search engines to crawl, parse, and index websites, and thereby rank them appropriately long before any marketing work begins.

The important thing to remember: Your log files contain the only data that is 100% accurate in terms of how search engines are crawling your website. By helping Google to do its job, you will set the stage for your future SEO work and make your job easier. Log analysis is one facet of technical SEO, and correcting the problems found in your logs will help to lead to higher rankings, more traffic, and more conversions and sales.

Here are just a few reasons why:

  • Too many response code errors may cause Google to reduce its crawling of your website and perhaps even your rankings.
  • You want to make sure that search engines are crawling everything, new and old, that you want to appear and rank in the SERPs (and nothing else).
  • It’s crucial to ensure that all URL redirections will pass along any incoming “link juice.”

However, log analysis is something that is unfortunately discussed all too rarely in SEO circles. So, here, I wanted to give the Moz community an introductory guide to log analytics that I hope will help. If you have any questions, feel free to ask in the comments!

What is a log file?

Computer servers, operating systems, network devices, and computer applications automatically generate something called a log entry whenever they perform an action. In a SEO and digital marketing context, one type of action is whenever a page is requested by a visiting bot or human.

Server log entries are specifically programmed to be output in the Common Log Format of the W3C consortium. Here is one example from Wikipedia with my accompanying explanations:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • 127.0.0.1 — The remote hostname. An IP address is shown, like in this example, whenever the DNS hostname is not available or DNSLookup is turned off.
  • user-identifier — The remote logname / RFC 1413 identity of the user. (It’s not that important.)
  • frank — The user ID of the person requesting the page. Based on what I see in my Moz profile, Moz’s log entries would probably show either “SamuelScott” or “392388” whenever I visit a page after having logged in.
  • [10/Oct/2000:13:55:36 -0700] — The date, time, and timezone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP/1.0 — “GET” is one of the two commands (the other is “POST”) that can be performed. “GET” fetches a URL while “POST” is submitting something (such as a forum comment). The second part is the URL that is being accessed, and the last part is the version of HTTP that is being accessed.
  • 200 — The status code of the document that was returned.
  • 2326 — The size, in bytes, of the document that was returned.

Note: A hyphen is shown in a field when that information is unavailable.

Every single time that you — or the Googlebot — visit a page on a website, a line with this information is output, recorded, and stored by the server.

Log entries are generated continuously and anywhere from several to thousands can be created every second — depending on the level of a given server, network, or application’s activity. A collection of log entries is called a log file (or often in slang, “the log” or “the logs”), and it is displayed with the most-recent log entry at the bottom. Individual log files often contain a calendar day’s worth of log entries.

Accessing your log files

Different types of servers store and manage their log files differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers:

What is log analysis?

Log analysis (or log analytics) is the process of going through log files to learn something from the data. Some common reasons include:

  • Development and quality assurance (QA) — Creating a program or application and checking for problematic bugs to make sure that it functions properly
  • Network troubleshooting — Responding to and fixing system errors in a network
  • Customer service — Determining what happened when a customer had a problem with a technical product
  • Security issues — Investigating incidents of hacking and other intrusions
  • Compliance matters — Gathering information in response to corporate or government policies
  • Technical SEO — This is my favorite! More on that in a bit.

Log analysis is rarely performed regularly. Usually, people go into log files only in response to something — a bug, a hack, a subpoena, an error, or a malfunction. It’s not something that anyone wants to do on an ongoing basis.

Why? This is a screenshot of ours of just a very small part of an original (unstructured) log file:

Ouch. If a website gets 10,000 visitors who each go to ten pages per day, then the server will create a log file every day that will consist of 100,000 log entries. No one has the time to go through all of that manually.

How to do log analysis

There are three general ways to make log analysis easier in SEO or any other context:

  • Do-it-yourself in Excel
  • Proprietary software such as Splunk or Sumo-logic
  • The ELK Stack open-source software

Tim Resnik’s Moz essay from a few years ago walks you through the process of exporting a batch of log files into Excel. This is a (relatively) quick and easy way to do simple log analysis, but the downside is that one will see only a snapshot in time and not any overall trends. To obtain the best data, it’s crucial to use either proprietary tools or the ELK Stack.

Splunk and Sumo-Logic are proprietary log analysis tools that are primarily used by enterprise companies. The ELK Stack is a free and open-source batch of three platforms (Elasticsearch, Logstash, and Kibana) that is owned by Elastic and used more often by smaller businesses. (Disclosure: We at Logz.io use the ELK Stack to monitor our own internal systems as well as for the basis of our own log management software.)

For those who are interested in using this process to do technical SEO analysis, monitor system or application performance, or for any other reason, our CEO, Tomer Levy, has written a guide to deploying the ELK Stack.

Technical SEO insights in log data

However you choose to access and understand your log data, there are many important technical SEO issues to address as needed. I’ve included screenshots of our technical SEO dashboard with our own website’s data to demonstrate what to examine in your logs.

Bot crawl volume

It’s important to know the number of requests made by Baidu, BingBot, GoogleBot, Yahoo, Yandex, and others over a given period time. If, for example, you want to get found in search in Russia but Yandex is not crawling your website, that is a problem. (You’d want to consult Yandex Webmaster and see this article on Search Engine Land.)

Response code errors

Moz has a great primer on the meanings of the different status codes. I have an alert system setup that tells me about 4XX and 5XX errors immediately because those are very significant.

Temporary redirects

Temporary 302 redirects do not pass along the “link juice” of external links from the old URL to the new one. Almost all of the time, they should be changed to permanent 301 redirects.

Crawl budget waste

Google assigns a crawl budget to each website based on numerous factors. If your crawl budget is, say, 100 pages per day (or the equivalent amount of data), then you want to be sure that all 100 are things that you want to appear in the SERPs. No matter what you write in your robots.txt file and meta-robots tags, you might still be wasting your crawl budget on advertising landing pages, internal scripts, and more. The logs will tell you — I’ve outlined two script-based examples in red above.

If you hit your crawl limit but still have new content that should be indexed to appear in search results, Google may abandon your site before finding it.

Duplicate URL crawling

The addition of URL parameters — typically used in tracking for marketing purposes — often results in search engines wasting crawl budgets by crawling different URLs with the same content. To learn how to address this issue, I recommend reading the resources on Google and Search Engine Land here, here, here, and here.

Crawl priority

Google might be ignoring (and not crawling or indexing) a crucial page or section of your website. The logs will reveal what URLs and/or directories are getting the most and least attention. If, for example, you have published an e-book that attempts to rank for targeted search queries but it sits in a directory that Google only visits once every six months, then you won’t get any organic search traffic from the e-book for up to six months.

If a part of your website is not being crawled very often — and it is updated often enough that it should be — then you might need to check your internal-linking structure and the crawl-priority settings in your XML sitemap.

Last crawl date

Have you uploaded something that you hope will be indexed quickly? The log files will tell you when Google has crawled it.

Crawl budget

One thing I personally like to check and see is Googlebot’s real-time activity on our site because the crawl budget that the search engine assigns to a website is a rough indicator — a very rough one — of how much it “likes” your site. Google ideally does not want to waste valuable crawling time on a bad website. Here, I had seen that Googlebot had made 154 requests of our new startup’s website over the prior twenty-four hours. Hopefully, that number will go up!

As I hope you can see, log analysis is critically important in technical SEO. It’s eleven o’clock — do you know where your logs are now?

Additional resources

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

I Can’t Drive 155: Meta Descriptions in 2015

Posted by Dr-Pete

For years now, we (and many others) have been recommending keeping your Meta Descriptions shorter than
about 155-160 characters. For months, people have been sending me examples of search snippets that clearly broke that rule, like this one (on a search for “hummingbird food”):

For the record, this one clocks in at 317 characters (counting spaces). So, I set out to discover if these long descriptions were exceptions to the rule, or if we need to change the rules. I collected the search snippets across the MozCast 10K, which resulted in 92,669 snippets. All of the data in this post was collected on April 13, 2015.

The Basic Data

The minimum snippet length was zero characters. There were 69 zero-length snippets, but most of these were the new generation of answer box, that appears organic but doesn’t have a snippet. To put it another way, these were misidentified as organic by my code. The other 0-length snippets were local one-boxes that appeared as organic but had no snippet, such as this one for “chichen itza”:

These zero-length snippets were removed from further analysis, but considering that they only accounted for 0.07% of the total data, they didn’t really impact the conclusions either way. The shortest legitimate, non-zero snippet was 7 characters long, on a search for “geek and sundry”, and appears to have come directly from the site’s meta description:

The maximum snippet length that day (this is a highly dynamic situation) was 372 characters. The winner appeared on a search for “benefits of apple cider vinegar”:

The average length of all of the snippets in our data set (not counting zero-length snippets) was 143.5 characters, and the median length was 152 characters. Of course, this can be misleading, since some snippets are shorter than the limit and others are being artificially truncated by Google. So, let’s dig a bit deeper.

The Bigger Picture

To get a better idea of the big picture, let’s take a look at the display length of all 92,600 snippets (with non-zero length), split into 20-character buckets (0-20, 21-40, etc.):

Most of the snippets (62.1%) cut off as expected, right in the 141-160 character bucket. Of course, some snippets were shorter than that, and didn’t need to be cut off, and some broke the rules. About 1% (1,010) of the snippets in our data set measured 200 or more characters. That’s not a huge number, but it’s enough to take seriously.

That 141-160 character bucket is dwarfing everything else, so let’s zoom in a bit on the cut-off range, and just look at snippets in the 120-200 character range (in this case, by 5-character bins):

Zooming in, the bulk of the snippets are displaying at lengths between about 146-165 characters. There are plenty of exceptions to the 155-160 character guideline, but for the most part, they do seem to be exceptions.

Finally, let’s zoom in on the rule-breakers. This is the distribution of snippets displaying 191+ characters, bucketed in 10-character bins (191-200, 201-210, etc.):

Please note that the Y-axis scale is much smaller than in the previous 2 graphs, but there is a pretty solid spread, with a decent chunk of snippets displaying more than 300 characters.

Without looking at every original meta description tag, it’s very difficult to tell exactly how many snippets have been truncated by Google, but we do have a proxy. Snippets that have been truncated end in an ellipsis (…), which rarely appears at the end of a natural description. In this data set, more than half of all snippets (52.8%) ended in an ellipsis, so we’re still seeing a lot of meta descriptions being cut off.

I should add that, unlike titles/headlines, it isn’t clear whether Google is cutting off snippets by pixel width or character count, since that cut-off is done on the server-side. In most cases, Google will cut before the end of the second line, but sometimes they cut well before this, which could suggest a character-based limit. They also cut off at whole words, which can make the numbers a bit tougher to interpret.

The Cutting Room Floor

There’s another difficulty with telling exactly how many meta descriptions Google has modified – some edits are minor, and some are major. One minor edit is when Google adds some additional information to a snippet, such as a date at the beginning. Here’s an example (from a search for “chicken pox”):

With the date (and minus the ellipsis), this snippet is 164 characters long, which suggests Google isn’t counting the added text against the length limit. What’s interesting is that the rest comes directly from the meta description on the site, except that the site’s description starts with “Chickenpox.” and Google has removed that keyword. As a human, I’d say this matches the meta description, but a bot has a very hard time telling a minor edit from a complete rewrite.

Another minor rewrite occurs in snippets that start with search result counts:

Here, we’re at 172 characters (with spaces and minus the ellipsis), and Google has even let this snippet roll over to a third line. So, again, it seems like the added information at the beginning isn’t counting against the length limit.

All told, 11.6% of the snippets in our data set had some kind of Google-generated data, so this type of minor rewrite is pretty common. Even if Google honors most of your meta description, you may see small edits.

Let’s look at our big winner, the 372-character description. Here’s what we saw in the snippet:

Jan 26, 2015 – Health• Diabetes Prevention: Multiple studies have shown a correlation between apple cider vinegar and lower blood sugar levels. … • Weight Loss: Consuming apple cider vinegar can help you feel more full, which can help you eat less. … • Lower Cholesterol: … • Detox: … • Digestive Aid: … • Itchy or Sunburned Skin: … • Energy Boost:1 more items

So, what about the meta description? Here’s what we actually see in the tag:

Were you aware of all the uses of apple cider vinegar? From cleansing to healing, to preventing diabetes, ACV is a pantry staple you need in your home.

That’s a bit more than just a couple of edits. So, what’s happening here? Well, there’s a clue on that same page, where we see yet another rule-breaking snippet:

You might be wondering why this snippet is any more interesting than the other one. If you could see the top of the SERP, you’d know why, because it looks something like this:

Google is automatically extracting list-style data from these pages to fuel the expansion of the Knowledge Graph. In one case, that data is replacing a snippet
and going directly into an answer box, but they’re performing the same translation even for some other snippets on the page.

So, does every 2nd-generation answer box yield long snippets? After 3 hours of inadvisable mySQL queries, I can tell you that the answer is a resounding “probably not”. You can have 2nd-gen answer boxes without long snippets and you can have long snippets without 2nd-gen answer boxes,
but there does appear to be a connection between long snippets and Knowledge Graph in some cases.

One interesting connection is that Google has begun bolding keywords that seem like answers to the query (and not just synonyms for the query). Below is an example from a search for “mono symptoms”. There’s an answer box for this query, but the snippet below is not from the site in the answer box:

Notice the bolded words – “fatigue”, “sore throat”, “fever”, “headache”, “rash”. These aren’t synonyms for the search phrase; these are actual symptoms of mono. This data isn’t coming from the meta description, but from a bulleted list on the target page. Again, it appears that Google is trying to use the snippet to answer a question, and has gone well beyond just matching keywords.

Just for fun, let’s look at one more, where there’s no clear connection to the Knowledge Graph. Here’s a snippet from a search for “sons of anarchy season 4”:

This page has no answer box, and the information extracted is odd at best. The snippet bears little or no resemblance to the site’s meta description. The number string at the beginning comes out of a rating widget, and some of the text isn’t even clearly available on the page. This seems to be an example of Google acknowledging IMDb as a high-authority site and desperately trying to match any text they can to the query, resulting in a Frankenstein’s snippet.

The Final Verdict

If all of this seems confusing, that’s probably because it is. Google is taking a lot more liberties with snippets these days, both to better match queries, to add details they feel are important, or to help build and support the Knowledge Graph.

So, let’s get back to the original question – is it time to revise the 155(ish) character guideline? My gut feeling is: not yet. To begin with, the vast majority of snippets are still falling in that 145-165 character range. In addition, the exceptions to the rule are not only atypical situations, but in most cases those long snippets don’t seem to represent the original meta description. In other words, even if Google does grant you extra characters, they probably won’t be the extra characters you asked for in the first place.

Many people have asked: “How do I make sure that Google shows my meta description as is?” I’m afraid the answer is: “You don’t.” If this is very important to you, I would recommend keeping your description below the 155-character limit, and making sure that it’s a good match to your target keyword concepts. I suspect Google is going to take more liberties with snippets over time, and we’re going to have to let go of our obsession with having total control over the SERPs.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

​The 3 Most Common SEO Problems on Listings Sites

Posted by Dom-Woodman

Listings sites have a very specific set of search problems that you don’t run into everywhere else. In the day I’m one of Distilled’s analysts, but by night I run a job listings site, teflSearch. So, for my first Moz Blog post I thought I’d cover the three search problems with listings sites that I spent far too long agonising about.

Quick clarification time: What is a listings site (i.e. will this post be useful for you)?

The classic listings site is Craigslist, but plenty of other sites act like listing sites:

  • Job sites like Monster
  • E-commerce sites like Amazon
  • Matching sites like Spareroom

1. Generating quality landing pages

The landing pages on listings sites are incredibly important. These pages are usually the primary drivers of converting traffic, and they’re usually generated automatically (or are occasionally custom category pages) .

For example, if I search “Jobs in Manchester“, you can see nearly every result is an automatically generated landing page or category page.

There are three common ways to generate these pages (occasionally a combination of more than one is used):

  • Faceted pages: These are generated by facets—groups of preset filters that let you filter the current search results. They usually sit on the left-hand side of the page.
  • Category pages: These pages are listings which have already had a filter applied and can’t be changed. They’re usually custom pages.
  • Free-text search pages: These pages are generated by a free-text search box.

Those definitions are still bit general; let’s clear them up with some examples:

Amazon uses a combination of categories and facets. If you click on browse by department you can see all the category pages. Then on each category page you can see a faceted search. Amazon is so large that it needs both.

Indeed generates its landing pages through free text search, for example if we search for “IT jobs in manchester” it will generate: IT jobs in manchester.

teflSearch generates landing pages using just facets. The jobs in China landing page is simply a facet of the main search page.

Each method has its own search problems when used for generating landing pages, so lets tackle them one by one.


Aside

Facets and free text search will typically generate pages with parameters e.g. a search for “dogs” would produce:

www.mysite.com?search=dogs

But to make the URL user friendly sites will often alter the URLs to display them as folders

www.mysite.com/results/dogs/

These are still just ordinary free text search and facets, the URLs are just user friendly. (They’re a lot easier to work with in robots.txt too!)


Free search (& category) problems

If you’ve decided the base of your search will be a free text search, then we’ll have two major goals:

  • Goal 1: Helping search engines find your landing pages
  • Goal 2: Giving them link equity.

Solution

Search engines won’t use search boxes and so the solution to both problems is to provide links to the valuable landing pages so search engines can find them.

There are plenty of ways to do this, but two of the most common are:

  • Category links alongside a search

    Photobucket uses a free text search to generate pages, but if we look at example search for photos of dogs, we can see the categories which define the landing pages along the right-hand side. (This is also an example of URL friendly searches!)

  • Putting the main landing pages in a top-level menu

    Indeed also uses free text to generate landing pages, and they have a browse jobs section which contains the URL structure to allow search engines to find all the valuable landing pages.

Breadcrumbs are also often used in addition to the two above and in both the examples above, you’ll find breadcrumbs that reinforce that hierarchy.

Category (& facet) problems

Categories, because they tend to be custom pages, don’t actually have many search disadvantages. Instead it’s the other attributes that make them more or less desirable. You can create them for the purposes you want and so you typically won’t have too many problems.

However, if you also use a faceted search in each category (like Amazon) to generate additional landing pages, then you’ll run into all the problems described in the next section.

At first facets seem great, an easy way to generate multiple strong relevant landing pages without doing much at all. The problems appear because people don’t put limits on facets.

Lets take the job page on teflSearch. We can see it has 18 facets each with many options. Some of these options will generate useful landing pages:

The China facet in countries will generate “Jobs in China” that’s a useful landing page.

On the other hand, the “Conditional Bonus” facet will generate “Jobs with a conditional bonus,” and that’s not so great.

We can also see that the options within a single facet aren’t always useful. As of writing, I have a single job available in Serbia. That’s not a useful search result, and the poor user engagement combined with the tiny amount of content will be a strong signal to Google that it’s thin content. Depending on the scale of your site it’s very easy to generate a mass of poor-quality landing pages.

Facets generate other problems too. The primary one being they can create a huge amount of duplicate content and pages for search engines to get lost in. This is caused by two things: The first is the sheer number of possibilities they generate, and the second is because selecting facets in different orders creates identical pages with different URLs.

We end up with four goals for our facet-generated landing pages:

  • Goal 1: Make sure our searchable landing pages are actually worth landing on, and that we’re not handing a mass of low-value pages to the search engines.
  • Goal 2: Make sure we don’t generate multiple copies of our automatically generated landing pages.
  • Goal 3: Make sure search engines don’t get caught in the metaphorical plastic six-pack rings of our facets.
  • Goal 4: Make sure our landing pages have strong internal linking.

The first goal needs to be set internally; you’re always going to be the best judge of the number of results that need to present on a page in order for it to be useful to a user. I’d argue you can rarely ever go below three, but it depends both on your business and on how much content fluctuates on your site, as the useful landing pages might also change over time.

We can solve the next three problems as group. There are several possible solutions depending on what skills and resources you have access to; here are two possible solutions:

Category/facet solution 1: Blocking the majority of facets and providing external links
  • Easiest method
  • Good if your valuable category pages rarely change and you don’t have too many of them.
  • Can be problematic if your valuable facet pages change a lot

Nofollow all your facet links, and noindex and block category pages which aren’t valuable or are deeper than x facet/folder levels into your search using robots.txt.

You set x by looking at where your useful facet pages exist that have search volume. So, for example, if you have three facets for televisions: manufacturer, size, and resolution, and even combinations of all three have multiple results and search volume, then you could set you index everything up to three levels.

On the other hand, if people are searching for three levels (e.g. “Samsung 42″ Full HD TV”) but you only have one or two results for three-level facets, then you’d be better off indexing two levels and letting the product pages themselves pick up long-tail traffic for the third level.

If you have valuable facet pages that exist deeper than 1 facet or folder into your search, then this creates some duplicate content problems dealt with in the aside “Indexing more than 1 level of facets” below.)

The immediate problem with this set-up, however, is that in one stroke we’ve removed most of the internal links to our category pages, and by no-following all the facet links, search engines won’t be able to find your valuable category pages.

In order re-create the linking, you can add a top level drop down menu to your site containing the most valuable category pages, add category links elsewhere on the page, or create a separate part of the site with links to the valuable category pages.

The top level drop down menu you can see on teflSearch (it’s the search jobs menu), the other two examples are demonstrated in Photobucket and Indeed respectively in the previous section.

The big advantage for this method is how quick it is to implement, it doesn’t require any fiddly internal logic and adding an extra menu option is usually minimal effort.

Category/facet solution 2: Creating internal logic to work with the facets

  • Requires new internal logic
  • Works for large numbers of category pages with value that can change rapidly

There are four parts to the second solution:

  1. Select valuable facet categories and allow those links to be followed. No-follow the rest.
  2. No-index all pages that return a number of items below the threshold for a useful landing page
  3. No-follow all facets on pages with a search depth greater than x.
  4. Block all facet pages deeper than x level in robots.txt

As with the last solution, x is set by looking at where your useful facet pages exist that have search volume (full explanation in the first solution), and if you’re indexing more than one level you’ll need to check out the aside below to see how to deal with the duplicate content it generates.


Aside: Indexing more than one level of facets

If you want more than one level of facets to be indexable, then this will create certain problems.

Suppose you have a facet for size:

  • Televisions: Size: 46″, 44″, 42″

And want to add a brand facet:

  • Televisions: Brand: Samsung, Panasonic, Sony

This will create duplicate content because the search engines will be able to follow your facets in both orders, generating:

  • Television – 46″ – Samsung
  • Television – Samsung – 46″

You’ll have to either rel canonical your duplicate pages with another rule or set up your facets so they create a single unique URL.

You also need to be aware that each followable facet you add will multiply with each other followable facet and it’s very easy to generate a mass of pages for search engines to get stuck in. Depending on your setup you might need to block more paths in robots.txt or set-up more logic to prevent them being followed.

Letting search engines index more than one level of facets adds a lot of possible problems; make sure you’re keeping track of them.


2. User-generated content cannibalization

This is a common problem for listings sites (assuming they allow user generated content). If you’re reading this as an e-commerce site who only lists their own products, you can skip this one.

As we covered in the first area, category pages on listings sites are usually the landing pages aiming for the valuable search terms, but as your users start generating pages they can often create titles and content that cannibalise your landing pages.

Suppose you’re a job site with a category page for PHP Jobs in Greater Manchester. If a recruiter then creates a job advert for PHP Jobs in Greater Manchester for the 4 positions they currently have, you’ve got a duplicate content problem.

This is less of a problem when your site is large and your categories mature, it will be obvious to any search engine which are your high value category pages, but at the start where you’re lacking authority and individual listings might contain more relevant content than your own search pages this can be a problem.

Solution 1: Create structured titles

Set the <title> differently than the on-page title. Depending on variables you have available to you can set the title tag programmatically without changing the page title using other information given by the user.

For example, on our imaginary job site, suppose the recruiter also provided the following information in other fields:

  • The no. of positions: 4
  • The primary area: PHP Developer
  • The name of the recruiting company: ABC Recruitment
  • Location: Manchester

We could set the <title> pattern to be: *No of positions* *The primary area* with *recruiter name* in *Location* which would give us:

4 PHP Developers with ABC Recruitment in Manchester

Setting a <title> tag allows you to target long-tail traffic by constructing detailed descriptive titles. In our above example, imagine the recruiter had specified “Castlefield, Manchester” as the location.

All of a sudden, you’ve got a perfect opportunity to pick up long-tail traffic for people searching in Castlefield in Manchester.

On the downside, you lose the ability to pick up long-tail traffic where your users have chosen keywords you wouldn’t have used.

For example, suppose Manchester has a jobs program called “Green Highway.” A job advert title containing “Green Highway” might pick up valuable long-tail traffic. Being able to discover this, however, and find a way to fit it into a dynamic title is very hard.

Solution 2: Use regex to noindex the offending pages

Perform a regex (or string contains) search on your listings titles and no-index the ones which cannabalise your main category pages.

If it’s not possible to construct titles with variables or your users provide a lot of additional long-tail traffic with their own titles, then is a great option. On the downside, you miss out on possible structured long-tail traffic that you might’ve been able to aim for.

Solution 3: De-index all your listings

It may seem rash, but if you’re a large site with a huge number of very similar or low-content listings, you might want to consider this, but there is no common standard. Some sites like Indeed choose to no-index all their job adverts, whereas some other sites like Craigslist index all their individual listings because they’ll drive long tail traffic.

Don’t de-index them all lightly!

3. Constantly expiring content

Our third and final problem is that user-generated content doesn’t last forever. Particularly on listings sites, it’s constantly expiring and changing.

For most use cases I’d recommend 301’ing expired content to a relevant category page, with a message triggered by the redirect notifying the user of why they’ve been redirected. It typically comes out as the best combination of search and UX.

For more information or advice on how to deal with the edge cases, there’s a previous Moz blog post on how to deal with expired content which I think does an excellent job of covering this area.

Summary

In summary, if you’re working with listings sites, all three of the following need to be kept in mind:

  • How are the landing pages generated? If they’re generated using free text or facets have the potential problems been solved?
  • Is user generated content cannibalising the main landing pages?
  • How has constantly expiring content been dealt with?

Good luck listing, and if you’ve had any other tricky problems or solutions you’ve come across working on listings sites lets chat about them in the comments below!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Image Forward – Review Monitoring

Review Monitoring Service. Image Forward will automatically monitor the internet for reviews of your company and send you alerts to track, investigate and ac…

Reblogged 4 years ago from www.youtube.com

​Inbound Lead Generation: eCommerce Marketing’s Missing Link

Posted by Everett

If eCommerce businesses hope to remain competitive with Amazon, eBay, big box brands, and other online retail juggernauts, they’ll need to learn how to conduct content marketing, lead generation, and contact nurturing as part of a comprehensive inbound marketing strategy.

First, I will discuss some of the ways most online retailers are approaching email from the bottom of the funnel upward, and why this needs to be turned around. Then we can explore how to go about doing this within the framework of “Inbound Marketing” for eCommerce businesses. Lastly, popular marketing automation and email marketing solutions are discussed in the context of inbound marketing for eCommerce.

Key differences between eCommerce and lead generation approaches to email

Different list growth strategies

Email acquisition sources differ greatly between lead gen. sites and online stores. The biggest driver of email acquisition for most eCommerce businesses are their shoppers, especially when the business doesn’t collect an email address for their contact database until the shopper provides it during the check-out process—possibly, not until the very end.

With most B2B/B2C lead gen. websites, the entire purpose of every landing page is to get visitors to submit a contact form or pick up the phone. Often, the price tag for their products or services is much higher than those of an eCommerce site or involves recurring payments. In other words, what they’re selling is more difficult to sell. People take longer to make those purchasing decisions. For this reason, leads—in the form of contact names and email addresses—are typically acquired and nurtured without having first become a customer.

Contacts vs. leads

Whether it is a B2B or B2C website, lead gen. contacts (called leads) are thought of as potential customers (clients, subscribers, patients) who need to be nurtured to the point of becoming “sales qualified,” meaning they’ll eventually get a sales call or email that attempts to convert them into a customer.

On the other hand, eCommerce contacts are often thought of primarily as existing customers to whom the marketing team can blast coupons and other offers by email.

Retail sites typically don’t capture leads at the top or middle of the funnel. Only once a shopper has checked out do they get added to the list. Historically, the buying cycle has been short enough that eCommerce sites could move many first-time visitors directly to customers in a single visit.
But this has changed.

Unless your brand is very strong—possibly a luxury brand or one with an offline retail presence—it is probably getting more difficult (i.e. expensive) to acquire new customers. At the same time, attrition rates are rising. Conversion optimization helps by converting more bottom of the funnel visitors. SEO helps drive more traffic into the site, but mostly for middle-of-funnel (category page) and bottom-of-funnel (product page) visitors who may not also be price/feature comparison shopping, or are unable to convert right away because of device or time limitations.

Even savvy retailers publishing content for shoppers higher up in the funnel, such as buyer guides and reviews, aren’t getting an email address and are missing a lot of opportunities because of it.

attract-convert-grow-funnel-inflow-2.jpg

Here’s a thought. If your eCommerce site has a 10 percent conversion rate, you’re doing pretty good by most standards. But what happened to the other 90 percent of those visitors? Will you have the opportunity to connect with them again? Even if you bump that up a few percentage points with retargeting, a lot of potential revenue has seeped out of your funnel without a trace.

I don’t mean to bash the eCommerce marketing community with generalizations. Most lead gen. sites aren’t doing anything spectacular either, and a lot of opportunity is missed all around.

There are many eCommerce brands doing great things marketing-wise. I’m a big fan of
Crutchfield for their educational resources targeting early-funnel traffic, and Neman Tools, Saddleback Leather and Feltraiger for the stories they tell. Amazon is hard to beat when it comes to scalability, product suggestions and user-generated reviews.

Sadly, most eCommerce sites (including many of the major household brands) still approach marketing in this way…

The ol’ bait n’ switch: promising value and delivering spam

Established eCommerce brands have gigantic mailing lists (compared with lead gen. counterparts), to whom they typically send out at least one email each week with “offers” like free shipping, $ off, buy-one-get-one, or % off their next purchase. The lists are minimally segmented, if at all. For example, there might be lists for repeat customers, best customers, unresponsive contacts, recent purchasers, shoppers with abandoned carts, purchases by category, etc.

The missing points of segmentation include which campaign resulted in the initial contact (sometimes referred to as a cohort) and—most importantly—the persona and buying cycle stage that best applies to each contact.

Online retailers often send frequent “blasts” to their entire list or to a few of the large segments mentioned above. Lack of segmentation means contacts aren’t receiving emails based on their interests, problems, or buying cycle stage, but instead, are receiving what they perceive as “generic” emails.

The result of these missing segments and the lack of overarching strategy looks something like this:

My, What a Big LIST You Have!

iStock_000017047747Medium.jpg

TIME reported in 2012 on stats from Responsys that the average online retailer sent out between five and six emails the week after Thanksgiving. Around the same time, the Wall Street Journal reported that the top 100 online retailers sent an average of 177 emails apiece to each of their contacts in 2011. Averaged out, that’s somewhere between three and four emails each week that the contact is receiving from these retailers.

The better to SPAM you with!

iStock_000016088853Medium.jpg

A 2014 whitepaper from SimpleRelevance titled
Email Fail: An In-Depth Evaluation of Top 20 Internet Retailer’s Email Personalization Capabilities (
PDF) found that, while 70 percent of marketing executives believed personalization was of “utmost importance” to their business…

“Only 17 percent of marketing leaders are going beyond basic transactional data to deliver personalized messages to consumers.”

Speaking of email overload, the same report found that some major online retailers sent ten or more emails per week!

simplerelevance-email-report-frequency.png

The result?

All too often, the eCommerce business will carry around big, dead lists of contacts who don’t even bother reading their emails anymore. They end up scrambling toward other channels to “drive more demand,” but because the real problems were never addressed, this ends up increasing new customer acquisition costs.

The cycle looks something like this:

  1. Spend a fortune driving in unqualified traffic from top-of-the-funnel channels
  2. Ignore the majority of those visitors who aren’t ready to purchase
  3. Capture email addresses only for the few visitors who made a purchase
  4. Spam the hell out of those people until they unsubscribe
  5. Spend a bunch more money trying to fill the top of the funnel with even more traffic

It’s like trying to fill your funnel with a bucket full of holes, some of them patched with band-aids.

The real problems

  1. Lack of a cohesive strategy across marketing channels
  2. Lack of a cohesive content strategy throughout all stages of the buying cycle
  3. Lack of persona, buying cycle stage, and cohort-based list segmentation to nurture contacts
  4. Lack of tracking across customer touchpoints and devices
  5. Lack of gated content that provides enough value to early-funnel visitors to get them to provide their email address

So, what’s the answer?

Inbound marketing allows online retailers to stop competing with Amazon and other “price focused” competitors with leaky funnels, and to instead focus on:

  1. Persona-based content marketing campaigns designed to acquire email addresses from high-quality leads (potential customers) by offering them the right content for each stage in their buyer’s journey
  2. A robust marketing automation system that makes true personalization scalable
  3. Automated contact nurturing emails triggered by certain events, such as viewing specific content, abandoning their shopping cart, adding items to their wish list or performing micro-conversions like downloading a look book
  4. Intelligent SMM campaigns that match visitors and customers with social accounts by email addresses, interests and demographics—as well as social monitoring
  5. Hyper-segmented email contact lists to support the marketing automation described above, as well as to provide highly-customized email and shopping experiences
  6. Cross-channel, closed loop reporting to provide a complete “omnichannel” view of online marketing efforts and how they assist offline conversions, if applicable

Each of these areas will be covered in more detail below. First, let’s take a quick step back and define what it is we’re talking about here.

Inbound marketing: a primer

A lot of people think “inbound marketing” is just a way some SEO agencies are re-cloaking themselves to avoid negative associations with search engine optimization. Others think it’s synonymous with “internet marketing.” I think it goes more like this:

Inbound marketing is to Internet marketing as SEO is to inbound marketing: One piece of a larger whole.

There are many ways to define inbound marketing. A cursory review of definitions from several trusted sources reveals some fundamental similarities :

Rand Fishkin

randfishkin.jpeg

“Inbound Marketing is the practice of earning traffic and attention for your business on the web rather than buying it or interrupting people to get it. Inbound channels include organic search, social media, community-building content, opt-in email, word of mouth, and many others. Inbound marketing is particularly powerful because it appeals to what people are looking for and what they want, rather than trying to get between them and what they’re trying to do with advertising. Inbound’s also powerful due to the flywheel-effect it creates. The more you invest in Inbound and the more success you have, the less effort required to earn additional benefit.”


Mike King

mikeking.jpeg

“Inbound Marketing is a collection of marketing activities that leverage remarkable content to penetrate earned media channels such as Organic Search, Social Media, Email, News and the Blogosphere with the goal of engaging prospects when they are specifically interested in what the brand has to offer.”

This quote is from 2012, and is still just as accurate today. It’s from an
Inbound.org comment thread where you can also see many other takes on it from the likes of Ian Lurie, Jonathon Colman, and Larry Kim.


Inflow

inflow-logo.jpeg

“Inbound Marketing is a multi-channel, buyer-centric approach to online marketing that involves attracting, engaging, nurturing and converting potential customers from wherever they are in the buying cycle.”

From Inflow’s
Inbound Services page.


Wikipedia

wikipedia.jpeg

“Inbound marketing refers to marketing activities that bring visitors in, rather than marketers having to go out to get prospects’ attention. Inbound marketing earns the attention of customers, makes the company easy to be found, and draws customers to the website by producing interesting content.”

From
Inbound Marketing – Wikipedia.


Larry-Kim.jpeg

Larry Kim

“Inbound marketing” refers to marketing activities that bring leads and customers in when they’re ready, rather than you having to go out and wave your arms to try to get people’s attention.”

Via
Marketing Land in 2013. You can also read more of Larry Kim’s interpretation, along with many others, on Inbound.org.


Hubspot

“Instead of the old outbound marketing methods of buying ads, buying email lists, and praying for leads, inbound marketing focuses on creating quality content that pulls people toward your company and product, where they naturally want to be.”

Via
Hubspot, a marketing automation platform for inbound marketing.

When everyone has their own definition of something, it helps to think about what they have in common, as opposed to how they differ. In the case of inbound, this includes concepts such as:

  • Pull (inbound) vs. push (interruption) marketing
  • “Earning” media coverage, search engine rankings, visitors and customers with outstanding content
  • Marketing across channels
  • Meeting potential customers where they are in their buyer’s journey

Running your first eCommerce inbound marketing campaign

Audience personas—priority no. 1

The magic happens when retailers begin to hyper-segment their list based on buyer personas and other relevant information (i.e. what they’ve downloaded, what they’ve purchased, if they abandoned their cart…). This all starts with audience research to develop personas. If you need more information on persona development, try these resources:

Once personas are developed, retailers should choose one on which to focus. A complete campaign strategy should be developed around this persona, with the aim of providing the “right value” to them at the “right time” in their buyer’s journey.

Ready to get started?

We’ve developed a quick-start guide in the form of a checklist for eCommerce marketers who want to get started with inbound marketing, which you can access below.

inbound ecommerce checklist

Hands-on experience running one campaign will teach you more about inbound marketing than a dozen articles. My advice: Just do one. You will make mistakes. Learn from them and get better each time.

Example inbound marketing campaign

Below is an example of how a hypothetical inbound marketing campaign might play out, assuming you have completed all of the steps in the checklist above. Imagine you handle marketing for an online retailer of high-end sporting goods.

AT Hiker Tommy campaign: From awareness to purchase

When segmenting visitors and customers for a “high-end sporting goods / camping retailer” based on the East Coast, you identified a segment of “Trail Hikers.” These are people with disposable income who care about high-quality gear, and will pay top dollar if they know it is tested and reliable. The top trail on their list of destinations is the
Appalachian Trail (AT).

Top of the Funnel: SEO & Strategic Content Marketing

at-tommy.jpg

Tommy’s first action is to do “top of the funnel” research from search engines (one reason why SEO is still so important to a complete inbound marketing strategy).

A search for “Hiking the Appalachian Trail” turns up your article titled “What NOT to Pack When Hiking the Appalachian Trail,” which lists common items that are bulky/heavy, and highlights slimmer, lighter alternatives from your online catalog.

It also highlights the difference between cheap gear and the kind that won’t let you down on your 2,181 mile journey through the wilderness of Appalachia, something you learned was important to Tommy when developing his persona. This allows you to get the company’s value proposition of “tested, high-end, quality gear only” in front of readers very early in their buyer’s journey—important if you want to differentiate your site from all of the retailers racing Amazon to the bottom of their profit margins.

So far you have yet to make “contact” with AT Hiker Tommy. The key to “acquiring” a contact before the potential customer is ready to make a purchase is to provide something of value to that specific type of person (i.e. their persona) at that specific point in time (i.e. their buying cycle stage).

In this case, we need to provide value to AT Hiker Tommy while he is getting started on his research about hiking the Appalachian Trail. He has an idea of what gear not to bring, as well as some lighter, higher-end options sold on your site. At this point, however, he is not ready to buy anything without researching the trail more. This is where retailers lose most of their potential customers. But not you. Not this time…

Middle of the funnel: Content offers, personalization, social & email nurturing

at-hiker-ebook.png

On the “What NOT to Pack When Hiking the Appalachian Trail” article (and probably several others), you have placed a call-to-action (CTA) in the form of a button that offers something like:

Download our Free 122-page Guide to Hiking the Appalachian Trail

This takes Tommy to a landing page showcasing some of the quotes from the book, and highlighting things like:

“We interviewed over 50 ‘thru-hikers’ who completed the AT and have curated and organized the best first-hand tips, along with our own significant research to develop a free eBook that should answer most of your questions about the trail.”

By entering their email address potential customers agree to allow you to send them the free PDF downloadable guide to hiking the AT, and other relevant information about hiking.

An automated email is sent with a link to the downloadable PDF guide, and several other useful content links, such as “The AT Hiker’s Guide to Gear for the Appalachian Trail”—content designed to move Tommy further toward the purchase of hiking gear.

If Tommy still has not made a purchase within the next two weeks, another automated email is sent asking for feedback about the PDF guide (providing the link again), and to again provide the link to the “AT Hiker’s Guide to Gear…” along with a compelling offer just for him, perhaps “Get 20% off your first hiking gear purchase, and a free wall map of the AT!”

Having Tommy’s email address also allows you to hyper-target him on social channels, while also leveraging his initial visit to initiate retargeting efforts.

Bottom of the funnel: Email nurturing & strategic, segmented offers

Eventually Tommy makes a purchase, and he may or may not receive further emails related to this campaign, such as post-purchase emails for reviews, up-sells and cross-sells.

Upon checkout, Tommy checked the box to opt-in to weekly promotional emails. He is now on multiple lists. Your marketing automation system will automatically update Tommy’s status from “Contact” or lead, to “Customer” and potentially remove or deactivate him from the marketing automation system database. This is accomplished either by default integration features, or with the help of integration tools like
Zapier and IFTTT.

You have now nurtured Tommy from his initial research on Google all the way to his first purchase without ever having sent a spammy newsletter email full of irrelevant coupons and other offers. However, now that he is a loyal customer, Tommy finds value in these bottom-of-funnel email offers.

And this is just the start

Every inbound marketing campaign will have its own mix of appropriate channels. This post has focused mostly on email because acquiring the initial permission to contact the person is what fuels most of the other features offered by marketing automation systems, including:

  • Personalization of offers and other content on the site.
  • Knowing exactly which visitors are interacting on social media
  • Knowing where visitors and social followers are in the buying cycle and which persona best represents them, among other things.
  • Smart forms that don’t require visitors to put in the same information twice and allow you to build out more detailed profiles of them over time.
  • Blogging platforms that tie into email and marketing automation systems
  • Analytics data that isn’t blocked by Google and is tied directly to real people.
  • Closed-loop reporting that integrates with call-tracking and Google’s Data Import tool
  • Up-sell, cross-sell, and abandoned cart reclamation features
Three more things…
  1. If you can figure out a way to get Tommy to “log in” when he comes to your site, the personalization possibilities are nearly limitless.
  2. The persona above is based on a real customer segment. I named it after my friend Tommy Bailey, who actually did write the eBook
    Guide to Hiking the Appalachian Trail, featured in the image above.
  3. This Moz post is part of an inbound marketing campaign targeting eCommerce marketers, a segment Inflow identified while building out our own personas. Our hope, and the whole point of inbound marketing, is that it provides value to you.

Current state of the inbound marketing industry

Inbound has, for the the most part, been applied to businesses in which the website objective is to generate leads for a sales team to follow-up with and close the deal. An examination of various marketing automation platforms—a key component of scalable inbound marketing programs—highlights this issue.

Popular marketing automation systems

Most of the major marketing automation systems can be be used very effectively as the backbone of an inbound marketing program for eCommerce businesses. However, only one of them (Silverpop) has made significant efforts to court the eCommerce market with content and out-of-box features. The next closest thing is Hubspot, so let’s start with those two:

Silverpop – an IBMⓇ Company

silver-pop.jpeg

Unlike the other platforms below, right out of the box Silverpop allows marketers to tap into very specific behaviors, including the items purchased or left in the cart.

You can easily segment based on metrics like the Recency, Frequency and Monetary Value (RFM) of purchases:

silverpop triggered campaigns

You can automate personalized shopping cart abandonment recovery emails:

silverpop cart abandonment recovery

You can integrate with many leading brands offering complementary services, including: couponing, CRM, analytics, email deliverability enhancement, social and most major eCommerce platforms.

What you can’t do with Silverpop is blog, find pricing info on their website, get a free trial on their website or have a modern-looking user experience. Sounds like an IBMⓇ company, doesn’t it?

HubSpot

Out of all the marketing automation platforms on this list, HubSpot is the most capable of handling “inbound marketing” campaigns from start to finish. This should come as no surprise, given the phrase is credited to
Brian Halligan, HubSpot’s co-founder and CEO.

While they don’t specifically cater to eCommerce marketing needs with the same gusto they give to lead gen. marketing, HubSpot does have
an eCommerce landing page and a demo landing page for eCommerce leads, which suggests that their own personas include eCommerce marketers. Additionally, there is some good content on their blog written specifically for eCommerce.

HubSpot has allowed some key partners to develop plug-ins that integrate with leading eCommerce platforms. This approach works well with curation, and is not dissimilar to how Google handles Android or Apple handles their approved apps.

magento and hubspot

The
Magento Connector for HubSpot, which costs $80 per month, was developed by EYEMAGiNE, a creative design firm for eCommerce websites. A similar HubSpot-approved third-party integration is on the way for Bigcommerce.

Another eCommerce integration for Hubspot is a Shopify plug-in called
HubShoply, which was developed by Groove Commerce and costs $100 per month.

You can also use HubSpot’s native integration capabilities with
Zapier to sync data between HubSpot and most major eCommerce SaaS vendors, including the ones above, as well as WooCommerce, Shopify, PayPal, Infusionsoft and more. However, the same could be said of some of the other marketing automation platforms, and using these third-party solutions can sometimes feel like fitting a square peg into a round hole.

HubSpot can and does handle inbound marketing for eCommerce websites. All of the features are there, or easy enough to integrate. But let’s put some pressure on them to up their eCommerce game even more. The least they can do is put an eCommerce link in the footer:

hubspot menus

Despite the lack of clear navigation to their eCommerce content, HubSpot seems to be paying more attention to the needs of eCommerce businesses than the rest of the platforms below.

Marketo

Nothing about Marketo’s in-house marketing strategy suggests “Ecommerce Director Bob” might be one of their personas. The description for each of
their marketing automation packages (from Spark to Enterprise) mentions that it is “for B2B” websites.

marketo screenshot

Driving Sales could apply to a retail business so I clicked on the link. Nope. Clearly, this is for lead generation.

marketo marketing automation

Passing “purchase-ready leads” over to your “sales reps” is a good example of the type of language used throughout the site.

Make no mistake, Marketo is a top-notch marketing automation platform. Powerful and clean, it’s a shame they don’t launch a full-scale eCommerce version of their core product. In the meantime, there’s the
Magento Integration for Marketo Plug-in developed by an agency out of Australia called Hoosh Marketing.

magento marketo integration

I’ve never used this integration, but it’s part of Marketo’s
LaunchPoint directory, which I imagine is vetted, and Hoosh seems like a reputable agency.

Their
pricing page is blurred and gated, which is annoying, but perhaps they’ll come on here and tell everyone how much they charge.

marketo pricing page

As with all others except Silverpop, the Marketo navigation provides no easy paths to landing pages that would appeal to “Ecommerce Director Bob.”

Pardot

This option is a
SalesForce product, so—though I’ve never had the opportunity to use it—I can imagine Pardot is heavy on B2B/Sales and very light on B2C marketing for retail sites.

The hero image on their homepage says as much.

pardot tagline

pardot marketing automationAgain, no mention of eCommerce or retail, but clear navigation to lead gen and sales.

Eloqua / OMC

eloqua-logo.jpeg

Eloqua, now part of the Oracle Marketing Cloud (OMC), has a landing page
for the retail industry, on which they proclaim:

“Retail marketers know that the path to lifelong loyalty and increased revenue goes through building and growing deep client relationships.”

Since when did retail marketers start calling customers clients?

eloqua integration

The Integration tab on OMC’s “…Retail.html” page helpfully informs eCommerce marketers that their sales teams can continue using CRM systems like SalesForce and Microsoft Dynamics but doesn’t mention anything about eCommerce platforms and other SaaS solutions for eCommerce businesses.

Others

There are many other players in this arena. Though I haven’t used them yet, three I would love to try out are
SharpSpring, Hatchbuck and Act-On. But none of them appear to be any better suited to handle the concerns of eCommerce websites.

Where there’s a gap, there’s opportunity

The purpose of the section above wasn’t to highlight deficiencies in the tools themselves, but to illustrate a gap in who they are being marketed to and developed for.

So far, most of your eCommerce competitors probably aren’t using tools like these because they are not marketed to by the platforms, and don’t know how to apply the technology to online retail in a way that would justify the expense.

The thing is, a tool is just a tool

The
key concepts behind inbound marketing apply just as much to online retail as they do to lead generation.

In order to “do inbound marketing,” a marketing automation system isn’t even strictly necessary (in theory). They just help make the activities scalable for most businesses.

They also bring a lot of different marketing activities under one roof, which saves time and allows data to be moved and utilized between channels and systems. For example, what a customer is doing on social could influence the emails they receive, or content they see on your site. Here are some potential uses for most of the platforms above:

Automated marketing uses

  • Personalized abandoned cart emails
  • Post-purchase nurturing/reorder marketing
  • Welcome campaigns for the newsletter (other free offer) signups
  • Winback campaigns
  • Lead-nurturing email campaigns for cohorts and persona-based segments

Content marketing uses

  • Optimized, strategic blogging platforms, and frameworks
  • Landing pages for pre-transactional/educational offers or contests
  • Social media reporting, monitoring, and publishing
  • Personalization of content and user experience

Reporting uses

  • Revenue reporting (by segment or marketing action)
  • Attribution reporting (by campaign or content)

Assuming you don’t have the budget for a marketing automation system, but already have a good email marketing platform, you can still get started with inbound marketing. Eventually, however, you may want to graduate to a dedicated marketing automation solution to reap the full benefits.

Email marketing platforms

Most of the marketing automation systems claim to replace your email marketing platform, while many email marketing platforms claim to be marketing automation systems. Neither statement is completely accurate.

Marketing automation systems, especially those created specifically for the type of “inbound” campaigns described above, provide a powerful suite of tools all in one place. On the other hand, dedicated email platforms tend to offer “email marketing” features that are better, and more robust, than those offered by marketing automation systems. Some of them are also considerably cheaper—such as
MailChimp—but those are often light on even the email-specific features for eCommerce.

A different type of campaign

Email “blasts” in the form of B.O.G.O., $10 off or free shipping offers can still be very successful in generating incremental revenue boosts — especially for existing customers and seasonal campaigns.

The conversion rate on a 20% off coupon sent to existing customers, for instance, would likely pulverize the conversion rate of an email going out to middle-of-funnel contacts with a link to content (at least with how CR is currently being calculated by email platforms).

Inbound marketing campaigns can also offer quick wins, but they tend to focus mostly on non-customers after the first segmentation campaign (a campaign for the purpose of segmenting your list, such as an incentivised survey). This means lower initial conversion rates, but long-term success with the growth of new customers.

Here’s a good bet if works with your budget: Rely on a marketing automation system for inbound marketing to drive new customer acquisition from initial visit to first purchase, while using a good email marketing platform to run your “promotional email” campaigns to existing customers.

If you have to choose one or the other, I’d go with a robust marketing automation system.

Some of the most popular email platforms used by eCommerce businesses, with a focus on how they handle various Inbound Marketing activities, include:

Bronto

bronto.jpeg

This platform builds in features like abandoned cart recovery, advanced email list segmentation and automated email workflows that nurture contacts over time.

They also offer a host of eCommerce-related
features that you just don’t get with marketing automation systems like Hubspot and Marketo. This includes easy integration with a variety of eCommerce platforms like ATG, Demandware, Magento, Miva Merchant, Mozu and MarketLive, not to mention apps for coupons, product recommendations, social shopping and more. Integration with enterprise eCommerce platforms is one reason why Bronto is seen over and over again when browsing the Internet Retailer Top 500 reports.

On the other hand, Bronto—like the rest of these email platforms—doesn’t have many of the features that assist with content marketing outside of emails. As an “inbound” marketing automation system, it is incomplete because it focuses almost solely on one channel: email.

Vertical Response

verticalresponse.jpeg

Another juggernaut in eCommerce email marketing platforms, Vertical Response, has even fewer inbound-related features than Bronto, though it is a good email platform with a free version that includes up to 1,000 contacts and 4,000 emails per month (i.e. 4 emails to a full list of 1,000).

Oracle Marketing Cloud (OMC)

Responsys (the email platform), like Eloqua (the marketing automation system) was gobbled up by Oracle and is now part of their “Marketing Cloud.”

It has been my experience that when a big technology firm like IBM or Oracle buys a great product, it isn’t “great” for the users. Time will tell.

Listrak

listrak.jpeg

Out of the established email platforms for eCommerce, Listrak may do the best job at positioning themselves as a full inbound marketing platform.

Listrak’s value proposition is that they’re an “Omnichannel” solution. Everything is all in one “Single, Integrated Digital Marketing Platform for Retailers.” The homepage image promises solutions for Email, Mobile, Social, Web and In-Store channels.

I haven’t had the opportunity to work with Listrak yet, but would love to hear feedback in the comments on whether they could handle the kind of persona-based content marketing and automated email nurturing campaigns described in the example campaign above.

Key takeaways

Congratulations for making this far! Here are a few things I hope you’ll take away from this post:

  • There is a lot of opportunity right now for eCommerce sites to take advantage of marketing automation systems and robust email marketing platforms as the infrastructure to run comprehensive inbound marketing campaigns.
  • There is a lot of opportunity right now for marketing automation systems to develop content and build in eCommerce-specific features to lure eCommerce marketers.
  • Inbound marketing isn’t email marketing, although email is an important piece to inbound because it allows you to begin forming lasting relationships with potential customers much earlier in the buying cycle.
  • To see the full benefits of inbound marketing, you should focus on getting the right content to the right person at the right time in their shopping journey. This necessarily involves several different channels, including search, social and email. One of the many benefits of marketing automation systems is their ability to track your efforts here across marketing channels, devices and touch-points.

Tools, resources, and further reading

There is a lot of great content on the topic of Inbound marketing, some of which has greatly informed my own understanding and approach. Here are a few resources you may find useful as well.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Spam Score: Moz’s New Metric to Measure Penalization Risk

Posted by randfish

Today, I’m very excited to announce that Moz’s Spam Score, an R&D project we’ve worked on for nearly a year, is finally going live. In this post, you can learn more about how we’re calculating spam score, what it means, and how you can potentially use it in your SEO work.

How does Spam Score work?

Over the last year, our data science team, led by 
Dr. Matt Peters, examined a great number of potential factors that predicted that a site might be penalized or banned by Google. We found strong correlations with 17 unique factors we call “spam flags,” and turned them into a score.

Almost every subdomain in 
Mozscape (our web index) now has a Spam Score attached to it, and this score is viewable inside Open Site Explorer (and soon, the MozBar and other tools). The score is simple; it just records the quantity of spam flags the subdomain triggers. Our correlations showed that no particular flag was more likely than others to mean a domain was penalized/banned in Google, but firing many flags had a very strong correlation (you can see the math below).

Spam Score currently operates only on the subdomain level—we don’t have it for pages or root domains. It’s been my experience and the experience of many other SEOs in the field that a great deal of link spam is tied to the subdomain-level. There are plenty of exceptions—manipulative links can and do live on plenty of high-quality sites—but as we’ve tested, we found that subdomain-level Spam Score was the best solution we could create at web scale. It does a solid job with the most obvious, nastiest spam, and a decent job highlighting risk in other areas, too.

How to access Spam Score

Right now, you can find Spam Score inside 
Open Site Explorer, both in the top metrics (just below domain/page authority) and in its own tab labeled “Spam Analysis.” Spam Score is only available for Pro subscribers right now, though in the future, we may make the score in the metrics section available to everyone (if you’re not a subscriber, you can check it out with a free trial). 

The current Spam Analysis page includes a list of subdomains or pages linking to your site. You can toggle the target to look at all links to a given subdomain on your site, given pages, or the entire root domain. You can further toggle source tier to look at the Spam Score for incoming linking pages or subdomains (but in the case of pages, we’re still showing the Spam Score for the subdomain on which that page is hosted).

You can click on any Spam Score row and see the details about which flags were triggered. We’ll bring you to a page like this:

Back on the original Spam Analysis page, at the very bottom of the rows, you’ll find an option to export a disavow file, which is compatible with Google Webmaster Tools. You can choose to filter the file to contain only those sites with a given spam flag count or higher:

Disavow exports usually take less than 3 hours to finish. We can send you an email when it’s ready, too.

WARNING: Please do not export this file and simply upload it to Google! You can really, really hurt your site’s ranking and there may be no way to recover. Instead, carefully sort through the links therein and make sure you really do want to disavow what’s in there. You can easily remove/edit the file to take out links you feel are not spam. When Moz’s Cyrus Shepard disavowed every link to his own site, it took more than a year for his rankings to return!

We’ve actually made the file not-wholly-ready for upload to Google in order to be sure folks aren’t too cavalier with this particular step. You’ll need to open it up and make some edits (specifically to lines at the top of the file) in order to ready it for Webmaster Tools

In the near future, we hope to have Spam Score in the Mozbar as well, which might look like this: 

Sweet, right? 🙂

Potential use cases for Spam Analysis

This list probably isn’t exhaustive, but these are a few of the ways we’ve been playing around with the data:

  1. Checking for spammy links to your own site: Almost every site has at least a few bad links pointing to it, but it’s been hard to know how much or how many potentially harmful links you might have until now. Run a quick spam analysis and see if there’s enough there to cause concern.
  2. Evaluating potential links: This is a big one where we think Spam Score can be helpful. It’s not going to catch every potentially bad link, and you should certainly still use your brain for evaluation too, but as you’re scanning a list of link opportunities or surfing to various sites, having the ability to see if they fire a lot of flags is a great warning sign.
  3. Link cleanup: Link cleanup projects can be messy, involved, precarious, and massively tedious. Spam Score might not catch everything, but sorting links by it can be hugely helpful in identifying potentially nasty stuff, and filtering out the more probably clean links.
  4. Disavow Files: Again, because Spam Score won’t perfectly catch everything, you will likely need to do some additional work here (especially if the site you’re working on has done some link buying on more generally trustworthy domains), but it can save you a heap of time evaluating and listing the worst and most obvious junk.

Over time, we’re also excited about using Spam Score to help improve the PA and DA calculations (it’s not currently in there), as well as adding it to other tools and data sources. We’d love your feedback and insight about where you’d most want to see Spam Score get involved.

Details about Spam Score’s calculation

This section comes courtesy of Moz’s head of data science, Dr. Matt Peters, who created the metric and deserves (at least in my humble opinion) a big round of applause. – Rand

Definition of “spam”

Before diving into the details of the individual spam flags and their calculation, it’s important to first describe our data gathering process and “spam” definition.

For our purposes, we followed Google’s definition of spam and gathered labels for a large number of sites as follows.

  • First, we randomly selected a large number of subdomains from the Mozscape index stratified by mozRank.
  • Then we crawled the subdomains and threw out any that didn’t return a “200 OK” (redirects, errors, etc).
  • Finally, we collected the top 10 de-personalized, geo-agnostic Google-US search results using the full subdomain name as the keyword and checked whether any of those results matched the original keyword. If they did not, we called the subdomain “spam,” otherwise we called it “ham.”

We performed the most recent data collection in November 2014 (after the Penguin 3.0 update) for about 500,000 subdomains.

Relationship between number of flags and spam

The overall Spam Score is currently an aggregate of 17 different “flags.” You can think of each flag a potential “warning sign” that signals that a site may be spammy. The overall likelihood of spam increases as a site accumulates more and more flags, so that the total number of flags is a strong predictor of spam. Accordingly, the flags are designed to be used together—no single flag, or even a few flags, is cause for concern (and indeed most sites will trigger at least a few flags).

The following table shows the relationship between the number of flags and percent of sites with those flags that we found Google had penalized or banned:

ABOVE: The overall probability of spam vs. the number of spam flags. Data collected in Nov. 2014 for approximately 500K subdomains. The table also highlights the three overall danger levels: low/green (< 10%) moderate/yellow (10-50%) and high/red (>50%)

The overall spam percent averaged across a large number of sites increases in lock step with the number of flags; however there are outliers in every category. For example, there are a small number of sites with very few flags that are tagged as spam by Google and conversely a small number of sites with many flags that are not spam.

Spam flag details

The individual spam flags capture a wide range of spam signals link profiles, anchor text, on page signals and properties of the domain name. At a high level the process to determine the spam flags for each subdomain is:

  • Collect link metrics from Mozscape (mozRank, mozTrust, number of linking domains, etc).
  • Collect anchor text metrics from Mozscape (top anchor text phrases sorted by number of links)
  • Collect the top five pages by Page Authority on the subdomain from Mozscape
  • Crawl the top five pages plus the home page and process to extract on page signals
  • Provide the output for Mozscape to include in the next index release cycle

Since the spam flags are incorporated into in the Mozscape index, fresh data is released with each new index. Right now, we crawl and process the spam flags for each subdomains every two – three months although this may change in the future.

Link flags

The following table lists the link and anchor text related flags with the the odds ratio for each flag. For each flag, we can compute two percents: the percent of sites with that flag that are penalized by Google and the percent of sites with that flag that were not penalized. The odds ratio is the ratio of these percents and gives the increase in likelihood that a site is spam if it has the flag. For example, the first row says that a site with this flag is 12.4 times more likely to be spam than one without the flag.

ABOVE: Description and odds ratio of link and anchor text related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

Working down the table, the flags are:

  • Low mozTrust to mozRank ratio: Sites with low mozTrust compared to mozRank are likely to be spam.
  • Large site with few links: Large sites with many pages tend to also have many links and large sites without a corresponding large number of links are likely to be spam.
  • Site link diversity is low: If a large percentage of links to a site are from a few domains it is likely to be spam.
  • Ratio of followed to nofollowed subdomains/domains (two separate flags): Sites with a large number of followed links relative to nofollowed are likely to be spam.
  • Small proportion of branded links (anchor text): Organically occurring links tend to contain a disproportionate amount of banded keywords. If a site does not have a lot of branded anchor text, it’s a signal the links are not organic.

On-page flags

Similar to the link flags, the following table lists the on page and domain name related flags:

ABOVE: Description and odds ratio of on page and domain name related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).

  • Thin content: If a site has a relatively small ratio of content to navigation chrome it’s likely to be spam.
  • Site mark-up is abnormally small: Non-spam sites tend to invest in rich user experiences with CSS, Javascript and extensive mark-up. Accordingly, a large ratio of text to mark-up is a spam signal.
  • Large number of external links: A site with a large number of external links may look spammy.
  • Low number of internal links: Real sites tend to link heavily to themselves via internal navigation and a relative lack of internal links is a spam signal.
  • Anchor text-heavy page: Sites with a lot of anchor text are more likely to be spam then those with more content and less links.
  • External links in navigation: Spam sites may hide external links in the sidebar or footer.
  • No contact info: Real sites prominently display their social and other contact information.
  • Low number of pages found: A site with only one or a few pages is more likely to be spam than one with many pages.
  • TLD correlated with spam domains: Certain TLDs are more spammy than others (e.g. pw).
  • Domain name length: A long subdomain name like “bycheapviagra.freeshipping.onlinepharmacy.com” may indicate keyword stuffing.
  • Domain name contains numerals: domain names with numerals may be automatically generated and therefore spam.

If you’d like some more details on the technical aspects of the spam score, check out the 
video of Matt’s 2012 MozCon talk about Algorithmic Spam Detection or the slides (many of the details have evolved, but the overall ideas are the same):

We’d love your feedback

As with all metrics, Spam Score won’t be perfect. We’d love to hear your feedback and ideas for improving the score as well as what you’d like to see from it’s in-product application in the future. Feel free to leave comments on this post, or to email Matt (matt at moz dot com) and me (rand at moz dot com) privately with any suggestions.

Good luck cleaning up and preventing link spam!



Not a Pro Subscriber? No problem!



Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it