Google Updates The Search Quality Rating Guidelines

Google has updated their search quality raters guidelines document on March 28, 2016 reducing it from 160 to 146 pages.

The post Google Updates The Search Quality Rating Guidelines appeared first on Search Engine Land.

Please visit Search Engine Land for the full article.

Reblogged 3 years ago from feeds.searchengineland.com

From Editorial Calendars to SEO: Setting Yourself Up to Create Fabulous Content

Posted by Isla_McKetta

Quick note: This article is meant to apply to teams of all sizes, from the sole proprietor who spends all night writing their copy (because they’re doing business during the day) to the copy team who occupies an entire floor and produces thousands of pieces of content per week. So if you run into a section that you feel requires more resources than you can devote just now, that’s okay. Bookmark it and revisit when you can, or scale the step down to a more appropriate size for your team. We believe all the information here is important, but that does not mean you have to do everything right now.

If you thought ideation was fun, get ready for content creation. Sure, we’ve all written some things before, but the creation phase of content marketing is where you get to watch that beloved idea start to take shape.

Before you start creating, though, you want to get (at least a little) organized, and an editorial calendar is the perfect first step.

Editorial calendars

Creativity and organization are not mutually exclusive. In fact, they can feed each other. A solid schedule gives you and your writers the time and space to be wild and creative. If you’re just starting out, this document may be sparse, but it’s no less important. Starting early with your editorial calendar also saves you from creating content willy-nilly and then finding out months later that no one ever finished that pesky (but crucial) “About” page.

There’s no wrong way to set up your editorial calendar, as long as it’s meeting your needs. Remember that an editorial calendar is a living document, and it will need to change as a hot topic comes up or an author drops out.

There are a lot of different types of documents that pass for editorial calendars. You get to pick the one that’s right for your team. The simplest version is a straight-up calendar with post titles written out on each day. You could even use a wall calendar and a Sharpie.

Monday Tuesday Wednesday Thursday Friday
Title
The Five Colors of Oscar Fashion 12 Fabrics We’re Watching for Fall Is Charmeuse the New Corduroy? Hot Right Now: Matching Your Handbag to Your Hatpin Tea-length and Other Fab Vocab You Need to Know
Author Ellie James Marta Laila Alex

Teams who are balancing content for different brands at agencies or other more complex content environments will want to add categories, author information, content type, social promo, and more to their calendars.

Truly complex editorial calendars are more like hybrid content creation/editorial calendars, where each of the steps to create and publish the content are indicated and someone has planned for how long all of that takes. These can be very helpful if the content you’re responsible for crosses a lot of teams and can take a long time to complete. It doesn’t matter if you’re using Excel or a Google Doc, as long as the people who need the calendar can easily access it. Gantt charts can be excellent for this. Here’s a favorite template for creating a Gantt chart in Google Docs (and they only get more sophisticated).

Complex calendars can encompass everything from ideation through writing, legal review, and publishing. You might even add content localization if your empire spans more than one continent to make sure you have the currency, date formatting, and even slang right.

Content governance

Governance outlines who is taking responsibility for your content. Who evaluates your content performance? What about freshness? Who decides to update (or kill) an older post? Who designs and optimizes workflows for your team or chooses and manages your CMS?

All these individual concerns fall into two overarching components to governance: daily maintenance and overall strategy. In the long run it helps if one person has oversight of the whole process, but the smaller steps can easily be split among many team members. Read this to take your governance to the next level.

Finding authors

The scale of your writing enterprise doesn’t have to be limited to the number of authors you have on your team. It’s also important to consider the possibility of working with freelancers and guest authors. Here’s a look at the pros and cons of outsourced versus in-house talent.

In-house authors

Guest authors and freelancers

Responsible to

You

Themselves

Paid by

You (as part of their salary)

You (on a per-piece basis)

Subject matter expertise

Broad but shallow

Deep but narrow

Capacity for extra work

As you wish

Show me the Benjamins

Turnaround time

On a dime

Varies

Communication investment

Less

More

Devoted audience

Smaller

Potentially huge

From that table, it might look like in-house authors have a lot more advantages. That’s somewhat true, but do not underestimate the value of occasionally working with a true industry expert who has name recognition and a huge following. Whichever route you take (and there are plenty of hybrid options), it’s always okay to ask that the writers you are working with be professional about communication, payment, and deadlines. In some industries, guest writers will write for links. Consider yourself lucky if that’s true. Remember, though, that the final paycheck can be great leverage for getting a writer to do exactly what you need them to (such as making their deadlines).

Tools to help with content creation

So those are some things you need to have in place before you create content. Now’s the fun part: getting started. One of the beautiful things about the Internet is that new and exciting tools crop up every day to help make our jobs easier and more efficient. Here are a few of our favorites.

Calendars

You can always use Excel or a Google Doc to set up your editorial calendar, but we really like Trello for the ability to gather a lot of information in one card and then drag and drop it into place. Once there are actual dates attached to your content, you might be happier with something like a Google Calendar.

Ideation and research

If you need a quick fix for ideation, turn your keywords into wacky ideas with Portent’s Title Maker. You probably won’t want to write to the exact title you’re given (although “True Facts about Justin Bieber’s Love of Pickles” does sound pretty fascinating…), but it’s a good way to get loose and look at your topic from a new angle.

Once you’ve got that idea solidified, find out what your audience thinks about it by gathering information with Survey Monkey or your favorite survey tool. Or, use Storify to listen to what people are saying about your topic across a wide variety of platforms. You can also use Storify to save those references and turn them into a piece of content or an illustration for one. Don’t forget that a simple social ask can also do wonders.

Format

Content doesn’t have to be all about the words. Screencasts, Google+ Hangouts, and presentations are all interesting ways to approach content. Remember that not everyone’s a reader. Some of your audience will be more interested in visual or interactive content. Make something for everyone.

Illustration

Don’t forget to make your content pretty. It’s not that hard to find free stock images online (just make sure you aren’t violating someone’s copyright). We like Morgue File, Free Images, and Flickr’s Creative Commons. If you aren’t into stock images and don’t have access to in-house graphic design, it’s still relatively easy to add images to your content. Pull a screenshot with Skitch or dress up an existing image with Pixlr. You can also use something like Canva to create custom graphics.

Don’t stop with static graphics, though. There are so many tools out there to help you create gifs, quizzes and polls, maps, and even interactive timelines. Dream it, then search for it. Chances are whatever you’re thinking of is doable.

Quality, not quantity

Mediocre content will hurt your cause

Less is more. That’s not an excuse to pare your blog down to one post per month (check out our publishing cadence experiment), but it is an important reminder that if you’re writing “How to Properly Install a Toilet Seat” two days after publishing “Toilet Seat Installation for Dummies,” you might want to rethink your strategy.

The thing is, and I’m going to use another cliché here to drive home the point, you never get a second chance to make a first impression. Potential customers are roving the Internet right now looking for exactly what you’re selling. And if what they find is an only somewhat informative article stuffed with keywords and awful spelling and grammar mistakes… well, you don’t want that. Oh, and search engines think it’s spammy too…

A word about copyright

We’re not copyright lawyers, so we can’t give you the ins and outs on all the technicalities. What we can tell you (and you already know this) is that it’s not okay to steal someone else’s work. You wouldn’t want them to do it to you. This includes images. So whenever you can, make your own images or find images that you can either purchase the rights to (stock imagery) or license under Creative Commons.

It’s usually okay to quote short portions of text, as long as you attribute the original source (and a link is nice). In general, titles and ideas can’t be copyrighted (though they might be trademarked or patented). When in doubt, asking for permission is smart.

That said, part of the fun of the Internet is the remixing culture which includes using things like memes and gifs. Just know that if you go that route, there is a certain amount of risk involved.

Editing

Your content needs to go through at least one editing cycle by someone other than the original author. There are two types of editing, developmental (which looks at the underlying structure of a piece that happens earlier in the writing cycle) and copy editing (which makes sure all the words are there and spelled right in the final draft).

If you have a very small team or are in a rush (and are working with writers that have some skill), you can often skip the developmental editing phase. But know that an investment in that close read of an early draft is often beneficial to the piece and to the writer’s overall growth.

Many content teams peer-edit work, which can be great. Other organizations prefer to run their work by a dedicated editor. There’s no wrong answer, as long as the work gets edited.

Ensuring proper basic SEO

The good news is that search engines are doing their best to get closer and closer to understanding and processing natural language. So good writing (including the natural use of synonyms rather than repeating those keywords over and over and…) will take you a long way towards SEO mastery.

For that reason (and because it’s easy to get trapped in keyword thinking and veer into keyword stuffing), it’s often nice to think of your SEO check as a further edit of the post rather than something you should think about as you’re writing.

But there are still a few things you can do to help cover those SEO bets. Once you have that draft, do a pass for SEO to make sure you’ve covered the following:

  • Use your keyword in your title
  • Use your keyword (or long-tail keyword phrase) in an H2
  • Make sure the keyword appears at least once (though not more than four times, especially if it’s a phrase) in the body of the post
  • Use image alt text (including the keyword when appropriate)

Finding time to write when you don’t have any

Writing (assuming you’re the one doing the writing) can require a lot of energy—especially if you want to do it well. The best way to find time to write is to break each project down into little tasks. For example, writing a blog post actually breaks down into these steps (though not always in this order):

  • Research
  • Outline
  • Fill in outline
  • Rewrite and finish post
  • Write headline
  • SEO check
  • Final edit
  • Select hero image (optional)

So if you only have random chunks of time, set aside 15-30 minutes one day (when your research is complete) to write a really great outline. Then find an hour the next to fill that outline in. After an additional hour the following day, (unless you’re dealing with a research-heavy post) you should have a solid draft by the end of day three.

The magic of working this way is that you engage your brain and then give it time to work in the background while you accomplish other tasks. Hemingway used to stop mid-sentence at the end of his writing days for the same reason.

Once you have that draft nailed, the rest of the steps are relatively easy (even the headline, which often takes longer to write than any other sentence, is easier after you’ve immersed yourself in the post over a few days).

Working with design/development

Every designer and developer is a little different, so we can’t give you any blanket cure-alls for inter-departmental workarounds (aka “smashing silos”). But here are some suggestions to help you convey your vision while capitalizing on the expertise of your coworkers to make your content truly excellent.

Ask for feedback

From the initial brainstorm to general questions about how to work together, asking your team members what they think and prefer can go a long way. Communicate all the details you have (especially the unspoken expectations) and then listen.

If your designer tells you up front that your color scheme is years out of date, you’re saving time. And if your developer tells you that the interactive version of that timeline will require four times the resources, you have the info you need to fight for more budget (or reassess the project).

Check in

Things change in the design and development process. If you have interim check-ins already set up with everyone who’s working on the project, you’ll avoid the potential for nasty surprises at the end. Like finding out that no one has experience working with that hot new coding language you just read about and they’re trying to do a workaround that isn’t working.

Proofread

Your job isn’t done when you hand over the copy to your designer or developer. Not only might they need help rewriting some of your text so that it fits in certain areas, they will also need you to proofread the final version. Accidents happen in the copy-and-paste process and there’s nothing sadder than a really beautiful (and expensive) piece of content that wraps up with a typo:

Know when to fight for an idea

Conflict isn’t fun, but sometimes it’s necessary. The more people involved in your content, the more watered down the original idea can get and the more roadblocks and conflicting ideas you’ll run into. Some of that is very useful. But sometimes you’ll get pulled off track. Always remember who owns the final product (this may not be you) and be ready to stand up for the idea if it’s starting to get off track.

We’re confident this list will set you on the right path to creating some really awesome content, but is there more you’d like to know? Ask us your questions in the comments.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

A Vision for Brand Engagement Online, or "The Goal"

Posted by EricEnge

Today’s post focuses on a vision for your online presence. This vision outlines what it takes to be the best, both from an overall reputation and visibility standpoint, as well as an SEO point of view. The reason these are tied together is simple: Your overall online reputation and visibility is a huge factor in your SEO. Period. Let’s start by talking about why.

Core ranking signals

For purposes of this post, let’s define three cornerstone ranking signals that most everyone agrees on:

Links

Links remain a huge factor in overall ranking. Both Cyrus Shepard and Marcus Tober re-confirmed this on the Periodic Table of SEO Ranking Factors session at the SMX Advanced conference in Seattle this past June.

On-page content

On-page content remains a huge factor too, but with some subtleties now thrown in. I wrote about some of this in earlier posts I did on Moz about Term Frequency and Inverse Document Frequency. Suffice it to say that on-page content is about a lot more than pure words on the page, but also includes the supporting pages that you link to.

User engagement with your site

This is not one of the traditional SEO signals from the early days of SEO, but most advanced SEO pros that I know consider it a real factor these days. One of the most popular concepts people talk about is called pogo-sticking, which is illustrated here:

You can learn more about the pogosticking concept by visiting this Whiteboard Friday video by a rookie SEO with a last name of Fishkin.

New, lesser-known signals

OK, so these are the more obvious signals, but now let’s look more broadly at the overall web ecosystem and talk about other types of ranking signals. Be warned that some of these signals may be indirect, but that just doesn’t matter. In fact, my first example below is an indirect factor which I will use to demonstrate why whether a signal is direct or indirect is not an issue at all.

Let me illustrate with an example. Say you spend $1 billion dollars building a huge brand around a product that is massively useful to people. Included in this is a sizable $100 million dollar campaign to support a highly popular charitable foundation, and your employees regularly donate time to help out in schools across your country. In short, the great majority of people love your brand.

Do you think this will impact the way people link to your site? Of course it does. Do you think it will impact how likely people are to be satisified with quality of the pages of your site? Consider this A/B test scenario of 2 pages from different “brands” (for the one on the left, imagine the image of Coca Cola or Pepsi Cola, whichever one you prefer):

Do you think that the huge brand will get a benefit of a doubt on their page that the no-name brand does not even though the pages are identical? Of course they will. Now let’s look at some simpler scenarios that don’t involve a $1 billion investment.

1. Cover major options related to a product or service on “money pages”

Imagine that a user arrives on your auto parts site after searching on the phrase “oil filter” at Google or Bing. Chances are pretty good that they want an oil filter, but here are some other items they may also want:

  • A guide to picking the right filter for their car
  • Oil
  • An oil filter wrench
  • A drainage pan to drain the old oil into

This is just the basics, right? But, you would be surprised with how many sites don’t include links or information on directly related products on their money pages. Providing this type of smart site and page design can have a major impact on user engagement with the money pages of your site.

2. Include other related links on money pages

In the prior item we covered the user’s most directly related needs, but they may have secondary needs as well. Someone who is changing a car’s oil is either a mechanic or a do-it-yourself-er. What else might they need? How about other parts, such as windshield wipers or air filters?

These are other fairly easy maintenance steps for someone who is working on their car to complete. Presence of these supporting products could be one way to improve user engagement with your pages.

3. Offer industry-leading non-commercial content on-site

Publishing world-class content on your site is a great way to produce links to your site. Of course, if you do this on a blog on your site, it may not provide links directly to your money pages, but it will nonetheless lift overall site authority.

In addition, if someone has consumed one or more pieces of great content on your site, the chance of their engaging in a more positive manner with your site overall go way up. Why? Because you’ve earned their trust and admiration.

4. Be everywhere your audiences are with more high-quality, relevant, non-commercial content

Are there major media sites that cover your market space? Do they consider you to be an expert? Will they quote you in articles they write? Can you provide them with guest posts or let you be a guest columnist? Will they collaborate on larger content projects with you?

All of these activities put you in front of their audiences, and if those audiences overlap with yours, this provides a great way to build your overall reputation and visibility. This content that you publish, or collaborate on, that shows up on 3rd-party sites will get you mentions and links. In addition, once again, it will provide you with a boost to your branding. People are now more likely to consume your other content more readily, including on your money pages.

5. Leverage social media

The concept here shares much in common with the prior point. Social media provides opportunities to get in front of relevant audiences. Every person that’s an avid follower of yours on a social media site is more likely to show very different behavior characteristics interacting with your site than someone that does not know you well at all.

Note that links from social media sites are nofollowed, but active social media behavior can lead to people implementing “real world” links to your site that are followed, from their blogs and media web sites.

6. Be active in the offline world as well

Think your offline activity doesn’t matter online? Think again. Relationships are still most easily built face-to-face. People you meet and spend time with can well become your most loyal fans online. This is particularly important when it comes to building relationships with influential people.

One great way to do that is to go to public events related to your industry, such as conferences. Better still, obtain speaking engagements at those conferences. This can even impact people who weren’t there to hear you speak, as they become aware that you have been asked to do that. This concept can also work for a small local business. Get out in your community and engage with people at local events.

The payoff here is similar to the payoff for other items: more engaged, highly loyal fans who engage with you across the web, sending more and more positive signals, both to other people and to search engines, that you are the real deal.

7. Provide great customer service/support

Whatever your business may be, you need to take care of your customers as best you can. No one can make everyone happy, that’s unrealistic, but striving for much better than average is a really sound idea. Having satisfied customers saying nice things about you online is a big impact item in the grand scheme of things.

8. Actively build relationships with influencers too

While this post is not about the value of influencer relationships, I include this in the list for illustration purposes, for two reasons:

  1. Some opportunities are worth extra effort. Know of someone who could have a major impact on your business? Know that they will be at a public event in the near future? Book your plane tickets and get your butt out there. No guarantee that you will get the result you are looking for, or that it will happen quickly, but your chances go WAY up if you get some face time with them.
  2. Influencers are worth special attention and focus, but your relationship-building approach to the web and SEO is not only about influencers. It’s about the entire ecosystem.

It’s an integrated ecosystem

The web provides a level of integrated, real-time connectivity of a kind that the world has never seen before. This is only going to increase. Do something bad to a customer in Hong Kong? Consumers in Boston will know within 5 minutes. That’s where it’s all headed.

Google and Bing (and any future search engine that may emerge) want to measure these types of signals because they tell them how to improve the quality of the experience on their platforms. There are may ways they can perform these measurements.

One simple concept is covered by Rand in this recent Whiteboard Friday video. The discussion is about a recent patent granted to Google that shows how the company can use search queries to detect who is an authority on a topic.

The example he provides is about people who search on “email finding tool”. If Google also finds that a number of people search on “voila norbert email tool”, Google may use that as an authority signal.

Think about that for a moment. How are you going to get people to search on your brand more while putting it together with a non-branded querly like that? (OK, please leave Mechanical Turk and other services like that out of the discussion).

Now you can start to see the bigger picture. Measurements like pogosticking and this recent search behavior related patent are just the tip of the iceberg. Undoubtedly, there are many other ways that search engines can measure what people like and engage with the most.

This is all part of SEO now. UX, product breadth, problem solving, UX, engaging in social media, getting face to face, creating great content that you publish in front of other people’s audiences, and more.

For the small local business, you can still win at this game, as your focus just needs to be on doing it better than your competitors. The big brands will never be hyper-local like you are, so don’t think you can’t play the game, because you can.

Whoever you are, get ready, because this new integrated ecosystem is already upon us, and you need to be a part of it.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Help Us Improve the Moz Blog: 2015 Reader Survey

Posted by Trevor-Klein

In late 2013, we asked you all about your experience with the Moz Blog. It was the first time we’d collected direct feedback from our readers in more than three years—an eternity in the marketing industry. With the pace of change in our line of work (not to mention your schedules and reading habits) we didn’t want to wait that long again, so we’re taking this opportunity to ask you how well we’re keeping up.

Our mission is to help you all become better marketers, and to do that, we need to know more about you. What challenges do you all face? What are your pain points? Your day-to-day frustrations? If you could learn more about one or two (or three) topics, what would those be?

If you’ll help us out by taking this five-minute survey, we can make sure we’re offering the most useful and valuable content we possibly can. When we’re done looking through the responses, we’ll follow up with a post about what we learned.

Thanks, everyone; we’re excited to see what you have to say!

(function(){var qs,js,q,s,d=document,gi=d.getElementById,ce=d.createElement,gt=d.getElementsByTagName,id=’typef_orm’,b=’https://s3-eu-west-1.amazonaws.com/share.typeform.com/’;if(!gi.call(d,id)){js=ce.call(d,’script’);js.id=id;js.src=b+’widget.js’;q=gt.call(d,’script’)[0];q.parentNode.insertBefore(js,q)}})()

Can’t see the survey? Click here to take it in a new tab.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Eliminate Duplicate Content in Faceted Navigation with Ajax/JSON/JQuery

Posted by EricEnge

One of the classic problems in SEO is that while complex navigation schemes may be useful to users, they create problems for search engines. Many publishers rely on tags such as rel=canonical, or the parameters settings in Webmaster Tools to try and solve these types of issues. However, each of the potential solutions has limitations. In today’s post, I am going to outline how you can use JavaScript solutions to more completely eliminate the problem altogether.

Note that I am not going to provide code examples in this post, but I am going to outline how it works on a conceptual level. If you are interested in learning more about Ajax/JSON/jQuery here are some resources you can check out:

  1. Ajax Tutorial
  2. Learning Ajax/jQuery

Defining the problem with faceted navigation

Having a page of products and then allowing users to sort those products the way they want (sorted from highest to lowest price), or to use a filter to pick a subset of the products (only those over $60) makes good sense for users. We typically refer to these types of navigation options as “faceted navigation.”

However, faceted navigation can cause problems for search engines because they don’t want to crawl and index all of your different sort orders or all your different filtered versions of your pages. They would end up with many different variants of your pages that are not significantly different from a search engine user experience perspective.

Solutions such as rel=canonical tags and parameters settings in Webmaster Tools have some limitations. For example, rel=canonical tags are considered “hints” by the search engines, and they may not choose to accept them, and even if they are accepted, they do not necessarily keep the search engines from continuing to crawl those pages.

A better solution might be to use JSON and jQuery to implement your faceted navigation so that a new page is not created when a user picks a filter or a sort order. Let’s take a look at how it works.

Using JSON and jQuery to filter on the client side

The main benefit of the implementation discussed below is that a new URL is not created when a user is on a page of yours and applies a filter or sort order. When you use JSON and jQuery, the entire process happens on the client device without involving your web server at all.

When a user initially requests one of the product pages on your web site, the interaction looks like this:

using json on faceted navigation

This transfers the page to the browser the user used to request the page. Now when a user picks a sort order (or filter) on that page, here is what happens:

jquery and faceted navigation diagram

When the user picks one of those options, a jQuery request is made to the JSON data object. Translation: the entire interaction happens within the client’s browser and the sort or filter is applied there. Simply put, the smarts to handle that sort or filter resides entirely within the code on the client device that was transferred with the initial request for the page.

As a result, there is no new page created and no new URL for Google or Bing to crawl. Any concerns about crawl budget or inefficient use of PageRank are completely eliminated. This is great stuff! However, there remain limitations in this implementation.

Specifically, if your list of products spans multiple pages on your site, the sorting and filtering will only be applied to the data set already transferred to the user’s browser with the initial request. In short, you may only be sorting the first page of products, and not across the entire set of products. It’s possible to have the initial JSON data object contain the full set of pages, but this may not be a good idea if the page size ends up being large. In that event, we will need to do a bit more.

What Ajax does for you

Now we are going to dig in slightly deeper and outline how Ajax will allow us to handle sorting, filtering, AND pagination. Warning: There is some tech talk in this section, but I will try to follow each technical explanation with a layman’s explanation about what’s happening.

The conceptual Ajax implementation looks like this:

ajax and faceted navigation diagram

In this structure, we are using an Ajax layer to manage the communications with the web server. Imagine that we have a set of 10 pages, the user has gotten the first page of those 10 on their device and then requests a change to the sort order. The Ajax requests a fresh set of data from the web server for your site, similar to a normal HTML transaction, except that it runs asynchronously in a separate thread.

If you don’t know what that means, the benefit is that the rest of the page can load completely while the process to capture the data that the Ajax will display is running in parallel. This will be things like your main menu, your footer links to related products, and other page elements. This can improve the perceived performance of the page.

When a user selects a different sort order, the code registers an event handler for a given object (e.g. HTML Element or other DOM objects) and then executes an action. The browser will perform the action in a different thread to trigger the event in the main thread when appropriate. This happens without needing to execute a full page refresh, only the content controlled by the Ajax refreshes.

To translate this for the non-technical reader, it just means that we can update the sort order of the page, without needing to redraw the entire page, or change the URL, even in the case of a paginated sequence of pages. This is a benefit because it can be faster than reloading the entire page, and it should make it clear to search engines that you are not trying to get some new page into their index.

Effectively, it does this within the existing Document Object Model (DOM), which you can think of as the basic structure of the documents and a spec for the way the document is accessed and manipulated.

How will Google handle this type of implementation?

For those of you who read Adam Audette’s excellent recent post on the tests his team performed on how Google reads Javascript, you may be wondering if Google will still load all these page variants on the same URL anyway, and if they will not like it.

I had the same question, so I reached out to Google’s Gary Illyes to get an answer. Here is the dialog that transpired:

Eric Enge: I’d like to ask you about using JSON and jQuery to render different sort orders and filters within the same URL. I.e. the user selects a sort order or a filter, and the content is reordered and redrawn on the page on the client site. Hence no new URL would be created. It’s effectively a way of canonicalizing the content, since each variant is a strict subset.

Then there is a second level consideration with this approach, which involves doing the same thing with pagination. I.e. you have 10 pages of products, and users still have sorting and filtering options. In order to support sorting and filtering across the entire 10 page set, you use an Ajax solution, so all of that still renders on one URL.

So, if you are on page 1, and a user executes a sort, they get that all back in that one page. However, to do this right, going to page 2 would also render on the same URL. Effectively, you are taking the 10 page set and rendering it all within one URL. This allows sorting, filtering, and pagination without needing to use canonical, noindex, prev/next, or robots.txt.

If this was not problematic for Google, the only downside is that it makes the pagination not visible to Google. Does that make sense, or is it a bad idea?

Gary Illyes
: If you have one URL only, and people have to click on stuff to see different sort orders or filters for the exact same content under that URL, then typically we would only see the default content.

If you don’t have pagination information, that’s not a problem, except we might not see the content on the other pages that are not contained in the HTML within the initial page load. The meaning of rel-prev/next is to funnel the signals from child pages (page 2, 3, 4, etc.) to the group of pages as a collection, or to the view-all page if you have one. If you simply choose to render those paginated versions on a single URL, that will have the same impact from a signals point of view, meaning that all signals will go to a single entity, rather than distributed to several URLs.

Summary

Keep in mind, the reason why Google implemented tags like rel=canonical, NoIndex, rel=prev/next, and others is to reduce their crawling burden and overall page bloat and to help focus signals to incoming pages in the best way possible. The use of Ajax/JSON/jQuery as outlined above does this simply and elegantly.

On most e-commerce sites, there are many different “facets” of how a user might want to sort and filter a list of products. With the Ajax-style implementation, this can be done without creating new pages. The end users get the control they are looking for, the search engines don’t have to deal with excess pages they don’t want to see, and signals in to the site (such as links) are focused on the main pages where they should be.

The one downside is that Google may not see all the content when it is paginated. A site that has lots of very similar products in a paginated list does not have to worry too much about Google seeing all the additional content, so this isn’t much of a concern if your incremental pages contain more of what’s on the first page. Sites that have content that is materially different on the additional pages, however, might not want to use this approach.

These solutions do require Javascript coding expertise but are not really that complex. If you have the ability to consider a path like this, you can free yourself from trying to understand the various tags, their limitations, and whether or not they truly accomplish what you are looking for.

Credit: Thanks for Clark Lefavour for providing a review of the above for technical correctness.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

How to Use Server Log Analysis for Technical SEO

Posted by SamuelScott

It’s ten o’clock. Do you know where your logs are?

I’m introducing this guide with a pun on a common public-service announcement that has run on late-night TV news broadcasts in the United States because log analysis is something that is extremely newsworthy and important.

If your technical and on-page SEO is poor, then nothing else that you do will matter. Technical SEO is the key to helping search engines to crawl, parse, and index websites, and thereby rank them appropriately long before any marketing work begins.

The important thing to remember: Your log files contain the only data that is 100% accurate in terms of how search engines are crawling your website. By helping Google to do its job, you will set the stage for your future SEO work and make your job easier. Log analysis is one facet of technical SEO, and correcting the problems found in your logs will help to lead to higher rankings, more traffic, and more conversions and sales.

Here are just a few reasons why:

  • Too many response code errors may cause Google to reduce its crawling of your website and perhaps even your rankings.
  • You want to make sure that search engines are crawling everything, new and old, that you want to appear and rank in the SERPs (and nothing else).
  • It’s crucial to ensure that all URL redirections will pass along any incoming “link juice.”

However, log analysis is something that is unfortunately discussed all too rarely in SEO circles. So, here, I wanted to give the Moz community an introductory guide to log analytics that I hope will help. If you have any questions, feel free to ask in the comments!

What is a log file?

Computer servers, operating systems, network devices, and computer applications automatically generate something called a log entry whenever they perform an action. In a SEO and digital marketing context, one type of action is whenever a page is requested by a visiting bot or human.

Server log entries are specifically programmed to be output in the Common Log Format of the W3C consortium. Here is one example from Wikipedia with my accompanying explanations:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • 127.0.0.1 — The remote hostname. An IP address is shown, like in this example, whenever the DNS hostname is not available or DNSLookup is turned off.
  • user-identifier — The remote logname / RFC 1413 identity of the user. (It’s not that important.)
  • frank — The user ID of the person requesting the page. Based on what I see in my Moz profile, Moz’s log entries would probably show either “SamuelScott” or “392388” whenever I visit a page after having logged in.
  • [10/Oct/2000:13:55:36 -0700] — The date, time, and timezone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP/1.0 — “GET” is one of the two commands (the other is “POST”) that can be performed. “GET” fetches a URL while “POST” is submitting something (such as a forum comment). The second part is the URL that is being accessed, and the last part is the version of HTTP that is being accessed.
  • 200 — The status code of the document that was returned.
  • 2326 — The size, in bytes, of the document that was returned.

Note: A hyphen is shown in a field when that information is unavailable.

Every single time that you — or the Googlebot — visit a page on a website, a line with this information is output, recorded, and stored by the server.

Log entries are generated continuously and anywhere from several to thousands can be created every second — depending on the level of a given server, network, or application’s activity. A collection of log entries is called a log file (or often in slang, “the log” or “the logs”), and it is displayed with the most-recent log entry at the bottom. Individual log files often contain a calendar day’s worth of log entries.

Accessing your log files

Different types of servers store and manage their log files differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers:

What is log analysis?

Log analysis (or log analytics) is the process of going through log files to learn something from the data. Some common reasons include:

  • Development and quality assurance (QA) — Creating a program or application and checking for problematic bugs to make sure that it functions properly
  • Network troubleshooting — Responding to and fixing system errors in a network
  • Customer service — Determining what happened when a customer had a problem with a technical product
  • Security issues — Investigating incidents of hacking and other intrusions
  • Compliance matters — Gathering information in response to corporate or government policies
  • Technical SEO — This is my favorite! More on that in a bit.

Log analysis is rarely performed regularly. Usually, people go into log files only in response to something — a bug, a hack, a subpoena, an error, or a malfunction. It’s not something that anyone wants to do on an ongoing basis.

Why? This is a screenshot of ours of just a very small part of an original (unstructured) log file:

Ouch. If a website gets 10,000 visitors who each go to ten pages per day, then the server will create a log file every day that will consist of 100,000 log entries. No one has the time to go through all of that manually.

How to do log analysis

There are three general ways to make log analysis easier in SEO or any other context:

  • Do-it-yourself in Excel
  • Proprietary software such as Splunk or Sumo-logic
  • The ELK Stack open-source software

Tim Resnik’s Moz essay from a few years ago walks you through the process of exporting a batch of log files into Excel. This is a (relatively) quick and easy way to do simple log analysis, but the downside is that one will see only a snapshot in time and not any overall trends. To obtain the best data, it’s crucial to use either proprietary tools or the ELK Stack.

Splunk and Sumo-Logic are proprietary log analysis tools that are primarily used by enterprise companies. The ELK Stack is a free and open-source batch of three platforms (Elasticsearch, Logstash, and Kibana) that is owned by Elastic and used more often by smaller businesses. (Disclosure: We at Logz.io use the ELK Stack to monitor our own internal systems as well as for the basis of our own log management software.)

For those who are interested in using this process to do technical SEO analysis, monitor system or application performance, or for any other reason, our CEO, Tomer Levy, has written a guide to deploying the ELK Stack.

Technical SEO insights in log data

However you choose to access and understand your log data, there are many important technical SEO issues to address as needed. I’ve included screenshots of our technical SEO dashboard with our own website’s data to demonstrate what to examine in your logs.

Bot crawl volume

It’s important to know the number of requests made by Baidu, BingBot, GoogleBot, Yahoo, Yandex, and others over a given period time. If, for example, you want to get found in search in Russia but Yandex is not crawling your website, that is a problem. (You’d want to consult Yandex Webmaster and see this article on Search Engine Land.)

Response code errors

Moz has a great primer on the meanings of the different status codes. I have an alert system setup that tells me about 4XX and 5XX errors immediately because those are very significant.

Temporary redirects

Temporary 302 redirects do not pass along the “link juice” of external links from the old URL to the new one. Almost all of the time, they should be changed to permanent 301 redirects.

Crawl budget waste

Google assigns a crawl budget to each website based on numerous factors. If your crawl budget is, say, 100 pages per day (or the equivalent amount of data), then you want to be sure that all 100 are things that you want to appear in the SERPs. No matter what you write in your robots.txt file and meta-robots tags, you might still be wasting your crawl budget on advertising landing pages, internal scripts, and more. The logs will tell you — I’ve outlined two script-based examples in red above.

If you hit your crawl limit but still have new content that should be indexed to appear in search results, Google may abandon your site before finding it.

Duplicate URL crawling

The addition of URL parameters — typically used in tracking for marketing purposes — often results in search engines wasting crawl budgets by crawling different URLs with the same content. To learn how to address this issue, I recommend reading the resources on Google and Search Engine Land here, here, here, and here.

Crawl priority

Google might be ignoring (and not crawling or indexing) a crucial page or section of your website. The logs will reveal what URLs and/or directories are getting the most and least attention. If, for example, you have published an e-book that attempts to rank for targeted search queries but it sits in a directory that Google only visits once every six months, then you won’t get any organic search traffic from the e-book for up to six months.

If a part of your website is not being crawled very often — and it is updated often enough that it should be — then you might need to check your internal-linking structure and the crawl-priority settings in your XML sitemap.

Last crawl date

Have you uploaded something that you hope will be indexed quickly? The log files will tell you when Google has crawled it.

Crawl budget

One thing I personally like to check and see is Googlebot’s real-time activity on our site because the crawl budget that the search engine assigns to a website is a rough indicator — a very rough one — of how much it “likes” your site. Google ideally does not want to waste valuable crawling time on a bad website. Here, I had seen that Googlebot had made 154 requests of our new startup’s website over the prior twenty-four hours. Hopefully, that number will go up!

As I hope you can see, log analysis is critically important in technical SEO. It’s eleven o’clock — do you know where your logs are now?

Additional resources

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

The Long Click and the Quality of Search Success

Posted by billslawski

“On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the “Long Click” — This occurred when someone went to a search result, ideally the top one, and did not return. That meant Google has successfully fulfilled the query.”

~ Steven Levy. In the Plex: How Google Thinks, Works, and Shapes our Lives

I often explore and read patents and papers from the search engines to try to get a sense of how they may approach different issues, and learn about the assumptions they make about search, searchers, and the Web. Lately, I’ve been keeping an eye open for papers and patents from the search engines where they talk about a metric known as the “long click.”

A recently granted Google patent uses the metric of a “Long Click” as the center of a process Google may use to track results for queries that were selected by searchers for long visits in a set of search results.

This concept isn’t new. In 2011, I wrote about a Yahoo patent in How a Search Engine May Measure the Quality of Its Search Results, where they discussed a metric that they refer to as a “target page success metric.” It included “dwell time” upon a result as a sign of search success (Yes, search engines have goals, too).

5543947f5bb408.24541747.jpg

Another Google patent described assigning web pages “reachability scores” based upon the quality of pages linked to from those initially visited pages. In the post Does Google Use Reachability Scores in Ranking Resources? I described how a Google patent that might view a long click metric as a sign to see if visitors to that page are engaged by the links to content they find those links pointing to, including links to videos. Google tells us in that patent that it might consider a “long click” to have been made on a video if someone watches at least half the video or 30 seconds of it. The patent suggests that a high reachability score on a page may mean that page could be boosted in Google search results.

554394a877e8c8.30299132.jpg

But the patent I’m writing about today is focused primarily upon looking at and tracking a search success metric like a long click or long dwell time. Here’s the abstract:

Modifying ranking data based on document changes

Invented by Henele I. Adams, and Hyung-Jin Kim

Assigned to Google

US Patent 9,002,867

Granted April 7, 2015

Filed: December 30, 2010

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media for determining a weighted overall quality of result statistic for a document.

One method includes receiving quality of result data for a query and a plurality of versions of a document, determining a weighted overall quality of result statistic for the document with respect to the query including weighting each version specific quality of result statistic and combining the weighted version-specific quality of result statistics, wherein each quality of result statistic is weighted by a weight determined from at least a difference between content of a reference version of the document and content of the version of the document corresponding to the version specific quality of result statistic, and storing the weighted overall quality of result statistic and data associating the query and the document with the weighted overall quality of result statistic.

This patent tells us that search results may be be ranked in an order, according to scores assigned to the search results by a scoring function or process that would be based upon things such as:

  • Where, and how often, query terms appear in the given document,
  • How common the query terms are in the documents indexed by the search engine, or
  • A query-independent measure of quality of the document itself.

Last September, I wrote about how Google might identify a category associated with a query term base upon clicks, in the post Using Query User Data To Classify Queries. In a query for “Lincoln.” the results that appear in response might be about the former US President, the town of Lincoln, Nebraska, and the model of automobile. When someone searches for [Lincoln], Google returning all three of those responses as a top result could be said to be reasonable. The patent I wrote about in that post told us that Google might collect information about “Lincoln” as a search entity, and track which category of results people clicked upon most when they performed that search, to determine what categories of pages to show other searchers. Again, that’s another “search success” based upon a past search history.

There likely is some value in working to find ways to increase the amount of dwell time someone spends upon the pages of your site, if you are already having some success in crafting page titles and snippets that persuade people to click on your pages when they those appear in search results. These approaches can include such things as:

  1. Making visiting your page a positive experience in terms of things like site speed, readability, and scannability.
  2. Making visiting your page a positive experience in terms of things like the quality of the content published on your pages including spelling, grammar, writing style, interest, quality of images, and the links you share to other resources.
  3. Providing a positive experience by offering ideas worth sharing with others, and offering opportunities for commenting and interacting with others, and by being responsive to people who do leave comments.

Here are some resources I found that discuss this long click metric in terms of “dwell time”:

Your ability to create pages that can end up in a “long click” from someone who has come to your site in response to a query, is also a “search success” metric on the search engine’s part, and you both succeed. Just be warned that as the most recent patent from Google on Long Clicks shows us, Google will be watching to make sure that the content of your page doesn’t change too much, and that people are continuing to click upon it in search results, and spend a fair amount to time upon it.

(Images for this post are from my Go Fish Digital Design Lead Devin Holmes @DevinGoFish. Thank you, Devin!)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Inverse Document Frequency and the Importance of Uniqueness

Posted by EricEnge

In my last column, I wrote about how to use term frequency analysis in evaluating your content vs. the competition’s. Term frequency (TF) is only one part of the TF-IDF approach to information retrieval. The other part is inverse document frequency (IDF), which is what I plan to discuss today.

Today’s post will use an explanation of how IDF works to show you the importance of creating content that has true uniqueness. There are reputation and visibility reasons for doing this, and it’s great for users, but there are also SEO benefits.

If you wonder why I am focusing on TF-IDF, consider these words from a Google article from August 2014: “This is the idea of the famous TF-IDF, long used to index web pages.” While the way that Google may apply these concepts is far more than the simple TF-IDF models I am discussing, we can still learn a lot from understanding the basics of how they work.

What is inverse document frequency?

In simple terms, it’s a measure of the rareness of a term. Conceptually, we start by measuring document frequency. It’s easiest to illustrate with an example, as follows:

IDF table

In this example, we see that the word “a” appears in every document in the document set. What this tells us is that it provides no value in telling the documents apart. It’s in everything.

Now look at the word “mobilegeddon.” It appears in 1,000 of the documents, or one thousandth of one percent of them. Clearly, this phrase provides a great deal more differentiation for the documents that contain them.

Document frequency measures commonness, and we prefer to measure rareness. The classic way that this is done is with a formula that looks like this:

idf equation

For each term we are looking at, we take the total number of documents in the document set and divide it by the number of documents containing our term. This gives us more of a measure of rareness. However, we don’t want the resulting calculation to say that the word “mobilegeddon” is 1,000 times more important in distinguishing a document than the word “boat,” as that is too big of a scaling factor.

This is the reason we take the Log Base 10 of the result, to dampen that calculation. For those of you who are not mathematicians, you can loosely think of the Log Base 10 of a number as being a count of the number of zeros – i.e., the Log Base 10 of 1,000,000 is 6, and the log base 10 of 1,000 is 3. So instead of saying that the word “mobilegeddon” is 1,000 times more important, this type of calculation suggests it’s three times more important, which is more in line with what makes sense from a search engine perspective.

With this in mind, here are the IDF values for the terms we looked at before:

idf table logarithm values

Now you can see that we are providing the highest score to the term that is the rarest.

What does the concept of IDF teach us?

Think about IDF as a measure of uniqueness. It helps search engines identify what it is that makes a given document special. This needs to be much more sophisticated than how often you use a given search term (e.g. keyword density).

Think of it this way: If you are one of 6.78 million web sites that comes up for the search query “super bowl 2015,” you are dealing with a crowded playing field. Your chances of ranking for this term based on the quality of your content are pretty much zero.

massive number of results for broad keyword

Overall link authority and other signals will be the only way you can rank for a term that competitive. If you are a new site on the landscape, well, perhaps you should chase something else.

That leaves us with the question of what you should target. How about something unique? Even the addition of a simple word like “predictions”—changing our phrase to “super bowl 2015 predictions”—reduces this playing field to 17,800 results.

Clearly, this is dramatically less competitive already. Slicing into this further, the phrase “super bowl 2015 predictions and odds” returns only 26 pages in Google. See where this is going?

What IDF teaches us is the importance of uniqueness in the content we create. Yes, it will not pay nearly as much money to you as it would if you rank for the big head term, but if your business is a new entrant into a very crowded space, you are not going to rank for the big head term anyway

If you can pick out a smaller number of terms with much less competition and create content around those needs, you can start to rank for these terms and get money flowing into your business. This is because you are making your content more unique by using rarer combinations of terms (leveraging what IDF teaches us).

Summary

People who do keyword analysis are often wired to pursue the major head terms directly, simply based on the available keyword search volume. The result from this approach can, in fact, be pretty dismal.

Understanding how inverse document frequency works helps us understand the importance of standing out. Creating content that brings unique angles to the table is often a very potent way to get your SEO strategy kick-started.

Of course, the reasons for creating content that is highly differentiated and unique go far beyond SEO. This is good for your users, and it’s good for your reputation, visibility, AND also your SEO.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it

Using Term Frequency Analysis to Measure Your Content Quality

Posted by EricEnge

It’s time to look at your content differently—time to start understanding just how good it really is. I am not simply talking about titles, keyword usage, and meta descriptions. I am talking about the entire page experience. In today’s post, I am going to introduce the general concept of content quality analysis, why it should matter to you, and how to use term frequency (TF) analysis to gather ideas on how to improve your content.

TF analysis is usually combined with inverse document frequency analysis (collectively TF-IDF analysis). TF-IDF analysis has been a staple concept for information retrieval science for a long time. You can read more about TF-IDF and other search science concepts in Cyrus Shepard’s
excellent article here.

For purposes of today’s post, I am going to show you how you can use TF analysis to get clues as to what Google is valuing in the content of sites that currently outrank you. But first, let’s get oriented.

Conceptualizing page quality

Start by asking yourself if your page provides a quality experience to people who visit it. For example, if a search engine sends 100 people to your page, how many of them will be happy? Seventy percent? Thirty percent? Less? What if your competitor’s page gets a higher percentage of happy users than yours does? Does that feel like an “uh-oh”?

Let’s think about this with a specific example in mind. What if you ran a golf club site, and 100 people come to your page after searching on a phrase like “golf clubs.” What are the kinds of things they may be looking for?

Here are some things they might want:

  1. A way to buy golf clubs on your site (you would need to see a shopping cart of some sort).
  2. The ability to select specific brands, perhaps by links to other pages about those brands of golf clubs.
  3. Information on how to pick the club that is best for them.
  4. The ability to select specific types of clubs (drivers, putters, irons, etc.). Again, this may be via links to other pages.
  5. A site search box.
  6. Pricing info.
  7. Info on shipping costs.
  8. Expert analysis comparing different golf club brands.
  9. End user reviews of your company so they can determine if they want to do business with you.
  10. How your return policy works.
  11. How they can file a complaint.
  12. Information about your company. Perhaps an “about us” page.
  13. A link to a privacy policy page.
  14. Whether or not you have been “in the news” recently.
  15. Trust symbols that show that you are a reputable organization.
  16. A way to access pages to buy different products, such as golf balls or tees.
  17. Information about specific golf courses.
  18. Tips on how to improve their golf game.

This is really only a partial list, and the specifics of your site can certainly vary for any number of reasons from what I laid out above. So how do you figure out what it is that people really want? You could pull in data from a number of sources. For example, using data from your site search box can be invaluable. You can do user testing on your site. You can conduct surveys. These are all good sources of data.

You can also look at your analytics data to see what pages get visited the most. Just be careful how you use that data. For example, if most of your traffic is from search, this data will be biased by incoming search traffic, and hence what Google chooses to rank. In addition, you may only have a small percentage of the visitors to your site going to your privacy policy, but chances are good that there are significantly more users than that who notice whether or not you have a privacy policy. Many of these will be satisfied just to see that you have one and won’t actually go check it out.

Whatever you do, it’s worth using many of these methods to determine what users want from the pages of your site and then using the resulting information to improve your overall site experience.

Is Google using this type of info as a ranking factor?

At some level, they clearly are. Clearly Google and Bing have evolved far beyond the initial TF-IDF concepts, but we can still use them to better understand our own content.

The first major indication we had that Google was performing content quality analysis was with the release of the
Panda algorithm in February of 2011. More recently, we know that on April 21 Google will release an algorithm that makes the mobile friendliness of a web site a ranking factor. Pure and simple, this algo is about the user experience with a page.

Exactly how Google is performing these measurements is not known, but
what we do know is their intent. They want to make their search engine look good, largely because it helps them make more money. Sending users to pages that make them happy will do that. Google has every incentive to improve the quality of their search results in as many ways as they can.

Ultimately, we don’t actually know what Google is measuring and using. It may be that the only SEO impact of providing pages that satisfy a very high percentage of users is an indirect one. I.e., so many people like your site that it gets written about more, linked to more, has tons of social shares, gets great engagement, that Google sees other signals that it uses as ranking factors, and this is why your rankings improve.

But, do I care if the impact is a direct one or an indirect one? Well, NO.

Using TF analysis to evaluate your page

TF-IDF analysis is more about relevance than content quality, but we can still use various precepts from it to help us understand our own content quality. One way to do this is to compare the results of a TF analysis of all the keywords on your page with those pages that currently outrank you in the search results. In this section, I am going to outline the basic concepts for how you can do this. In the next section I will show you a process that you can use with publicly available tools and a spreadsheet.

The simplest form of TF analysis is to count the number of uses of each keyword on a page. However, the problem with that is that a page using a keyword 10 times will be seen as 10 times more valuable than a page that uses a keyword only once. For that reason, we dampen the calculations. I have seen two methods for doing this, as follows:

term frequency calculation

The first method relies on dividing the number of repetitions of a keyword by the count for the most popular word on the entire page. Basically, what this does is eliminate the inherent advantage that longer documents might otherwise have over shorter ones. The second method dampens the total impact in a different way, by taking the log base 10 for the actual keyword count. Both of these achieve the effect of still valuing incremental uses of a keyword, but dampening it substantially. I prefer to use method 1, but you can use either method for our purposes here.

Once you have the TF calculated for every different keyword found on your page, you can then start to do the same analysis for pages that outrank you for a given search term. If you were to do this for five competing pages, the result might look something like this:

term frequency spreadsheet

I will show you how to set up the spreadsheet later, but for now, let’s do the fun part, which is to figure out how to analyze the results. Here are some of the things to look for:

  1. Are there any highly related words that all or most of your competitors are using that you don’t use at all?
  2. Are there any such words that you use significantly less, on average, than your competitors?
  3. Also look for words that you use significantly more than competitors.

You can then tag these words for further analysis. Once you are done, your spreadsheet may now look like this:

second stage term frequency analysis spreadsheet

In order to make this fit into this screen shot above and keep it legibly, I eliminated some columns you saw in my first spreadsheet. However, I did a sample analysis for the movie “Woman in Gold”. You can see the
full spreadsheet of calculations here. Note that we used an automated approach to marking some items at “Low Ratio,” “High Ratio,” or “All Competitors Have, Client Does Not.”

None of these flags by themselves have meaning, so you now need to put all of this into context. In our example, the following words probably have no significance at all: “get”, “you”, “top”, “see”, “we”, “all”, “but”, and other words of this type. These are just very basic English language words.

But, we can see other things of note relating to the target page (a.k.a. the client page):

  1. It’s missing any mention of actor ryan reynolds
  2. It’s missing any mention of actor helen mirren
  3. The page has no reviews
  4. Words like “family” and “story” are not mentioned
  5. “Austrian” and “maria altmann” are not used at all
  6. The phrase “woman in gold” and words “billing” and “info” are used proportionally more than they are with the other pages

Note that the last item is only visible if you open
the spreadsheet. The issues above could well be significant, as the lead actors, reviews, and other indications that the page has in-depth content. We see that competing pages that rank have details of the story, so that’s an indication that this is what Google (and users) are looking for. The fact that the main key phrase, and the word “billing”, are used to a proportionally high degree also makes it seem a bit spammy.

In fact, if you look at the information closely, you can see that the target page is quite thin in overall content. So much so, that it almost looks like a doorway page. In fact, it looks like it was put together by the movie studio itself, just not very well, as it presents little in the way of a home page experience that would cause it to rank for the name of the movie!

In the many different times I have done an analysis using these methods, I’ve been able to make many different types of observations about pages. A few of the more interesting ones include:

  1. A page that had no privacy policy, yet was taking personally identifiable info from users.
  2. A major lack of important synonyms that would indicate a real depth of available content.
  3. Comparatively low Domain Authority competitors ranking with in-depth content.

These types of observations are interesting and valuable, but it’s important to stress that you shouldn’t be overly mechanical about this. The value in this type of analysis is that it gives you a technical way to compare the content on your page with that of your competitors. This type of analysis should be used in combination with other methods that you use for evaluating that same page. I’ll address this some more in the summary section of this below.

How do you execute this for yourself?

The
full spreadsheet contains all the formulas so all you need to do is link in the keyword count data. I have tried this with two different keyword density tools, the one from Searchmetrics, and this one from motoricerca.info.

I am not endorsing these tools, and I have no financial interest in either one—they just seemed to work fairly well for the process I outlined above. To provide the data in the right format, please do the following:

  1. Run all the URLs you are testing through the keyword density tool.
  2. Copy and paste all the one word, two word, and three word results into a tab on the spreadsheet.
  3. Sort them all so you get total word counts aligned by position as I have shown in the linked spreadsheet.
  4. Set up the formulas as I did in the demo spreadsheet (you can just use the demo spreadsheet).
  5. Then do your analysis!

This may sound a bit tedious (and it is), but it has worked very well for us at STC.

Summary

You can also use usability groups and a number of other methods to figure out what users are really looking for on your site. However, what this does is give us a look at what Google has chosen to rank the highest in its search results. Don’t treat this as some sort of magic formula where you mechanically tweak the content to get better metrics in this analysis.

Instead, use this as a method for slicing into your content to better see it the way a machine might see it. It can yield some surprising (and wonderful) insights!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 4 years ago from tracking.feedpress.it