Optimizing AngularJS Single-Page Applications for Googlebot Crawlers

Posted by jrridley

It’s almost certain that you’ve encountered AngularJS on the web somewhere, even if you weren’t aware of it at the time. Here’s a list of just a few sites using Angular:

  • Upwork.com
  • Freelancer.com
  • Udemy.com
  • Youtube.com

Any of those look familiar? If so, it’s because AngularJS is taking over the Internet. There’s a good reason for that: Angular- and other React-style frameworks make for a better user and developer experience on a site. For background, AngularJS and ReactJS are part of a web design movement called single-page applications, or SPAs. While a traditional website loads each individual page as the user navigates the site, including calls to the server and cache, loading resources, and rendering the page, SPAs cut out much of the back-end activity by loading the entire site when a user first lands on a page. Instead of loading a new page each time you click on a link, the site dynamically updates a single HTML page as the user interacts with the site.

Image c/o Microsoft

Why is this movement taking over the Internet? With SPAs, users are treated to a screaming fast site through which they can navigate almost instantaneously, while developers have a template that allows them to customize, test, and optimize pages seamlessly and efficiently. AngularJS and ReactJS use advanced Javascript templates to render the site, which means the HTML/CSS page speed overhead is almost nothing. All site activity runs behind the scenes, out of view of the user.

Unfortunately, anyone who’s tried performing SEO on an Angular or React site knows that the site activity is hidden from more than just site visitors: it’s also hidden from web crawlers. Crawlers like Googlebot rely heavily on HTML/CSS data to render and interpret the content on a site. When that HTML content is hidden behind website scripts, crawlers have no website content to index and serve in search results.

Of course, Google claims they can crawl Javascript (and SEOs have tested and supported this claim), but even if that is true, Googlebot still struggles to crawl sites built on a SPA framework. One of the first issues we encountered when a client first approached us with an Angular site was that nothing beyond the homepage was appearing in the SERPs. ScreamingFrog crawls uncovered the homepage and a handful of other Javascript resources, and that was it.

SF Javascript.png

Another common issue is recording Google Analytics data. Think about it: Analytics data is tracked by recording pageviews every time a user navigates to a page. How can you track site analytics when there’s no HTML response to trigger a pageview?

After working with several clients on their SPA websites, we’ve developed a process for performing SEO on those sites. By using this process, we’ve not only enabled SPA sites to be indexed by search engines, but even to rank on the first page for keywords.

5-step solution to SEO for AngularJS

  1. Make a list of all pages on the site
  2. Install Prerender
  3. “Fetch as Google”
  4. Configure Analytics
  5. Recrawl the site

1) Make a list of all pages on your site

If this sounds like a long and tedious process, that’s because it definitely can be. For some sites, this will be as easy as exporting the XML sitemap for the site. For other sites, especially those with hundreds or thousands of pages, creating a comprehensive list of all the pages on the site can take hours or days. However, I cannot emphasize enough how helpful this step has been for us. Having an index of all pages on the site gives you a guide to reference and consult as you work on getting your site indexed. It’s almost impossible to predict every issue that you’re going to encounter with an SPA, and if you don’t have an all-inclusive list of content to reference throughout your SEO optimization, it’s highly likely you’ll leave some part of the site un-indexed by search engines inadvertently.

One solution that might enable you to streamline this process is to divide content into directories instead of individual pages. For example, if you know that you have a list of storeroom pages, include your /storeroom/ directory and make a note of how many pages that includes. Or if you have an e-commerce site, make a note of how many products you have in each shopping category and compile your list that way (though if you have an e-commerce site, I hope for your own sake you have a master list of products somewhere). Regardless of what you do to make this step less time-consuming, make sure you have a full list before continuing to step 2.

2) Install Prerender

Prerender is going to be your best friend when performing SEO for SPAs. Prerender is a service that will render your website in a virtual browser, then serve the static HTML content to web crawlers. From an SEO standpoint, this is as good of a solution as you can hope for: users still get the fast, dynamic SPA experience while search engine crawlers can identify indexable content for search results.

Prerender’s pricing varies based on the size of your site and the freshness of the cache served to Google. Smaller sites (up to 250 pages) can use Prerender for free, while larger sites (or sites that update constantly) may need to pay as much as $200+/month. However, having an indexable version of your site that enables you to attract customers through organic search is invaluable. This is where that list you compiled in step 1 comes into play: if you can prioritize what sections of your site need to be served to search engines, or with what frequency, you may be able to save a little bit of money each month while still achieving SEO progress.

3) “Fetch as Google”

Within Google Search Console is an incredibly useful feature called “Fetch as Google.” “Fetch as Google” allows you to enter a URL from your site and fetch it as Googlebot would during a crawl. “Fetch” returns the HTTP response from the page, which includes a full download of the page source code as Googlebot sees it. “Fetch and Render” will return the HTTP response and will also provide a screenshot of the page as Googlebot saw it and as a site visitor would see it.

This has powerful applications for AngularJS sites. Even with Prerender installed, you may find that Google is still only partially displaying your website, or it may be omitting key features of your site that are helpful to users. Plugging the URL into “Fetch as Google” will let you review how your site appears to search engines and what further steps you may need to take to optimize your keyword rankings. Additionally, after requesting a “Fetch” or “Fetch and Render,” you have the option to “Request Indexing” for that page, which can be handy catalyst for getting your site to appear in search results.

4) Configure Google Analytics (or Google Tag Manager)

As I mentioned above, SPAs can have serious trouble with recording Google Analytics data since they don’t track pageviews the way a standard website does. Instead of the traditional Google Analytics tracking code, you’ll need to install Analytics through some kind of alternative method.

One method that works well is to use the Angulartics plugin. Angulartics replaces standard pageview events with virtual pageview tracking, which tracks the entire user navigation across your application. Since SPAs dynamically load HTML content, these virtual pageviews are recorded based on user interactions with the site, which ultimately tracks the same user behavior as you would through traditional Analytics. Other people have found success using Google Tag Manager “History Change” triggers or other innovative methods, which are perfectly acceptable implementations. As long as your Google Analytics tracking records user interactions instead of conventional pageviews, your Analytics configuration should suffice.

5) Recrawl the site

After working through steps 1–4, you’re going to want to crawl the site yourself to find those errors that not even Googlebot was anticipating. One issue we discovered early with a client was that after installing Prerender, our crawlers were still running into a spider trap:

As you can probably tell, there were not actually 150,000 pages on that particular site. Our crawlers just found a recursive loop that kept generating longer and longer URL strings for the site content. This is something we would not have found in Google Search Console or Analytics. SPAs are notorious for causing tedious, inexplicable issues that you’ll only uncover by crawling the site yourself. Even if you follow the steps above and take as many precautions as possible, I can still almost guarantee you will come across a unique issue that can only be diagnosed through a crawl.

If you’ve come across any of these unique issues, let me know in the comments! I’d love to hear what other issues people have encountered with SPAs.

Results

As I mentioned earlier in the article, the process outlined above has enabled us to not only get client sites indexed, but even to get those sites ranking on first page for various keywords. Here’s an example of the keyword progress we made for one client with an AngularJS site:

Also, the organic traffic growth for that client over the course of seven months:

All of this goes to show that although SEO for SPAs can be tedious, laborious, and troublesome, it is not impossible. Follow the steps above, and you can have SEO success with your single-page app website.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 1 year ago from tracking.feedpress.it

SearchCap: Chrome without GoogleBot, stacked PPC bidding & local SEO

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Chrome without GoogleBot, stacked PPC bidding & local SEO appeared first on Search Engine Land.

Please visit Search Engine Land for the full article.

Reblogged 1 year ago from feeds.searchengineland.com

Distance from Perfect

Posted by wrttnwrd

In spite of all the advice, the strategic discussions and the conference talks, we Internet marketers are still algorithmic thinkers. That’s obvious when you think of SEO.

Even when we talk about content, we’re algorithmic thinkers. Ask yourself: How many times has a client asked you, “How much content do we need?” How often do you still hear “How unique does this page need to be?”

That’s 100% algorithmic thinking: Produce a certain amount of content, move up a certain number of spaces.

But you and I know it’s complete bullshit.

I’m not suggesting you ignore the algorithm. You should definitely chase it. Understanding a little bit about what goes on in Google’s pointy little head helps. But it’s not enough.

A tale of SEO woe that makes you go “whoa”

I have this friend.

He ranked #10 for “flibbergibbet.” He wanted to rank #1.

He compared his site to the #1 site and realized the #1 site had five hundred blog posts.

“That site has five hundred blog posts,” he said, “I must have more.”

So he hired a few writers and cranked out five thousand blogs posts that melted Microsoft Word’s grammar check. He didn’t move up in the rankings. I’m shocked.

“That guy’s spamming,” he decided, “I’ll just report him to Google and hope for the best.”

What happened? Why didn’t adding five thousand blog posts work?

It’s pretty obvious: My, uh, friend added nothing but crap content to a site that was already outranked. Bulk is no longer a ranking tactic. Google’s very aware of that tactic. Lots of smart engineers have put time into updates like Panda to compensate.

He started like this:

And ended up like this:
more posts, no rankings

Alright, yeah, I was Mr. Flood The Site With Content, way back in 2003. Don’t judge me, whippersnappers.

Reality’s never that obvious. You’re scratching and clawing to move up two spots, you’ve got an overtasked IT team pushing back on changes, and you’ve got a boss who needs to know the implications of every recommendation.

Why fix duplication if rel=canonical can address it? Fixing duplication will take more time and cost more money. It’s easier to paste in one line of code. You and I know it’s better to fix the duplication. But it’s a hard sell.

Why deal with 302 versus 404 response codes and home page redirection? The basic user experience remains the same. Again, we just know that a server should return one home page without any redirects and that it should send a ‘not found’ 404 response if a page is missing. If it’s going to take 3 developer hours to reconfigure the server, though, how do we justify it? There’s no flashing sign reading “Your site has a problem!”

Why change this thing and not that thing?

At the same time, our boss/client sees that the site above theirs has five hundred blog posts and thousands of links from sites selling correspondence MBAs. So they want five thousand blog posts and cheap links as quickly as possible.

Cue crazy music.

SEO lacks clarity

SEO is, in some ways, for the insane. It’s an absurd collection of technical tweaks, content thinking, link building and other little tactics that may or may not work. A novice gets exposed to one piece of crappy information after another, with an occasional bit of useful stuff mixed in. They create sites that repel search engines and piss off users. They get more awful advice. The cycle repeats. Every time it does, best practices get more muddled.

SEO lacks clarity. We can’t easily weigh the value of one change or tactic over another. But we can look at our changes and tactics in context. When we examine the potential of several changes or tactics before we flip the switch, we get a closer balance between algorithm-thinking and actual strategy.

Distance from perfect brings clarity to tactics and strategy

At some point you have to turn that knowledge into practice. You have to take action based on recommendations, your knowledge of SEO, and business considerations.

That’s hard when we can’t even agree on subdomains vs. subfolders.

I know subfolders work better. Sorry, couldn’t resist. Let the flaming comments commence.

To get clarity, take a deep breath and ask yourself:

“All other things being equal, will this change, tactic, or strategy move my site closer to perfect than my competitors?”

Breaking it down:

“Change, tactic, or strategy”

A change takes an existing component or policy and makes it something else. Replatforming is a massive change. Adding a new page is a smaller one. Adding ALT attributes to your images is another example. Changing the way your shopping cart works is yet another.

A tactic is a specific, executable practice. In SEO, that might be fixing broken links, optimizing ALT attributes, optimizing title tags or producing a specific piece of content.

A strategy is a broader decision that’ll cause change or drive tactics. A long-term content policy is the easiest example. Shifting away from asynchronous content and moving to server-generated content is another example.

“Perfect”

No one knows exactly what Google considers “perfect,” and “perfect” can’t really exist, but you can bet a perfect web page/site would have all of the following:

  1. Completely visible content that’s perfectly relevant to the audience and query
  2. A flawless user experience
  3. Instant load time
  4. Zero duplicate content
  5. Every page easily indexed and classified
  6. No mistakes, broken links, redirects or anything else generally yucky
  7. Zero reported problems or suggestions in each search engines’ webmaster tools, sorry, “Search Consoles”
  8. Complete authority through immaculate, organically-generated links

These 8 categories (and any of the other bazillion that probably exist) give you a way to break down “perfect” and help you focus on what’s really going to move you forward. These different areas may involve different facets of your organization.

Your IT team can work on load time and creating an error-free front- and back-end. Link building requires the time and effort of content and outreach teams.

Tactics for relevant, visible content and current best practices in UX are going to be more involved, requiring research and real study of your audience.

What you need and what resources you have are going to impact which tactics are most realistic for you.

But there’s a basic rule: If a website would make Googlebot swoon and present zero obstacles to users, it’s close to perfect.

“All other things being equal”

Assume every competing website is optimized exactly as well as yours.

Now ask: Will this [tactic, change or strategy] move you closer to perfect?

That’s the “all other things being equal” rule. And it’s an incredibly powerful rubric for evaluating potential changes before you act. Pretend you’re in a tie with your competitors. Will this one thing be the tiebreaker? Will it put you ahead? Or will it cause you to fall behind?

“Closer to perfect than my competitors”

Perfect is great, but unattainable. What you really need is to be just a little perfect-er.

Chasing perfect can be dangerous. Perfect is the enemy of the good (I love that quote. Hated Voltaire. But I love that quote). If you wait for the opportunity/resources to reach perfection, you’ll never do anything. And the only way to reduce distance from perfect is to execute.

Instead of aiming for pure perfection, aim for more perfect than your competitors. Beat them feature-by-feature, tactic-by-tactic. Implement strategy that supports long-term superiority.

Don’t slack off. But set priorities and measure your effort. If fixing server response codes will take one hour and fixing duplication will take ten, fix the response codes first. Both move you closer to perfect. Fixing response codes may not move the needle as much, but it’s a lot easier to do. Then move on to fixing duplicates.

Do the 60% that gets you a 90% improvement. Then move on to the next thing and do it again. When you’re done, get to work on that last 40%. Repeat as necessary.

Take advantage of quick wins. That gives you more time to focus on your bigger solutions.

Sites that are “fine” are pretty far from perfect

Google has lots of tweaks, tools and workarounds to help us mitigate sub-optimal sites:

  • Rel=canonical lets us guide Google past duplicate content rather than fix it
  • HTML snapshots let us reveal content that’s delivered using asynchronous content and JavaScript frameworks
  • We can use rel=next and prev to guide search bots through outrageously long pagination tunnels
  • And we can use rel=nofollow to hide spammy links and banners

Easy, right? All of these solutions may reduce distance from perfect (the search engines don’t guarantee it). But they don’t reduce it as much as fixing the problems.
Just fine does not equal fixed

The next time you set up rel=canonical, ask yourself:

“All other things being equal, will using rel=canonical to make up for duplication move my site closer to perfect than my competitors?”

Answer: Not if they’re using rel=canonical, too. You’re both using imperfect solutions that force search engines to crawl every page of your site, duplicates included. If you want to pass them on your way to perfect, you need to fix the duplicate content.

When you use Angular.js to deliver regular content pages, ask yourself:

“All other things being equal, will using HTML snapshots instead of actual, visible content move my site closer to perfect than my competitors?”

Answer: No. Just no. Not in your wildest, code-addled dreams. If I’m Google, which site will I prefer? The one that renders for me the same way it renders for users? Or the one that has to deliver two separate versions of every page?

When you spill banner ads all over your site, ask yourself…

You get the idea. Nofollow is better than follow, but banner pollution is still pretty dang far from perfect.

Mitigating SEO issues with search engine-specific tools is “fine.” But it’s far, far from perfect. If search engines are forced to choose, they’ll favor the site that just works.

Not just SEO

By the way, distance from perfect absolutely applies to other channels.

I’m focusing on SEO, but think of other Internet marketing disciplines. I hear stuff like “How fast should my site be?” (Faster than it is right now.) Or “I’ve heard you shouldn’t have any content below the fold.” (Maybe in 2001.) Or “I need background video on my home page!” (Why? Do you have a reason?) Or, my favorite: “What’s a good bounce rate?” (Zero is pretty awesome.)

And Internet marketing venues are working to measure distance from perfect. Pay-per-click marketing has the quality score: A codified financial reward applied for seeking distance from perfect in as many elements as possible of your advertising program.

Social media venues are aggressively building their own forms of graphing, scoring and ranking systems designed to separate the good from the bad.

Really, all marketing includes some measure of distance from perfect. But no channel is more influenced by it than SEO. Instead of arguing one rule at a time, ask yourself and your boss or client: Will this move us closer to perfect?

Hell, you might even please a customer or two.

One last note for all of the SEOs in the crowd. Before you start pointing out edge cases, consider this: We spend our days combing Google for embarrassing rankings issues. Every now and then, we find one, point, and start yelling “SEE! SEE!!!! THE GOOGLES MADE MISTAKES!!!!” Google’s got lots of issues. Screwing up the rankings isn’t one of them.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

How to Use Server Log Analysis for Technical SEO

Posted by SamuelScott

It’s ten o’clock. Do you know where your logs are?

I’m introducing this guide with a pun on a common public-service announcement that has run on late-night TV news broadcasts in the United States because log analysis is something that is extremely newsworthy and important.

If your technical and on-page SEO is poor, then nothing else that you do will matter. Technical SEO is the key to helping search engines to crawl, parse, and index websites, and thereby rank them appropriately long before any marketing work begins.

The important thing to remember: Your log files contain the only data that is 100% accurate in terms of how search engines are crawling your website. By helping Google to do its job, you will set the stage for your future SEO work and make your job easier. Log analysis is one facet of technical SEO, and correcting the problems found in your logs will help to lead to higher rankings, more traffic, and more conversions and sales.

Here are just a few reasons why:

  • Too many response code errors may cause Google to reduce its crawling of your website and perhaps even your rankings.
  • You want to make sure that search engines are crawling everything, new and old, that you want to appear and rank in the SERPs (and nothing else).
  • It’s crucial to ensure that all URL redirections will pass along any incoming “link juice.”

However, log analysis is something that is unfortunately discussed all too rarely in SEO circles. So, here, I wanted to give the Moz community an introductory guide to log analytics that I hope will help. If you have any questions, feel free to ask in the comments!

What is a log file?

Computer servers, operating systems, network devices, and computer applications automatically generate something called a log entry whenever they perform an action. In a SEO and digital marketing context, one type of action is whenever a page is requested by a visiting bot or human.

Server log entries are specifically programmed to be output in the Common Log Format of the W3C consortium. Here is one example from Wikipedia with my accompanying explanations:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • 127.0.0.1 — The remote hostname. An IP address is shown, like in this example, whenever the DNS hostname is not available or DNSLookup is turned off.
  • user-identifier — The remote logname / RFC 1413 identity of the user. (It’s not that important.)
  • frank — The user ID of the person requesting the page. Based on what I see in my Moz profile, Moz’s log entries would probably show either “SamuelScott” or “392388” whenever I visit a page after having logged in.
  • [10/Oct/2000:13:55:36 -0700] — The date, time, and timezone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP/1.0 — “GET” is one of the two commands (the other is “POST”) that can be performed. “GET” fetches a URL while “POST” is submitting something (such as a forum comment). The second part is the URL that is being accessed, and the last part is the version of HTTP that is being accessed.
  • 200 — The status code of the document that was returned.
  • 2326 — The size, in bytes, of the document that was returned.

Note: A hyphen is shown in a field when that information is unavailable.

Every single time that you — or the Googlebot — visit a page on a website, a line with this information is output, recorded, and stored by the server.

Log entries are generated continuously and anywhere from several to thousands can be created every second — depending on the level of a given server, network, or application’s activity. A collection of log entries is called a log file (or often in slang, “the log” or “the logs”), and it is displayed with the most-recent log entry at the bottom. Individual log files often contain a calendar day’s worth of log entries.

Accessing your log files

Different types of servers store and manage their log files differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers:

What is log analysis?

Log analysis (or log analytics) is the process of going through log files to learn something from the data. Some common reasons include:

  • Development and quality assurance (QA) — Creating a program or application and checking for problematic bugs to make sure that it functions properly
  • Network troubleshooting — Responding to and fixing system errors in a network
  • Customer service — Determining what happened when a customer had a problem with a technical product
  • Security issues — Investigating incidents of hacking and other intrusions
  • Compliance matters — Gathering information in response to corporate or government policies
  • Technical SEO — This is my favorite! More on that in a bit.

Log analysis is rarely performed regularly. Usually, people go into log files only in response to something — a bug, a hack, a subpoena, an error, or a malfunction. It’s not something that anyone wants to do on an ongoing basis.

Why? This is a screenshot of ours of just a very small part of an original (unstructured) log file:

Ouch. If a website gets 10,000 visitors who each go to ten pages per day, then the server will create a log file every day that will consist of 100,000 log entries. No one has the time to go through all of that manually.

How to do log analysis

There are three general ways to make log analysis easier in SEO or any other context:

  • Do-it-yourself in Excel
  • Proprietary software such as Splunk or Sumo-logic
  • The ELK Stack open-source software

Tim Resnik’s Moz essay from a few years ago walks you through the process of exporting a batch of log files into Excel. This is a (relatively) quick and easy way to do simple log analysis, but the downside is that one will see only a snapshot in time and not any overall trends. To obtain the best data, it’s crucial to use either proprietary tools or the ELK Stack.

Splunk and Sumo-Logic are proprietary log analysis tools that are primarily used by enterprise companies. The ELK Stack is a free and open-source batch of three platforms (Elasticsearch, Logstash, and Kibana) that is owned by Elastic and used more often by smaller businesses. (Disclosure: We at Logz.io use the ELK Stack to monitor our own internal systems as well as for the basis of our own log management software.)

For those who are interested in using this process to do technical SEO analysis, monitor system or application performance, or for any other reason, our CEO, Tomer Levy, has written a guide to deploying the ELK Stack.

Technical SEO insights in log data

However you choose to access and understand your log data, there are many important technical SEO issues to address as needed. I’ve included screenshots of our technical SEO dashboard with our own website’s data to demonstrate what to examine in your logs.

Bot crawl volume

It’s important to know the number of requests made by Baidu, BingBot, GoogleBot, Yahoo, Yandex, and others over a given period time. If, for example, you want to get found in search in Russia but Yandex is not crawling your website, that is a problem. (You’d want to consult Yandex Webmaster and see this article on Search Engine Land.)

Response code errors

Moz has a great primer on the meanings of the different status codes. I have an alert system setup that tells me about 4XX and 5XX errors immediately because those are very significant.

Temporary redirects

Temporary 302 redirects do not pass along the “link juice” of external links from the old URL to the new one. Almost all of the time, they should be changed to permanent 301 redirects.

Crawl budget waste

Google assigns a crawl budget to each website based on numerous factors. If your crawl budget is, say, 100 pages per day (or the equivalent amount of data), then you want to be sure that all 100 are things that you want to appear in the SERPs. No matter what you write in your robots.txt file and meta-robots tags, you might still be wasting your crawl budget on advertising landing pages, internal scripts, and more. The logs will tell you — I’ve outlined two script-based examples in red above.

If you hit your crawl limit but still have new content that should be indexed to appear in search results, Google may abandon your site before finding it.

Duplicate URL crawling

The addition of URL parameters — typically used in tracking for marketing purposes — often results in search engines wasting crawl budgets by crawling different URLs with the same content. To learn how to address this issue, I recommend reading the resources on Google and Search Engine Land here, here, here, and here.

Crawl priority

Google might be ignoring (and not crawling or indexing) a crucial page or section of your website. The logs will reveal what URLs and/or directories are getting the most and least attention. If, for example, you have published an e-book that attempts to rank for targeted search queries but it sits in a directory that Google only visits once every six months, then you won’t get any organic search traffic from the e-book for up to six months.

If a part of your website is not being crawled very often — and it is updated often enough that it should be — then you might need to check your internal-linking structure and the crawl-priority settings in your XML sitemap.

Last crawl date

Have you uploaded something that you hope will be indexed quickly? The log files will tell you when Google has crawled it.

Crawl budget

One thing I personally like to check and see is Googlebot’s real-time activity on our site because the crawl budget that the search engine assigns to a website is a rough indicator — a very rough one — of how much it “likes” your site. Google ideally does not want to waste valuable crawling time on a bad website. Here, I had seen that Googlebot had made 154 requests of our new startup’s website over the prior twenty-four hours. Hopefully, that number will go up!

As I hope you can see, log analysis is critically important in technical SEO. It’s eleven o’clock — do you know where your logs are now?

Additional resources

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from tracking.feedpress.it

Technical Site Audit Checklist: 2015 Edition

Posted by GeoffKenyon

Back in 2011, I wrote a technical site audit checklist, and while it was thorough, there have been a lot of additions to what is encompassed in a site audit. I have gone through and updated that old checklist for 2015. Some of the biggest changes were the addition of sections for mobile, international, and site speed.

This checklist should help you put together a thorough site audit and determine what is holding back the organic performance of your site. At the end of your audit, don’t write a document that says what’s wrong with the website. Instead, create a document that says what needs to be done. Then explain why these actions need to be taken and why they are important. What I’ve found to really helpful is to provide a prioritized list along with your document of all the actions that you would like them to implement. This list can be handed off to a dev or content team to be implemented easily. These teams can refer to your more thorough document as needed.


Quick overview

Check indexed pages  
  • Do a site: search.
  • How many pages are returned? (This can be way off so don’t put too much stock in this).
  • Is the homepage showing up as the first result? 
  • If the homepage isn’t showing up as the first result, there could be issues, like a penalty or poor site architecture/internal linking, affecting the site. This may be less of a concern as Google’s John Mueller recently said that your homepage doesn’t need to be listed first.

Review the number of organic landing pages in Google Analytics

  • Does this match with the number of results in a site: search?
  • This is often the best view of how many pages are in a search engine’s index that search engines find valuable.

Search for the brand and branded terms

  • Is the homepage showing up at the top, or are correct pages showing up?
  • If the proper pages aren’t showing up as the first result, there could be issues, like a penalty, in play.
Check Google’s cache for key pages
  • Is the content showing up?
  • Are navigation links present?
  • Are there links that aren’t visible on the site?
PRO Tip:
Don’t forget to check the text-only version of the cached page. Here is a
bookmarklet to help you do that.

Do a mobile search for your brand and key landing pages

  • Does your listing have the “mobile friendly” label?
  • Are your landing pages mobile friendly?
  • If the answer is no to either of these, it may be costing you organic visits.

On-page optimization

Title tags are optimized
  • Title tags should be optimized and unique.
  • Your brand name should be included in your title tag to improve click-through rates.
  • Title tags are about 55-60 characters (512 pixels) to be fully displayed. You can test here or review title pixel widths in Screaming Frog.
Important pages have click-through rate optimized titles and meta descriptions
  • This will help improve your organic traffic independent of your rankings.
  • You can use SERP Turkey for this.

Check for pages missing page titles and meta descriptions
  
The on-page content includes the primary keyword phrase multiple times as well as variations and alternate keyword phrases
  
There is a significant amount of optimized, unique content on key pages
 
The primary keyword phrase is contained in the H1 tag
  

Images’ file names and alt text are optimized to include the primary keyword phrase associated with the page.
 
URLs are descriptive and optimized
  • While it is beneficial to include your keyword phrase in URLs, changing your URLs can negatively impact traffic when you do a 301. As such, I typically recommend optimizing URLs when the current ones are really bad or when you don’t have to change URLs with existing external links.
Clean URLs
  • No excessive parameters or session IDs.
  • URLs exposed to search engines should be static.
Short URLs
  • 115 characters or shorter – this character limit isn’t set in stone, but shorter URLs are better for usability.

Content

Homepage content is optimized
  • Does the homepage have at least one paragraph?
  • There has to be enough content on the page to give search engines an understanding of what a page is about. Based on my experience, I typically recommend at least 150 words.
Landing pages are optimized
  • Do these pages have at least a few paragraphs of content? Is it enough to give search engines an understanding of what the page is about?
  • Is it template text or is it completely unique?
Site contains real and substantial content
  • Is there real content on the site or is the “content” simply a list of links?
Proper keyword targeting
  • Does the intent behind the keyword match the intent of the landing page?
  • Are there pages targeting head terms, mid-tail, and long-tail keywords?
Keyword cannibalization
  • Do a site: search in Google for important keyword phrases.
  • Check for duplicate content/page titles using the Moz Pro Crawl Test.
Content to help users convert exists and is easily accessible to users
  • In addition to search engine driven content, there should be content to help educate users about the product or service.
Content formatting
  • Is the content formatted well and easy to read quickly?
  • Are H tags used?
  • Are images used?
  • Is the text broken down into easy to read paragraphs?
Good headlines on blog posts
  • Good headlines go a long way. Make sure the headlines are well written and draw users in.
Amount of content versus ads
  • Since the implementation of Panda, the amount of ad-space on a page has become important to evaluate.
  • Make sure there is significant unique content above the fold.
  • If you have more ads than unique content, you are probably going to have a problem.

Duplicate content

There should be one URL for each piece of content
  • Do URLs include parameters or tracking code? This will result in multiple URLs for a piece of content.
  • Does the same content reside on completely different URLs? This is often due to products/content being replicated across different categories.
Pro Tip:
Exclude common parameters, such as those used to designate tracking code, in Google Webmaster Tools. Read more at
Search Engine Land.
Do a search to check for duplicate content
  • Take a content snippet, put it in quotes and search for it.
  • Does the content show up elsewhere on the domain?
  • Has it been scraped? If the content has been scraped, you should file a content removal request with Google.
Sub-domain duplicate content
  • Does the same content exist on different sub-domains?
Check for a secure version of the site
  • Does the content exist on a secure version of the site?
Check other sites owned by the company
  • Is the content replicated on other domains owned by the company?
Check for “print” pages
  • If there are “printer friendly” versions of pages, they may be causing duplicate content.

Accessibility & Indexation

Check the robots.txt

  • Has the entire site, or important content been blocked? Is link equity being orphaned due to pages being blocked via the robots.txt?

Turn off JavaScript, cookies, and CSS

Now change your user agent to Googlebot

PRO Tip:
Use
SEO Browser to do a quick spot check.

Check the SEOmoz PRO Campaign

  • Check for 4xx errors and 5xx errors.

XML sitemaps are listed in the robots.txt file

XML sitemaps are submitted to Google/Bing Webmaster Tools

Check pages for meta robots noindex tag

  • Are pages accidentally being tagged with the meta robots noindex command
  • Are there pages that should have the noindex command applied
  • You can check the site quickly via a crawl tool such as Moz or Screaming Frog

Do goal pages have the noindex command applied?

  • This is important to prevent direct organic visits from showing up as goals in analytics

Site architecture and internal linking

Number of links on a page
Vertical linking structures are in place
  • Homepage links to category pages.
  • Category pages link to sub-category and product pages as appropriate.
  • Product pages link to relevant category pages.
Horizontal linking structures are in place
  • Category pages link to other relevant category pages.
  • Product pages link to other relevant product pages.
Links are in content
  • Does not utilize massive blocks of links stuck in the content to do internal linking.
Footer links
  • Does not use a block of footer links instead of proper navigation.
  • Does not link to landing pages with optimized anchors.
Good internal anchor text
 
Check for broken links
  • Link Checker and Xenu are good tools for this.

Technical issues

Proper use of 301s
  • Are 301s being used for all redirects?
  • If the root is being directed to a landing page, are they using a 301 instead of a 302?
  • Use Live HTTP Headers Firefox plugin to check 301s.
“Bad” redirects are avoided
  • These include 302s, 307s, meta refresh, and JavaScript redirects as they pass little to no value.
  • These redirects can easily be identified with a tool like Screaming Frog.
Redirects point directly to the final URL and do not leverage redirect chains
  • Redirect chains significantly diminish the amount of link equity associated with the final URL.
  • Google has said that they will stop following a redirect chain after several redirects.
Use of JavaScript
  • Is content being served in JavaScript?
  • Are links being served in JavaScript? Is this to do PR sculpting or is it accidental?
Use of iFrames
  • Is content being pulled in via iFrames?
Use of Flash
  • Is the entire site done in Flash, or is Flash used sparingly in a way that doesn’t hinder crawling?
Check for errors in Google Webmaster Tools
  • Google WMT will give you a good list of technical problems that they are encountering on your site (such as: 4xx and 5xx errors, inaccessible pages in the XML sitemap, and soft 404s)
XML Sitemaps  
  • Are XML sitemaps in place?
  • Are XML sitemaps covering for poor site architecture?
  • Are XML sitemaps structured to show indexation problems?
  • Do the sitemaps follow proper XML protocols
Canonical version of the site established through 301s
 
Canonical version of site is specified in Google Webmaster Tools
 
Rel canonical link tag is properly implemented across the site
Uses absolute URLs instead of relative URLs
  • This can cause a lot of problems if you have a root domain with secure sections.

Site speed


Review page load time for key pages 

Make sure compression is enabled


Enable caching


Optimize your images for the web


Minify your CSS/JS/HTML

Use a good, fast host
  • Consider using a CDN for your images.

Optimize your images for the web

Mobile

Review the mobile experience
  • Is there a mobile site set up?
  • If there is, is it a mobile site, responsive design, or dynamic serving?


Make sure analytics are set up if separate mobile content exists


If dynamic serving is being used, make sure the Vary HTTP header is being used

Review how the mobile experience matches up with the intent of mobile visitors
  • Do your mobile visitors have a different intent than desktop based visitors?
Ensure faulty mobile redirects do not exist
  • If your site redirects mobile visitors away from their intended URL (typically to the homepage), you’re likely going to run into issues impacting your mobile organic performance.
Ensure that the relationship between the mobile site and desktop site is established with proper markup
  • If a mobile site (m.) exists, does the desktop equivalent URL point to the mobile version with rel=”alternate”?
  • Does the mobile version canonical to the desktop version?
  • Official documentation.

International

Review international versions indicated in the URL
  • ex: site.com/uk/ or uk.site.com
Enable country based targeting in webmaster tools
  • If the site is targeted to one specific country, is this specified in webmaster tools? 
  • If the site has international sections, are they targeted in webmaster tools?
Implement hreflang / rel alternate if relevant
If there are multiple versions of a site in the same language (such as /us/ and /uk/, both in English), update the copy been updated so that they are both unique
 

Make sure the currency reflects the country targeted
 
Ensure the URL structure is in the native language 
  • Try to avoid having all URLs in the default language

Analytics

Analytics tracking code is on every page
  • You can check this using the “custom” filter in a Screaming Frog Crawl or by looking for self referrals.
  • Are there pages that should be blocked?
There is only one instance of a GA property on a page
  • Having the same Google Analytics property will create problems with pageview-related metrics such as inflating page views and pages per visit and reducing the bounce rate.
  • It is OK to have multiple GA properties listed, this won’t cause a problem.
Analytics is properly tracking and capturing internal searches
 

Demographics tracking is set up

Adwords and Adsense are properly linked if you are using these platforms
Internal IP addresses are excluded
UTM Campaign Parameters are used for other marketing efforts
Meta refresh and JavaScript redirects are avoided
  • These can artificially lower bounce rates.
Event tracking is set up for key user interactions

This audit covers the main technical elements of a site and should help you uncover any issues that are holding a site back. As with any project, the deliverable is critical. I’ve found focusing on the solution and impact (business case) is the best approach for site audit reports. While it is important to outline the problems, too much detail here can take away from the recommendations. If you’re looking for more resources on site audits, I recommend the following:

Helpful tools for doing a site audit:

Annie Cushing’s Site Audit
Web Developer Toolbar
User Agent Add-on
Firebug
Link Checker
SEObook Toolbar
MozBar (Moz’s SEO toolbar)
Xenu
Screaming Frog
Your own scraper
Inflow’s technical mobile best practices

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Reblogged 3 years ago from moz.com

Can I place multiple breadcrumbs on a page?

Many of my items belong to multiple categories on my eCommerce site. Can I place multiple breadcrumbs on a page? Do they confuse Googlebot? Do you properly u…

Reblogged 4 years ago from www.youtube.com