The 2015 #MozCon Video Bundle Has Arrived!

Posted by EricaMcGillivray

The bird has landed, and by bird, I mean the MozCon 2015 Video Bundle! That’s right, 27 sessions and over 15 hours of knowledge from our top notch speakers right at your fingertips. Watch presentations about SEO, personalization, content strategy, local SEO, Facebook graph search, and more to level up your online marketing expertise.

If these videos were already on your wish list, skip ahead:

If you attended MozCon, the videos are included with your ticket. You should have an email in your inbox (sent to the address you registered for MozCon with) containing your unique URL for a free “purchase.”

MozCon 2015 was fantastic! This year, we opened up the room for a few more attendees and to fit our growing staff, which meant 1,600 people showed up. Each year we work to bring our programming one step further with incredible speakers, diverse topics, and tons of tactics and tips for you.


What did attendees say?

We heard directly from 30% of MozCon attendees. Here’s what they had to say about the content:

Did you find the presentations to be advanced enough? 74% found them to be just perfect.

Wil Reynolds at MozCon 2015


What do I get in the bundle?

Our videos feature the presenter and their presentation side-by-side, so there’s no need to flip to another program to view a slide deck. You’ll have easy access to links and reference tools, and the videos even offer closed captioning for your enjoyment and ease of understanding.

For $299, the 2015 MozCon Video Bundle gives you instant access to:

  • 27 videos (over 15 hours) from MozCon 2015
  • Stream or download the videos to your computer, tablet, phone, phablet, or whatever you’ve got handy
  • Downloadable slide decks for all presentations


Bonus! A free full session from 2015!

Because some sessions are just too good to hide behind a paywall. Sample what the conference is all about with a full session from Cara Harshman about personalization on the web:


Surprised and excited to see these videos so early? Huge thanks is due to the Moz team for working hard to process, build, program, write, design, and do all the necessaries to make these happen. You’re the best!

Still not convinced you want the videos? Watch the preview for the Sherlock Christmas Special. Want to attend the live show? Buy your early bird ticket for MozCon 2016. We’ve sold out the conference for the last five years running, so grab your ticket now!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

How to Use Server Log Analysis for Technical SEO

Posted by SamuelScott

It’s ten o’clock. Do you know where your logs are?

I’m introducing this guide with a pun on a common public-service announcement that has run on late-night TV news broadcasts in the United States because log analysis is something that is extremely newsworthy and important.

If your technical and on-page SEO is poor, then nothing else that you do will matter. Technical SEO is the key to helping search engines to crawl, parse, and index websites, and thereby rank them appropriately long before any marketing work begins.

The important thing to remember: Your log files contain the only data that is 100% accurate in terms of how search engines are crawling your website. By helping Google to do its job, you will set the stage for your future SEO work and make your job easier. Log analysis is one facet of technical SEO, and correcting the problems found in your logs will help to lead to higher rankings, more traffic, and more conversions and sales.

Here are just a few reasons why:

  • Too many response code errors may cause Google to reduce its crawling of your website and perhaps even your rankings.
  • You want to make sure that search engines are crawling everything, new and old, that you want to appear and rank in the SERPs (and nothing else).
  • It’s crucial to ensure that all URL redirections will pass along any incoming “link juice.”

However, log analysis is something that is unfortunately discussed all too rarely in SEO circles. So, here, I wanted to give the Moz community an introductory guide to log analytics that I hope will help. If you have any questions, feel free to ask in the comments!

What is a log file?

Computer servers, operating systems, network devices, and computer applications automatically generate something called a log entry whenever they perform an action. In a SEO and digital marketing context, one type of action is whenever a page is requested by a visiting bot or human.

Server log entries are specifically programmed to be output in the Common Log Format of the W3C consortium. Here is one example from Wikipedia with my accompanying explanations:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • 127.0.0.1 — The remote hostname. An IP address is shown, like in this example, whenever the DNS hostname is not available or DNSLookup is turned off.
  • user-identifier — The remote logname / RFC 1413 identity of the user. (It’s not that important.)
  • frank — The user ID of the person requesting the page. Based on what I see in my Moz profile, Moz’s log entries would probably show either “SamuelScott” or “392388” whenever I visit a page after having logged in.
  • [10/Oct/2000:13:55:36 -0700] — The date, time, and timezone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP/1.0 — “GET” is one of the two commands (the other is “POST”) that can be performed. “GET” fetches a URL while “POST” is submitting something (such as a forum comment). The second part is the URL that is being accessed, and the last part is the version of HTTP that is being accessed.
  • 200 — The status code of the document that was returned.
  • 2326 — The size, in bytes, of the document that was returned.

Note: A hyphen is shown in a field when that information is unavailable.

Every single time that you — or the Googlebot — visit a page on a website, a line with this information is output, recorded, and stored by the server.

Log entries are generated continuously and anywhere from several to thousands can be created every second — depending on the level of a given server, network, or application’s activity. A collection of log entries is called a log file (or often in slang, “the log” or “the logs”), and it is displayed with the most-recent log entry at the bottom. Individual log files often contain a calendar day’s worth of log entries.

Accessing your log files

Different types of servers store and manage their log files differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers:

What is log analysis?

Log analysis (or log analytics) is the process of going through log files to learn something from the data. Some common reasons include:

  • Development and quality assurance (QA) — Creating a program or application and checking for problematic bugs to make sure that it functions properly
  • Network troubleshooting — Responding to and fixing system errors in a network
  • Customer service — Determining what happened when a customer had a problem with a technical product
  • Security issues — Investigating incidents of hacking and other intrusions
  • Compliance matters — Gathering information in response to corporate or government policies
  • Technical SEO — This is my favorite! More on that in a bit.

Log analysis is rarely performed regularly. Usually, people go into log files only in response to something — a bug, a hack, a subpoena, an error, or a malfunction. It’s not something that anyone wants to do on an ongoing basis.

Why? This is a screenshot of ours of just a very small part of an original (unstructured) log file:

Ouch. If a website gets 10,000 visitors who each go to ten pages per day, then the server will create a log file every day that will consist of 100,000 log entries. No one has the time to go through all of that manually.

How to do log analysis

There are three general ways to make log analysis easier in SEO or any other context:

  • Do-it-yourself in Excel
  • Proprietary software such as Splunk or Sumo-logic
  • The ELK Stack open-source software

Tim Resnik’s Moz essay from a few years ago walks you through the process of exporting a batch of log files into Excel. This is a (relatively) quick and easy way to do simple log analysis, but the downside is that one will see only a snapshot in time and not any overall trends. To obtain the best data, it’s crucial to use either proprietary tools or the ELK Stack.

Splunk and Sumo-Logic are proprietary log analysis tools that are primarily used by enterprise companies. The ELK Stack is a free and open-source batch of three platforms (Elasticsearch, Logstash, and Kibana) that is owned by Elastic and used more often by smaller businesses. (Disclosure: We at Logz.io use the ELK Stack to monitor our own internal systems as well as for the basis of our own log management software.)

For those who are interested in using this process to do technical SEO analysis, monitor system or application performance, or for any other reason, our CEO, Tomer Levy, has written a guide to deploying the ELK Stack.

Technical SEO insights in log data

However you choose to access and understand your log data, there are many important technical SEO issues to address as needed. I’ve included screenshots of our technical SEO dashboard with our own website’s data to demonstrate what to examine in your logs.

Bot crawl volume

It’s important to know the number of requests made by Baidu, BingBot, GoogleBot, Yahoo, Yandex, and others over a given period time. If, for example, you want to get found in search in Russia but Yandex is not crawling your website, that is a problem. (You’d want to consult Yandex Webmaster and see this article on Search Engine Land.)

Response code errors

Moz has a great primer on the meanings of the different status codes. I have an alert system setup that tells me about 4XX and 5XX errors immediately because those are very significant.

Temporary redirects

Temporary 302 redirects do not pass along the “link juice” of external links from the old URL to the new one. Almost all of the time, they should be changed to permanent 301 redirects.

Crawl budget waste

Google assigns a crawl budget to each website based on numerous factors. If your crawl budget is, say, 100 pages per day (or the equivalent amount of data), then you want to be sure that all 100 are things that you want to appear in the SERPs. No matter what you write in your robots.txt file and meta-robots tags, you might still be wasting your crawl budget on advertising landing pages, internal scripts, and more. The logs will tell you — I’ve outlined two script-based examples in red above.

If you hit your crawl limit but still have new content that should be indexed to appear in search results, Google may abandon your site before finding it.

Duplicate URL crawling

The addition of URL parameters — typically used in tracking for marketing purposes — often results in search engines wasting crawl budgets by crawling different URLs with the same content. To learn how to address this issue, I recommend reading the resources on Google and Search Engine Land here, here, here, and here.

Crawl priority

Google might be ignoring (and not crawling or indexing) a crucial page or section of your website. The logs will reveal what URLs and/or directories are getting the most and least attention. If, for example, you have published an e-book that attempts to rank for targeted search queries but it sits in a directory that Google only visits once every six months, then you won’t get any organic search traffic from the e-book for up to six months.

If a part of your website is not being crawled very often — and it is updated often enough that it should be — then you might need to check your internal-linking structure and the crawl-priority settings in your XML sitemap.

Last crawl date

Have you uploaded something that you hope will be indexed quickly? The log files will tell you when Google has crawled it.

Crawl budget

One thing I personally like to check and see is Googlebot’s real-time activity on our site because the crawl budget that the search engine assigns to a website is a rough indicator — a very rough one — of how much it “likes” your site. Google ideally does not want to waste valuable crawling time on a bad website. Here, I had seen that Googlebot had made 154 requests of our new startup’s website over the prior twenty-four hours. Hopefully, that number will go up!

As I hope you can see, log analysis is critically important in technical SEO. It’s eleven o’clock — do you know where your logs are now?

Additional resources

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Misuses of 4 Google Analytics Metrics Debunked

Posted by Tom.Capper

In this post I’ll pull apart four of the most commonly used metrics in Google Analytics, how they are collected, and why they are so easily misinterpreted.

Average Time on Page

Average time on page should be a really useful metric, particularly if you’re interested in engagement with content that’s all on a single page. Unfortunately, this is actually its worst use case. To understand why, you need to understand how time on page is calculated in Google Analytics:

Time on Page: Total across all pageviews of time from pageview to last engagement hit on that page (where an engagement hit is any of: next pageview, interactive event, e-commerce transaction, e-commerce item hit, or social plugin). (Source)

If there is no subsequent engagement hit, or if there is a gap between the last engagement hit on a site and leaving the site, the assumption is that no further time was spent on the site. Below are some scenarios with an intuitive time on page of 20 seconds, and their Google Analytics time on page:

Scenario

Intuitive time on page

GA time on page

0s: Pageview
10s: Social plugin
20s: Click through to next page

20s

20s

0s: Pageview
10s: Social plugin
20s: Leave site

20s

10s

0s: Pageview
20s: Leave site

20s

0s

Google doesn’t want exits to influence the average time on page, because of scenarios like the third example above, where they have a time on page of 0 seconds (source). To avoid this, they use the following formula (remember that Time on Page is a total):

Average Time on Page: (Time on Page) / (Pageviews – Exits)

However, as the second example above shows, this assumption doesn’t always hold. The second example feeds into the top half of the average time on page faction, but not the bottom half:

Example 2 Average Time on Page: (20s+10s+0s) / (3-2) = 30s

There are two issues here:

  1. Overestimation
    Excluding exits from the second half of the average time on page equation doesn’t have the desired effect when their time on page wasn’t 0 seconds—note that 30s is longer than any of the individual visits. This is why average time on page can often be longer than average visit duration. Nonetheless, 30 seconds doesn’t seem too far out in the above scenario (the intuitive average is 20s), but in the real world many pages have much higher exit rates than the 67% in this example, and/or much less engagement with events on page.
  2. Ignored visits
    Considering only visitors who exit without an engagement hit, whether these visitors stayed for 2 seconds, 10 minutes or anything inbetween, it doesn’t influence average time on page in the slightest. On many sites, a 10 minute view of a single page without interaction (e.g. a blog post) would be considered a success, but it wouldn’t influence this metric.

Solution: Unfortunately, there isn’t an easy solution to this issue. If you want to use average time on page, you just need to keep in mind how it’s calculated. You could also consider setting up more engagement events on page (like a scroll event without the “nonInteraction” parameter)—this solves issue #2 above, but potentially worsens issue #1.

Site Speed

If you’ve used the Site Speed reports in Google Analytics in the past, you’ve probably noticed that the numbers can sometimes be pretty difficult to believe. This is because the way that Site Speed is tracked is extremely vulnerable to outliers—it starts with a 1% sample of your users and then takes a simple average for each metric. This means that a few extreme values (for example, the occasional user with a malware-infested computer or a questionable wifi connection) can create a very large swing in your data.

The use of an average as a metric is not in itself bad, but in an area so prone to outliers and working with such a small sample, it can lead to questionable results.

Fortunately, you can increase the sampling rate right up to 100% (or the cap of 10,000 hits per day). Depending on the size of your site, this may still only be useful for top-level data. For example, if your site gets 1,000,000 hits per day and you’re interested in the performance of a new page that’s receiving 100 hits per day, Google Analytics will throttle your sampling back to the 10,000 hits per day cap—1%. As such, you’ll only be looking at a sample of 1 hit per day for that page.

Solution: Turn up the sampling rate. If you receive more than 10,000 hits per day, keep the sampling rate in mind when digging into less visited pages. You could also consider external tools and testing, such as Pingdom or WebPagetest.

Conversion Rate (by channel)

Obviously, conversion rate is not in itself a bad metric, but it can be rather misleading in certain reports if you don’t realise that, by default, conversions are attributed using a last non-direct click attribution model.

From Google Analytics Help:

“…if a person clicks over your site from google.com, then returns as “direct” traffic to convert, Google Analytics will report 1 conversion for “google.com / organic” in All Traffic.”

This means that when you’re looking at conversion numbers in your acquisition reports, it’s quite possible that every single number is different to what you’d expect under last click—every channel other than direct has a total that includes some conversions that occurred during direct sessions, and direct itself has conversion numbers that don’t include some conversions that occurred during direct sessions.

Solution: This is just something to be aware of. If you do want to know your last-click numbers, there’s always the Multi-Channel Funnels and Attribution reports to help you out.

Exit Rate

Unlike some of the other metrics I’ve discussed here, the calculation behind exit rate is very intuitive—”for all pageviews to the page, Exit Rate is the percentage that were the last in the session.” The problem with exit rate is that it’s so often used as a negative metric: “Which pages had the highest exit rate? They’re the problem with our site!” Sometimes this might be true: Perhaps, for example, if those pages are in the middle of a checkout funnel.

Often, however, a user will exit a site when they’ve found what they want. This doesn’t just mean that a high exit rate is ok on informational pages like blog posts or about pages—it could also be true of product pages and other pages with a highly conversion-focused intent. Even on ecommerce sites, not every visitor has the intention of converting. They might be researching towards a later online purchase, or even planning to visit your physical store. This is particularly true if your site ranks well for long tail queries or is referenced elsewhere. In this case, an exit could be a sign that they found the information they wanted and are ready to purchase once they have the money, the need, the right device at hand or next time they’re passing by your shop.

Solution: When judging a page by its exit rate, think about the various possible user intents. It could be useful to take a segment of visitors who exited on a certain page (in the Advanced tab of the new segment menu), and investigate their journey in User Flow reports, or their landing page and acquisition data.

Discussion

If you know of any other similarly misunderstood metrics, you have any questions or you have something to add to my analysis, tweet me at @THCapper or leave a comment below.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

The Long Click and the Quality of Search Success

Posted by billslawski

“On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the “Long Click” — This occurred when someone went to a search result, ideally the top one, and did not return. That meant Google has successfully fulfilled the query.”

~ Steven Levy. In the Plex: How Google Thinks, Works, and Shapes our Lives

I often explore and read patents and papers from the search engines to try to get a sense of how they may approach different issues, and learn about the assumptions they make about search, searchers, and the Web. Lately, I’ve been keeping an eye open for papers and patents from the search engines where they talk about a metric known as the “long click.”

A recently granted Google patent uses the metric of a “Long Click” as the center of a process Google may use to track results for queries that were selected by searchers for long visits in a set of search results.

This concept isn’t new. In 2011, I wrote about a Yahoo patent in How a Search Engine May Measure the Quality of Its Search Results, where they discussed a metric that they refer to as a “target page success metric.” It included “dwell time” upon a result as a sign of search success (Yes, search engines have goals, too).

5543947f5bb408.24541747.jpg

Another Google patent described assigning web pages “reachability scores” based upon the quality of pages linked to from those initially visited pages. In the post Does Google Use Reachability Scores in Ranking Resources? I described how a Google patent that might view a long click metric as a sign to see if visitors to that page are engaged by the links to content they find those links pointing to, including links to videos. Google tells us in that patent that it might consider a “long click” to have been made on a video if someone watches at least half the video or 30 seconds of it. The patent suggests that a high reachability score on a page may mean that page could be boosted in Google search results.

554394a877e8c8.30299132.jpg

But the patent I’m writing about today is focused primarily upon looking at and tracking a search success metric like a long click or long dwell time. Here’s the abstract:

Modifying ranking data based on document changes

Invented by Henele I. Adams, and Hyung-Jin Kim

Assigned to Google

US Patent 9,002,867

Granted April 7, 2015

Filed: December 30, 2010

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media for determining a weighted overall quality of result statistic for a document.

One method includes receiving quality of result data for a query and a plurality of versions of a document, determining a weighted overall quality of result statistic for the document with respect to the query including weighting each version specific quality of result statistic and combining the weighted version-specific quality of result statistics, wherein each quality of result statistic is weighted by a weight determined from at least a difference between content of a reference version of the document and content of the version of the document corresponding to the version specific quality of result statistic, and storing the weighted overall quality of result statistic and data associating the query and the document with the weighted overall quality of result statistic.

This patent tells us that search results may be be ranked in an order, according to scores assigned to the search results by a scoring function or process that would be based upon things such as:

  • Where, and how often, query terms appear in the given document,
  • How common the query terms are in the documents indexed by the search engine, or
  • A query-independent measure of quality of the document itself.

Last September, I wrote about how Google might identify a category associated with a query term base upon clicks, in the post Using Query User Data To Classify Queries. In a query for “Lincoln.” the results that appear in response might be about the former US President, the town of Lincoln, Nebraska, and the model of automobile. When someone searches for [Lincoln], Google returning all three of those responses as a top result could be said to be reasonable. The patent I wrote about in that post told us that Google might collect information about “Lincoln” as a search entity, and track which category of results people clicked upon most when they performed that search, to determine what categories of pages to show other searchers. Again, that’s another “search success” based upon a past search history.

There likely is some value in working to find ways to increase the amount of dwell time someone spends upon the pages of your site, if you are already having some success in crafting page titles and snippets that persuade people to click on your pages when they those appear in search results. These approaches can include such things as:

  1. Making visiting your page a positive experience in terms of things like site speed, readability, and scannability.
  2. Making visiting your page a positive experience in terms of things like the quality of the content published on your pages including spelling, grammar, writing style, interest, quality of images, and the links you share to other resources.
  3. Providing a positive experience by offering ideas worth sharing with others, and offering opportunities for commenting and interacting with others, and by being responsive to people who do leave comments.

Here are some resources I found that discuss this long click metric in terms of “dwell time”:

Your ability to create pages that can end up in a “long click” from someone who has come to your site in response to a query, is also a “search success” metric on the search engine’s part, and you both succeed. Just be warned that as the most recent patent from Google on Long Clicks shows us, Google will be watching to make sure that the content of your page doesn’t change too much, and that people are continuing to click upon it in search results, and spend a fair amount to time upon it.

(Images for this post are from my Go Fish Digital Design Lead Devin Holmes @DevinGoFish. Thank you, Devin!)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Data Center Relocation, Moving, Migration, Movers, Computer, and IT Equipment Movers Torrance

http://goo.gl/Mfk67 http://www.innovaglobal.com 877-448-4968 Specials:http://goo.gl/bazaq Phone sys:http://goo.gl/8ncqd Relocation :http://goo.gl/2hkr2 Servi…

[ccw-atrib-link]

What Deep Learning and Machine Learning Mean For the Future of SEO – Whiteboard Friday

Posted by randfish

Imagine a world where even the high-up Google engineers don’t know what’s in the ranking algorithm. We may be moving in that direction. In today’s Whiteboard Friday, Rand explores and explains the concepts of deep learning and machine learning, drawing us a picture of how they could impact our work as SEOs.

For reference, here’s a still of this week’s whiteboard!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are going to take a peek into Google’s future and look at what it could mean as Google advances their machine learning and deep learning capabilities. I know these sound like big, fancy, important words. They’re not actually that tough of topics to understand. In fact, they’re simplistic enough that even a lot of technology firms like Moz do some level of machine learning. We don’t do anything with deep learning and a lot of neural networks. We might be going that direction.

But I found an article that was published in January, absolutely fascinating and I think really worth reading, and I wanted to extract some of the contents here for Whiteboard Friday because I do think this is tactically and strategically important to understand for SEOs and really important for us to understand so that we can explain to our bosses, our teams, our clients how SEO works and will work in the future.

The article is called “Google Search Will Be Your Next Brain.” It’s by Steve Levy. It’s over on Medium. I do encourage you to read it. It’s a relatively lengthy read, but just a fascinating one if you’re interested in search. It starts with a profile of Geoff Hinton, who was a professor in Canada and worked on neural networks for a long time and then came over to Google and is now a distinguished engineer there. As the article says, a quote from the article: “He is versed in the black art of organizing several layers of artificial neurons so that the entire system, the system of neurons, could be trained or even train itself to divine coherence from random inputs.”

This sounds complex, but basically what we’re saying is we’re trying to get machines to come up with outcomes on their own rather than us having to tell them all the inputs to consider and how to process those incomes and the outcome to spit out. So this is essentially machine learning. Google has used this, for example, to figure out when you give it a bunch of photos and it can say, “Oh, this is a landscape photo. Oh, this is an outdoor photo. Oh, this is a photo of a person.” Have you ever had that creepy experience where you upload a photo to Facebook or to Google+ and they say, “Is this your friend so and so?” And you’re like, “God, that’s a terrible shot of my friend. You can barely see most of his face, and he’s wearing glasses which he usually never wears. How in the world could Google+ or Facebook figure out that this is this person?”

That’s what they use, these neural networks, these deep machine learning processes for. So I’ll give you a simple example. Here at MOZ, we do machine learning very simplistically for page authority and domain authority. We take all the inputs — numbers of links, number of linking root domains, every single metric that you could get from MOZ on the page level, on the sub-domain level, on the root-domain level, all these metrics — and then we combine them together and we say, “Hey machine, we want you to build us the algorithm that best correlates with how Google ranks pages, and here’s a bunch of pages that Google has ranked.” I think we use a base set of 10,000, and we do it about quarterly or every 6 months, feed that back into the system and the system pumps out the little algorithm that says, “Here you go. This will give you the best correlating metric with how Google ranks pages.” That’s how you get page authority domain authority.

Cool, really useful, helpful for us to say like, “Okay, this page is probably considered a little more important than this page by Google, and this one a lot more important.” Very cool. But it’s not a particularly advanced system. The more advanced system is to have these kinds of neural nets in layers. So you have a set of networks, and these neural networks, by the way, they’re designed to replicate nodes in the human brain, which is in my opinion a little creepy, but don’t worry. The article does talk about how there’s a board of scientists who make sure Terminator 2 doesn’t happen, or Terminator 1 for that matter. Apparently, no one’s stopping Terminator 4 from happening? That’s the new one that’s coming out.

So one layer of the neural net will identify features. Another layer of the neural net might classify the types of features that are coming in. Imagine this for search results. Search results are coming in, and Google’s looking at the features of all the websites and web pages, your websites and pages, to try and consider like, “What are the elements I could pull out from there?”

Well, there’s the link data about it, and there are things that happen on the page. There are user interactions and all sorts of stuff. Then we’re going to classify types of pages, types of searches, and then we’re going to extract the features or metrics that predict the desired result, that a user gets a search result they really like. We have an algorithm that can consistently produce those, and then neural networks are hopefully designed — that’s what Geoff Hinton has been working on — to train themselves to get better. So it’s not like with PA and DA, our data scientist Matt Peters and his team looking at it and going, “I bet we could make this better by doing this.”

This is standing back and the guys at Google just going, “All right machine, you learn.” They figure it out. It’s kind of creepy, right?

In the original system, you needed those people, these individuals here to feed the inputs, to say like, “This is what you can consider, system, and the features that we want you to extract from it.”

Then unsupervised learning, which is kind of this next step, the system figures it out. So this takes us to some interesting places. Imagine the Google algorithm, circa 2005. You had basically a bunch of things in here. Maybe you’d have anchor text, PageRank and you’d have some measure of authority on a domain level. Maybe there are people who are tossing new stuff in there like, “Hey algorithm, let’s consider the location of the searcher. Hey algorithm, let’s consider some user and usage data.” They’re tossing new things into the bucket that the algorithm might consider, and then they’re measuring it, seeing if it improves.

But you get to the algorithm today, and gosh there are going to be a lot of things in there that are driven by machine learning, if not deep learning yet. So there are derivatives of all of these metrics. There are conglomerations of them. There are extracted pieces like, “Hey, we only ant to look and measure anchor text on these types of results when we also see that the anchor text matches up to the search queries that have previously been performed by people who also search for this.” What does that even mean? But that’s what the algorithm is designed to do. The machine learning system figures out things that humans would never extract, metrics that we would never even create from the inputs that they can see.

Then, over time, the idea is that in the future even the inputs aren’t given by human beings. The machine is getting to figure this stuff out itself. That’s weird. That means that if you were to ask a Google engineer in a world where deep learning controls the ranking algorithm, if you were to ask the people who designed the ranking system, “Hey, does it matter if I get more links,” they might be like, “Well, maybe.” But they don’t know, because they don’t know what’s in this algorithm. Only the machine knows, and the machine can’t even really explain it. You could go take a snapshot and look at it, but (a) it’s constantly evolving, and (b) a lot of these metrics are going to be weird conglomerations and derivatives of a bunch of metrics mashed together and torn apart and considered only when certain criteria are fulfilled. Yikes.

So what does that mean for SEOs. Like what do we have to care about from all of these systems and this evolution and this move towards deep learning, which by the way that’s what Jeff Dean, who is, I think, a senior fellow over at Google, he’s the dude that everyone mocks for being the world’s smartest computer scientist over there, and Jeff Dean has basically said, “Hey, we want to put this into search. It’s not there yet, but we want to take these models, these things that Hinton has built, and we want to put them into search.” That for SEOs in the future is going to mean much less distinct universal ranking inputs, ranking factors. We won’t really have ranking factors in the way that we know them today. It won’t be like, “Well, they have more anchor text and so they rank higher.” That might be something we’d still look at and we’d say, “Hey, they have this anchor text. Maybe that’s correlated with what the machine is finding, the system is finding to be useful, and that’s still something I want to care about to a certain extent.”

But we’re going to have to consider those things a lot more seriously. We’re going to have to take another look at them and decide and determine whether the things that we thought were ranking factors still are when the neural network system takes over. It also is going to mean something that I think many, many SEOs have been predicting for a long time and have been working towards, which is more success for websites that satisfy searchers. If the output is successful searches, and that’ s what the system is looking for, and that’s what it’s trying to correlate all its metrics to, if you produce something that means more successful searches for Google searchers when they get to your site, and you ranking in the top means Google searchers are happier, well you know what? The algorithm will catch up to you. That’s kind of a nice thing. It does mean a lot less info from Google about how they rank results.

So today you might hear from someone at Google, “Well, page speed is a very small ranking factor.” In the future they might be, “Well, page speed is like all ranking factors, totally unknown to us.” Because the machine might say, “Well yeah, page speed as a distinct metric, one that a Google engineer could actually look at, looks very small.” But derivatives of things that are connected to page speed may be huge inputs. Maybe page speed is something, that across all of these, is very well connected with happier searchers and successful search results. Weird things that we never thought of before might be connected with them as the machine learning system tries to build all those correlations, and that means potentially many more inputs into the ranking algorithm, things that we would never consider today, things we might consider wholly illogical, like, “What servers do you run on?” Well, that seems ridiculous. Why would Google ever grade you on that?

If human beings are putting factors into the algorithm, they never would. But the neural network doesn’t care. It doesn’t care. It’s a honey badger. It doesn’t care what inputs it collects. It only cares about successful searches, and so if it turns out that Ubuntu is poorly correlated with successful search results, too bad.

This world is not here yet today, but certainly there are elements of it. Google has talked about how Panda and Penguin are based off of machine learning systems like this. I think, given what Geoff Hinton and Jeff Dean are working on at Google, it sounds like this will be making its way more seriously into search and therefore it’s something that we’re really going to have to consider as search marketers.

All right everyone, I hope you’ll join me again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

[ccw-atrib-link]

Computer Technology, create free 2500+ back link

This video prepares individuals to apply basic engineering principles and technical skills in support of professionals who use video tutorial computer system…

[ccw-atrib-link]