by Ben Lorica (last updated Aug/2011)The material below was inspired by a Foo camp discussion I had with Google's webspam lead Matt Cutts, about the need to organize and summarize the key ideas contained in the materials Google releases. Matt and his team have published hundreds of videos over the last two years, and this document is an attempt to organize and capture the suggestions and factoids in one place. What I wanted to do was create a reference that was both a check list (of concrete steps to take) and a myth-busting tool. If you're new to Search Marketing and SEO, this document should get you up-to-speed on the many dimensions of this fast-moving topic. This document also clears up a lot of SEO myths, and thus should be useful to experienced practitioners as well.
My goal is to try to keep this document updated, so if you have any suggestions or comments, feel free to contact me. If you're interested in learning more about the subjects discussed below, follow the hyperlinks to the original sources.
|Site and Information Architecture||Search Results and Ranking|
|Site Content||Search Engine Crawlers|
|Spam||Keywords & Landing Pages|
|Miscellaneous topics: SEO Myths, Site Code, and Site Performance|
thoughtful piece on the subject). A brandable domain name (think "Twitter" or "Groupon") can set your company apart from competitors. One major consideration is the importance Google and other search engines place on keywords in domain names. Based on user feedback, Google is considering adjusting their ranking algorithms to lessen the importance of keywords in domains. Before buying a domain, check it's available history on www.archive.org: you probably won't want to inherit a domain previously used by spammers or pornographers! If you search for a domain by typing in the exact URL, and it doesn't show up on Google, chances are Google has flagged it as an untrustworthy domain. This is true especially if you know there are actual links that point to that domain (in other words, it's a site that should have a nontrivial PageRank, yet fails to show up in search results).
If your web host offers you add-on domains (e.g. my-site.com in addition to mysite.com), rather than maintaining addons as related but separate sites, redirect them to your main site. Finally, using nonstandard ports (www.example.com:90) won't affect Google's crawlers, but users would have a difficult time remembering your URL.
From an information architecture perspective, you want to make your site easier for Googlebot to crawl. If you have thousands of pages, just being in the Sitemap.xml file doesn't guarantee that a page will be indexed. To get more pages crawled, try to get more links, particularly from reputable sites. The Sitemap.xml file can have up to 50,000 entries. If you need more than that you can use a site index and accompanying site maps.
As the amount of content on your site grows try using navigation & content management technologies such as displaying a list of "related" content, or buttons to allow users to vote up/down content. Consider using web analytics to highlight the most popular content on your root page. Other ways users can navigate your site are through tags and categories. Between those two options, it's better to spend resources improving your taxonomy. Google is pretty good at identifying keywords, so carefully assigning tags isn't likely to yield much SEO benefit. HTML site maps are good for users and are a great way to distribute PageRank across your site.
Be careful when measuring the number of pages on a site using Google's site: tool (example: # of pages in a given blog). The resulting page counts only provide an estimate, up to three significant digits, of the number of pages.
Linking strategies: Google interprets nofollow at the link level, and not at the page level. So if you have multiple links between two pages, one nofollow doesn't mean the rest are interpreted as nofollow. More importantly, internal links (these are links between pages on your site) shouldn't use nofollow. With regards to links from your site to other sites, Google does not penalize sites that only use nofollow links. Bear in mind that according to an analysis conducted by Google, only a small percentage of all links on the web, are nofollow. And by relying on nofollow links, you might be cutting your site off from conversations on the web. Lastly, don't drop-off links that result in 404 errors.
URL structure: The number of subdirectories in a URL does not affect its Google's PageRank. Moreover, it doesn't matter much to Google's rankings whether keywords are in the file name or file path. One step worth takins is using your your URL's to help Google detect duplicate content. Through Webmaster Tools, you can help Googlebot identify duplicates by highlighting which URL parameters it can ignore. Another important usability consideration: make sure your pages (URL's) can be bookmarked. Social bookmarking services can be a major source of traffic for your site. Lastly, will switching to HTTPS affect search ranking results? Probably not, but as of May/2011 it is still uncommon for sites to do this, so it's prudent to experiment before making a wholesale transition.
The above considerations need to accomodate the fact that marketing and web analytics require tracking "codes" embedded within URL's. When using Google AdWords, the Destination URL Autotagging option will let you distinguish between visits from organic and paid search listings. Many web analytics tools, including Google Analytics, use Link Tagging, to add extra marketing-related information to URL's. [Related issue: For duplicate content on multiple URL's, also see sections on Site Content : Content Duplication and Search Engine Crawlers.]
Mobile and International content: Google has a crawler specifically for mobile search -- if you detect it, make sure you display content for your mobile web site. Note that Google recommends that you maintain a separate URL for your mobile site (see here & here). Similarly, place localized content into country-specific domain names. It's easier for search engines to associate top-level domains with country-specific domains.
User-experience and Customer Satisfaction: As I note in the section on Site Content, Google's Panda/Farmer update means that site architects and designers need to be aware that customer satisfaction and overall user-experience, have direct SEO (i.e., search rankings) consequences.
UX & Site Navigation: Navigation is an important aspect of user-experience design. Ideally, each page on your site can be reached within a few clicks from the homepage/sitemap. For retailers this means a logical flow of links from your home page, to categories, to detailed product pages. Tools like breadcrumb navigation, provide additional context and enhance user-experience.PageRank) depends on the sites that link to you: the number of links & the reputation of the sites that link to you. (Yup, its recursive.) PageRank is publicly refreshed a few times each year. (Internally, PageRank is a "float" that is constantly recomputed by Google.) Strictly speaking it is independent of the content on your site. But in practice, other sites link to you because of the (compelling) content on your site. Since content doesn't factor into the computation of PageRank, factors such as the browser compatability of a site, and the types of ads displayed on a site, aren't taken into account (at least directly). Of course sites that serve lots of annoying ads probably won't receive many links from reputable sites.
Optimized sites (sites that use SEO techniques) don't always outrank non-optimized sites: Google looks at page content along with off-domain links. Also note that tools for determining & comparing links are usually not exhaustive. So there may well be links from high PageRank sites to a particular site that these tools aren't accounting for.
Both redirects and customer URL shorteners flow PageRank & anchor text: links that point to shortened URL's & redirects, count towards the PageRank of their desitination pages.
RANKING: PageRank is one of over 200+ factors used by Google in ranking search results. The exact list of all the signals used, as well as how they're employed to arrive at a composite score, is a closely guarded secret. Besides PageRank, other ranking factors include anchor text, content on the page, proximity of different words on the page, URL & title of the page, content found in the header (for a more detailed discussion, see here). Since Google cares about overall experience of their users, Google recently announced that site speed was a minor factor in determining search rankings (see here, here, & here). The important thing to remember is that as far as Google is concerned, site reputation (which depends a lot on a site's content) matters a lot more than site speed.
Non-factors: Use of other Google products (Google Analytics, AdSense), won't affect your search rankings. The age of a domain isn't an important ranking factor. HTML validation isn't that important for ranking or SEO. Modern browsers do a good job of rendering pages even when there are mistakes in the underlying HTML code.
RESULTS: Google algorithmically determines the text (and age) of snippets it shows to users. Nevertheless, site owners sometimes request Google to change snippets that appear in search engine results. Google does look at the description meta tag, and will often use it as part of a search result snippet for the site. To deter search engine spammers, their algorithms will compare it with what's on the page to ensure that the description meta tag is representative of the actual content on a page.
Another potential source for snippets are user navigation tools on your site. A popular navigation tool for traversing (product) categories is breadcrumb navigation. Note that it takes time before Googlebot can gather and incorporate breadcumb navigation data into search snippets. As of mid-2010 Google engineers were hoping to update breadcrumb navigation data every few weeks.
REAL-TIME: Caffeine was Google's response to the growing amount of real-time content on the web. With Caffeine Google can index "new" content and make it available in its search results within seconds.reputation by counting and analyzing the links that point to your site. A surefire way of generating inbound links is by publishing interesting content. Your goal should be to organically grow your number of inbounds links, without having to resort to paid or even deceptive schemes. For content publishers, here are some ideas from Matt Cutts.
Controversy: It's a common enough method for generating links that it even has a name: link baiting. While controversy can be effective, use it sparingly.
Participate in a discussion, by answering questions: If you have a point of view on a trending topic, don't hesitate to join the dicussion. Use blogs & social media site to weigh in.
Conduct original (data rich) research: Speaking from personal experience, it doesn't take too much original research (factoids or data) for a post to attract links.
Lists: Readers don't seem to get tired of Top 10 lists, tips, and best practices. While you may not get a lot of (nofollow) links, you still might get traffic from social media sites.
Newsletters, conference presentations: If you develop expertise in a topic, don't hesitate to share your findings & materials. The easiest step to take is to publish slides of presentations that you've already given.
Videos and slidecasts: Related to the previous item, but requires more effort to produce.
HOW-TOS and Tutorials: The key is truth in advertising: if the title of the page claims that readers will learn something specific, make sure you deliver.
Release products & services: plug-ins, free apps, etc
CONTENT DUPLICATION: Use of the tag rel=canonical is the primary tool for flagging duplcate content. As of June/2011, Google web search supports rel=canonical. If you have duplicate content on multiple URL's on your site, consolidate PageRank by using 301 redirects or rel=canonical. An important thing to remember is that 301 redirects sometimes take a while to take effect.
Let search engines know that your site is the original source of a particular piece of content: tweet as soon as you post the content, file a DMCA request, use PubSubHubbub or related tools.
Google's Panda/Farmer update: In late June/2011, Google rolled out machine-learning algorithms to identify "scrape and republish" sites. Webmasters have long complained that in many instances "scrape and republish" sites used original content from their sites, to outrank them in search results. There are currently no manual reviews to the results generated by these algorithms, so false-positives (wrongly identifying pages as products of "scraping") can have serious consequences. The inputs used by these new machine-learning classifiers are rumored to include metrics from consumer user-experience studies Google conducts. If true, this means SEO and search marketers now have to place more emphasis on user-experience and customer satisfaction. As of mid Aug/2011, Google was on the verge of deploying Panda to non-English search queries.
Within a domain: As mentioned in the section on Site and Information Architecture, the tag rel=canonical is the primary tool for flagging duplcate content on a domain.
Content duplication across domains: Rather than using a cross-domain rel="canonical" tag, consider using a 301 redirect (especially when you're permanently migrating to another domain). You lose a little bit of "link juice" or PageRank, but not enough to worry much about it. There is no limit to the number of 301 redirects from your old to your new site.
Placing an excerpt of recent posts or other content on your home page, isn't considered a "bad or spammy" form of content duplication: While Google is good at detecting that your home page changes a lot and content on it may appear elsewhere on your site, try to limit yourself to excerpts and avoid copying entire posts/articles onto your home page.
What if you mistakenly set the rel="canonical" attribute of a page to itself? This actually occurs quite frequently in practice. Fortunately the programmers of Googlebot took this possibility into account, and it won't cause a problem with Google.
Provide information about images on your site: Either directly (alt meta tag, page title, descriptive text) or indirectly (comments or tags from users) to help Google interpret your image files. For video, transcripts that you submit can be turned automatically into captions by Youtube.
Provide recommendations: Keep users on your site using short lists of related products or articles.
Don't go overboard with Tag Clouds: Tag Clouds with thousands of links run the risk of appearing as keyword stuffing.
Use special characters (ligatures, soft-hyphens, interpuncts, hyphenation points) carefully: There is a wide variation in how search engines (are able) handle them.
If you've gone to the trouble of creating unique descriptions & other meta data for your product pages, skip publishing the generic text from manufacturers, found on other sites.
If you are syndicating your content, use the rel="canonical" tag.
When possible, try not to mix multiple languages on a single page. Google's algorithms try to infer the primary language used on a page, but in rare cases when it guesses incorrectly, the page may not rank as high for your target audience. If you can, produce single-language versions of the same page.
Recent Microformats: Google announced support for an authorship markup (rel="author", provides a way to connect authors with their content), around the same time that Google/Bing/Yahoo launched a broader set of microformats through Schema.org. Major search engines seem to be moving in the direction of supporting microformats, so it is a good idea to slowly roll them out on your content.
duration of penalties for spam depends on how Google flagged the problem. If the problem sites/pages were detected algorithmically, then the penaly is lifted once Googlebot revisits the site and find the problems have been fixed. If the problem sites/pages were manually reported, then Google suspends the site for a fixed duration (e.g. 30 days). In either case, you can file a reconsideration case with Google to determine the status of your site. Here are some things to consider:
Avoid cloaking ("presenting different content or URLs to users and search engines"): Google crawlers are from U.S. IP addresses and should see pages visible to regular U.S. Internet users. Cloaking can result in your site being removed from Google's index. But cloaking is more than just the content you display. If you create "speedier" pages specifically for Googlebot, you would be guilty of cloaking and run the risk of getting penalized by Google.
Preventing pages from being crawled: The "noindex" attribute should contain no spaces. Google doesn't have tools that lets site owners highlight sections of a page that they don't want indexed, but Yahoo! does.
Don't use robots.txt to manage and direct Googlebot (e.g. crawl this section of my site this week, and crawl the rest next week): Rather than changing robots.txt, simply link to the pages you want crawled from your root page.
Google and other search engine crawlers eventually detect links that point to your site, you don't need to notify them about the existence of inbound links.
BUT, if you have new pages that you want crawled immediately, consider notifying search engines: There are cases when you might NOT want to wait until search engines detect key pages on your site. If that's the case, investigate your options with each search engine. Recently Google introduced the Fetch as Googlebot feature in Webmaster Tools for precisely this purpose.
Forms: Google does attempt to crawl simple HTML forms. So if you have pages that are accessible only after the successful completion of forms, Google might still be able to reach them.
The If-modified-since HTTP header should be updated to keep it in sync with content changes: If you can't modify it, omit it, so Googlbot doesn't assume that content on your pages hasn't changed.
Symbols: Google's parsers break at the end of a word, so it treats "brandname" and "brandname®" in the same way. (also see Best Practices in the Site Content section)
Googlebot tends not to index or look at RSS/Atom feeds.
Besides http and https, Google can crawl ftp (but stick with http & https).
Pay attention to how search engines handle query parameters: In late July/2011, Google announced improvements in how Googlebot handles query parameters. By taking advantage of Google's tips, your site "... can be crawled more effectively, reducing your bandwidth usage and likely allowing more unique content from your site to be indexed."
recent meta-analysis by researchers from Google found that 89% of incremental traffic (site visits that would not have occurred without search ad campaigns) came from search ads. Not surprisingly, major search engines have for years, derived most of their revenue from keyword ad programs. Thus many companies have emerged to provide keyword research tools. Below are some useful tools, factoids, and tips on how to identify & manage keywords.
If your site was hacked (and malicious content was inserted), run a site: search (e.g. site:google.com) after you clean up your pages. If you aren't seeing results, file a reconsideration request. Also log into Google Webmaster to check if you have messages.
Google's webspam team uses both algorithms and manual reviews.
Guilt by association: The good news for many small site owners is, if you have the normal mix of spammy & normal sites in your shared hosting environment, the rankings of the normal sites won't be affected by their spammy peers.
Having dofollow comments on your blog can affect the reputation (PageRank) of your site: But commenting on a dofollow blog won't affect your site's reputation.
When considering the use of "hidden content" (content accessible to users only by selecting elements in a user interface), pay attention to conventional web interface designs: Also try to limit the amount of hidden content to a reasonable amount of text. These steps will decrease the likelihood that Googlebot will mistakenly believe you are guilty of hiding text for optimization purposes.
It's fine to sell links as long as you use the nofollow attribute: This applies to links from all the different types of content (text, banner ads, images, etc ) on your site.
Google has algorithms for detecting Crypto 404 pages: These are pages that look like 404 pages to a user, but returns a 200 response code.
Here are some attributes of great keyword initiatives: (1) Flexibility: The effectiveness of keywords and their accompanying ads, are subject to change. What worked last year/month or even yesterday, may prove ineffective in the days and weeks ahead. The needs and interests of search engine users change. But equally important, competitors may drive up the cost of certain keywords, diminishing what was formerly a decent ROI. (2) Experimentation: It's important to adapt a test-and-learn mindset. Google Adwords has canned reports to help you measure the effectiveness of keyword campaigns. Evidence-based search marketing teams are more likely to maintain a diverse portfolio of keywords at any given time. (3) Outcome-based: Metrics (page views, conversion, transactions) are clearly defined and tracked.
KEYWORD DISCOVERY: Common Techniques:
THIRD-PARTY KEYWORD TOOLS:
Intuition: Keywords are based on what you think users are interested in.
Data mining internal logs: By looking at your own search/referral/web logs, you probably will come across words and phrases that you are unaware of, but are what users are typing to discover your site. Welcome to the long tail!
Simple NLP (natural language processing) and IR (information retrieval): A simple example is to take a bunch of documents (product pages, blog posts, news articles, content from competitors) that you think are pertinent to the campaign you are embraking on. You can associate keywords to individual documents, using the popular in information retrieval metrics, like TF-IDF (term frequency, inverse document frequency) weights.
Unsupervised learning: TF-IDF associates keywords that can be found in individual documents. What if you're interested in related words & phrases, not found in a document? Topic models assume that documents were generated by mixing words from a set of topics. The algorithm inverts this assumption: given a set of dcuments, it discovers the topics (and the words associated with each topic) responsible for generating the documents. Returning to keyword discovery: for each document you can associate the word and phrases from the key topics that comprise the document. Another related technique you can use is document clustering: find non-obvious keywords for a document, by looking at "nearby" documents in your collection. The good news is that both techniques have been the subject of much research, and recent progress has made it possible to scale these algorithms, and use them in real-time.
Third-party tools: there are so many, that they warrant a separate section.
Tools from Google: Trends gives search volume, for a given phrase, over the past few years. Besides information about key geographical sources of search traffic for the phrase, you can export the data in CSV format. Google Insights for Search furnishes similar data, but also includes a list of related search phrases (keyword discovery). The drawback of Google Insights for Search is that the data isn't exportable to CSV. Google's Wonder wheel an effective visual tool for displaying related search queries, was quietly removed in early July/2011. If you use Adwords, then similar data can be had using the AdWords Keyword Tool and Traffic Estimator.
Seasonality: Use the Google tools listed above to measure seasonality of your candidate search phrases.
Google advanced search operators: There are instances when you might interested in estimating the number of search results, for a narrower slice of Google's index. For example the allintitle: operator will return results for pages whose titles contain your search phrase.
Keyword research tools: Trellian and Wordtracker; EfficientFrontier offers a suite of tools which includes keyword campaign management.
Keyword Competitiveness/Difficulty: There are several tools that use factors including search volume, number of search results, number of paid search competitors, and bid price, to derive composite scores that purportedly measure the amount of competition for given search phrases. The most well-known are SEOMoz and SEOlogs keyword difficulty tools. In addition, there are several tools designed to help you track search & display advertising activities by companies (see this list of tools for competitive analysis).
Once you've chosen the right keywords and optimized your ad copy, you should prepare your landing pages (pages on your site that users first see), prior to launching your marketing campign. Search marketing will bring users to your site, conversion depends on what they see when they land on those first few pages. Here are some important considerations:
Call to Action: Your landing page must contain a clear call to action ("buy/download/register/read" now). The message should be concise and the page should be free from clutter.
Ask/require only essential information: Don't ask users for unnecessary information, if an email address will suffice, ask for just that.
Simplify navigation: All the clickable links reinforce the call-to-action, with an additional link for "additional information" for undecided users.
Headlines matter: Users tend to decide within a few seconds whether to stay or leave your sight. Headlines convey that you have an offering worth pursuing.
Testimonials: Sometimes referred to as social proof, communicates who else has signed up to your offering. Here are examples of what you might include on a landing page: number of users of this particular product/service, media coverage if any, short personal testimonials & endorsements.
Do some A/B (or even multivariate) testing: You should always test different versions of landing pages. Ideally you can settle on a few elements that you can tweak and vary for multivariate testing purposes. Proper test design is critical for consumer research projects of this nature. In many cases Google's Web Site Optimizer might be sufficient for your A/B testing needs.
Tools for designing and testing landing pages: Web analytics tool providers also offer tools for optimizing landing pages (Google Analytics sister product called Website Optimizer mentioned above). There are also a bunch of startups ready to help you with your landing pages. Examples include hosted landing pages with Unbounce (if you want to speed up your dev/testing process), simple A/B testing with Optimizely, Concept Feedback lets you solicit feedback from experienced web designers, and [x+1] claims that their dynamically assembled landing pages increase conversion rates.
KEYWORDS AND SEARCH ENGINES:
A site can rank high for a keyword that doesn't appear on its site, if many other sites link to it using the given keyword as anchor text.
As far as search results rankings go, it doesn't matter much to Google whether keywords are in file name or file path.
Google and other major search engines do not use the meta keywords tag in their rankings.
Avoid keyword cannibalization: By placing the same keyword or phrases across multiple pages on the same site, you end up competing with yourself: search engines will be forced to choose among the many pages and display the page that best fits a query. If a page contains keywords targeted by another page, linking to that page using the right anchor text, will avoid the cannibalization issue.
Doing negative things against your competitors (e.g. negative product reviews) will hurt their PageRank: Many "product review" sites use "nofollow" to guard against coordinated negative campaigns.
The keywords meta-tag counts in Google's search rankings: Google does not use the keyword meta-tag in its ranking algorithms.
Domains created before 2004 tend to have higher PageRank: This is absolutely false.
The amount of time left on a domain's registration is an important SEO factor: Time-to-expiration is not an important ranking factor.
Google DNS data is available to other groups inside Google, including the Webspam team: This is absolutely false.
Google deliberately implements changes over the Christmas Holidays, to force marketers to buy ads: Google does everything possible to minimize disruption during the 4th quarter of each year.
Site performance (page speed) is a slight factor in Google's rankings: see here and here.
Googlebot isn't the sole tool used to determine site performance. Besides response time to Googlebot, static analysis is used to evaluate site speed, but not usage statistics gathered from the Chrome browser.
Underscore vs dash for separators: Avoid using "_" (underscore) as a separator in your title tag. Google views this as a secondary consideration, but they continue to recommend "-" (dashes) as separators, especially within your URL.
There are situations when "noindex,follow" robots meta tag makes sense. You may want Google to not index a page, yet still follow the links that are on that page.
NOTE: Reproduction & reuse allowed under Creative Commons Attribution.