Web site Crawlability
Is your Web site crawlable?
Web site Crawlability
The SEO community is totally confused as to the meaning of Web site crawlability in reference to preventing problems with Google.
Crawlability refers to the ability of a search engine to crawl through the entire text content of your Web site, easily navigating to every one of your webpages, without encountering an unexpected dead-end.
It has absolutely nothing to do with W3C Validation issues.
Always avoid unexpected dead-ends, both for humans using Web browsers and for search enigne bot visitors.
Crawlability is the number one factor for preventing Google problems (supported by Googler Matt Cutt’s rather unaccessible video [Transcription]).
- Test Crawlability with a TEXT Browser – Starting from the home page of your Web site, a visitor to your site, should be easily able to locate every one of your Web pages. Avoiding Google problems is all about Web site fundamentals. If you can view / access your entire site well enough to read its informational content using only an old out of date CSS challenged Web browser browser (such asNetscape v. 4.08 for Windows95/98) in text mode only, then you are going to be in pretty good shape search engine wise. Googleitself provides a Cache Text option that explicitly points out what Google sees on each webpage that is cached by Google. Alternatively, you can check it one page at a time using the Poodle PredictorWeb site. Even performing a simple Ctrl-A select all in one of your webpages and then pasting it in a text editor can revealing interesting information about what search engines see on your webpage.
- Sitemaps – HTML sitemaps enable human visitors to reach orphan pages on your Web site. Accordingly HTML sitemaps should be followed, but not indexed by Google. Text and XML formated sitemaps are for search engine bots, spiders, and crawlers. They artificially make uncrawlable Web sites more crawlable.
- Bad, or Broken, Links are NOT Crawlable – Every hyperlink on your entire Web site, whether internal or external, should be functional or in working condition. Nor, are links that Google choses to ignore crawlable.
“Make sure your navigation links are in HTML, and not in Flash or Javascript. Search engines have trouble extracting links from anything other than HTML.” — X-Googler Vanessa Fox
Web site Crawlability Rating
Bad or broken links are an unexpected and a unwanted roadblock to navigating the Web. When traveling on the Web nobody likes to run into an unexpected dead-end, neither does Google. Dead-ends on your site lead visitors to nowhere. Every bad, or broken, hyperlink on your entire Web site reduces the crawability rating of your Web site. If a hyperlink does NOT work, that means that a visitor to your site cannot navigate or visit it. Google assigns each Web site a crawlability rating. Too many bad or broken hyperlinks on your entire Web site and your site will be completely filtered out of Google’s listings or SERPs.
The links on this Web site are only a few months old, yet I have had to replaced over a half-dozen of them already. Once your site starts voting for other Web sites, you are committing yourself to a never-ending maintenance headache. Web sites are constantly changing their URLs. Webpages are here today, but gone tomorrow. Put off maintaining your inbound links long enough, and Google will filter out your site 100%.
Webpage Crawlability
The most important consideration for one of your webpages to be crawled is for it to have other links pointing to the page. That means designing your Web site so that:
- Other internal webpages on your site link to it, the more the better.
- Every page on your site should link to your home page as well as to other pages on your site.
- The more internal links a webpage has pointing to it, the more important it is considered by Google, and the higher its pagerank.
To be crawlable, a webpage has to be able to take a visitor someplace else. Visitors must be able to navigate to either another one of your webpages or to an external site. Bad or broken links reduces the crawability of your webpages.
Beyond your site’s internal link structure, other considerations for determining webpage crawlability is the existence of text on the page that can be indexed. And, whether or not the links on a given page are good, bad, or are being ignored by Google. Remember that Google has chosen to ignore graphic hyperlinks without alt text image tags, some types of javascript, and Flash.