That topic has been discussed at length in the SEO community but recent events got me back to it. I have to tell this story and get it off my chest as I found myself both on the validation advocate side and the seo side.
So a few days ago I stumbled upon a huge traffic drop over the week-end, like half of the traffic was gone, gone. That evening, I checked all the possibilities on why the heck half of a site traffic would suddenly disappear; maybe some referrers? search position drop? email delivery? etc. to no clear signal. Everything was going the same direction: down. So I went to bed hoping for another Google Analytics data glitch. The next day, I asked the development team if they made changes recently that might have affected the results. Luckily, a colleague found out that there was an issue with the footer, on Internet Explorer only. The browser capabilities analytics report confirmed that. The kind of report you barely never look at.
The footer, that includes the GA code, wasn’t loading because of a badly formated piece of HTML. Obviously, this should have been caught with testing or at least a proper W3C code validation. Long story short, it makes me realize that even if I’m strongly against the idea that code validation is good for SEO, it still has an impact. At first hand, things that you might or might not see, things that might ruin the user experience, your analytics data or the robots capacity to crawl your site.
Myth and reality of code validation for Search Engines
As much as a Web Standards advocate would like to see the Web more compliant, the fact is that a very low proportion of web pages currently validate. A 2008 Opera study showed that no more than 4% of all the web pages do. We’re a long way from a standardized web and guess what? Search engines need to index those non-valid pages and they must understand them. This is the main argument used by SEO folks. I agree with that, but why not help search engines as much as possible, just to be sure they get it the way we want.
Yet at the end of the day, site compliance won’t help you move your site up in the SERPs a bit. It’s not a ranking factor but I firmly belive it should be taken in consideration at the Web strategy level as it could help your site be indexed faster and more deeply by Search Engines.
On the other hand the validation advocates in the “your website must be 100% W3C compliant” camp are ignorant of how information retrieval works. As opposed to browsers, search crawlers don’t need to render the page, they just need to parse it. Search engines will parse your pages to determine the content, links and a page hierarchy. The later is the important part of information retrieval, to not only crawl the content but to get a context associated to it. To do so they don’t need W3C compliance at all.
Enter Web Semantics
Code validation is nothing more than a syntax check, it does not measure how much you comply with Web Standards. That’s why we should look at a broader level; Web Semantics. For example tables are semantically appropriate for tabular data, not layout. Text paragraph should be in well formated P and so on. The HTML 5 specifications goes further by having HTML tags for specific parts of the layout; header, sidebar, navigation. Semantic elements help machines get more context out of the usual text and website image content.
A lot of these elements of Web Standards add to the search engine friendliness of a site while enhancing the front-end development code.
What are the positive sides of Web Standards that might be worth looking at with an SEO perspective?
- Lower page loading time
- Better code to text ratio
- Parsing is faster and easier
- Richer information context
As always in SEO, go after what will get you the most benefits for you site first, giving you have limited time and budget. Hopefully I’ve brought the pros, cons and myths of code validation to light to take an enlightened decision on it.