Google learning to index SWF files. Still a long way to go.
Posted by igarciaoliver | Filed under General Information
I’ve learned thru google’s blog that, on a collaboration with Adobe, the google bot can now index text inside a SWF. Yahoo is also capable of this apparently, since they were also involved with Adobe during the research.
The post on googleblog doesn’t give a lot of details but there is also a post at google webmaster central which provides much more information but stills manages to leave the important points in the dark.
Google spiders run the SWF file and clicks everywhere much like a hot blooded visitor would. Whenever it captures text, it gets indexed, If it finds a link, it add the target to it’s “to-crawl” lists and will eventually index it too.
That the good part but there are still plenty of limitations and a lot to accomplish.
At the moment, SWF loaded using Javascript are not indexed, Some debate is going on in related blogs on wether SWFobject, a widely spread script to embed SWF into html is supported or not.
Another setback is what has been already mentioned about links. Google can crawl thru the inner links of a SWF and that includes loaded XML, JPG and other SWF files. However, this files are not interpreted as a part of a main, currently being spidered file, but as a complete independent files.
This goes against any professional level ActionScript.
On one hand, only very small sites don’t rely on external data for loading and populating its contents. So, for google, there would be an empty shell in one hand and bunch of unreadable data on the other. They don’t add up, don’t mix or interact. This means your real text content won’t be viewable and your swf will show up when searching for your typical menu items but not much more.
Also, different sections loaded in different SWF’s will not be glued together. Different SWF helps keeping the downloaded KB to the necessary minimum making the overall user experience much more enjoyable.
The mining of text is still quite raw and there are no tools to add hierarchy to your content. All contents is treated the same and all links are followed. There is no equivalent to the ‘rel=nofollow’ html attribute for links.
This looks like the first promising step into real and powerful SWF indexing.
Lots of people are complaining about the limitations but the day the blogosphere gets rid of complainers it’s the day someone pulls out the internet plug, it’s just not going to happen.
Technology is evolving and small steps are just as necessary as bigger ones. Adobe, Google and Yahoo are going in the right path to get SWF finally indexed and in not too long, we should see a decent indexing system working.
In my opinion, Adobe should add a way of defining standarized metadata for the general project and each SWF and text pieces. Having done that programmers would find much easier to adapt code to show properly understood by search engines.