BizJournals Portfolio
Oct 31 2008 4:05pm EDT

Google Is Now Scanning Documents

Google has begun to index documents posted online that contain images of text using Optical Character Recognition (OCR) technology, it announced yesterday on its blog

Previously only docs converted to PDFs with text were indexed and included in results.  Since scanned docs are only a picture of text, they are typically more difficult to interpret, and the pages can include wrinkle, smudges or stains.

This advancement opens up a whole new collection of information, including many government and academic documents once hidden from the public searches.

The news comes a few days after Google settled its book-scan suit, giving it the go-ahead to continue its book search project.

By Chris Snyder for Wired.com

Also on Wired.com:
DHS: Scour Blogs to Stop Bombs
Google Yahoo Deal Crumbling, Report
Now Official: No One In Tech Can Defend McCain

Subscribe to Wired magazine


blog comments powered by Disqus
Real Business, Real Results

Did anyone at Microsoft ever watch the (gasp!) offensively funny show Family Guy?

Ex-Morgan Stanley exec Zoe Cruz is now heading her own hedge fund. Are Wall Street's leaders done?

Martha, Bernie and Skilling know that what you wear for court can go a long way in public perception.

spotlight on

Health Care

Bad to the Bone No More

Companies such as General Mills say they're stepping up efforts to change employees' bad behavior and promote healthier lifestyles. Read More