Google using OCR technology to index scaned PDF document

Discuss about search engine strategy and its process. Get latest news of search engines.

Google using OCR technology to index scaned PDF document

Postby sherrick on Sat Nov 01, 2008 7:19 am

Google announced that now their search system will use OCR technology to index and show an HTML version of a scanned PDF document. In the past, Google only showed an HTML version of PDF's created with text enabled formatting. But now, if a document is scanned as an image, Google can create an HTML version using OCR.

To see this new system at work, click on these search queries. Note the document excerpt in the search results, along with the full text presented after the 'View as HTML' link:

[repairing aluminum wiring]
[spin lock performance]
[Mumps and Severe Neutropenia]
sherrick
 
Posts: 16
Joined: Wed Oct 22, 2008 4:31 am

Return to Search Engine News

Who is online

Users browsing this forum: No registered users and 0 guests

cron