I know Canon can can do it with some $900.00 MSRP add on software. Xerox can do it with their Scan Performance kit.
I know Canon can can do it with some $900.00 MSRP add on software. Xerox can do it with their Scan Performance kit.
Replies sorted oldest to newest
Any idea if Ricoh or Sharp can do this?
Funny you asked about sharp - I've never sold it before so I don't know if sharp desk is an add on or if its standard or if you can buy higher end versions of it but they've been doing it for a while.
Does a Searchable PDF require OCR software operating in the background?
Although Xerox and Konica Minolta offer Searchable PDFs created within the copier does anybody have any real world experience how good these onboard features native to the copier are at processing larger more complex documents. I suspect they are probably acceptable for smaller 1-2 page documents that comprise mostly of text but what happens to their accuracy when processing more complex 10+ page documents.
I do not see how a built in Xerox/Konica Minolta OCR engine could compete in both speed and accuracy with industry leading OmniPage or Abbyy OCR engines that run on PCs with more powerful processing capability than a copier could have.
That's the rhetoric I've always heard from Ricoh when bringing up deals in which I had to add middleware to add this functionality & lost because of the price gap that creates when making those additions across of fleet of 5+ machines. In reading the literature on the new Ricoh MP C3003/C3503/C4503/C5503/C6003, it looks like Scan to Searchable PDF will be standard on those models. I look forward to seeing how well/quickly it works compared to NSI Autostore, et al.
Well at least my sales spin is not so different from Ricoh's. I would be interested to hear about your test results when you have a chance.
Ricoh must have been taking some hits and losing some sales on this knock out feature.
IT Depts like the idea of them not having to manage 3rd party software to create Searchable PDFs.
The question remains how happy users will be with the quality and speed that a MFP's on board processor can create when dealing with longer, more complex documents.
I'll be happy to share the results once I have the chance to test the new models. Based on Ricoh's track record for delivering new models, I should have some feedback in December. *Sarcasm*
I've spoken to Monte (fellow P4P'er) on this subject and his take is that most of these units will choke when trying to process large files (many documents to be scanned). The other day I had an instance with an HP plotter where the customer had placed the documents on the USB drive and was then trying to print from the USB drive, after 2 hours they gave up.
I will also test when it comes out and I'm hoping I'm done before December!!!
In my territory, Xerox, to their credit, has made this standard feature of their product, the "poster boy" of knock out features.
Sooner or later I will get the chance to test this feature on their product.
I now feel that I have enough ammo to at least put some doubt in the buyer's mind that the glossy brochure and reality might not quite match in many situations.
Canon: Finding this topic of interest, I decided to run a test on one of our demo room devices, a Canon ImageRunner Advance C2230. While a newer model, this family of IRAdvance does not offer all of the horespower and features of the larger and more costly models. It does feature PDF (searchable) as a scanned file format. So, I placed a memory stick in the USB port, chose B&W 300dpi pdf searchable. I placed a "typical" office document (actually, my Treeno EDM Administrator's Guide) in the ADF, and scanned 50 pages. Result: File scanned, written to USB, and completed in 3 minutes 15 seconds. File size: 3.1MB.
I then opened the file on my Mac using Adobe Reader and it is indeed searchable.
Admirable performance from the Canon. Not knowing what a Treeno Admin Guide looks like is it mostly text in an easy to read font? Are there many lines, bars, boxes, logos, fonts changes or images in the content?
Yes- I was impressed by this simple test. The Treeno EDM manual is as you described: easy to read font, layout that includes tables and simple graphics. I will repeat the test with a 100 page document as soon as I can find something that's more similar to a customer's document. We need a standard document to test; anyone have a 100 page Slerex letter?!!
Allot of our solutions have text searchable .PDF's as a requirement and I spend allot of time in the legal vertical where its crucial. All the newer HP MFP's have "flow" versions which create text searchable .PDF's do background clean up etc. So essentially from my experience if it's low volume convenience scanning then sure do it using the machines embedded OCR capabilities. But if its volume scanning this wont cut it, you need to use DSS/Autostore etc and even then you can get bottlenecks depending on the number of OCR engines available.
But here is a thought:
Recently we had a customer with high volume OCR requirements, what actually worked out to be a better solution was to drop in non text searchable .PDF's so they are available immediately and use a content crawler software to process and convert all the documents in their ECMS to searchable.
Triple thanks for your helpful reply!
I am currently working with trying to scan a large pile of approx 20 page documents each of client data into searchable PDFs. The copier is scanning directly to a Omnipage Pro 16 OCR engine.There are a lot of different content pages including application forms with lots of lines, boxes, changing fonts, logos and handwriting. Omnipage is having a hard time with accuracy and quality so it seems fair to presume that trying to complete this task via a copier's on-board processing capabilities would likely result in unhappy customers.
The content crawler is a great idea I will have to ponder. Can you recommend a specific crawler for me to look at?
Can be quite expensive depending on the scope of your deal. 20 Pages doesn't seem like allot? whats the frequency? urgency? I did a test this week on a 40 page document and it took 3 minutes using device resident HP OCR which isn't to bad. Also maybe try something higher end like the ABBY OCR engine which is what Autostore uses
I have hundreds of stacks of approx 20 page complex documents to scan. One idea you mentioned earlier was to suggest to the customer that they purchase a second or third OCR engine to put inline. I never thought of that and it would not be a big investment for them.
One thing that I am doing that the customer likes is the insertion of a bar code seperator page. I have stored a predefined bar code in the eFiling box of the copier. Its content does not change. The operator can print off as many bar codes as they need. The operator inserts these paper bar codes where appropriate into a stack of documents waiting to be scanned. In a stack of documents every time the OCR engine sees this bar code, it knows to create a new file. We a currently just scanning as PDFs, not searchable, for the highest accuracy.
I learned today that while a Xerox copier is processing a Searchable PDF, the engine is not available for any other task. Long complicated PDFs can knock the Xerox completely out of action for many minutes until the OCR task is completed.
SalesServiceGuy,
What Model Xerox? Is this true for all Xerox models?
Can any Xerox folks confirm or add to this info?
Thanks,
Vince