There are always some papers that is not OCRed well. And more, I have tried to take these papers to do the 'OCR text recognization' once more with acrobat. I am frustrated, and the acrobat gave me a error, 'This page contains renderable text'. Googled it, there is 
a troubleshoot, but it did not work to me. Although there is still not a clear solution, thanks 
the post, which gives me a well explanation about the renderable text.
Finally, I found a solution that is quite simple:
-     Export all images of the troubled pdf in TIFF format;
-     Combine these images into a new pdf;
-     OCR the new pdf
没有评论:
发表评论