python - How to provide image to Tesseract from memory -


i'm using tesseract ocr on millions of pdfs, , i'm trying squeeze out performance can.

my current pipeline uses convert convert pdf png files (one per page), , uses tesseract on each of those.

during profiling, i've discovered lot of time spent writing files disk, reading them again, i'd move of memory.

i've got pdf png conversion working in memory, need way pass in-memory blob tesseract instead of giving path file? haven't been able find documentation or examples of this?

you can use pytesseract. it's python wrapper google tesseract.

usage:

image = ... # read image memory result = pytesseract.image_to_string(image, lang="eng") 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -