OCR - w/ColdFusion

Report · Nov 17, 2017

Has anyone attempted to perform OCR with CF? I have been tasked with extracting the text from a faxed image file (TIF). Are there any open source OCR APIs to integrate with CF?

Report · Nov 20, 2017

The Google Cloud Vision API can extract text from an image or PDF file.

https://cloud.google.com/vision/

Report · Nov 20, 2017

rbuong, that is neither open-source, nor ColdFusion.

armandof44358221, I did a quick Google search, and found a blog entry from 2009 that might get you started on the right path.

http://coldfusion.sys-con.com/node/1173727

HTH,

^ _ ^

UPDATE: Okay.. it's not CF, but it is an API that can be used from CF (ColdFusion or ColdBox). However, it is a commercial API, and not free. You will be charged anywhere from US$1.50 to US$3.50 per 1,000 API calls, depending on what you are trying to do with it.

Report · Nov 20, 2017

You will likely not find a free OCR platform out there.

However my company wrote a Coldfusion and Java integration for Google Cloud Vision (also referenced above). You would obviously still need to pay for the usage on Cloud Vision

https://github.com/Construction-Monitor/coldfusion-vision-api

Report · Nov 20, 2017

Also as a note, you mention you have TIF images, as far as I am aware you can't send TIF images in directly, so you would need to convert them to JPEG first. My recommendation for that would be to fall back to command line tools like GraphicsMagick as their performance is much better then Coldfusion's built in image tag.

Report · Nov 21, 2017

Thanks Daniel,

I went ahead and purchased an SDK and combined with a DLL I created to communicate with CF in which converts image to text via the CFOBJECT. Thanks for your input .

Report · Nov 21, 2017

That sounds pretty sweet. Any chance we could convince you for details, just to help anyone else that might have this same requirement?

V/r,

^ _ ^

Report · Nov 21, 2017

Sure .. once I get it going and in production I'll put up the Visio and some details...

Report · Nov 21, 2017

Much appreciated. I'm eager to see how you did it.

V/r,

^ _ ^

Report · Jan 05, 2024

Here's what worked for me:

Download tesseract ocr, it's free and open-source: https://tesseract-ocr.github.io/tessdoc/Downloads.html

Then in cfscript:

public function ocr(FilePath) {
    ocrtext = "";
    cfexecute(
        name=full_tesseract_filepath
        arguments="""" & FilePath & """ stdout"
        variable="ocrtext",
        timeout="10"
        );
    return ocrtext;
}

Note: full_tesseract_filepath is usually "C:\Program Files\Tesseract-OCR\tesseract.exe" on windows machines.

Report · Jan 05, 2024

Oops: missing a few columns on that function call, it should be:

cfexecute(
    name=full_tesseract_filepath,
    arguments="""" & FilePath & """ stdout",
    variable="ocrtext",
    timeout="10"
);

Report · Jan 05, 2024

Note that to avoid clumsy double quotes and concatenation you can change the arguments parameter syntax to the following:

arguments='"#FilePath#" stdout',

It's easier to read.