Copy link to clipboard
Copied
Has anyone attempted to perform OCR with CF? I have been tasked with extracting the text from a faxed image file (TIF). Are there any open source OCR APIs to integrate with CF?
Copy link to clipboard
Copied
The Google Cloud Vision API can extract text from an image or PDF file.
Copy link to clipboard
Copied
rbuong, that is neither open-source, nor ColdFusion.
armandof44358221, I did a quick Google search, and found a blog entry from 2009 that might get you started on the right path.
http://coldfusion.sys-con.com/node/1173727
HTH,
^ _ ^
UPDATE: Okay.. it's not CF, but it is an API that can be used from CF (ColdFusion or ColdBox). However, it is a commercial API, and not free. You will be charged anywhere from US$1.50 to US$3.50 per 1,000 API calls, depending on what you are trying to do with it.
Copy link to clipboard
Copied
You will likely not find a free OCR platform out there.
However my company wrote a Coldfusion and Java integration for Google Cloud Vision (also referenced above). You would obviously still need to pay for the usage on Cloud Vision
https://github.com/Construction-Monitor/coldfusion-vision-api
Copy link to clipboard
Copied
Also as a note, you mention you have TIF images, as far as I am aware you can't send TIF images in directly, so you would need to convert them to JPEG first. My recommendation for that would be to fall back to command line tools like GraphicsMagick as their performance is much better then Coldfusion's built in image tag.
Copy link to clipboard
Copied
Thanks Daniel,
I went ahead and purchased an SDK and combined with a DLL I created to communicate with CF in which converts image to text via the CFOBJECT. Thanks for your input .
Copy link to clipboard
Copied
That sounds pretty sweet. Any chance we could convince you for details, just to help anyone else that might have this same requirement?
V/r,
^ _ ^
Copy link to clipboard
Copied
Sure .. once I get it going and in production I'll put up the Visio and some details...
Copy link to clipboard
Copied
Much appreciated. I'm eager to see how you did it.
V/r,
^ _ ^
Copy link to clipboard
Copied
Here's what worked for me:
Download tesseract ocr, it's free and open-source: https://tesseract-ocr.github.io/tessdoc/Downloads.html
Then in cfscript:
public function ocr(FilePath) {
ocrtext = "";
cfexecute(
name=full_tesseract_filepath
arguments="""" & FilePath & """ stdout"
variable="ocrtext",
timeout="10"
);
return ocrtext;
}
Note: full_tesseract_filepath is usually "C:\Program Files\Tesseract-OCR\tesseract.exe" on windows machines.
Copy link to clipboard
Copied
Oops: missing a few columns on that function call, it should be:
cfexecute(
name=full_tesseract_filepath,
arguments="""" & FilePath & """ stdout",
variable="ocrtext",
timeout="10"
);
Copy link to clipboard
Copied
Note that to avoid clumsy double quotes and concatenation you can change the arguments parameter syntax to the following:
arguments='"#FilePath#" stdout',
It's easier to read.