• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How to OCR using SDK in C#

New Here ,
May 10, 2016 May 10, 2016

Copy link to clipboard

Copied

Hi All,

Below is my requirement in detail.

I have a PDF which contains the scanned documents.

I want to convert the PDF content to (XML).

Can anyone help me out to achieve this using SDK with C#.

Manoj K Singh

Views

2.4K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
May 12, 2016 May 12, 2016

Copy link to clipboard

Copied

LATEST

This isn't that useful but I believe it'll end up being a 2-stage process that you may need to use a 2+ libraries to perform the various steps.

Don't get me wrong, there's OCR libs for c# that read pdfs full of images, and no doubt you saw the price was over $4k for a developer license haha. There's a few. I'm assuming you want to avoid that.

There's tools like Xpdf and others that you should find and try just so you can read the PDF images themselves. After you get those images, you might need to convert them to a different image format, and them feed them into an OCR library. Google manages a project Tesseract OCR which you may want to look at. I believe it only compiles to C++ but you know there's ways to use a C++ library with c#.

A lot of work to do, but that's probably why they made their direct PDF Image -> OCR Text plugins so expensive.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines