Skip to main content
Participant
December 9, 2023
Question

Is Extract PDF API Using OCR?

  • December 9, 2023
  • 1 reply
  • 1337 views

hello i am Using Extract API to analyze my PDF Files. I have a Question About this APIs are using OCR to Extract PDF Text?? Because in my workflow it's really important about Acuraacy But i already Checked in Demo. Some PDF Couldn't recognize Text . it goes like this  Text:"□H�Q!i " So if this APIs are using OCR i have to find another way to extract PDF Text.

    This topic has been closed for replies.

    1 reply

    Joel Geraci
    Community Expert
    Community Expert
    December 11, 2023

    OCR is used when the entire page is an image. Otherwise, we extract the text from the PDF page. It's possible that the font encoding of your PDF is bad and that's why you are seeing the results you are getting.  Can you share the PDF in question?

    Participant
    December 12, 2023

    Sure I will Post My PDF. And I am using ExtractTextTableInfoWithTableStructureFromPdf.java I saw on DeveloperLive Video They said APIs are using OCR and Aodbe Sensei for increase more Accuracy. So i Run the code for extract text from PDF(image) and PDF(Not Image) Then Can you answer me Am i Right?? 
    APIs Are using only when the PDF File is Image and If PDF is not image APIs aren't use OCR?? 

    Participant
    December 12, 2023

    And he PDF file is in Korean, but the API recognizes it in English. So that's the reason text is □H�Q! 

    and i attached my PDF file. Thanks for Help