Skip to main content
Participant
November 22, 2021
Question

Hidden text behind is extracted while visible text is not extracted

  • November 22, 2021
  • 1 reply
  • 210 views

I have run the ExtractTextInfoWithCharBoundsFromPDF code for the PDF in this URL:

https://drive.google.com/file/d/14qy_GPS3dzXI-meJiCKkvqwUb59Q1yWk/view?usp=sharing
The following text which is hidden behind the image is extracted: 
ANNUAL REPORT 2018

A text which is actually visible is not extracted:
Xcel Energy became the first public utility to receive

Is it possible to extract only the visible text ?

    This topic has been closed for replies.

    1 reply

    Participant
    November 23, 2021

    I have removed the PDF sample from my cloud folder.
    It is available here:
    https://s25.q4cdn.com/680186029/files/doc_financials/ar-interactive/2018-interactive/ar/images/Xcel_Energy-AR2018.pdf
    The problem is on page 14 (zero based counting).