API integration with ROR and extracting column based content from pdf

Report · Aug 10, 2021

Hi There,

I would like to read a pdf and extract content based on search strings provided (via code).

This content can be in column formats as well. Hence the service should be able to identify and return content accordingly.

I am using Ruby on Rails. Hence would like to know if these API's can be incorporated in Ruby on Rails.

Regards,

Mru

Report · Aug 10, 2021

Just so you know: PDF files don't contain columns. Only text, which might appear on screen as columns. There are no markers in the file to say "here is column 1" etc.

Report · Aug 10, 2021

Thanks for your reply.

Yes, I am referring to the text being present in columns.

Report · Aug 10, 2021

But I am telling you the text is not present in columns. It only looks that way.

Report · Aug 10, 2021

Okay.

Request you to suggest which APIs I should be using and how I could integrate it with RoR.

Report · Aug 10, 2021

I am not an expert on the APIs, but an expert on the internals of PDF.

I am pointing out that wanting to extract in columns may not be possible. If the APIs do not offer this option, do not be surprised.

API integration with ROR and extracting column based content from pdf

Photos