Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Batch removing all vertical text in hundreds of PDFs

New Here ,
Mar 27, 2018 Mar 27, 2018

Dear community,

I have hundreds of PDFs that I'm converting to text and feeding into text-to-audio program.  90% of them have vertical text fields on every page (e.g., "This PDF was generated by... and downloaded from... at ... ") which messes up the conversion and text-to-audio horrendously. Is there a way to batch remove all vertical text from PDFs? All solutions are welcome, including codes in Python, VBA and R.

Thank you for any info/advice

TOPICS
Edit and convert PDFs
1.9K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 27, 2018 Mar 27, 2018

There's no such thing as "vertical text" in a PDF. Also, there's no support for any of these languages in Acrobat.

It might be possible to do it using JavaScript, if the text can be identified based on its location or contents, but it's impossible to say for sure without seeing the actual files.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 27, 2018 Mar 27, 2018

Here is the link to the first page of such PDFs. I'm not familiar with the PDF internal design, so the box ion the right looks like vertical text to me, even though it's a 90 degree clockwise rotation.

Dropbox - Meneghetti and Williams - 2017ysis - Fortune Favors the Bold 1.pdf

As for python and R, tehre are libraries that read PDFs, adn there are also libraries that save PDFs, hence my question.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 28, 2018 Mar 28, 2018

I didn't say it can't be done with those languages, just that they can't be used in Acrobat, which is the subject of this forum.

You can do it in Acrobat (Pro) using the Redaction tool, which is located under Tools - Protection. Draw a redaction area around the text on the first page, then right-click it and select "Repeat mark across pages". Then apply the redactions and you're done.

This process can also be automated using JavaScript and incorporated into an Action, to process multiple files at once.

If you're interested I could write this code for you, for a small fee. You can contact me privately via try6767 at gmail.com to discuss it further.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 28, 2018 Mar 28, 2018

Does the terms of use allow this?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 28, 2018 Mar 28, 2018
LATEST

If the sole purpose is to convert the files to text-to-audo, could you solve the problem by cropping the pages to remove the vertical text? I know that a piece of software like Pitstop allows you delete objects outside certain dimensions of a PDF page.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines