Skip to main content
Mike Witherell
Community Expert
Community Expert
February 16, 2024
Answered

Salvaging text from a PDF back to InDesign

  • February 16, 2024
  • 4 replies
  • 540 views

As many of you know, Acrobat Pro has an Export feature that can pull the text out of a PDF. While trying to rebuild a lost publication, I exported the text. Acrobat does a nearly-great job but has one fault:

Textual words that break at a syllable end up in the docx text as two misspelled words.

For example, flowing in the text are words that are broken like:

muscu lar

overwhelm ing

It is just a spacebar space, yet both in Word and in InDesign, the spell-checker sees two misspelled words ... and doesn't offer to join them. Spell Checker sees them, but is brutally stupid about knowing what to do.

 

I have miles and miles of text, and this sort of thing is in every paragraph. What should I do to fix them speedily?

 

This topic has been closed for replies.
Correct answer leo.r

Try to run it through chatGPT?

 

request in chatGPT:

combine broken words correctly: "muscu lar overwhelm ing"

 

result:

"muscular" "overwhelming"

 

You may need to make a more specific and detailed request to get better results for your entire text.

4 replies

Robert at ID-Tasker
Legend
February 17, 2024

Where is this space? At the end of the line - or start of the next line?

 

What with the "-" at the end of the line? 

 

Mike Witherell
Community Expert
Community Expert
February 16, 2024

Thanks Leo. I just got back from trying the very same idea. ChatGPT fixed probably 95% of the mid-sentence linebreak errors, and also does well at rejoining syllable fragments. I had to tell it explicitly not to rewrite the text; just fix the breaks.

I also tried it with Google Gemini, and that AI engine could not comprehend what I was asking it to do. The limitation of ChatGPT appears to be about a 2,000 word limit. Still, the possibility of ChatGPT as a text-editing work-around is very encouraging!

Mike Witherell
leo.r
Community Expert
Community Expert
February 17, 2024

good to know it worked!

leo.r
Community Expert
leo.rCommunity ExpertCorrect answer
Community Expert
February 16, 2024

Try to run it through chatGPT?

 

request in chatGPT:

combine broken words correctly: "muscu lar overwhelm ing"

 

result:

"muscular" "overwhelming"

 

You may need to make a more specific and detailed request to get better results for your entire text.

James Gifford—NitroPress
Legend
February 16, 2024
  • Cry.
  • Blanton's.
  • Use some efficient combination of system macros to grab a selected broken word and do a global find-and-replace. Eventually you will be down to a few that can be corrected with grab-and-macro to remove the space. Speed tools to help a wetware/manual process.

 

Choose wisely.

 

(I really can't think of anything automated beyond those speed-editing macros/scripts. Sorry. Hire it out to someone?)

BobLevine
Community Expert
Community Expert
February 16, 2024

I, too, was going to suggest outsourcing it. You could also try sacrificing a goat.

 

All kidding aside, just to be clear, you did save as Word from Acrobat, right?

Mike Witherell
Community Expert
Community Expert
February 16, 2024

Right.

Funny you mention sacrificing a goat. This is an ancient text about an altar in Pergamum.

Mike Witherell