Find and replace all hard returns, except when they precede an indented paragraph

Report · Aug 18, 2024

Formatting newbie author, who, for some strange reason, hit the return key after EVERY SENTENCE. I just want to remove all those extra returns --ONE PER LINE--but leave the return that naturally goes at the end of the paragraph, which begins a new indented paragraph.

I attach a screenshot of the situation I recreated with a chunk of text from one of my own books, so you can see what I mean. End of paragraph is, "you woke up"; "I fell," begins the next indented paragraph. Hard to see, since those other returns make everything indent. Anyway, I wondered if find/replacing beginning of paragraph with an asterisk (or hashtags) will allow me to find all of them after I somehow figure out how to remove all extra ones. Like a command, "Find and replace all hard returns, except when they precede an indented paragraph."

Report · Aug 18, 2024

We just covered something like this. It can be fixed, within certain limits.

First off, the usual cause of this one-para-per-line fault is a glitched export or file conversion, or (very common) cut and paste from a PDF. If you can back up to any prior version, even that PDF, you might be able to export a better-structured file to start with.

While it's possible to write an elaborate GREP string that can sort this out, the more efficient, kind of bulldozer method I've used most often is a series of Find/Replace. That may be tricky here because I don't see how those paragraphs are indented. There has to be some "marker" for the start of the real paragraphs. You have hidden chars turned on, and I don't see a tab, and it looks as if every single paragraph is the same style with that indent.

The way it would work, if a marker can be found, is more or less this:

FIND <paragraph return><marker>
REPLACE both with a tag like #*#*#
FIND all remaining paragraph returns and REPLACE with spaces
FIND all double spaces and REPLACE with single (just cleanup — optional)
FIND your tag and REPLACE with a paragraph return

That, in theory, will remove all extraneous paragraph returns and restore only those wanted at the end of paragraphs.

But if there's no marker to hang this on... I can't think of any easy solution. You may have to replace all paragraph returns with spaces, then go through and manually break the content back into paragraphs. If the marker is some unique text string (instead of a tab or leading space or such), a GREP string might be composed to find it.

ETA: It seems unlikely, but are any styles used here? Are all those paragraphs the same (BODY or NORMAL or such)? Or is that last paragraph ("Minutes later...") a different style? That would give a hook for the above process.

┋┊ InDesign to Kindle (& EPUB): A Professional Guide, v3.1 ┊ (Amazon) ┊┋

Report · Aug 18, 2024

Hi @Kelli Jae Baeli , It looks like from your capture that periods indicate the real end of a paragraph? If that’s the case a script could remove returns at the end of a line when there’s no period before the return. For example:

Before:

After:

Using this script with a text frame selected:

//Gets the parent story of a selected text frame
var sl = app.activeDocument.selection[0].parentStory.lines.everyItem().getElements();
var er
for (var i = sl.length-1; i > -1; i--){
     er = sl[i].characters.itemByRange(-2, -1)
     if (er.characters[0].contents != "." && er.characters[1].contents == "\r") {
         er.contents = ""
     } 
}; 

Report · Aug 18, 2024

...periods indicate the real end of a paragraph?

Uh, I don't think that's even close to correct. I see periods within what seem to be paragraphs even in that short sample, although the run of short one-sentence paras is a bit misleading. I don't see any consistent pattern of character/space/period/return to use.

┋┊ InDesign to Kindle (& EPUB): A Professional Guide, v3.1 ┊ (Amazon) ┊┋

Report · Aug 18, 2024

If no marker string can be found, and no better version of the file can be obtained, the only "good" solution may be to go through and add a paragraph return before each genuine paragraph start. That would be fairly fast and efficient, if tedious beyond belief, but then any of these fixes can be implemented using a double return as an actual/desired return, and eliminating the rest.

┋┊ InDesign to Kindle (& EPUB): A Professional Guide, v3.1 ┊ (Amazon) ┊┋

Report · Aug 18, 2024

@Kelli Jae Baeli

I think @James Gifford—NitroPress meant this thread:

https://community.adobe.com/t5/indesign-discussions/how-to-correct-wrong-page-break/m-p/14787906#M58...

Report · Aug 18, 2024

I thought there was another, similar thread, a bit closer in specific topic than that one. I'll skim the recent stuff and see.

┋┊ InDesign to Kindle (& EPUB): A Professional Guide, v3.1 ┊ (Amazon) ┊┋

Report · Aug 19, 2024

With a presumption that paragraphs start with a capital letter or opening double quote I think you can get at least 90% of the way there with a simple GREP find change using a negative look-ahead:

Find \r(?!["\u]) and change to a space

I don't see any hyphens in your sample, but if there are they might get left in the text, and of course you may still have some unwanted breaks where a sentence coincidentally starts a new line, but I think that will cut down on manual changes.

Report · Aug 19, 2024

… Supposing each "bad carriage return" is preceeded or followed but a "space":

Find: ((?<=\h)\r)|(\r(?=\h))

Replace by: nothing

(^/) The Jedi

Report · Aug 19, 2024

Close, but no cheap see-gar for you. I can see exceptions even in the short sample.

I concur with Peter. One "fix" or another might get the job mostly done, but every possibility leads to a need for close proofing and fix of any skipped instances. There's just no path for this but applied wetware.

Unless the document is extremely long, I still feel the optimum path is reading and inserting paragraph breaks, then one or another cleanup method from there.

Of course, there's always the position that this is the author's problem to resolve, not the designer's.

┋┊ InDesign to Kindle (& EPUB): A Professional Guide, v3.1 ┊ (Amazon) ┊┋

Report · Aug 19, 2024

if a "bad carriage return" is followed by a "lower case char" but not preceeded by a "space", you will need a second regex to achieve the game:

Find: (?<!\h)\r(?=\l)

Replace by: a normal space

(^/)