Skip to main content
Participant
September 26, 2023
Question

How to remove hard returns from end of each line only if not proceeded by a period, etc

  • September 26, 2023
  • 2 replies
  • 1407 views

I have the ususal problem. I am trying to remove carriage returns at the end of each line but not if the carriage return is preceded by a period, exclamation point or question mark.

Just doing a find and replace with /r in grep removes all carriage returns so melds all paragraphs together as well as the intra-paragraph carriage returns. I would like to keep the paragraphs, but separate them with one carriage return.

How do I adjust the grep? Thanks.

2 replies

Joel Cherney
Community Expert
Community Expert
September 27, 2023

Well, I can't really call myself an expert either, but maybe I'm "intermediate"? (My syntax is going to be ugly.)

 

I've processed a whole lot of text extracted from PDFs in the last few decades.  The answer to your question, if taken literally, is quite simple:

(?<!\.|\?|\!)\r

It's what Peter suggested; a negative lookbehind. Lots of ways to do it, of course, but this is what immediately occurred to me as an answer to your question. This assumes, of course, that nowhere in the text you're cleaning up is there an instance of sentence-ending punctuation followed by a space, or a close quote, or a close parenthesis, or anything else. 

 

 

 

FRIdNGE
Inspiring
September 27, 2023

Find:  (?<![.!?])(\h*\r)+

Replace by: a normal space

 

(^/)  The Jedi

Robert at ID-Tasker
Legend
September 26, 2023

I think Positive Lookahead would the answer:

 

https://carijansen.com/positive-lookahead-grep-for-designers/

 

but you need to wait for experts to confirm. 

 

Peter Spier
Community Expert
Community Expert
September 26, 2023

Actually I think negative look behind, but I'm not one of the "experts".