After importing a .docx file – well, sometimes we have to do these things – I was hoping to find empty lines with the (relatively) familiar
^$ wildcard, but all this is managing to locate is empty table-cells. Any hints?
I think you said previously you are on Fm12? 2015 can now remove the returns on import, fyi.
Remember that the | wildcard matches spaces or punctuation. Hence \p| or \P| find paragraphs containing only spaces and punctuation as well as those that are empty or contain only spaces.
By the way, in a simple search, I was surprised to discover that \P\p finds paragraphs only markers or anchors. Furthermore, Change All of \P\p deletes paragraphs containing only anchors (but not those containing only markers). The anchored table or anchored frame is not deleted; the anchor moves to the next paragraph. I detected this behavior in FM 2015 and confirmed it in FM 9 but did not test in intervening or earlier versions.
I use that combo for this situation, which I thought FieryPantone was asking about, but perhaps I misunderstood?
Of course with GREP we have better options now!
Sorry, had to run back into class. I have used the F/C sequence I mentioned above forever, but it leaves trailing whitespace behind. We all know that we want to leave nothing to chance in our extremely long docs, or Murphy's law will kick in and will add a blank line just for a hard return.
This GREP works for trailing whitespace and multiple returns (but falls short on just multiple returns).
Regular Expressions (aka GREP) on
This is where I remember why I want Fm to allow us to save our queries, so that we can set them up once, save them and just choose them from a list later. Mine are all tucked safely away on a piece of paper that I can't find at the moment.
Try ^\n. If it doesn't work, check maker.ini and make sure the regular expression syntax option is set to Perl.
Lots of interesting information, for which my thanks … but:
Remedy? export to html, convert html tagging to mml, use editor to discard empty paragraphs … Remaining minor pain, reimport the tables by hand.
Oops. My mistake. I wrote:
Find: \p| where I meant \P|
The case makes all the difference. It's in the screen shot correctly, but I typed it incorrectly.
When you have time, could you try it one more time, FieryPantone? I'm testing on 2015 and don't have 12 installed at this point.
\P| does the trick … (curiously enough, so does
\P\p, this morning; the wind must have changed) Thanks for the correction.
I'll bookmark this thread for future reference, though I did quite enjoy the .mml route ;-}
Oh good! I'm horrified that I typed it wrong and you pulled out all the returns—that would be devastating to a new user. I trust you've been around the block and knew to either undo or revert? These are the scenarios that wake me up at night!
Anyway, happy weekend to you, FieryPantone!
FieryPantone and Barb,
\P| may not be what you want
Once again, the wildcard search \P| will find empty paragraphs, but it will also find paragraphs containing only punctuation and paragraphs that begin with punctuation. For example, if you have a paragraph that contains a quotation and hence begins with a quotation mark, changing \P| to the empty text string will delete the opening quotation mark, which is probably not desired.
In more detail, \P matches the beginning of a paragraph (other than the first paragraph in a flow). | matches a sequence of one or more spaces and punctuation. I do not know the complete list of characters that are matched, but it includes spaces, non-breaking spaces, all the special characters on a typical English keyboard except the underscore. A single match can span multiple paragraphs and multiple lines in one paragraph.
The sequence \p| produces results similar to \P|. However, while \P matches the start of a new paragraph, \p matches the end of a paragraph (except the final paragraph in a flow). Thus if you have a three paragraph sequence such as
Changing \P| to nothing will delete the middle line, changing \p| will delete the middle line and the preceding paragraph break, thereby merging the first and third paragraphs into one.
A simple search or wildcard search for \P\p will find empty paragraphs (except the first and last in a flow). Unfortunately, it also finds paragraphs containing only anchors (of tables or of anchored frames) and markers. If your document does not contain tables, anchored frames, or markers, you may want to use this expression. Note that it will not find paragraphs that are visibly empty but contain one or more spaces.
FrameMaker offers three different regular expression options, controlled by maker.ini. Using the default, Perl, while ^$ does not match an empty paragraph, ^\n does. Furthermore, ^ *\n matches a paragraph that is either empty or contains spaces. It still find paragraphs containing anchors for anchored frames (but not table anchors). I too am working on a project that involves converting documents from Microsoft Word. We use conditional text to hide the anchored frames and then use ^[ \t]+ to remove spaces and tabs from the beginning of all paragraphs (which includes deleting these characters in paragraphs that are otherwise empty). Then we use ^\n to remove empty paragraphs. Lastly, we show the anchored frames again.
By the way, I did not find any changes between FM 12 and FM 2015 behavior in this area.
More excellent information, Lynne; thanks!
Very good information and clarification. Thanks, Lynne.
I wrote a a free Extenscript "RemoveEmptyPgf", that deletes all empty paragraphs.(Marked or whole document)
The problem converting documents from Microsoft Word is: you'll get an paragraph that seems to be empty, but it isn. It contains chr(13) which is not displayed. My script just ignores this unprintable character and deletes the paragraph.
But at the moment it's only available in German.
To remove trailing spaces:
It's working great on my docs, but I'd love to hear if either of you discover any issues that I'm not experiencing.
And while I'm sharing queries, here's another one that I'm loving for removing those extra spaces that typists randomly add in the middle of their Word docs (2, 3, 10, whatever!)
Now if only we could save them in the Find/Change panel so that we don't have to keep typing them each day!
Thanks for the extra input! I'll keep you posted – though when I'm reusing content from a Word doc it's usually as part of a complete make-over involving a text editor with a query stack ;-}