• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How to limit GREP search to words which do not start with certain characters?

New Here ,
Aug 24, 2024 Aug 24, 2024

Copy link to clipboard

Copied

In InDesign, I’m using the GREP expression (?<=.)/(?=.) to locate all occurrences of the slash character / throughout a document. For example, I want to find the character / in Color/Colour or American English/British English in order to apply a certain styling to the slash.

 

As a next step, I want to limit this to all words/strings that do not begin with either https, https or www, so the slashes in https://usa.gov/about or www.gov.uk/about should not be included in the results. Lone slashes should be ignored. How can this be achieved?

 

I have managed to find all words/strings that begin with either http or www with \<www|\<http, however, I’m not able to combine the two.

 

I’ve tried the following, based on an answer to my question on Stack Overflow (https://stackoverflow.com/a/78494627/3103254), which should work with the boost regex engine InDesign GREP is using (https://community.adobe.com/t5/indesign-discussions/grep-what-is-the-base-syntax-of-indesign-grep/td...), however, while this works fine in a testing environment (https://regex101.com/r/a0x0zG/1), this does not seem to work in InDesign: (?<!\S)(?:(?:https?|www)\S+|/+(?!\S))(*SKIP)(*F)|/

TOPICS
How to

Views

254

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 24, 2024 Aug 24, 2024

Copy link to clipboard

Copied

What do you want to do with found text? 

 

Are you changing its contents - by adding / removing characters - or you just want to change formatting / appearance?

 

InDesign uses its own implementation of RegExp. 

 

 

Would be extremely easy to achieve with my tool - first, find everything with "/" then filter out web addresses or limit even more, then do whatever you want to do with what's left - but you are probably looking for a free solution / it's a one-off / you work on a Mac? 

 

 

If it's just styling / formatting - why not do it in two steps: 

1) apply whatever styling / formatting you want to all texts meeting your criteria, 

2) "reset" all web links. 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

Thank you, Robert.

 

Yes, I would love to achive this / learn how to achieve this with InDesign GREP without third-party tools. And yes, I’m working on a Mac.

 

The goal is not to apply styling, but to run a search replace through multiple documents that allows me to

  1. either skip a search result, e.g. if the / is found in Berlin/London, i.e. Word/Word
  2. or replace result, e.g. if the / is found in Mexico City/New York, i.e. Multiple Words/Multiple Words (as in German this should be Mexico City / New York, i.e. with a space on both sides of the /).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

To clarify the context: in the documents there are many instances of URLs starting with https, http or www, which do contain a lot of /, e.g. https://example.org/path/to/subpage In order to cut down the time on the search replace, mentioned above, I’d like to exclude these URLs from the search.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

Do the texts you want to "fix" have multiple ParaStyles applied - or just one - like "body" and "[None]" CharStyle? 

 

Because you could narrow your search by first applying dedicated CharStyle to Hyperlinks - then use Char & ParaStyle combo for your fix.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

Or... You can do as I've suggested earlier, with a slight modification:

 

1) style Hyperlinks with dedicated CharStyle, 

2) do your fix on everything, 

3) remove spaces with CharStyle from 1) applied.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

Or you could use the fact, that you are looking for

 

(lowercase)(/)(Uppercase).

 

so probably something like this - if that's the correct syntax for pos/neg look:

 

(?<=\l)(\/)(?=\u)

 

To:

 

[space]$2[space]

 

Or just:

 

[space]/[space] 

 

Of course change "[space]" to actual space.

 

Web addresses should be lowercase anyway. 

 

 

Or you could search for:

 

(\u\l+)(\/)(\u\l+)

 

to be more precise. 

 

Or maybe even use "\w"? 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

This is a great idea! I’ll use this for now, as this will exclude most URLs (some like e.g.

https://en.wikipedia.org/wiki/Regulatory_capture indeed use upper case letters).

 

Also, there is a small chance that this might exclude edge cases, which are not URLs, but in which names have to be written a certain way, e.g. BRAND/brand or Brand©/“Brand”, but I’d be willing to take the risk. Thank you!

 

I have considered solutions involving character and paragraph styles before, however, I’d also like to understand InDesign’s GREP sytax a litte better, which is why I’m aiming for a GREP-only solution, if possible.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

I've updated my last reply a bit. 

 

You could also include "space" and "punctuations", etc. before / after - as they are not allowed in the Web addresses.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

Not sure I can follow.

 

The lookbehind/lookahead is needed, so only the / would be found, which is why (\u\l+)(\/)(\u\l+) would not work.

 

Using \w in the lookbehind/lookahead would include URLs, again, as well.

 

So far, (?<=\l)(\/)(?=\u) is a great solution, however, a few edge cases I found are

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

You are right, then:

 

(?<=\u\l+)(\/)(?=\u\l+)

 

will find Word/Word so your example URL will be ignored. 

 

Not sure if you can use "exclude" in lookbehind / ahead - but [-\/] placed before / after should limit found results to single "/" between two Words? 

 

Or "?" placed correctly should make GREP not greedy. But it would require adding spaces, puncts, etc. to the mix. 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 25, 2024 Aug 25, 2024

Copy link to clipboard

Copied

LATEST

Can you share your file - on priv of course - for some testing? 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines