Skip to main content
dublove
Legend
May 8, 2025
Answered

Can anyone explain the respective roles of ~k and \K with examples?

  • May 8, 2025
  • 2 replies
  • 428 views

~k is a discretionary Line Break, I've tried it but I don't know what it's used for.

 

\K is something I learned last year, what is it called, backwards? I've read a couple of postings, maybe it's a language difference, and haven't been able to understand the same yet.


I've just been depressed as to why (? <=\d). works, but (? <=\d+). can't be realized.
\K is used to solve the problem of  "(? <=\d+)."?

 

Thanks

 

Correct answer Eugene Tyson

It sounds like you're working through some tricky regex concepts, and I can see where the confusion might be coming from. Let’s break it down.

~k - Discretionary Line Break is used as a discretionary line break in certain environments, particularly for typesetting. In simple terms, it's a hint for where the line can be split, but the break isn't enforced unless necessary (like at the end of a line). It’s not something you'll use often in regular expressions, but it can come in handy when formatting text that might need soft breaks..
Say for a really really long URL - you might want to put in a discretionary line break so that it breaks at logical points - and if the text reflows/changes size/tracking/scaling etc that the Line break appears/disappears as you adjust instead of breaking at illogical places. It can be used for many instances, not just URLs.

 

\K is a relatively lesser-known regex feature, sometimes referred to as a "reset" or "keep" marker. It basically lets you reset the match everything before \K is "forgotten," and what follows it is what gets captured in the match. It’s particularly useful for situations where you want to exclude certain characters from the final match while still working with them for the purposes of the pattern.

 

For example, if you're trying to match a string but exclude a specific part of it, \K helps you reset the starting point.

 

Here’s an example:

abc\Kdef

This matches "def" but only returns "def" because "abc" is reset by \K.

 

The (?<=\d) vs (?<=\d+) issue
This is a tricky one. You’re running into a common regex behaviour involving lookbehinds.

(?<=\d) works because it checks if there's a digit just before your match (i.e., a "positive lookbehind" for a single digit).

(?<=\d+) doesn't work because lookbehinds in most engines (including InDesign's) require a fixed-length pattern. The \d+ part (any number of digits) is variable-length, and regex engines can’t handle that in a lookbehind.

 

This is where \K could help, as it allows you to work around the limitations of variable-length lookbehinds by allowing the reset to happen after you've matched something, while still keeping the important part.

 

You could try play with InDesign regex expressions

So this 

(?<=\d\b).+

 

Could do the same as 

\d+\K.+

 

But lookbehinds and lookaheads with  a + are not available in most Regex as explained earlier.

2 replies

Eugene TysonCommunity ExpertCorrect answer
Community Expert
May 8, 2025

It sounds like you're working through some tricky regex concepts, and I can see where the confusion might be coming from. Let’s break it down.

~k - Discretionary Line Break is used as a discretionary line break in certain environments, particularly for typesetting. In simple terms, it's a hint for where the line can be split, but the break isn't enforced unless necessary (like at the end of a line). It’s not something you'll use often in regular expressions, but it can come in handy when formatting text that might need soft breaks..
Say for a really really long URL - you might want to put in a discretionary line break so that it breaks at logical points - and if the text reflows/changes size/tracking/scaling etc that the Line break appears/disappears as you adjust instead of breaking at illogical places. It can be used for many instances, not just URLs.

 

\K is a relatively lesser-known regex feature, sometimes referred to as a "reset" or "keep" marker. It basically lets you reset the match everything before \K is "forgotten," and what follows it is what gets captured in the match. It’s particularly useful for situations where you want to exclude certain characters from the final match while still working with them for the purposes of the pattern.

 

For example, if you're trying to match a string but exclude a specific part of it, \K helps you reset the starting point.

 

Here’s an example:

abc\Kdef

This matches "def" but only returns "def" because "abc" is reset by \K.

 

The (?<=\d) vs (?<=\d+) issue
This is a tricky one. You’re running into a common regex behaviour involving lookbehinds.

(?<=\d) works because it checks if there's a digit just before your match (i.e., a "positive lookbehind" for a single digit).

(?<=\d+) doesn't work because lookbehinds in most engines (including InDesign's) require a fixed-length pattern. The \d+ part (any number of digits) is variable-length, and regex engines can’t handle that in a lookbehind.

 

This is where \K could help, as it allows you to work around the limitations of variable-length lookbehinds by allowing the reset to happen after you've matched something, while still keeping the important part.

 

You could try play with InDesign regex expressions

So this 

(?<=\d\b).+

 

Could do the same as 

\d+\K.+

 

But lookbehinds and lookaheads with  a + are not available in most Regex as explained earlier.

dublove
dubloveAuthor
Legend
May 8, 2025

Hi Eugene Tyson

This explanation of yours is too let information.
It's very specific and to the point.

Community Expert
May 8, 2025

Thank you - no problem - your issues you face bring up interesting challenges. Keep them coming 😄 

Joel Cherney
Community Expert
Community Expert
May 8, 2025

Do you ever see any southeast Asian languages in your work? Thai, Khmer, Lao, Burmese? Much like Chinese, these languages don't use spaces between words. (They use spaces like commas or pauses between phrases, if they use spaces at all.)  However, in these languages, you're not allowed to hyphenate or break words in the middle when wrapping to the next line. Also, web browsers and operating systems usually had poor support for SE Asian languages, up until around ten years ago. So the way that your Burmese translator can make their text word-wrap correctly when you're working in InDesign and changing the width of the margin is: to insert a zero-width space in between words. That's its Unicode name, "ZERO WIDTH SPACE." No idea why Adobe decided to call it a "discretionary line break".

 

Sometimes people use them to ensure that a very long URL only breaks in a few predetermined places, but with no hyphenation or spaces. 

 

www.thisURLisfartoolongtofitanarrowcolumnbutIhatehyphenation.com 

 

www.this|URL|is|far|too|long|to|fit|a|narrow|column|but|I|hate|hyphenation.com 

 

If you were to replace all of those | with zero-width spaces, then you could put that URL into a narrow column, and it would only wrap where you see the | characters. 

 

www.thisURLisfartoolongtofitanarrow

columnbutIhatehyphenation.com