Copy link to clipboard
Copied
As you can see in the screenshot above, first textframe (on the left) is hyphenating the word "Clean-cut" incorrectly. The text is written in a custom language which uses sample hyphenator plugin from "sdksamples/hyphenator".
The second textframe (on the right) is hyphenating the word correctly without double dashes which is written in built-in language "English: USA".
I wonder why the sample plugin is working incorrectly when coming to hyphenating the words with dashes in them.
------
Here I will leave some code snippets from that plugin and debug logs:
HypAdapter.cpp, HypAdapter::findHyphenationPoints function has this, maybe something is going wrong here:
// Scan the characters and selectively generate hyphenation points.
for (int32 i = 0; rWord.CharCount() > 0; i++, rWord.RemoveFirst())
{
Hyp::HyphenQuality hyphenQuality = IHyphenatedWord::kNoHyphenPoint;
switch (rWord.First().GetValue()) {
case kTextChar_HyphenMinus:
case kTextChar_UnicodeHyphen:
case kTextChar_FigureDash:
case kTextChar_HorizontalBar:
case kTextChar_EmDash:
case kTextChar_EnDash: {
hyphenQuality = IHyphenatedWord::kHardHyphenPoint;
break;
}
case kTextChar_DiscretionaryHyphen: {
hyphenQuality = IHyphenatedWord::kDiscretionaryHyphenPoint;
break;
}
case kTextChar_Solidus:
case kTextChar_ReverseSolidus:
case kTextChar_Ellipse:
case kTextChar_FlushSpace:
case kTextChar_EnSpace:
case kTextChar_EmSpace:
case kTextChar_FigureSpace:
case kTextChar_PunctuationSpace:
case kTextChar_ThinSpace:
case kTextChar_HairSpace:
case kTextChar_HardSpace:
case kTextChar_ZeroSpaceBreak: {
hyphenQuality = IHyphenatedWord::kUnpreferredHyphenPoint;
break;
}
}
if (hyphenQuality == IHyphenatedWord::kNoHyphenPoint) {
continue;
}
hyphenationPoints.push_back(Hyp::HyphenationPoint(i, hyphenQuality));
}
From logs, I can see this:
HypDiagnostic:TraceHyphenatedWord("Clean-cut")-->In
word(ascii)="Clean-cut"
nNthPoint, hyphenIndex, hyphenQuality
0, 5, 90
hyphenatedWord="Clean--cut"
HypDiagnostic:TraceHyphenatedWord()-->Out
Will be waiting for your response, thanks!
Hi @jasuryusupov ,
It is not a problem of dictionary. I implemented custom hyphenation long time back.
You need to tweak the business logic of sdk sample hyphen code.
Sample code assumes there are no physical ("-") hyphen present in the code.
All functions such as IHyphenatedWord->GetFirstPartOfPoint(), IHyphenatedWord-> GetLastPartOfPoint(), IHphenatedWord->GetPartsOfPoints() and other functions should fill out WideString as if no physical "-" is present in the word.
Copy link to clipboard
Copied
If I recall correctly then hyphenation depends upon the dictionary. So you might have to set your own dictionary to get it right. looking at the code of the plugin I do see that we have an option to do that in the setDictionaryFolderPath method.
-Manan
Copy link to clipboard
Copied
This is crazy.
Is there an option to disable dictionary for hyphenation? I've tried doing
hyphenationService->CheckUserDictionary(kFalse);
at the hyp service installation part, but still getting the same incorrect result.
Also, the hyphenation service mode is set to Algorithm, so it shouldn't rely on dictionary lookups I think. Service mode can be seen it from debug trace log:
Hyphenator | HypHyphenationService::Hyphenate(rWord='keyin', serviceMode=Algorithm, nMinTail=1, nMinHead=1, providerHyphStyle=All) | languageID = 348, languageName = Uzbek: Latin
----
Nevertheless I've tried setting valid dictionary file path which is used/managed by custom UserDict manager service, coming from CHLinguistic sample plugin. Added all combinations of words into dict: "clean-cut", "clean" and "cut". Still getting same issue...
----
Built-in English hyphenator is working correctly even when I enter invalid word "Clead-cud". So I don't think its dictionary file related thing..?
Whereas the uzbek word "baxt-saodat" marked as correct (no red underlines), spellchecked algorithimically in custom speller service, is getting hyphenated incorrectly.
Also, when I open the dictionary and press "Hyphenate", I get same results: "-~"
Copy link to clipboard
Copied
Hi @jasuryusupov ,
It is not a problem of dictionary. I implemented custom hyphenation long time back.
You need to tweak the business logic of sdk sample hyphen code.
Sample code assumes there are no physical ("-") hyphen present in the code.
All functions such as IHyphenatedWord->GetFirstPartOfPoint(), IHyphenatedWord-> GetLastPartOfPoint(), IHphenatedWord->GetPartsOfPoints() and other functions should fill out WideString as if no physical "-" is present in the word.
Copy link to clipboard
Copied
Hi @Rahul_Rastogi,
I was thinking on the same lines but stopped at the point that if we remove the hypen in the word during the process of hyphenation, how do we keep a track of putting it back when the hyphenation decides not to break the word.
-Manan
Copy link to clipboard
Copied
@Rahul_Rastogi Thanks for the tip, managed to solve the problem
Copy link to clipboard
Copied
Pleae share the solution so that it helps the next person with this issue,
-Manan