We are typesetting in InDesign 2021. We are using Minion Pro as base font. Some special characters are not available in Minion font (example: t with the addition of a dot below), so we have used Times New Roman for that particular characters.
Minion font used for text:
Times font used for special character:
There is no issues until PDF Exported from InDesign. The issue raise when we are capturing this content from PDF for Indexing purpose some characters are repeated.
Copying content from PDF into textfile: "bi" text was repeated twice. (mutarābibiṭa)
Here Minion + Times font used
Copying content from PDF into textfile: No issues. (mutarābiṭa)
Here Times font only used
Please check and advice any solution to avoid this.
>> The issue raise when we are capturing this content from PDF for Indexing purpose some characters are repeated.
What are you do? And why are you do this through PDF?
InDesign have it an index feature:
We are not using InDesign functionality for Index.
We are generating output PDF of a book, and sending it to Author. From Author end they are collecting required contents as index terms and sending as word file to us for Typesetting. Finally we are importing the the word into InDesign and creating PDF as Index PDF. This is the process that we are followed for this client.
As per your seggestion, we have turned off Ligature, but the probelm was not resolved. When I'm changing italic formatting to roman font the issue was resolved. But those content need to set in italic format only.
Not able to reproduce.
I set the exact same word in the same fonts and did not have the same issue. It copied and pasted out of the PDF just fine (not that I would recommend that either)
Can you upload a sample PDF where that went wrong for you?
When I'm creating PDF of particular page only, there is no issues. When creation full set PDF around 300 pages, the issue raises.
Relevant to the issue, though perhaps not the resolution here, as it sounds like you are at the end of this project: one of the best font sets for different languages is Google Noto, available in both serif and sans serif. We've used them for many projects in a variety of languages, very full glyph/character coverage: https://fonts.google.com/noto
what is the applied paragraph composer of the text?
Look that up in the Paragraph Styles panel or in the Paragraph panel.
And yes, please provide a sample InDesign document plus the exported PDF.
What is the PDF reading application and if Adobe Acrobat Pro DC what exact function do you use when you copy the text?
Could also be a bug with the PDF reading application.
( ACP )