Skip to main content
A2D2
Inspiring
January 21, 2025
Question

Font not embedding. PDF alternatives for searchable text and pagination?

  • January 21, 2025
  • 3 replies
  • 1253 views

Hi!

 

The problem...

I have an InDesign document which 100s of pages and different text stories. I can export or print a PDF document which looks just fine. However, the text in the PDF is not searchable. I am not sure why. Maybe it is down to embedding restrictions in the font or the way the PDF engine is handling stylistic alternates as I do not get this problem with other fonts. The font is a Unicode font, by the way.

 

The issue...

Although the text can be viewed as a PDF, it needs to be indexed by a 3rd-party without InDesign or the fonts. Ideally, they would be able to work from the PDF searching for words and phrases using the search feature and then noting the page number.

 

A solution?

For the purposes of creating an index what would be the best work around in lieu of a PDF? Exporting to rich text or plain text would be an option if the page numbers could be noted. But what would the procedure be for a script? 

 

Possbile procedure in words...

  1. Export to text file
  2. Create text file
  3. Write "---Start of page n---" in file, where n is the page number
  4. Export text of page n to file
  5. Write "---End of page n---" in file
  6. Write "---Start of page n+1---" in file, where n is the page number
  7. Export text of page n+1 to file
  8. Write "---End of page n+1---" in file
  9. etc. until end of InDesign document

 

Exporting text to a text file is easy through the user menu but, of course, the pagination is lost so I am trying to retain a sense of the pagination.

 

 

3 replies

Robert at ID-Tasker
Legend
January 21, 2025

@A2D2 

 

There are many ways to skin proverbial cat - and all of them depends on the preferences of the person / people doing this job.

 

Your text can be export in pieces - all texts from the page as a separate blocks.

 

Or...

 

... you can give them - or they can buy - InCopy - $4.99/month or something like that...

 

InCopy is a limited in functionality version of InDesign - so they could add words / phrases as they go - including adding all occurences of the same word / phrase from the whole document...

 

A2D2
A2D2Author
Inspiring
January 23, 2025

Yes, I think exporting each page as plain or rich text separately is the way to go. The problem is collating the information. Say, if I have a 500 page document and each page is exported as a separate text file. Then I have to run a separate operation to put all the text together in one file, but clearly demaracted with page numbers and/or separators.

 

If I use a script or ID Tasker, can I export text from an InDesign document page by page or does it have to be frame by frame or story by story? If story by story it would mean I have to break the threads between the frames which is not a good option.

 

Thanks for the InCopy suggestion but that would also mean the indexer has to license the font also.

A2D2
A2D2Author
Inspiring
January 23, 2025
quote

The issue...

Although the text can be viewed as a PDF, it needs to be indexed by a 3rd-party without InDesign or the fonts. Ideally, they would be able to work from the PDF searching for words and phrases using the search feature and then noting the page number.


By @A2D2

 

This, from your opening post, would suggest, that your indexing people would do index manually - copy phrase and add page number?

 

Or are you looking for a way to use InDesign's built in mechanism - where text from the index is linked with the occurence in the text?

 


Yes, that's right, as you said it: the text exported from InDesign needs to be searchable e.g. for manual indexing (I am not looking to use InDesign's indexing feature)

Robert at ID-Tasker
Legend
January 21, 2025

@A2D2 

 

Can you share your PDF? Even one page with text you think should be searchable.

 

Maybe you are exporting with such a low version of a PDF - and you have a lot of transparencies - that your text is outlined?

 

A2D2
A2D2Author
Inspiring
January 23, 2025

Hi @31971975 . I appreciate the input. The output to PDF definitely renders the InDesign text as text and not as outline. In fact I even tried running an OCR on the PDF to see if it would help but the error was that the "page contains renderable text".

Mike Witherell
Community Expert
Community Expert
January 21, 2025

There is a lot that is unclear to me:

How are you making the PDF, exactly? 

Are you using Export to PDF (preferred) and not Print to Adobe PDF virtual printer driver (not preferred)?

What settings are you using when exporting to PDF?

By Unicode font, do you mean OpenType font?

How do you know if the font has an embedding restriction? It would say so in the font licencing information.

If your indexer/editor cannot work within InDesign, does that mean the person is untrained in how to use computer programs like InDesign? Are they familiar with Acrobat Pro? Are you really saying they need to use pencil and paper or type up their notes in a Word doc?

Deciding what should be indexed and what should not is a largely editorial manual process. Someone is going to be combing thru and reading and taking notes. If that person did so in Acrobat PDF Comments, it could come back to you the InDesign user, and you would have a list of comments to slowly translate into an InDesign Index.

Mike Witherell
A2D2
A2D2Author
Inspiring
January 23, 2025

Hi Mike. I could go into the details of how the PDF was produced but it might be a bit of a distraction as the post is asking how to work around the issue that might be experienced by different users in different ways. In my case it does indeed seem there is some restriction in the font with regard to embedding.

 

But for the sake of completeness, I tried making the PDF in different ways i.e. exporting to PDF, printing to Adobe PDF, printing to Microsoft PDF, printing to Postscript. Subsets of the fonts are embedded as shown in the PDF properties. On one iteration I reduced the percentage from 100% to 1% in File>Export>Export Adobe PDF>Advanced.

 

The font designer says that due to the complexity of the font (an Arabic font), InDesign makes some internal changes. I imagine this to be for stylistic alternates, ligatures and justification characters (kashida) as mentioned in the original post.

 

For my case it is interesting to copy a few words directly from InDesign as plain text (it is from the first line of the Arabic placeholder text):

لق البرناول بالترغب لا خلا الطبالتستطيع

 

The same string copied from the PDF is as follows:

ع􀘚􀉑ط􀋪ت�􀉑س􀋉ت􀉙 ب�ال 􀉑ط􀋪ل􀌼 خ لا ا 􀊑 ب لا 􀉷غ�
􀋰
ر􀉪 ال ت 􀉍 اول ب 􀉍ن ر􀉪 ال ب 􀌌ل ق 􀌼

 

By Unicode font, I mean a font which uses Unicode mapping for the glyphs.