Font not embedding. PDF alternatives for searchable text and pagination?

Report · Jan 21, 2025

Hi!

The problem...

I have an InDesign document which 100s of pages and different text stories. I can export or print a PDF document which looks just fine. However, the text in the PDF is not searchable. I am not sure why. Maybe it is down to embedding restrictions in the font or the way the PDF engine is handling stylistic alternates as I do not get this problem with other fonts. The font is a Unicode font, by the way.

The issue...

Although the text can be viewed as a PDF, it needs to be indexed by a 3rd-party without InDesign or the fonts. Ideally, they would be able to work from the PDF searching for words and phrases using the search feature and then noting the page number.

A solution?

For the purposes of creating an index what would be the best work around in lieu of a PDF? Exporting to rich text or plain text would be an option if the page numbers could be noted. But what would the procedure be for a script?

Possbile procedure in words...

Export to text file
Create text file
Write "---Start of page n---" in file, where n is the page number
Export text of page n to file
Write "---End of page n---" in file
Write "---Start of page n+1---" in file, where n is the page number
Export text of page n+1 to file
Write "---End of page n+1---" in file
etc. until end of InDesign document

Exporting text to a text file is easy through the user menu but, of course, the pagination is lost so I am trying to retain a sense of the pagination.

Report · Jan 21, 2025

There is a lot that is unclear to me:

How are you making the PDF, exactly?

Are you using Export to PDF (preferred) and not Print to Adobe PDF virtual printer driver (not preferred)?

What settings are you using when exporting to PDF?

By Unicode font, do you mean OpenType font?

How do you know if the font has an embedding restriction? It would say so in the font licencing information.

If your indexer/editor cannot work within InDesign, does that mean the person is untrained in how to use computer programs like InDesign? Are they familiar with Acrobat Pro? Are you really saying they need to use pencil and paper or type up their notes in a Word doc?

Deciding what should be indexed and what should not is a largely editorial manual process. Someone is going to be combing thru and reading and taking notes. If that person did so in Acrobat PDF Comments, it could come back to you the InDesign user, and you would have a list of comments to slowly translate into an InDesign Index.

Mike Witherell

Report · Jan 23, 2025

Hi Mike. I could go into the details of how the PDF was produced but it might be a bit of a distraction as the post is asking how to work around the issue that might be experienced by different users in different ways. In my case it does indeed seem there is some restriction in the font with regard to embedding.

But for the sake of completeness, I tried making the PDF in different ways i.e. exporting to PDF, printing to Adobe PDF, printing to Microsoft PDF, printing to Postscript. Subsets of the fonts are embedded as shown in the PDF properties. On one iteration I reduced the percentage from 100% to 1% in File>Export>Export Adobe PDF>Advanced.

The font designer says that due to the complexity of the font (an Arabic font), InDesign makes some internal changes. I imagine this to be for stylistic alternates, ligatures and justification characters (kashida) as mentioned in the original post.

For my case it is interesting to copy a few words directly from InDesign as plain text (it is from the first line of the Arabic placeholder text):

لق البرناول بالترغب لا خلا الطبالتستطيع

The same string copied from the PDF is as follows:

ع􀘚􀉑ط􀋪ت�􀉑س􀋉ت􀉙 ب�ال 􀉑ط􀋪ل􀌼 خ لا ا 􀊑 ب لا 􀉷غ�
􀋰
ر􀉪 ال ت 􀉍 اول ب 􀉍ن ر􀉪 ال ب 􀌌ل ق 􀌼

By Unicode font, I mean a font which uses Unicode mapping for the glyphs.

Report · Jan 21, 2025

@A2D2

Can you share your PDF? Even one page with text you think should be searchable.

Maybe you are exporting with such a low version of a PDF - and you have a lot of transparencies - that your text is outlined?

Report · Jan 23, 2025

Hi @rOB . I appreciate the input. The output to PDF definitely renders the InDesign text as text and not as outline. In fact I even tried running an OCR on the PDF to see if it would help but the error was that the "page contains renderable text".

Report · Jan 21, 2025

@A2D2

There are many ways to skin proverbial cat - and all of them depends on the preferences of the person / people doing this job.

Your text can be export in pieces - all texts from the page as a separate blocks.

Or...

... you can give them - or they can buy - InCopy - $4.99/month or something like that...

InCopy is a limited in functionality version of InDesign - so they could add words / phrases as they go - including adding all occurences of the same word / phrase from the whole document...

Report · Jan 21, 2025

https://www.adobe.com/products/incopy.html

Report · Jan 23, 2025

Yes, I think exporting each page as plain or rich text separately is the way to go. The problem is collating the information. Say, if I have a 500 page document and each page is exported as a separate text file. Then I have to run a separate operation to put all the text together in one file, but clearly demaracted with page numbers and/or separators.

If I use a script or ID Tasker, can I export text from an InDesign document page by page or does it have to be frame by frame or story by story? If story by story it would mean I have to break the threads between the frames which is not a good option.

Thanks for the InCopy suggestion but that would also mean the indexer has to license the font also.

Report · Jan 23, 2025

An indexer needs searchable text and to see pagination but not necessarily formatting. I am sure others can suggest applications where this priority is important and that my own application is not completely idiosyncratic such that we need to worry about the manner of producing the PDF or the method of the indexer.

For example, if the text of a novel, scholarly book, or legal document goes into a databse for archiving the format would be less important but the text would need to be searchable and pagination would need to be identified.

So the question is, when PDF is not a suitable option, how do we get paginated textual output from InDesign?

Report · Jan 23, 2025

The issue...

Although the text can be viewed as a PDF, it needs to be indexed by a 3rd-party without InDesign or the fonts. Ideally, they would be able to work from the PDF searching for words and phrases using the search feature and then noting the page number.

By @A2D2

This, from your opening post, would suggest, that your indexing people would do index manually - copy phrase and add page number?

Or are you looking for a way to use InDesign's built in mechanism - where text from the index is linked with the occurence in the text?

Report · Jan 23, 2025

Yes, that's right, as you said it: the text exported from InDesign needs to be searchable e.g. for manual indexing (I am not looking to use InDesign's indexing feature)

Report · Jan 23, 2025

Yes, that's right, as you said it: the text exported from InDesign needs to be searchable e.g. for manual indexing (I am not looking to use InDesign's indexing feature)

By @A2D2

Then I really don't see a problem 🙂

You can have, for example, a Table in WORD, with all texts from each page as a seprate cell in a Table - with info about the page.

Here is an example - it exports Tables but can export anything:

Report · Jan 23, 2025

So if there is a story that flows across multiple pages, IDT can still copy the text page by page? How is that specified. Do you need to break the threads between the text frames first?

Report · Jan 23, 2025

So if there is a story that flows across multiple pages, IDT can still copy the text page by page? How is that specified. Do you need to break the threads between the text frames first?

By @A2D2

IDT can do anything 😉

IDT won't do anything to your text in the InDesign document - as we only want to EXTRACT information - so IDT will select each TextFrame and export its contents - either with or WITHOUT formatting.

If you'll have multiple TextFrames from multiple Stories running together on pages - it's just a case of sorting and/or extracing additional information about the Parent Story that particular TF belongs to - so your indexers will know which parts of the text are from the same Story.

Or applying different colors to different Stories. Or pasting them to separate cells withing the destination Table. Etc. Whatever you prefer.

Or each Story can be exported separately for clarity - separate WORD documents.

Think of it as you would do it manually:

1) go to page,

2) select contents of the TextFrame,

3) CTRL+C,

4) switch to a destination where you want to have this text copied to - 2nd document in InDesign, WORD/Excel application opened together on your computer,

5) CTRL+V - of course this step will require adding extra information - page number, extra info about the Story, so text won't get mixed-up,

6) go to step 1) untill you get to the end of the Story - or Pages.

And that's how IDT works - it just replicates manual steps you would do.

Report · Jan 23, 2025

You can of course, limit the scope of the export by filtering layers - if you've multiple languages - or by applied Parent Page - or even applied color to the layer.

Or if TF is part of the group, or is Anchored / InLined, etc.

Or only export texts from specific Cells from Tables on specific layers with applied specific Cell / Table Style AND ParaStyle.

Whatever you can imagine - and more.

But, as you know, IDT is PC only and not free - but I can always give anyone access to the full version for free for testing.

Report · Jan 23, 2025

As a sidenote - there is only one small step to custom XML / HTML / ePUB / etc. export...

Report · Jan 23, 2025

I looked at the YouTube video you linked above. I understand the 6 steps and the idea that IDT just works through the manual steps taken by the user.

In IDT I can see how to load and select the required text frames.

However, what is the pathway for "select contents of the TextFrame"

How to you switch to another the document to paste the contents of the clipboard?

How do you switch back to the InDesign document?

How do you loop a routine and then tell IDT to stop?

Report · Jan 23, 2025

In IDT, what is the pathway for "select contents of the TextFrame"

When all text frames from a document are loaded and then checked in IDT, how do you select contents of the TextFrame. I only see select paragraphs

By @A2D2

Last option on the list:

Please download latest version.

Report · Jan 24, 2025

I looked at the YouTube video you linked above. I understand the 6 steps and the idea that IDT just works through the manual steps taken by the user.

In IDT I can see how to load and select the required text frames.

However, what is the pathway for "select contents of the TextFrame"

When you run Task in the BatchMode - ID-Tasker will automatically select object from the list - which makes the object from the list "ready for processing".

If you want to select contents - you need to add rule / command to your Task:

MISC / MODIFY CURRENT SELECTION / OBJECT - TEXT CONTENTS

The "modus operandi" of ID-Tasker is quite simple - every rule / command expects something to be selected in InDesign - unless rule will select something or do something unrelated to InDesign.

How to you switch to another the document to paste the contents of the clipboard?

In this case, you just need to select "destination" object - TextFrame for the texts you want to collect - and execute another Task, with only one Rule, before you run BatchMode with your "main" Task - and the rule is:

OBJECT / REMEMBER CURRENTLY SELECT OBJECT

This will "remember" internally this object and whenever you execute rule:

OBJECT / SELECT LAST SAVED OBJECT

ID-Tasker will select this object - no matter if it's in the same document or another document - exactly the same way as if you would click CTRL+TAB on your keyboard to switch between documents.

How do you switch back to the InDesign document?

As above - ID-Tasker will do the switching automatically - will select next object on the list for processing - and switch to the correct document.

How do you loop a routine and then tell IDT to stop?

When you run Task in BatchMode - ID-Tasker processes objects on the list from the first-to-last or from the last-to-first - sometimes order of processing is very important - you just need to select / check elements on the list for processing - either select manually or by using another "pre-processing" Task - everything in ID-Tasker - sorting & filtering, switching between lists, adding new columns/object properties, etc. - can also be automated.

BatchMode is kind of "double click on the item on the list (TYPE column) to select it in InDesign" and "click Play to execute Task" - repeated on every selected & visible item on the list.

The main goal of ID-Tasker is to minimise mundane clicking - wheter it's in InDesign or ID-Tasker itself.

Report · Jan 23, 2025

I could accomodate ID-Tasker to whatever way your indexing people would prefer.

And I really mean "watever way".

As I have full control over the contents of the INDD file - I can create a custom export.

Then it's just a case of how your Indexing people would like / prefer to work with this exported text - be it a website, WORD, Excel, dedicated application, etc.