Skip to main content
August 13, 2017
Question

How to remove only unused embedded fonts using PDF Library

  • August 13, 2017
  • 5 replies
  • 7388 views

I'm not sure it's the correct forum however I didn't find a forum specific for PDF Library. If there is one, I would love to get a link to it.

Now to the question(s).

I have an automated c# process that's optimizing pdf files that's coming from multiple sources using PDF Library:

    using (Library library = new Library())

    {

        using (PDFOptimizer optimizer = new PDFOptimizer())

        {

            SetOptimizerOptions(optimizer);

            try

            {

                optimizer.Optimize(new Document(sourceFile), targetFile);

            }

            catch(Datalogics.PDFL.LibraryException e)

            {

                throw new PdfException(e);

            }                   

        }

    }

    private void SetOptimizerOptions(PDFOptimizer optimizer)

    {

        optimizer.SetOption(OptimizerOption.MergeDuplicateFonts, true);

        optimizer.SetOption(OptimizerOption.DiscardUnusedForms, true);

        optimizer.SetOption(OptimizerOption.Linearize, true);

        optimizer.SetOption(OptimizerOption.SubsetAllEmbeddedFonts, true);

        optimizer.SetOption(OptimizerOption.RemoveAllBase14Fonts, true);

        optimizer.SetOption(OptimizerOption.RemoveAllEmbeddedFonts, RemoveEmbeddedFonts);

        optimizer.SetOption(OptimizerOption.DiscardOutputIntent, DiscardOutputIntent);

        optimizer.SetOption(OptimizerOption.DiscardStructureTrees, DiscardStructureTrees);

    }

Some of them are using embedded fonts, some do not.

1. Is there a way to find out if embedded fonts are used in the document?

2. Is there a way to remove only embedded fonts that are not used? (I didn't see an OptimizeOption for that)?

This topic has been closed for replies.

5 replies

Legend
August 13, 2017

I think part of the problem is that we are using the word "unused" in different ways. To me an unused font is one that appears in the PDF structure as a font object but is not used to put text on a page (so the unused object can be removed). I suspect you may be using it to mean "one where the PDF looks ok even if the font is not embedded (but the font is not to be removed)." If so, please ignore my advice on how to do this ! But understand your meaning is certainly different from anything the PDF library intends, or their support would expect.

August 13, 2017

Your suspicion is correct... I don't really care if the font is actually used as long as the file looks the same in Acrobat Reader without the font being embedded into it... but I guess for that this font must be installed on the machine used for viewing the file, and since the entire point of embedding fonts is to enable viewing the file even if the fonts are not installed on the target machine... well, I guess we will have to live with embedded fonts...

Legend
August 13, 2017

What I mean is that the work will be wasted, not just long. If you go to a lot of trouble to write code that removes unused embedded fonts - weeks of work it could be - but there weren't any, that's a bit of a waste.

What do you mean by "needed"? Clearly if a system happens to have the same font installed, Acrobat will use it and the file will look identical. There are hundreds of thousands, perhaps millions of fonts, so "standard font" is a bit hard to define. If the font is not embedded, Acrobat will make a substitute font. The spacing will be right but the look may be quite wrong. Is that important? Only you can know. You can verify whether the fonts you remove exist in the system though.

August 13, 2017

I see. Well, thanks again for your time and answers. I guess I'll have to keep the embedded fonts for now.

Legend
August 13, 2017

Actually, 10 fonts is quite likely. That's only 3 fonts, with bold and italic variants. I'd recommend examining this in detail, because you could have a very big project that does nothing. 100K for fonts isn't unusual.  So start with Acrobat's font properties, then see if you can find them all.

Legend
August 13, 2017

Oh, one follow up question. It's entirely possible to have unused font dictionaries, which might or might not be embedded fonts. But what specifically leads you to believe that you have unused embedded fonts? It's possible that the info you are seeing means something a little different.

August 13, 2017

I'm not sure I have unused embedded fonts, but here is why I think it might be a problem: I have a file that came from one of the users that's only 2 pages long and 118k. When optimizing and removing all embeded fonts I get an 18k file but I can't use it since it use at least one embedded font. Checking the file (using code - `document.GetFonts()`) shows a list of 10 embedded fonts - it seems to me highly unlikely that a 2 pages long document is using 10 different fonts.

lrosenth
Adobe Employee
Adobe Employee
August 13, 2017

First off – removing embedding fonts is a really bad idea! You want embedded fonts – which is why they are mandatory for all PDF subset standards (PDF/A, PDF/X, etc.)

Second, does this file have form fields? Annotations? I don’t know what this GetFonts() method does, since it’s a DL feature – so you willo need to ask them. What does Acrobat’s Font listing show?

Legend
August 13, 2017

Generally you'd contact support via your support contract. I believe the C# PDF Optimizer is exclusively from DataLogics? So, no forum but you probably already paid for support.

In general: an unused font is a curious thing. (Whether it is embedded makes no difference to this task). In 32000-1 terms it's something in a font dictionary with no references in the corresponding page stream. A C program using the C interface and Cos layer, with detailed 32000-1 knowledge, can consolidate the two sources of information and remove the unused from the resource dictionaries, descending recursively into form XObjects and patterns and allowing for the rules of resource inheritance. Not a quick or easy task.

August 13, 2017

Thanks for your reply. I guess I'll have to ask my boss about the kind of support we can get from DataLogics, then.

lrosenth
Adobe Employee
Adobe Employee
August 13, 2017

You can use a variety of APIs to find which fonts are used and which are not – and then choose to remove them yourself if you wish. As you iterate fonts, you can get their info.