How to remove only unused embedded fonts using PDF Library

New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

I'm not sure it's the correct forum however I didn't find a forum specific for PDF Library. If there is one, I would love to get a link to it.

Now to the question(s).

I have an automated c# process that's optimizing pdf files that's coming from multiple sources using PDF Library:

    using (Library library = new Library())

    {

        using (PDFOptimizer optimizer = new PDFOptimizer())

        {

            SetOptimizerOptions(optimizer);

            try

            {

                optimizer.Optimize(new Document(sourceFile), targetFile);

            }

            catch(Datalogics.PDFL.LibraryException e)

            {

                throw new PdfException(e);

            }                   

        }

    }

    private void SetOptimizerOptions(PDFOptimizer optimizer)

    {

        optimizer.SetOption(OptimizerOption.MergeDuplicateFonts, true);

        optimizer.SetOption(OptimizerOption.DiscardUnusedForms, true);

        optimizer.SetOption(OptimizerOption.Linearize, true);

        optimizer.SetOption(OptimizerOption.SubsetAllEmbeddedFonts, true);

        optimizer.SetOption(OptimizerOption.RemoveAllBase14Fonts, true);

        optimizer.SetOption(OptimizerOption.RemoveAllEmbeddedFonts, RemoveEmbeddedFonts);

        optimizer.SetOption(OptimizerOption.DiscardOutputIntent, DiscardOutputIntent);

        optimizer.SetOption(OptimizerOption.DiscardStructureTrees, DiscardStructureTrees);

    }

Some of them are using embedded fonts, some do not.

1. Is there a way to find out if embedded fonts are used in the document?

2. Is there a way to remove only embedded fonts that are not used? (I didn't see an OptimizeOption for that)?

TOPICS
Acrobat SDK and JavaScript

Views

3.0K

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

How to remove only unused embedded fonts using PDF Library

New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

I'm not sure it's the correct forum however I didn't find a forum specific for PDF Library. If there is one, I would love to get a link to it.

Now to the question(s).

I have an automated c# process that's optimizing pdf files that's coming from multiple sources using PDF Library:

    using (Library library = new Library())

    {

        using (PDFOptimizer optimizer = new PDFOptimizer())

        {

            SetOptimizerOptions(optimizer);

            try

            {

                optimizer.Optimize(new Document(sourceFile), targetFile);

            }

            catch(Datalogics.PDFL.LibraryException e)

            {

                throw new PdfException(e);

            }                   

        }

    }

    private void SetOptimizerOptions(PDFOptimizer optimizer)

    {

        optimizer.SetOption(OptimizerOption.MergeDuplicateFonts, true);

        optimizer.SetOption(OptimizerOption.DiscardUnusedForms, true);

        optimizer.SetOption(OptimizerOption.Linearize, true);

        optimizer.SetOption(OptimizerOption.SubsetAllEmbeddedFonts, true);

        optimizer.SetOption(OptimizerOption.RemoveAllBase14Fonts, true);

        optimizer.SetOption(OptimizerOption.RemoveAllEmbeddedFonts, RemoveEmbeddedFonts);

        optimizer.SetOption(OptimizerOption.DiscardOutputIntent, DiscardOutputIntent);

        optimizer.SetOption(OptimizerOption.DiscardStructureTrees, DiscardStructureTrees);

    }

Some of them are using embedded fonts, some do not.

1. Is there a way to find out if embedded fonts are used in the document?

2. Is there a way to remove only embedded fonts that are not used? (I didn't see an OptimizeOption for that)?

TOPICS
Acrobat SDK and JavaScript

Views

3.0K

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Aug 13, 2017 0
Most Valuable Participant ,
Aug 13, 2017

Copy link to clipboard

Copied

Generally you'd contact support via your support contract. I believe the C# PDF Optimizer is exclusively from DataLogics? So, no forum but you probably already paid for support.

In general: an unused font is a curious thing. (Whether it is embedded makes no difference to this task). In 32000-1 terms it's something in a font dictionary with no references in the corresponding page stream. A C program using the C interface and Cos layer, with detailed 32000-1 knowledge, can consolidate the two sources of information and remove the unused from the resource dictionaries, descending recursively into form XObjects and patterns and allowing for the rules of resource inheritance. Not a quick or easy task.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

Thanks for your reply. I guess I'll have to ask my boss about the kind of support we can get from DataLogics, then.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
Adobe Employee ,
Aug 13, 2017

Copy link to clipboard

Copied

You can use a variety of APIs to find which fonts are used and which are not – and then choose to remove them yourself if you wish. As you iterate fonts, you can get their info.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

I'm sorry, but "a variety of APIs" is way too general for me. I have searched and tested anything I can think of. Can you be more specific please?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
Adobe Employee ,
Aug 13, 2017

Copy link to clipboard

Copied

Well, you can either use the PDPageEnum methods to get all the font resources – but that will only list whether they are there are not. If you want to see what fonts are actually being used, then you need to use the PDEdit APIs to iterate over all content, find the ones of type pdeText and then get their font. Again – this is using the Adobe PDFLibrary C++ APIs…whatever you may be using from DL via C# is on them.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

First, Thank you both for your time and answers.


I know it seems like a lot of work for just a 100 KB, but I'm handling about 400-600 of these files for each user, and have a lot of uses (I don't even have an estimate). From my tests, Most files are displayed correctly after optimization (with removing all embedded fonts), I guess they use standard fonts. In my main test there are 486 files, most of them are about 100 - 200 KB. from these files, only 3 actually needs the embedded fonts. This means if I can find a way to find what files actually needs the embedded fonts, I can gain about 70-100 KB for most of my files. In a 600 files folder, that's a lot of space I can save.

Is there a way I can determine (in code) if a file actually needs it's embedded fonts? If I can do that, I'm happy to keep embedded fonts when they are actually needed.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
Most Valuable Participant ,
Aug 13, 2017

Copy link to clipboard

Copied

Oh, one follow up question. It's entirely possible to have unused font dictionaries, which might or might not be embedded fonts. But what specifically leads you to believe that you have unused embedded fonts? It's possible that the info you are seeing means something a little different.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

I'm not sure I have unused embedded fonts, but here is why I think it might be a problem: I have a file that came from one of the users that's only 2 pages long and 118k. When optimizing and removing all embeded fonts I get an 18k file but I can't use it since it use at least one embedded font. Checking the file (using code - `document.GetFonts()`) shows a list of 10 embedded fonts - it seems to me highly unlikely that a 2 pages long document is using 10 different fonts.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
Adobe Employee ,
Aug 13, 2017

Copy link to clipboard

Copied

First off – removing embedding fonts is a really bad idea! You want embedded fonts – which is why they are mandatory for all PDF subset standards (PDF/A, PDF/X, etc.)

Second, does this file have form fields? Annotations? I don’t know what this GetFonts() method does, since it’s a DL feature – so you willo need to ask them. What does Acrobat’s Font listing show?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
Most Valuable Participant ,
Aug 13, 2017

Copy link to clipboard

Copied

Actually, 10 fonts is quite likely. That's only 3 fonts, with bold and italic variants. I'd recommend examining this in detail, because you could have a very big project that does nothing. 100K for fonts isn't unusual.  So start with Acrobat's font properties, then see if you can find them all.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
Most Valuable Participant ,
Aug 13, 2017

Copy link to clipboard

Copied

What I mean is that the work will be wasted, not just long. If you go to a lot of trouble to write code that removes unused embedded fonts - weeks of work it could be - but there weren't any, that's a bit of a waste.

What do you mean by "needed"? Clearly if a system happens to have the same font installed, Acrobat will use it and the file will look identical. There are hundreds of thousands, perhaps millions of fonts, so "standard font" is a bit hard to define. If the font is not embedded, Acrobat will make a substitute font. The spacing will be right but the look may be quite wrong. Is that important? Only you can know. You can verify whether the fonts you remove exist in the system though.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

I see. Well, thanks again for your time and answers. I guess I'll have to keep the embedded fonts for now.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
Most Valuable Participant ,
Aug 13, 2017

Copy link to clipboard

Copied

I think part of the problem is that we are using the word "unused" in different ways. To me an unused font is one that appears in the PDF structure as a font object but is not used to put text on a page (so the unused object can be removed). I suspect you may be using it to mean "one where the PDF looks ok even if the font is not embedded (but the font is not to be removed)." If so, please ignore my advice on how to do this ! But understand your meaning is certainly different from anything the PDF library intends, or their support would expect.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0
New Here ,
Aug 13, 2017

Copy link to clipboard

Copied

Your suspicion is correct... I don't really care if the font is actually used as long as the file looks the same in Acrobat Reader without the font being embedded into it... but I guess for that this font must be installed on the machine used for viewing the file, and since the entire point of embedding fonts is to enable viewing the file even if the fonts are not installed on the target machine... well, I guess we will have to live with embedded fonts...

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 13, 2017 0