Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
2

Help with PDF parsing

New Here ,
Apr 19, 2024 Apr 19, 2024

Hello.

I’m a developer at a mobile app for embroidery, Cross Stitch Magic Embroidery. We’re working with a PDF file and recognise the symbols of the cross stitch patterns. Our service Java based. We faced some problems during work with the PDF file. We are seeking your help.

 

The user's way starts from uploading a PDF cross stitch pattern to our application, then we start parsing this PDF and transform it to our electronic editor where the user can highlight  the symbols.

Example is here: 

Katsiaryna36846969q3lr_0-1713519594481.png

 



So the problem lies in using org.apache.pdfbox and sometimes we have the PDF with the special symbols, which are located in Private Use Areas. And we don't understand how to extract them from PDF. 

We see them in a PDF file but don't see them while parsing. 

 

There is the code:



PDFontDescriptor fontDescriptor = pdFont.getFontDescriptor(); if (fontDescriptor != null) { PDStream fontFile2 = fontDescriptor.getFontFile2(); if (fontFile2 != null) { byte[] embeddedFontArray = fontFile2.toByteArray(); try (InputStream embeddedFontStream = new ByteArrayInputStream(embeddedFontArray)) { String embeddedFontFilename = UUID.randomUUID() + TTF;

 

minioRepository.putInputStreamObject(embeddedFontFilename, fontsBucketName, embeddedFontStream); log.info(EMBEDDED_FONT + fontDescriptor.getFontName()); return embeddedFontFilename; } } }

 

But, if some symbols are in PUA, we get NOT an empty byte[] - but there is no understandable font there. 

So we save this font in Minio (database)  and the mobile app takes it, but cannot display anything. 

 

Is there any way to extract symbols from Private Use Areas? 

We’ll be grateful for your help!

 

TOPICS
Create PDFs , Edit and convert PDFs , JavaScript , PDF , PDF forms
795
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 19, 2024 Apr 19, 2024

As you are not using Acrobat, but a third party tool, you need to ask the question in their support forum. 

 

This forum is specifically about Acrobat, not about the underlying PDF file format. The PDF file format, while initially developed by Adobe, is now an ISO standard, and not anymore a private format. 

 

(This does not mean that you can't get an answer here. But you will be better off with the support forum of your tool distribution)

ABAMBO | Hard- and Software Engineer | Photographer
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 19, 2024 Apr 19, 2024
LATEST

Thank you! We'll try!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines