Skip to main content
Participant
September 22, 2021
Question

PDF Embed API: Linearized PDFs not displaying first page before rest is loaded

  • September 22, 2021
  • 2 replies
  • 2925 views

Hi.

We are using the PDF embed API to display PDFs to our end users.

We are requesting the PDF from a client CDN, which delivers the PDF in a byte-stream. 


I have looked up both the documentation regarding linearized PDFs, as well as the sample demo on github. As many of our requested PDF are large, it is crucial to fix this issue to improve use experience.

All the promises are resolved in what i believe is the correct order: 
getInfo => Metadata (which returns fileSize)
getInitialBuffer => Initial 1024 bits (which returns a buffer: [ArrayBuffer(1025)])
getFileBufferRanges => fetches the requested ranges (awaits all promises, returnes bufferList: ArrayBuffer[])

Below are some images of our setup, as well as the header for the requested PDF.

Thanks in advance!

 

Header for the requested PDF:

Code using the URL directly, letting the API handle needed promises:

Code using a promise with linearizationObject (separated in two images):

 

    2 replies

    Legend
    September 30, 2021

    What are your timings right now?

    How long does the request to load the entire file take?

    How long does it take before ANY info is shown on the page?

     

    Why do you expect a different pattern of network requests, and what difference are you looking for?

    You say "preview" - what are you expecting to see on screen, that is different from a non-linearized file?

    Participant
    September 30, 2021

    Hi.

    Thanks again for the reply.

    As you suggested, i have tried with a much larger PDF:
    Timings (network request below):

    • Whole file: 46s  for 90.1MB
    • PDF metadata: 73ms for 8.3kB
    • Initial buffer: 116ms for initial 1024B
    • Range request: 86ms for 11.2kB (i see that the range received from the EmbedAPI is always 10240B,  is this a coincidence? I would expect this to locate value for the number of bytes for the first page in the PDF header)

    Network requests:

     

    The embedded window stays at 0% (see image below) until the whole file is received, then it displays the file. In other words: for this particular PDF, the file is shown after 46 seconds. The window with the loader is shown as soon as the script is loaded and attached to the div. 

    I am not necessarily looking for a different pattern for the network request. The current behaviour is expected for non-linearized PDFs, but I am expecting a different behaviour when enabling linearization support for the API, as the documentation specificaly says, and i quote, "Linearization is an approach to optimize PDFs for faster viewing by displaying the first page as quickly as possible before the entire PDF gets downloaded..... PDF Embed API supports the rendering of linearized PDFs which are hosted on servers with byte-range support." From this, I expect to see the first page almost immediately.

     

    As the documentation suggests that my expectations are realistic, I would expect something to be wrong my either my code, the PDFEmbedAPI, the PDF itself or the byte-range requests from the server.

     

    I really dont know how to test this further, as I feel like I have tried it all.. so I really appreciate all help.

     

     

     

     

    Adobe Employee
    September 30, 2021

    Hi, thank you for sharing these findings. 
    It looks like there are no further range calls being made after the first range call to get more of the PDF's content for first page render. That could mean that it was determined from the data returned for the first range request that we need to fallback to the usual workflow and wait for the entire PDF. This might either be due to the PDF's structure or incorrect range data being returned.
    Could you inspect the data being returned for the first range request and see if it seems correct?
    Also, would it be possible to share the URL or the PDF?

    Participant
    September 22, 2021

    Addition to this post: We are able to display the PDFs normally, but the goal is to display the first page before the whole PDF is loaded:)

    Adobe Employee
    September 22, 2021

    Hi! Thank you for using PDF Embed API. 
    To be able to display the first page of a linearized PDF before the whole file is loaded, the server hosting the PDF must support range requests. 
    Could you please confirm through the network tab whether the range calls for initial buffer and further requested ranges are getting resolved correctly?
    Also, is this issue observed for specific files or every file you've tried with?

    Participant
    September 23, 2021
    Hi. 
     
    Thanks for the fast reply. 
     
    Below is the network tab for the requests:
    Requests: 
    1. Whole file
    2. getInfo (a metadata fetch to the server. I only return the filesize from this)
    3. getInitialBuffer (Range request: bytes=0-1024)
    4. getFileBufferRanges (Range request: bytes=range.start-range.end)
     
    I also included the resolved responses for each function in the linearizationObject (inlcluded the requested range from getFileBufferRanges) 

     

    The issue is for every file i have tested on. Three in total as of now, and all of these are exported from Acrobat DC with the option "Optimize for fast web view". We have a lot more PDFs on the server, but i have chosen only a few to test with, which i know is formatted using this option.