Skip to main content
Participant
August 13, 2021
Question

How to fetch cross reference URI link ?

  • August 13, 2021
  • 1 reply
  • 502 views

I have been trying few python libraries to read data from a PDF made using FrameMaker. I am able to fetch hyperlinks that point to external public web pages. But I am not able to find the hyperlink URI for cross reference links. For example,

 

This is the annotation for cross reference hyperlinks.

{'/A': IndirectObject(2374, 0), '/BS': {'/S': '/S', '/Type': '/Border', '/W': 0}, '/Border': [0, 0, 0], '/Rect': [53.9978, 711.363, 131.515, 699.124], '/Subtype': '/Link', '/Type': '/Annot'}

 

This is the annotation for external hyperlinks.

{'/A': {'/S': '/URI', '/URI': 'https://en.wikipedia.org/wiki/Apple_Inc.'}, '/BS': {'/S': '/S', '/Type': '/Border', '/W': 0}, '/Border': [0, 0, 0], '/Rect': [53.9978, 673.685, 178.073, 661.685], '/Subtype': '/Link', '/Type': '/Annot'}

 

How can I fetch an URI like link for cross reference hyperlinks to open the pdf in a new tab, not redirect within the same tab, pointing to the exact section/figure/table mentioned in cross reference link?

 

    This topic has been closed for replies.

    1 reply

    frameexpert
    Community Expert
    Community Expert
    August 13, 2021

    Here is how I would handle it. I would open a PDF in Acrobat Pro and manually enter some links with the link tool. I would see if there are any options for opening the link in a new tab or window. If there are, use them, test them, and save the PDF and examine the links with Python.

     

    As far as I know, opening links in a new tab or window is a PDF viewer setting, not a setting for the PDF links themselves.

     

    To get more information, try downloading the pdfmark Reference or check Microtype.com's resources.

    Participant
    August 16, 2021

    Thanks for your quick response @frameexpert 

    However manually editing the links is not practical in my case. I need to automate fetching the cross reference ID/URI for over 1000 documents with 150-200 pages each. Now they are already in the cross reference format, meaning they don't have a URI link instead they have a destination mapping. According to the document structure explained by O'reilly, I need to find the '/Dests' which is supposed to hold the information about the cross references. But I am not able to find it when I fetch the data structure using popular python pdf manipulation libraries like pyPDF2.

    The above is what '/Dests' hold and when I go deeper, I find this.

    I do not understand where the destination ids are and how the indirect object in one page is mapped/diverted to another indirect object in another page within the same pdf. Some clarity would help build/develop over it.

    frameexpert
    Community Expert
    Community Expert
    August 16, 2021

    Since you are really dealing with the internal PDF structure, you may need to consult one of the Acrobat forums or consult the PDF Reference. The first question you need answered: is it possible to always force a PDF link to open in a new tab or window with a setting in the PDF? It may just be a viewer setting and if that's the case, then you won't get anywhere with modifying each PDF.