Hi, I have a pdf document that contains images which are digitised pages from a historic hand-written text. The text cannot be OCR'd. We have manually transcribed the text so I have it available to add into the document, but I want to do this a) automatically (I have potentially about 15 of these volumes, each with ca 600 pages); and b) so that it doesn't appear for people looking at the document but it does allow them to search and navigate directly to pages that a given words occurs on.
Currently I have this script (obviously I'll loop round pages and possibly read the text for each in from a file, but this is just to illustrate one page):
var f = this.addField("Page1Text","text",0,[10,10,500,500]);
f.display = display.noView;
f.textSize = 0;
f.textColor = color.transparent;
f.fillColor = color.transparent;
f.multiline = true;
f.readonly = true;
f.value = "1\nSpecimens illustrating Flora Tertiaria Helvetica\nFossil leaves &c Disco\nPetrified woods\nCoals British Columbia H.M.S Plumper\nFossils Port Stephen Dr Odenheimer\nCoals, James Russell, mineral surveyor\nCoal, Torbane Hill &c\nOne parcel miscellaneous\nEgyptian pebbles\nBoulders from Lebanon\nSoapstone &c";
f.fillColor = color.transparent appears to work but not f.textColor (it appears to give just an empty textbox)
setting f.display = display.hidden or = display.noView seems to make it invisible on screen as expected but stop it from being searchable
I have also tried adjusting the order of the values but without success.
I was wondering if there is a simple way to push the textbox behind the image (or bring the image to the front)?
Any other suggestions most welcome - this is my first attempt at using scripting within pdfs.
From the documentation of the textColor property:
Note: An exception is thrown if a transparent color space is used to set textColor.
Ah, thanks for confirming. So that means making the text 'invisible' isn't an option, and pushing the text behind the image will be the only way to handle this I presume? Unless it can be forced into 1 pixel? I hadn't thought of that but may try it.
You can't "push text behind an image", unless you add it as a part of a background watermark.
You can set the textSize property to a very low value, though.
Copy link to clipboard
How about the following?
Create a new PDF document which has the same number of pages as the original PDF document with images. The pages of the new PDF document would be blanks. However since the images are opaque you can also make a copy of the original PDF document and the copy would be the new PDF document.
For each page in the new document create a text field and a button field – make sure that the button field is on top of the text field. The button field should be configured to show icon only; set the icon of the button for every page in the new document to the corresponding page of the original document.
Visible form fields should never overlap. The display order is not defined, and may change in different software or between releases. Hence the concept of "make sure the button field is on top..." fails (or may fail in future, or with different software).
I created an example PDF form with a button field containing an image and two text fields under the button field with texts. The search for text appears to work with the form fields using Adobe Acrobat Pro XI and Adobe Acrobat Reader DC. The PDF form is available at <An Example of Text Fields behind an Image: A Reply to “How to make ad… >.
In the transparency imaging model (introduced in PDF 1.4)
“objects can be less than fully opaque, allowing previously painted marks to show through. Each object is painted on the page with a specified opacity, which may be constant at every point within the object’s shape or may vary from point to point. The previously existing contents of the page form a backdrop with which the new object is composted, producing results that combine the colors of the object and backdrop according to their respective opacity characteristics. The objects at any given point on the page can be thought of as forming a transparency stack, where the stacking order is defined to be the order in which the objects are specified, bottommost object first. All objects in the stack can potentially contribute to the result, depending on their colors, shapes and opacities” (Adobe, 2006, 195).
For more information please see Chapter 4 Graphics and Chapter 7 Transparency (Adobe, 2006).
Therefore it should be possible for an author to “make sure that a field is under another field” in a PDF document so that the bottom field will be painted first followed by the top field by a PDF reader that complies with the PDF standard.
Adobe Systems Incorporated. (2006). 4.1 Graphics Objects. In PDF Reference, sixth edition: Adobe Portable Document Format version 1.7: 194. <https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf#page=194>. (Aug. 13, 2019).
32000-1 defines how to blend elements given their order. The stack of elements in the page contents is well defined, and hence (barring other ambiguity), the rendering of page contents without annotations and form fields is defined by 32000-1.
It‘s also well defined ( I think but cannot recall where) that form fields are rendered after page contents. And is seems that form fields participate in the transparency model, so to be rendered, they are assigned an order which places them in the stack above page contents. Yes.
However, I do not believe that the order of rendering of form fields (relative to other form fields) is well defined. You might ASSUME that the order of elements in Annots defines it. Have you found something in 32000-1 which actually defines that?