Skip to main content
Kathlene Sage
Known Participant
September 17, 2009
Answered

Scripting Annotations

  • September 17, 2009
  • 1 reply
  • 2297 views

My edit work comes from PDF annotations. I work off a server for my ID file but have the annotated PDF local. Switching back and forth is cumbersome - Hoping to find a script available to transfer the annotations from the PDF to the ID file, in position, on a it's own layer - Then, when edit round #2, #3, #4, etc... (I often get 10 edits due to legal) come in, they get a new layer with some sort of naming convention to track the edits. Does anyone know of one which would come close to those requirements that is in production?

This topic has been closed for replies.
Correct answer Jongware

I agree, it could be hugely useful. Perhaps make it a Feature Request ...?

I did a quick little experiment, and the document you get when "Exporting comments" from Acrobat is a structurally simple PDF-ity file. Unfortunately, its internally referring structure isn't something I could grasp with a single glance (one note and one strikeout yield 5 separate objects). I can imagine a Javascript that reads and parses this file, but, really, I can imagine quite a lot. It'd be easier if the Acrobat engineers got together with the ID crew and together created an 'import comments' function.

1 reply

Jongware
Community Expert
JongwareCommunity ExpertCorrect answer
Community Expert
September 17, 2009

I agree, it could be hugely useful. Perhaps make it a Feature Request ...?

I did a quick little experiment, and the document you get when "Exporting comments" from Acrobat is a structurally simple PDF-ity file. Unfortunately, its internally referring structure isn't something I could grasp with a single glance (one note and one strikeout yield 5 separate objects). I can imagine a Javascript that reads and parses this file, but, really, I can imagine quite a lot. It'd be easier if the Acrobat engineers got together with the ID crew and together created an 'import comments' function.

Jongware
Community Expert
Community Expert
September 17, 2009

So maybe it could work... Here is the result of a preliminary "feasability study"

For the moment, I'm ignoring all lines except those defining an object. It seems to be pairs of

- popup object (containing a reference to next), usually at the outer boundary of your page

- annotation object -- text, strikeout, whatever -- on its correct place.

I'm ignoring the connection now, but perhaps I could add a line between the two.

Getting values out of the single lines is a bit of a hit-and-miss affair -- parentheses? slashes? values? text encoding? I'd have to think about that.

This scriptlet puts everything onto a single page, but the Page number is given in the annotations, so that oughta be a possibility.

Note that the measurements are in points and upside down (in true PDF style). For this to work, you really need an annotated PDF of exactly the same size as your ID document (which is not the case with this hardcoded test!).

I'll have to go over my own needs before continuing, as this seems a sizeable project (unless Harbs or Marc or Laurent or Robin or Dave S. or Peter K. or any other of our band of scripters sees "the challenge" in this!)

app.activeDocument.viewPreferences.horizontalMeasurementUnits = MeasurementUnits.POINTS;
app.activeDocument.viewPreferences.verticalMeasurementUnits = MeasurementUnits.POINTS;

myFile = File("D:/Temp/indesign cs4 sdk learning-indesign-plugin-development.fdf");
myFile.open("r");

while (myFile.eof == false)
{
  line = myFile.readln();
  // Expect "digit(s) digit(s) obj"
  var objline = line.match(/^\d+ \d+ obj/);
  if (objline == null)
    continue;
  // Expect "Rect bladibla"
  var rect = line.match(/<<\/Rect\[(\d+\.?\d*) (\d+\.?\d*) (\d+\.?\d*) (\d+\.?\d*)\]/);
  if (rect == null)
    continue;

  // convert to numbers :o|
  // vertically invert, while we're here
  rect[1] = Number(rect[1]);
  rect[2] = app.activeDocument.documentPreferences.pageHeight-Number(rect[2]);
  rect[3] = Number(rect[3]);
  rect[4] = app.activeDocument.documentPreferences.pageHeight-Number(rect[4]);
 
  subtype = line.match(/\/Subtype\/(\w+)/);
  switch (subtype[1])
  {
    case "Popup":
   frame = app.activeDocument.textFrames.add();
   frame.geometricBounds = [ rect[2], rect[1], rect[4], rect[3] ];
   headline = line.match(/\/Subj\(([^)]+)\)/);
   if (headline != null)
        frame.contents = headline[1];
      break;
    case "Text":
   frame = app.activeDocument.textFrames.add();
   frame.geometricBounds = [ rect[2], rect[1], rect[4], rect[3] ];
   contents = line.match(/\/Contents\(([^)]+)\)/);
   if (contents != null)
        frame.contents = contents[1];
      break;
    case "StrikeOut":
      line = app.activeDocument.graphicLines.add();
      line.paths[0].pathPoints[0].anchor = [ rect[1], (rect[2]+rect[4])/2 ];
      line.paths[0].pathPoints[1].anchor = [ rect[3], (rect[2]+rect[4])/2 ];
      line.strokeColor = app.activeDocument.swatches.item("Red");
      break;
  }
}
myFile.close();

Jongware
Community Expert
Community Expert
September 18, 2009

I did some more research. Discovery #1: different versions of Acrobat use different FDF export syntaxes (theoretically, one should check the version number for this).

You can get all kinds of properties out of annotations in the PDF, but exact placement is still a hit-and-miss. A strikethrough, for example, seems to have some internal bounding box apart from the one exported into the FDF. If you simply draw a line from edge to edge, it's too wide and you cannot really see which character or characters should be deleted.

Another 'bad' discovery is that any text is encoded in either plain ASCII or in Unicode; and parentheses, backslashes, and returns are escaped by another backslash, making it hard to extract the 'plain note text' in a straightforward manner. Formatted text is even worse; it uses XHTML markup with loads of "span"s and CSS styling to format it.

Additional problems are that my PDFs for a book are created as separate articles, and the editor returns them concatenated into one huge PDF, as well as imposed onto an A4 (for easier printing, I guess). Well, that could be handled by even more clever scripting -- I hardcoded it for testing purposes.

However, the biggest prob is that the annotations "live" on a separate layer from the text. That means that with the slightest edit, the text may move away from their annotations -- and there is no escaping that. You'd have to start correcting at the end of your file and work towards the beginning. It also makes keeping the annotation layer practically worthless.