Skip to main content
Inspiring
September 10, 2016
Question

Codes that appear in Tools, Action, Create Comment Summary

  • September 10, 2016
  • 2 replies
  • 975 views

I use Acrobat Pro XI  to highlight important PDF documents. Then I like to extract those highlights to put in Evernote with the PDF below it. I have copy highlighted text into comments highlighted. I then use Tools, Action, Create Comment Summary to export to an Excel file and then move that into Word. The problem is I have these codes that appear, like "\r" and greek characters. I have a macro that does find and replace to change them into the appropriate characters / styles, like bulleted text or quotation marks. However, would love if there was some table that shared what each code was in real characters.

This topic has been closed for replies.

2 replies

try67
Community Expert
Community Expert
September 11, 2016

It's weird that these codes appear in the Comments Summary... Would it be possible for you to share one of your files, so we can test it ourselves?

Inspiring
September 11, 2016

Great to hear from you try67. This is almost the same thing I have used your help for since 6 years ago. Back then, you wrote me a script to export comments from a PDF. Later, you updated it for Acrobat 8 I think. In Acrobat XI, they added the ability to make a comment summary.

Also, the nature of my work has changed a bit. Instead of reading published papers where I just want the text, now I read FDA Guidances where I want more text, often the bullets with it too.

Normally, I highlight the text, and then export it using this tool to Excel (tab delimited file). Then I move it from Excel to Word. I have a macro in Word that finds all the special characters and replaces them to what I know they need to be, including single and double quotation marks, bullets, en hyphens, and em hyphen.

What I am find though is that different FDA Guidances work differently. So if you look in the files I've included, the eCTD Technical Conformance guidance is relatively straightforward. My macro handles that with no problem. Go to the 510(k) substantial equivalence guidance and they use a funny "i" for a bullet. Go to the software guidance, and you find even dollar signs.

Files are here: https://www.dropbox.com/sh/06vuj36708eezkd/AACE3kFFhl8KmHBIRDXke1hNa?dl=0

I'm trying to figure out what the standard is so I can write my own macro to clean these up. Then it makes for a clean move to Evernote.

try67
Community Expert
Community Expert
September 11, 2016

Hi! I saw your name in the comments and found our correspondence from 6 and 4 years ago...

I examined the files and there are several issues there. For one, these files were created using different ways, which might explain why each one of them behaves differently. The first file (eCTD Guidance) was created from Word using the Adobe PDFMaker plugin, which is the recommended method and probably why it works the best of all three. The "Software Guidance" file was also created using "Acrobat PDFWriter", which I believe is the old name for the PDFMaker plugin, which makes sense since it was done on a Windows NT computer, so probably a long time ago. The last file was created using Word's internal Export to PDF command, which is not the best way of doing it.

So you have those issues, and on top of that there are issues with the fonts used. The bullets, for example, were created using a font called "Symbol" which seems to just map its own symbols on top of some random unicode characters instead of using the correct ones for the job. So a bullet in this font is represented by the "\uF0B7", when there's a perfectly valid one which should have been used, ie "\u2022" (•). So that's another issue, which you might be able to solve if you used that font in your Word file, or converted these characters to something else.

I couldn't figure out where those strange dollar-signs come from... Probably also a font encoding issue, but I'm not sure.

Inspiring
September 11, 2016

The \r indicates a carriage return. What other ones have you encountered?

Inspiring
September 11, 2016

Quite a few. Note that what I'm writing below is the best I can type it - the actual characters are not part of the alphabet. I've created a Find/Replace macro to address these, but can't get everything:

aeae is a beginning quotation mark

ae? is an ending quotation - due to the non-printable last character, can't do a find/replace from visual basic but can in the main window

aeTM is a single quotation mark

ae is an em dash

ae" is an en dash

ae cent size is a bullet

I want to capture all of them so I can convert them appropriately.

Eventually, I want to write code to remove line numbers (I read a lot of draft guidances written by the US Food and Drug Administration that has line numbers).