Copy link to clipboard
Copied
Hello, I am trying to extract a table of data from a pdf using Acrobat. This table is from a research article and doesn't have column or row borders. Unfortunately, when I try to extract the table to excel, it ends up extracting the data with rows merged or with a table row divided into two vertical cells, which is very time-consuming to fix. See below
Copy link to clipboard
Copied
You won't get any better results. Acrobat hat no clue, what a PDF visually represents to you. It can interpret vartical spaces as columns and line breaks as rows, but it can't determine if a row should contain multiple lines of text or not.
Copy link to clipboard
Copied
But it looks like it's having trouble even interpreting where the columns and rows are at.
Copy link to clipboard
Copied
Do you have access to the document that produced the original pdf? Using the original might produce better results, you can often determine the software that created a pdf by going to File> Properties> Description. If you don't need to use the chart for anything other than a visual reference, you may be able to add the pdf to your Excel file as an image (I don't currently use Excel, so I'm guessing here). You could also try to export to spreadsheet, then copy and paste, unless you have already tried this.
Copy link to clipboard
Copied
No, unfortunately, I don't have access to the original file. My goal is to be able to extract this data and create another table by adding this data to other data and analyzing them all in aggregate.
Copy link to clipboard
Copied
So you did try exporting as a spreadsheet? Try exporting as Word, then copy/paste into Excel, also try saving as rich text, with a bit of luck, one method might show some improvement.
In the past, if I needed to recreate a table (in InDesign), I would place the table as an image, reduce the opacity to 50%, then re-set all new copy on top of the original in a different color, then delete the image when done, the advantage was it's easy to catch any errors.
Copy link to clipboard
Copied
Yes, the screenshot above is the result of that export. Word and rtf also have the same problem. Is there a way to manually set where the column and row boundaries are?
Copy link to clipboard
Copied
No, not that i'm aware.
Copy link to clipboard
Copied
As mentioned, you can't change the output Acrobat creates. However, it might be possible to get better results using a custom-made script, especially if the columns are always the same size. It's not a simple task, though. If you're interested in hiring a professional to create such a script for you, feel free to contact me privately (click my username and then on Send a Message to do so).
Find more inspiration, events, and resources on the new Adobe Community
Explore Now