Skip to main content
May 23, 2012
Answered

Converting PDF to Excel and Retain Formatting

  • May 23, 2012
  • 1 reply
  • 57635 views

I am trying to convert a PDF to an Excel file.

I just launched the free test version of Adobe Acrobat X Pro (version 10.1.3) and opened a newly created PDF.

I clicked on the following: File > Save As > Spreadsheet > XML Spreadsheet 2003 (have Office 2003).

When I opened the new XML file, the main headings appear horizontally across the spreadsheet. All of the dollar amounts appear vertically in Column A. We want the dollar amounts to appear under the appropriate headings. How do I retain the formatting from the PDF?

This topic has been closed for replies.
Correct answer CtDave

Thank you Dave for the detailed information. This process is overwhelming.

I was not successful at doing the following:

select the content and right click for the context menu. Select Copy As

Table, Save As Table, or Open Table in Spreadsheet.

You may have to try each in turn to see which provides something adequate

for your needs.

It sounds as if I will not be able to easily convert a PDF to an Excel

file using Adobe Acrobat X Pro as advertised. Is this true?

Thanks,

Ginger

GINGER GREENBERG

Database Manager

T 503.699.6258

TF 800.634.9982 ext. 6258

MARYLHURST UNIVERSITY

You. Unlimited.

17600 PACIFIC HIGHWAY

MARYLHURST, OR 97036-0261

marylhurst.edu


It sounds as if I will not be able to easily convert a PDF to an Excel file using Adobe Acrobat X Pro as advertised. Is this true?

Not true.

The issue is not Acrobat.

Rather it is the PDF (how source content was mastered and the PDF created).

Use an authoring application having good tag management.

Examples: FrameMaker, InDesign, MS Word (with PDFMaker from Acrobat), MS Word 2010 (usng MS Save As PDF-XPS, accessible PDF & making use of the UI that promotes authoring for accessible PDF).

None are 100% (yet) for all aspects of an output of a well-formed Tagged PDF. However, they are (currently) "best-of-breed".

In the authoring file you'd master a proper table. Table header row(s) must be properly identified.

The Tagged output PDF's Table element must be properly post-processed with Acrobat (header row cells' Span attribute set, Scope attribute checked, and, perhaps, Headers attribute with associated ID set).

With proper content mastering (of the table in particular), tag management, and post-processing the Table element you'd have a properly "tagged" PDF.

Properly tagged, the table content in the Tagged PDF can be exported to Excel with rather nice results.

Properly mastered in the authoring file a table that is part of an untagged PDF can still be exported to Excel with fairly good results most times.

It is all in how content is mastered:

Example (an extreme but it conveys the point):

Use of space bar and Tab can yield the appearance of a "table" in Word or Notepad.

As with so much the "perception" is not reality.

Such content is merely "body text" tricked out to look like a table.

In the PDF such content is "body text" and has no correlation to tabular data.

There's a spectrum from OMG, yuck to Spot On.

What lands in Excel will reflect where the mastered content falls within this spectrum.

Variables are what was used to create the PDF, what was used for tag management (if any), how content is placed in the authoring file.

Often the PDFs that are least "supportive" are those that are programmatically created via a server application.

What PDFLibrary is in use (they are not all equal) and how effectively is it being used.

Be well...

Message was edited by: CtDave

1 reply

CtDave
Participating Frequently
May 24, 2012

Hi,

Start with a Tagged PDF. Export/Save As to spreadsheet works better with that.

Be well...

May 24, 2012

Thanks for responding Dave. What is a Tagged PDF?

Ginger

GINGER GREENBERG

Database Manager

T 503.699.6258

TF 800.634.9982 ext. 6258

MARYLHURST UNIVERSITY

You. Unlimited.

17600 PACIFIC HIGHWAY

MARYLHURST, OR 97036-0261

marylhurst.edu

May 24, 2012

It sounds as if I will not be able to easily convert a PDF to an Excel file using Adobe Acrobat X Pro as advertised. Is this true?

Not true.

The issue is not Acrobat.

Rather it is the PDF (how source content was mastered and the PDF created).

Use an authoring application having good tag management.

Examples: FrameMaker, InDesign, MS Word (with PDFMaker from Acrobat), MS Word 2010 (usng MS Save As PDF-XPS, accessible PDF & making use of the UI that promotes authoring for accessible PDF).

None are 100% (yet) for all aspects of an output of a well-formed Tagged PDF. However, they are (currently) "best-of-breed".

In the authoring file you'd master a proper table. Table header row(s) must be properly identified.

The Tagged output PDF's Table element must be properly post-processed with Acrobat (header row cells' Span attribute set, Scope attribute checked, and, perhaps, Headers attribute with associated ID set).

With proper content mastering (of the table in particular), tag management, and post-processing the Table element you'd have a properly "tagged" PDF.

Properly tagged, the table content in the Tagged PDF can be exported to Excel with rather nice results.

Properly mastered in the authoring file a table that is part of an untagged PDF can still be exported to Excel with fairly good results most times.

It is all in how content is mastered:

Example (an extreme but it conveys the point):

Use of space bar and Tab can yield the appearance of a "table" in Word or Notepad.

As with so much the "perception" is not reality.

Such content is merely "body text" tricked out to look like a table.

In the PDF such content is "body text" and has no correlation to tabular data.

There's a spectrum from OMG, yuck to Spot On.

What lands in Excel will reflect where the mastered content falls within this spectrum.

Variables are what was used to create the PDF, what was used for tag management (if any), how content is placed in the authoring file.

Often the PDFs that are least "supportive" are those that are programmatically created via a server application.

What PDFLibrary is in use (they are not all equal) and how effectively is it being used.

Be well...

Message was edited by: CtDave


Hi Dave,

The authoring application is a Sybase product called InfoMaker 11.5.

Would you happen to know if it has good tag management?

Thanks for your help.

Ginger

GINGER GREENBERG

Database Manager

T 503.699.6258

TF 800.634.9982 ext. 6258

MARYLHURST UNIVERSITY

You. Unlimited.

17600 PACIFIC HIGHWAY

MARYLHURST, OR 97036-0261

marylhurst.edu