Copy link to clipboard
Copied
I need to extract information from a PDF of a US Corporate Income Tax Return (aka Form 1120 returns) to Excel.
Are there any features in the Adobe suite that would allow me to perform this conversion accurately? OCR does not appear to be very reliable given the complexity of these forms. Similarly, exporting to Excel has problems with merged columns and checkboxes.
The IRS publishes various fake US 1120 returns. Take a look at the relatively standard/low to moderate complexity return. https://www.irs.gov/pub/irs-wi/ty24-f1120-ats-scenario-4.pdf Try extracting info from the table labeled Schedule L on page 9/64 of the adobe document.
Thank you,
Pablo
Copy link to clipboard
Copied
If you only want page 9, drag that page out of the pages panel onto your desktop and it will create a new file called "Untitled Extracted Pages". Open that file and from the Acrobat menu select File > Export To > Spreadsheet > Excel Workbook. It's not perfect (the Assets column from 1 - 15 will come out as a merged cell), but columns (a) through (d) will be a table. You can do File > Export To > Spreadsheet > Excel on the entire file but it will produce one long page.
Copy link to clipboard
Copied
Hi there
Hope you are doing well and thanks for reaching out.
The tools and features in Adobe Acrobat that can assist with extracting data from PDFs like the US Corporate Income Tax Return (Form 1120). Here's how Adobe Acrobat can help with this task:
Table Selection for Export: Adobe Acrobat provides a "Select Table" feature that works well for structured tables, like Schedule L on page 9 of the form. You can:
Enhanced Scanned Document Handling: If the PDF includes scanned pages, use the Enhanced OCR (Optical Character Recognition) functionality in Acrobat. This ensures better recognition of complex layouts:
Customizing Export to Excel: Acrobat allows for customization when exporting tables:
Form Field Detection: If the form contains interactive fields, Acrobat's "Prepare Form" tool can detect these fields for easier data handling:
Merged Cells & Complex Layouts: Post-export, Excel may require manual adjustments for merged cells and columns. You can simplify the process by splitting and formatting data appropriately using Excel's built-in tools.
Checkbox Representation: Acrobat identifies checkboxes as filled/unfilled. Use the form preparation tool to map these to a binary (Yes/No) representation in Excel.
Hope this info will help.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now