Skip to main content
AllUnderControl
Participant
April 8, 2026
Question

When using PDF Services API, the Adobe service is failing to recognize multiple pages from the input PDF, even though all pages in the document are identically formatted.

  • April 8, 2026
  • 0 replies
  • 0 views

I used Claude.AI to troubleshoot, here is the feedback:

Adobe Extract PDF API is silently dropping pages from this 94-page PDF. The text is fine. There's no difference in text format or layout between the missed and found contracts — pdftotext extracts them identically. So the common thread isn't about the content, it's about which pages Adobe Extract chose to skip. — pdftotext extracts every contract perfectly. Adobe just doesn't return text elements for certain pages, so the workflow never sees those Contract IDs.

  • The input PDF has 94 pages with 52 unique contracts
  • All 20 missing contracts have Contract ID: NNNNNNN in identical format to the 32 that succeeded — there's no text/regex issue
  • Every page has a footer: Affidavit: Page X of Y (that's the numbering you mentioned)
  • 18 of 20 missing contracts are single-page (Page 1 of 1)
  • Pages 1-6 are ALL missing — the first 6 contracts were entirely skipped
  • The remaining missing contracts are scattered (pages 21, 29, 48-55, 74-75, 77, 84, 94)