Copy link to clipboard
Copied
Hi,
using Acrobat Reader (23.006.20360) on Windows 10 but the same issue was encountered in the pro version.
The attached PDF has the word Foobar with the bar being in subscript and hence also smaller in fontsize. The second table is scaled to 70% which brings down the fontsize of "bar" to <4pt. A search for "foobar" without any quotes etc. will only bring up the one in the first table. It's broken for the one in the second table. Selecting and copy-pasting either instance of foobar works just fine.
Copy link to clipboard
Copied
The issue lies with how the file was created. In the second instance you'll see that the two parts are separate.
If you double-click the word "Foo" it will only select the first two letters of it. If you do the same on the first word it will select all of it. That means the former is composed out of two parts, and the latter out of one.
The solution needs to come from the application that generated the PDF, something called "Antenna House PDF Output Library".
Copy link to clipboard
Copied
Thanks for your response! I agree that looks odd. When looking at the textbox in Adobe Pro it does show up as a single Object though, see attached. Also interesting is, that despite the fact it shows "Fo" "o" "bar" as three different things in the reader when double clicking parts of it. It still finds "foo" when searching for it.
Extracting the objects with pdfminer shows a single text object for both instances, not sure how else to check what's a different object and what not. Testing this on the internal pdf readers from Firefox and Edge as well as PDF X change is able to find both instances without issues. Are you sure the double click is not just another bug in how Acrobat reader handles this document?
Copy link to clipboard
Copied
The double-click method is actually a better indication than the Edit Text & Images tool, as that uses all kinds of internal algorithms that try to section the file's contents into editable sections, and doesn't necessarily represent the internal structure of the file.
Analyzing this kind of issue can be quite complicated, but if you believe it's a bug then report it to Adobe, by all means: http://www.adobe.com/products/wishform.html
Copy link to clipboard
Copied
Ok, have created it over there. Looking at existing tickets I doubt they will care 😞
Copy link to clipboard
Copied
Unfortunately, I have to agree. It's a very bad platform and there's almost no feedback on any reports.