Skip to main content
Participant
October 30, 2023
Question

Bug in search for subscripted text with a small font size

  • October 30, 2023
  • 1 reply
  • 1485 views

Hi,

 

using Acrobat Reader (23.006.20360) on Windows 10 but the same issue was encountered in the pro version.

 

The attached PDF has the word Foobar with the bar being in subscript and hence also smaller in fontsize. The second table is scaled to 70% which brings down the fontsize of "bar" to <4pt. A search for "foobar" without any quotes etc. will only bring up the one in the first table. It's broken for the one in the second table. Selecting and copy-pasting either instance of foobar works just fine.

This topic has been closed for replies.

1 reply

try67
Community Expert
Community Expert
October 30, 2023

The issue lies with how the file was created. In the second instance you'll see that the two parts are separate.

If you double-click the word "Foo" it will only select the first two letters of it. If you do the same on the first word it will select all of it. That means the former is composed out of two parts, and the latter out of one.

The solution needs to come from the application that generated the PDF, something called "Antenna House PDF Output Library".

Participant
October 31, 2023

Thanks for your response! I agree that looks odd. When looking at the textbox in Adobe Pro it does show up as a single Object though, see attached. Also interesting is, that despite the fact it shows "Fo" "o" "bar" as three different things in the reader when double clicking parts of it. It still finds "foo" when searching for it.

Extracting the objects with pdfminer shows a single text object for both instances, not sure how else to check what's a different object and what not. Testing this on the internal pdf readers from Firefox and Edge as well as PDF X change is able to find both instances without issues. Are you sure the double click is not just another bug in how Acrobat reader handles this document?

try67
Community Expert
Community Expert
October 31, 2023

The double-click method is actually a better indication than the Edit Text & Images tool, as that uses all kinds of internal algorithms that try to section the file's contents into editable sections, and doesn't necessarily represent the internal structure of the file.

Analyzing this kind of issue can be quite complicated, but if you believe it's a bug then report it to Adobe, by all means: http://www.adobe.com/products/wishform.html