Skip to main content
Participating Frequently
January 22, 2020
Question

Different digits are combined into one block in the PDF

  • January 22, 2020
  • 3 replies
  • 1205 views

Hello dear forum,

I have the problem here, as part of a spare parts online shop project, that digit positions contained in a PDF, which are in an exact row, are combined into a text block or into blocks of three. No matter which program I use to create the PDF.

When the software of the online catalogue is "hotspotting" the digits, item numbers are lost because they are read as a series. As an example, the best way to see it is at the top left where there are many positions in a line:

https://vap.gea.com/stationaryapplication/Data/ArticleConstructionImages/16057-3.PDF

 

Do you have any advice on how to influence this? I need just all numbers as single digits and not combined. 

Thank you very much for your tips and answers!

Alex

Greetings ... Letschi

This topic has been closed for replies.

3 replies

try67
Community Expert
Community Expert
January 22, 2020

It's not related to the position on the page, but to the order in which the numbers are added. If you click next to "1270" in the screenshot you provided above and then hold down Shift and press the right arrow you'll see that it moves from that number to 1260 (skipping 1255), then to 1240 then to 1230 (skipping 1200). That's because this is order in which the application that created the PDF placed those elements on the page. There's really not much you can do about that.

Participating Frequently
January 23, 2020

Thanks very much so far 🙂

Just to clarify: It's NOT the hotspotting software, which creates the blocks of numbers. We tried every single piece PDF software - from Distiller, to Acrobat, etc. - what you can see at the download is the original PDF. So I'm confused about that arrays of numbers with no logic behind.

As a second step, the catalogue software comes into work, to identify the numbers within the PDF, and this software is obviously struggeling with the confusing mixed numbers as well.

So it comes really out of ADOBE and not from the software tool for the online parts catalogue.

Thanks so much!

try67
Community Expert
Community Expert
January 23, 2020

I did a bit more digging and I believe I found the cause for this error. You see, Acrobat uses an internal logic to decide which words are on the same "line". Basically, they must have the EXACT same Y-value on the page to be considered a part of a single line, and the words in your page do not have that.

I've printed out their exact locations so you could see it for yourself. The first number is the word index on the page, then the word itself, then an array that defines its rectangle. Focus on the second value in that array. You can see it's identical for 1270, 1260, 1240 and 1230 (733.374755859375), but different for 1255 and 1200 (733.359375).

Even though the difference is minuscule it seems to be enough for Acrobat to decide that they are on different lines. Since you can't change the way Acrobat works your only solution is to make sure the numbers are placed in the EXACT same vertical location on the page, and that needs to be done in the authoring application.

 

 

26: 1270
19.669174194335938,733.374755859375,36.99198913574219,723.0100708007812

27: 1260
64.49247741699219,733.374755859375,81.81529235839844,723.0100708007812

28: 1240
84.51318359375,733.374755859375,101.83599853515625,723.0100708007812

29: 1230
168.28009033203125,733.374755859375,185.6029052734375,723.0100708007812

84: 1255
41.9405517578125,733.359375,59.26336669921875,722.9946899414062

85: 1200
124.28240966796875,733.359375,141.605224609375,722.9946899414062

 

 

 

Participating Frequently
January 22, 2020

As Test_Screen_Name said. It is the hotspotting app that you use that is grouping the block of numbers.

You could use a different hotspotting app to ensure that the block of numbers don't get grouped.

Or you could disable that app temporarily to separate the block of numbers while creating a pdf.

Participating Frequently
January 23, 2020

Thanks very much so far

 

Just to clarify: It's NOT the hotspotting software, which creates the blocks of numbers. We tried every single piece PDF software - from Distiller, to Acrobat, etc. - what you can see at the download is the original PDF. So I'm confused about that arrays of numbers with no logic behind.

As a second step, the catalogue software comes into work, to identify the numbers within the PDF, and this software is obviously struggeling with the confusing mixed numbers as well.

So it comes really out of ADOBE and not from the software tool for the online parts catalogue.

Thanks so much!

Legend
January 22, 2020

In a PDF the text is just free characters. The division into words is done by guesswork in apps. My guess is that your "hotspotting" app is just using guesswork other than you'd like. There is no fix, it's just how things are.

Participating Frequently
January 22, 2020

Thanks a lot for the quick reply. So you think it's the software which can't distinguish properly the different positions? Did you open the online link to the PDF? THANKS!  🙂

Legend
January 22, 2020

I did and can't see where you mean. I see strings of number all over the page, no single digits. I take issue with your saying "can't distinguish properly". There is no "properly".