Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
0

text position

Guest
Nov 05, 2019 Nov 05, 2019

Copy link to clipboard

Copied

How to calculate the position of text of the attribute Tj?inside my pdf document the text is not following the order it appears...thank you.

To explain better,  this is inside the pdf document:

BT
/TT2 1 Tf
11.9951 0 0 12 110.04 406.3403 Tm
.0009 Tc
(Compone)Tj
ET
q
1 i
110.04 403.1 51.24 14.64 re
W n
BT
11.9951 0 0 12 158.04 406.3403 Tm
0 Tc
(n)Tj
ET
Q
BT
11.9951 0 0 12 110.04 223.5803 Tm
.0004 Tc
(Condition)Tj

 

to rebuild the text, if I follow this order, I obtain: ComponenCondition

but it is not correct, 't' is missing, is in other parts of the pdf, so I need to know how to calculate text position to rebuild the pdf content. This is what I would like to know. Thanks.

TOPICS
How to

Views

2.9K
Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2019 Nov 05, 2019

Copy link to clipboard

Copied

Hi,

 

Please share a screenshot of the issue and provide a little more info about the operating system you are using and Acrobat Pro version (updates applied?)

 

Thank you.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2019 Nov 05, 2019

Copy link to clipboard

Copied

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2019 Nov 05, 2019

Copy link to clipboard

Copied

Have you looked at the PDF Reference?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2019 Nov 05, 2019

Copy link to clipboard

Copied

Where does you display the 't' ?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 05, 2019 Nov 05, 2019

Copy link to clipboard

Copied

It is entirely normal for the text in a PDF to appear out of order. The PDF Reference contains full and complete information on how text is represented in a PDF, and how it is to be displayed. It can be difficult to understand some of it, so if you are facing difficulties with the PDF Reference, please tell us the exact paragraph, page and section number, and tell us the problem (e.g. seems incomplete, contradicts another portion). Please do not skimp on reading all of it.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Nov 06, 2019 Nov 06, 2019

Copy link to clipboard

Copied

I think paragraph of my interest is number 5.3.3 of PDF reference 1.7, Text Space Details.

In particular, giving this example, 

BT
/TT3 1 Tf
10.016 0 0 10.02 108.96 635.4203 Tm
<0372>Tj
/TT4 1 Tf
.2995 0 TD
.0023 Tc
(20)Tj
/TT3 1 Tf
1.0124 0 TD
0 Tc
<0003>Tj
/TT4 1 Tf
1.6294 .6048 TD
.0002 Tc
[(.Ur)13.4(e)]TJ
ET
q
1 i
142.62 638.78 18.66 12.24 re
W n
BT
10.016 0 0 10.02 157.38 641.4803 Tm
0 Tc
(a)Tj
ET

 

is it possible to show me how to find the position of the text elements? Or whatever example...I don't understand how to pass from the theory written in the paragraph to the practice. Thanks.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 06, 2019 Nov 06, 2019

Copy link to clipboard

Copied

I suggest this strategy.

 

1. Make sure you understand exactly how to work with transformation matrixes. You don't have to understand their effect, but you do have to understand all the mathematical operators. You also need specifically to understand how the current text matrix defines exactly the position of every character on the page, there is no other definition of position than the text matrix.

2. Understand the role of coordinate spaces, and know how a transformation matrix can go between them.

3. Know all of the features of text space, and how they are set. You do not need to try to understand their effect, only their definitions and corresponding operators. Writing mode is important, this is just whether text is layed out horizontally (as normal) or in vertical lines (as in some languages). You need to know the definitions even if you will never work with vertical text.

4. Be sure you fully understand about character sets/encodings in general, and how they are used in PDF specifically. THE TEXT CHARACTERS MUST NOT BE JUST COPIED FROM STRINGS.

 

Now, you are ready to work through "Text space details". This is an exercise in simple arithmetic, NOT in understanding fonts or text. When people get stuck, I find it is because they are trying to use what they believe they know about fonts and text. Assume NOTHING and follow EVERY part of the maths exactly. You don't have to know what Trm means, or worry about where it is stored, just calculate and use it.

 

Finally you will be able to say the exact position of every piece of text on the page, and what glyph it represents. What you do with this information is up to you. Some people sort the information into a positional order. Extracting text does require fuzzy logic. IT IS NOT AN EXACT SCIENCE, unlike getting the position of each glyph, which is exact. Displaying text is also exact.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Nov 07, 2019 Nov 07, 2019

Copy link to clipboard

Copied

Sorry, I have 2 question:

Where can  I found the value for W0?

And, I'm not successfull to find the meaning of the following: "/TT2 1 Tf"...I understand that 1 is the size for the font, but /TT2??

Thank you.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2019 Nov 07, 2019

Copy link to clipboard

Copied

What does you mean with W0 ?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 07, 2019 Nov 07, 2019

Copy link to clipboard

Copied

I think you have skipped a great deal of the PDF Reference. This is a Resource reference. "Set the text font, Tf, to font and the text font size, Tfs, to size. font shall be the name of a font resource in the Font subdictionary of the current resource dictionary;" If you don't know about Font Resources, you probably also don't know about font endoding and many other vital things. Read it all, we can't do that for you.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Nov 07, 2019 Nov 07, 2019

Copy link to clipboard

Copied

I've found the reference to the Tf TTx in my pdf.

In one case, in particulare, I have this:

<</BaseFont/DPKECP+Calibri/Encoding/WinAnsiEncoding/FirstChar 37/FontDescriptor 78 0 R/LastChar 148/Subtype
/TrueType/Type/Font/Widths[715 0 0 303 303 0 498 0 0 252 0 507 507 507 507 507 507 507 0 0 0 0 0 0 0 0
463 0 579 0 533 615 488 459 631 623 252 0 0 420 0 646 0 517 0 543 459 487 642 567 0 0 0 0 0 0 0 0 0 0
479 525 423 525 498 305 471 525 229 0 455 229 799 525 527 525 525 349 391 335 525 452 715 433 453 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 418 418]>>

My question is: First char correspond to the ASCII %, so the width for % is 715 and for the last Char 148, ö which width is 418,...or the correspondant character is different?Thank you.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 07, 2019 Nov 07, 2019

Copy link to clipboard

Copied

You ask "My question is: First char correspond to the ASCII %, so the width for % is 715 and for the last Char 148, ö which width is 418,...or the correspondant character is different?Thank you." 

I will repeat "4. Be sure you fully understand about character sets/encodings in general, and how they are used in PDF specifically. THE TEXT CHARACTERS MUST NOT BE JUST COPIED FROM STRINGS." This means you cannot assume character numbers are ASCII at all. [You should also know that ASCII codes stop at 127]. Every number must be processed through the Encoding or CMap (according to the font type), or the default Encoding derived as documented. You will also need a table of the widths of the base fonts, since Widths need not be present in this case.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Nov 13, 2019 Nov 13, 2019

Copy link to clipboard

Copied

But for font let me call standard, not with cmap and not unicode, how can I do to determine the width of each glyph?

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 14, 2019 Nov 14, 2019

Copy link to clipboard

Copied

LATEST

If a font contains /Widths you must use this, but it may be absent fot the base fonts. You need to get hold of the widths for the standard (base 14) fonts from somewhere. Or generate test files to calculate.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines