• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How to get the character '±' correctly?

New Here ,
Feb 06, 2023 Feb 06, 2023

Copy link to clipboard

Copied

I use the function 'AVDocGetPageText' to get the '±' character in the page, but keep returning '? '

My code is as follows:

PDTextSelect textSelect = PDDocCreateTextSelect(pdDoc, 0, &rect);
bool ret = AVDocSetSelection(avDoc, ASAtomFromString("Text"), textSelect,true);
ASAtom format = ASAtomFromString("Text");
string title = "";
AVDocGetPageText(m_avDoc, vecPages[i], textSelect, format, TextSelectProc, &title);

 

void TextSelectProc(ASAtom format, void *buf, AVTBufferSize bufLen, void *clientData)
{

          // Look at the memory of *buf and it returns 3f

}

TOPICS
Acrobat SDK and JavaScript , Windows

Views

466

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 06, 2023 Feb 06, 2023

Copy link to clipboard

Copied

Have you tested with other selections to ensure that other selected text is returned correctly?  Did you check the entire buffer to ensure that ASCII text is returned?  Have you checked the same character codes using a different method?  

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 06, 2023 Feb 06, 2023

Copy link to clipboard

Copied

Thank you for your reply.Please confirm the attached file. 

The memory returned is shown below

28276054uy7c_0-1675731382531.png

The character returned by AVDocGetPageText is incorrect.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 06, 2023 Feb 06, 2023

Copy link to clipboard

Copied

Looks like several special characters may be off. Have no idea.

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 06, 2023 Feb 06, 2023

Copy link to clipboard

Copied

You have chosen "Text" as format. Thus, TextSelectProc is called twice for your selection. Have you verified that on both calls the '±' character is transformed to the replacement character?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 07, 2023 Feb 07, 2023

Copy link to clipboard

Copied

Yes, I tried it twice, and it returned 0x3f

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 07, 2023 Feb 07, 2023

Copy link to clipboard

Copied

Maybe there is a problem with the encoding inside the PDF. Are you able to copy/paste the text including the plusminus character into other apps? Maybe you can share a PDF.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 09, 2023 Feb 09, 2023

Copy link to clipboard

Copied

LATEST

+ EDITED REPLY, corrected typo 1B hex value should read B1

 

Hi,

 

++ Adding to the topic, 

 

The hexadecimal value 3f equals 63 in decimal, in which case, if you use the ALT+63 keyboard combo produces the question mark "?" ASCII or Unicode symbol.

 

My guess is that  when you produced your document, you copied the equation "Mean±SEM" from another source (i.e. web or text reference) and pasted it directly in the Acrobat PDF document that you created.

 

The JavaScript script that you are using does not seems to be the problem.

 

I assume this because I couldn't help but notice that you expressed in your document:

 

  • Mean±SEM

 

The "n" character (in the text string "Mean") seems to touch the minus sign portion of the plus-or-minus symbol.

 

It almost looks as if two glyphs were combined together, which may indicate that you copied the plus-or-minus unicode symbol from another source, AND could be the reason why the question mark symbol is mapped instead of the plus-or-minus symbol.

 

However, if we test this assumption backwards and copy the string "Mean±SEM" (taken from your final document) and paste it in another  document, it shows:

 

  • Mean ±SEM  (with a space after the word "Mean")

 

That shows the correct expression if you were using it with an actual formula with numbers.

 

But for the sake of proper Unicode character mapping, you should manually type the ASCII or Unicode symbol using  the ALT+0177 (ALT keyboard key pressed while typing a number using the nuneric keypad).

 

Thus, you can test by manually typing in the whole expression in your PDF document as:

 

  • Mean  ±SEM ( where the plus-or-minus symbol is typed using ALT+0177.

 

  • In addition, note the use of a blank space to keep the symbol and letters from being too close to each other and avoid a glyph misinterpretaion... I may be wrong though... this is not my lane)

 

Anyway,  to be sure I would also add a space before the string "SEM", like so:

 

  • Mean ± SEM

 

Try that, and then execute the script normally.

 

And when you analyze the  Memory Thread again you should see that the hexadecimal value 3f should be now B1.

 

B1 equivalent in decimal would be 177  which should display ".(.M.e.a.n. . ±. .S.E.M.)."

 

See slides:

 

Screenshot_20230210_002039_Epic.png

 

Screenshot_20230210_002018_Epic.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines