I use the function 'AVDocGetPageText' to get the '±' character in the page, but keep returning '? '
My code is as follows：
PDTextSelect textSelect = PDDocCreateTextSelect(pdDoc, 0, &rect);
bool ret = AVDocSetSelection(avDoc, ASAtomFromString("Text"), textSelect,true);
ASAtom format = ASAtomFromString("Text");
string title = "";
AVDocGetPageText(m_avDoc, vecPages[i], textSelect, format, TextSelectProc, &title);
void TextSelectProc(ASAtom format, void *buf, AVTBufferSize bufLen, void *clientData)
// Look at the memory of *buf and it returns 3f
Have you tested with other selections to ensure that other selected text is returned correctly? Did you check the entire buffer to ensure that ASCII text is returned? Have you checked the same character codes using a different method?
Looks like several special characters may be off. Have no idea.
You have chosen "Text" as format. Thus, TextSelectProc is called twice for your selection. Have you verified that on both calls the '±' character is transformed to the replacement character?
Yes, I tried it twice, and it returned 0x3f
Maybe there is a problem with the encoding inside the PDF. Are you able to copy/paste the text including the plusminus character into other apps? Maybe you can share a PDF.
+ EDITED REPLY, corrected typo 1B hex value should read B1
++ Adding to the topic,
The hexadecimal value 3f equals 63 in decimal, in which case, if you use the ALT+63 keyboard combo produces the question mark "?" ASCII or Unicode symbol.
My guess is that when you produced your document, you copied the equation "Mean±SEM" from another source (i.e. web or text reference) and pasted it directly in the Acrobat PDF document that you created.
I assume this because I couldn't help but notice that you expressed in your document:
The "n" character (in the text string "Mean") seems to touch the minus sign portion of the plus-or-minus symbol.
It almost looks as if two glyphs were combined together, which may indicate that you copied the plus-or-minus unicode symbol from another source, AND could be the reason why the question mark symbol is mapped instead of the plus-or-minus symbol.
However, if we test this assumption backwards and copy the string "Mean±SEM" (taken from your final document) and paste it in another document, it shows:
That shows the correct expression if you were using it with an actual formula with numbers.
But for the sake of proper Unicode character mapping, you should manually type the ASCII or Unicode symbol using the ALT+0177 (ALT keyboard key pressed while typing a number using the nuneric keypad).
Thus, you can test by manually typing in the whole expression in your PDF document as:
Anyway, to be sure I would also add a space before the string "SEM", like so:
Try that, and then execute the script normally.
And when you analyze the Memory Thread again you should see that the hexadecimal value 3f should be now B1.
B1 equivalent in decimal would be 177 which should display ".(.M.e.a.n. . ±. .S.E.M.)."