Copy link to clipboard
Copied
I use the function 'AVDocGetPageText' to get the '±' character in the page, but keep returning '? '
My code is as follows:
PDTextSelect textSelect = PDDocCreateTextSelect(pdDoc, 0, &rect);
bool ret = AVDocSetSelection(avDoc, ASAtomFromString("Text"), textSelect,true);
ASAtom format = ASAtomFromString("Text");
string title = "";
AVDocGetPageText(m_avDoc, vecPages[i], textSelect, format, TextSelectProc, &title);
void TextSelectProc(ASAtom format, void *buf, AVTBufferSize bufLen, void *clientData)
{
// Look at the memory of *buf and it returns 3f
}
Copy link to clipboard
Copied
Have you tested with other selections to ensure that other selected text is returned correctly? Did you check the entire buffer to ensure that ASCII text is returned? Have you checked the same character codes using a different method?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Looks like several special characters may be off. Have no idea.
Copy link to clipboard
Copied
You have chosen "Text" as format. Thus, TextSelectProc is called twice for your selection. Have you verified that on both calls the '±' character is transformed to the replacement character?
Copy link to clipboard
Copied
Yes, I tried it twice, and it returned 0x3f
Copy link to clipboard
Copied
Maybe there is a problem with the encoding inside the PDF. Are you able to copy/paste the text including the plusminus character into other apps? Maybe you can share a PDF.
Copy link to clipboard
Copied
+ EDITED REPLY, corrected typo 1B hex value should read B1
Hi,
++ Adding to the topic,
The hexadecimal value 3f equals 63 in decimal, in which case, if you use the ALT+63 keyboard combo produces the question mark "?" ASCII or Unicode symbol.
My guess is that when you produced your document, you copied the equation "Mean±SEM" from another source (i.e. web or text reference) and pasted it directly in the Acrobat PDF document that you created.
The JavaScript script that you are using does not seems to be the problem.
I assume this because I couldn't help but notice that you expressed in your document:
The "n" character (in the text string "Mean") seems to touch the minus sign portion of the plus-or-minus symbol.
It almost looks as if two glyphs were combined together, which may indicate that you copied the plus-or-minus unicode symbol from another source, AND could be the reason why the question mark symbol is mapped instead of the plus-or-minus symbol.
However, if we test this assumption backwards and copy the string "Mean±SEM" (taken from your final document) and paste it in another document, it shows:
That shows the correct expression if you were using it with an actual formula with numbers.
But for the sake of proper Unicode character mapping, you should manually type the ASCII or Unicode symbol using the ALT+0177 (ALT keyboard key pressed while typing a number using the nuneric keypad).
Thus, you can test by manually typing in the whole expression in your PDF document as:
Anyway, to be sure I would also add a space before the string "SEM", like so:
Try that, and then execute the script normally.
And when you analyze the Memory Thread again you should see that the hexadecimal value 3f should be now B1.
B1 equivalent in decimal would be 177 which should display ".(.M.e.a.n. . ±. .S.E.M.)."
See slides: