• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How to get the coordinates of a word character?

Explorer ,
Dec 20, 2019 Dec 20, 2019

Copy link to clipboard

Copied

Hello from Hello from St.Petersburg.

I need to extract the text from the page of the PDF document in the sequence in which it is displayed on the screen. I sequentially read the words from the page. After that I want to sort the received characters in the desired sequence. For this I use PDWordGetCharQuad method. This method should return character's quad specified in user-space coordinates. It turned out that for all characters of one word, the PDWordGetCharQuad method returns quad with the same coordinate values. Why is that?

I would be grateful for your help.

TOPICS
Acrobat SDK and JavaScript

Views

1.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 20, 2019 Dec 20, 2019

Copy link to clipboard

Copied

This is not expected. Does this Nth Word have any normal characters?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 23, 2019 Dec 23, 2019

Copy link to clipboard

Copied

This Word have normal characters.

The most amazing thing is that I created the pdf file from scratch. I wrote several lines into it and checked which quads the PDWordGetCharQuad method reverses. It turned out that for all characters of one word the coordinates of quad are the same. And this is true for all words. 

Next, I give the code with which I got these results.

ACCB1 ASBool ACCB2 wordEnumerator(PDWordFinder wObj, PDWord pdWord, ASInt32 pgNum, void* clientData)
{
	char str[128];
	PDWordGetString(pdWord, str, sizeof(str));

	ASFixedQuad quad;
        FILE* pOutput;
        pOutput = fopen("1.txt", "w+b");
	for (int i = 0; i < PDWordGetLength(pdWord); i++) {
		bool b = PDWordGetCharQuad(pdWord, i, &quad);
		fprintf(pOutput, "%c %d, %d   %d, %d   %d, %d   %d, %d\n",
			str[i], quad.tl.h, quad.tl.v, quad.tr.h, quad.tr.v, quad.bl.h, quad.bl.v, quad.br.h, quad.br.v);
	}
        fclose(pOutput);
	return true;
}

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 24, 2019 Dec 24, 2019

Copy link to clipboard

Copied

Maybe that method is broken. What is your exact Acrobat version?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 25, 2019 Dec 25, 2019

Copy link to clipboard

Copied

I have Adobe Acrobat Pro DC version 2019.021.20061

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 25, 2019 Dec 25, 2019

Copy link to clipboard

Copied

It is not written anywhere in help that the method does not work. Can I get some advice from the company's programmers?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 25, 2019 Dec 25, 2019

Copy link to clipboard

Copied

"It is not written anywhere in help that the method does not work. Can I get some advice from the company's programmers?"  I said it was broken, not planned. Bugs happen.

 

"Can I get some advice from the company's programmers?" No, I'm quite sure you cannot. I never have in 20 years. 

 

Some thoughts (though you might like to consider using PDFEdit instead, I'm sure it is much more visited by programmers).

1. You do not check the return value of PDWordGetCharQuad. I suggest you do.

2. You say you created the PDFs yourself (in a text editor?) Does it happen with PDF files you did not make?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 25, 2019 Dec 25, 2019

Copy link to clipboard

Copied

"1. You do not check the return value of PDWordGetCharQuad. I suggest you do."

I checked. PDWordGetCharQuad always comes back true.

"2. You say you created the PDFs yourself (in a text editor?) Does it happen with PDF files you did not make?" I created PDF file in Acrobat. I checked on several files (created not only by me). The results are the same.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 25, 2019 Dec 25, 2019

Copy link to clipboard

Copied

Test_Screen_Name if you have the time and opportunity, check for yourself. Create some PDF-file and read all the words from this file. I used algorithm described at the page https://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/#t=Acro12_MasterBook%2FPlugins_Wor...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 26, 2019 Dec 26, 2019

Copy link to clipboard

Copied

I noticed one very interesting thing. The PDWordGetCharQuad and PDWordGetNthQuad methods return the same quad for the same word with PDWordGetNumQuads (pdWord) == 1.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jan 20, 2020 Jan 20, 2020

Copy link to clipboard

Copied

This is true if the property PDWordFinderConfigRec.noExtCharOffset is set to true.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 21, 2020 Jan 21, 2020

Copy link to clipboard

Copied

LATEST

I never noticed this option before. Good that you have a solution.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines