Highlighted

How to determine if a character is a superscript or subscript?

Mar 09, 2020

Copy link to clipboard

Copied

Hi,

I need to determine if a character is a superscript or subscript. Using the PDWordFinder object, I find all the text on the current page of the document. Each character of this text refers to a specific PDWord object. I tried using the PDWordGetCharacterTypes method to get the attributes of each character, but I did not find the properties of a superscript or subscript among the character attributes. Then I decided that I could find these properties in the PDFont object of the current character. But there I also could not find the necessary properties.

I would be grateful for any help.

Most Valuable Participant
Correct answer by Test_Screen_Name | Most Valuable Participant

You'd need to get the individual character metrics and apply the font size. However, this is going to go up and down as you read ordinary characters along a line. Look at "along". The "l" and "g" have more vertical height and a different baseline. This is only useful for graphics work. Use the font size to start the guess as to whether you have subscripts or superscripts. At least, that's how I'd do it... because this is an exercise in guesswork each programmer may do it differently and get different results.

Topics

Acrobat SDK and JavaScript

Views

99

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

How to determine if a character is a superscript or subscript?

Mar 09, 2020

Copy link to clipboard

Copied

Hi,

I need to determine if a character is a superscript or subscript. Using the PDWordFinder object, I find all the text on the current page of the document. Each character of this text refers to a specific PDWord object. I tried using the PDWordGetCharacterTypes method to get the attributes of each character, but I did not find the properties of a superscript or subscript among the character attributes. Then I decided that I could find these properties in the PDFont object of the current character. But there I also could not find the necessary properties.

I would be grateful for any help.

Most Valuable Participant
Correct answer by Test_Screen_Name | Most Valuable Participant

You'd need to get the individual character metrics and apply the font size. However, this is going to go up and down as you read ordinary characters along a line. Look at "along". The "l" and "g" have more vertical height and a different baseline. This is only useful for graphics work. Use the font size to start the guess as to whether you have subscripts or superscripts. At least, that's how I'd do it... because this is an exercise in guesswork each programmer may do it differently and get different results.

Topics

Acrobat SDK and JavaScript

Views

100

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Most Valuable Participant ,
Mar 09, 2020

Copy link to clipboard

Copied

I'm not too familiar with the plugins SDK, but if there isn't a specific property that defines the text as being super- or subscript (which is what I suspect) maybe you can find out the quads or font size of the word and compare it to the ones before (and/or after) it to find it out yourself.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Most Valuable Participant ,
Mar 09, 2020

Copy link to clipboard

Copied

There is absolutely nothing in PDF for superscript or subscript. Or justification, paragraph, hyphenation, column or any of these concepts. Only characters on the page. So you have to do your own guesswork. You can indeed analyze a line to find the dominant baseline and look for exceptions to it. The font size might also be a clue. Two programmers will get different results.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Mar 10, 2020

Copy link to clipboard

Copied

And how to determine the area occupied by a symbol? The PDWordGetCharQuad method shows the same vertical areas  for all characters of one word.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Most Valuable Participant ,
Mar 10, 2020

Copy link to clipboard

Copied

Then that method won't work.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Most Valuable Participant ,
Mar 10, 2020

Copy link to clipboard

Copied

Maybe you can work with the PDStyle of individual characters. Quads aren’t made to tell you font size, they are made to tell you about space occupied; quads help with many fuzzy logic decisions. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Mar 10, 2020

Copy link to clipboard

Copied

The font size I really can find out:
ASFixed fontSize = PDStyleGetFontSize (pdStyle);
But I would like to know the size of the area occupied by the symbol.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Most Valuable Participant ,
Mar 10, 2020

Copy link to clipboard

Copied

You'd need to get the individual character metrics and apply the font size. However, this is going to go up and down as you read ordinary characters along a line. Look at "along". The "l" and "g" have more vertical height and a different baseline. This is only useful for graphics work. Use the font size to start the guess as to whether you have subscripts or superscripts. At least, that's how I'd do it... because this is an exercise in guesswork each programmer may do it differently and get different results.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
sorin.n LATEST
Mar 10, 2020

Copy link to clipboard

Copied

Thank you very much for your answer. I will do so.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Mar 09, 2020

Copy link to clipboard

Copied

Hi,

 

I found the following links to be useful in as a starting point for building your javascript (if at all possible):

 

https://stackoverflow.com/questions/46444635/how-to-check-a-string-contains-superscript-or-subscript... 

 

See here for complete unicode table of subscript alphabet: https://stackoverflow.com/questions/17908593/how-to-find-the-unicode-of-the-subscript-alphabet 

 

An example of a script to accomplish this in C+ to check a string: https://www.codeproject.com/Questions/990692/How-to-check-a-string-contains-superscript-or-subs 

 

Using iText library example:http://itextsharp.10939.n7.nabble.com/Super-and-Subscript-td763.html 

 

Applying superscript example in Photoshop: https://community.adobe.com/t5/photoshop/applying-superscript-to-textitem/td-p/10482051?page=1 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Most Valuable Participant ,
Mar 09, 2020

Copy link to clipboard

Copied

Ooh, yes, thank you for that. So, you need to handle BOTH superscript Unicode AND regular characters set on a higher baseline, perhaps smaller. Similarly there are subscript characters AND characters set smaller, perhaps on the same baseline, perhaps a little below, but definitely overlapping. The Word Finder may not return these in the natural order, you should do your own X,Y sorting.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Resources
One Stop Solution for Acrobat
Edit a PDF
Add a group