Skip to main content
February 25, 2011
Question

Problem with getting word count in TLF text

  • February 25, 2011
  • 1 reply
  • 675 views

Hi,

I want to get the word count from my TLF text, but the problem is that I am not being able to handle th case for space.

I am using the findNextWordBoundary property of ParagraphElement as shown below:

private function countWords( para : ParagraphElement ) : void
{
            var wordBoundary:int = 0;
            var prevBoundary:int = 0;
           
            while ( wordBoundary != para.findNextWordBoundary( wordBoundary ) )
            {

               // If the value is greater than 1, then it's a word, otherwise it's a space.
                if ( para.findNextWordBoundary( wordBoundary ) - wordBoundary > 1)
                {
                    wordCount += 1;                   
                }
               
                prevBoundary = wordBoundary;
                wordBoundary = para.findNextWordBoundary( wordBoundary );                   
               
                // If the value is greater than 1, then it's a word, otherwise it's a space.
                if ( wordBoundary - prevBoundary > 1 )
                {
                    var s:String = para.getText().substring( prevBoundary, wordBoundary );
                    lenTotal += s.length;
                }
            }                  
}

Now I have 2 issues here:

If my string is for eg: Hi, I am writing in "TLF". And I want to get its word count then

1) Suppose I take the case of the string Hi,  . Then para.getText().substring( prevBoundary, wordBoundary ) gives the text as Hi i.e without the comma. Same case for the string "TLF forums" , It treats each " as a single word and not the whole "TLF" as a single word. Why doesn't it compute till spaces, that should be the ideal case. So until we don't give a space it should count the whole thing as a word.

2) So now the problem is I have applied a condition   if ( wordBoundary - prevBoundary > 1 ) to check if it is a space i.e. if the diff is <= 1 it is a Space. But if I use this I miss out on single words. Like for eg if I have "Hi, This is a string" ,then 'a' is ignored too.

Now I could have added a check here along with the space check that the string between prevBoundary and wordBoundary is " "(i.e a space), Then also it is a problem as then the single words like a,&,I will be ignored.

So, now I am stuck with this issue and need some help from you guys.

Thanks

This topic has been closed for replies.

1 reply

Adobe Employee
February 25, 2011

findNextWordBoundary is not going to serve your purpose.  I'd propose doing something like this:

// didn't test this but something like this - whitespace matches any set of 1 or more white space characters

static const whiteSpaceRegExp:RegExp = /[u0020|u000A|u000D]*/

public static function countWords( para : ParagraphElement ) : void
{

     return para.getText().split(whiteSpaceRegExp).length;

}

A good list of everything considered whitespace extracted from the unicode space can be found here:

http://sourceforge.net/adobe/tlf/svn/449/tree/trunk/textLayout/src/flashx/textLayout/utils/CharacterUtil.as

In function createWhiteSpaceObject

Hope that helps,

Richard

February 28, 2011

Thanks Richard, Will try this out.