Problem with getting word count in TLF text

Question

Hi,I want to get the word count from my TLF text, but the problem is that I am not being able to handle th case for space.I am using the findNextWordBoundary property of ParagraphElement as shown below:private function countWords( para : ParagraphElement ) : void{            var wordBoundary:int = 0;            var prevBoundary:int = 0;                        while ( wordBoundary != para.findNextWordBoundary( wordBoundary ) )             {               // If the value is greater than 1, then it's a word, otherwise it's a space.                if ( para.findNextWordBoundary( wordBoundary ) - wordBoundary > 1)                 {                    wordCount += 1;                                    }                                prevBoundary = wordBoundary;                wordBoundary = para.findNextWordBoundary( wordBoundary );                                                    // If the value is greater than 1, then it's a word, otherwise it's a space.                if ( wordBoundary - prevBoundary > 1 )                 {                    var s:String = para.getText().substring( prevBoundary, wordBoundary );                    lenTotal += s.length;                 }            }                   }Now I have 2 issues here:If my string is for eg: Hi, I am writing in "TLF". And I want to get its word count then1) Suppose I take the case of the string Hi,  . Then para.getText().substring( prevBoundary, wordBoundary ) gives the text as Hi i.e without the comma. Same case for the string "TLF forums" , It treats each " as a single word and not the whole "TLF" as a single word. Why doesn't it compute till spaces, that should be the ideal case. So until we don't give a space it should count the whole thing as a word.2) So now the problem is I have applied a condition   if ( wordBoundary - prevBoundary > 1 ) to check if it is a space i.e. if the diff is <= 1 it is a Space. But if I use this I miss out on single words. Like for eg if I have "Hi, This is a string" ,then 'a' is ignored too.Now I could have added a check here along with the space check that the string between prevBoundary and wordBoundary is " "(i.e a space), Then also it is a problem as then the single words like a,&,I will be ignored.So, now I am stuck with this issue and need some help from you guys.Thanks

rdermer · Answer

findNextWordBoundary is not going to serve your purpose. I'd propose doing something like this:

// didn't test this but something like this - whitespace matches any set of 1 or more white space characters

static const whiteSpaceRegExp:RegExp = /[u0020|u000A|u000D]*/

public static function countWords( para : ParagraphElement ) : void
{

return para.getText().split(whiteSpaceRegExp).length;

}

A good list of everything considered whitespace extracted from the unicode space can be found here:

http://sourceforge.net/adobe/tlf/svn/449/tree/trunk/textLayout/src/flashx/textLayout/utils/CharacterUtil.as

In function createWhiteSpaceObject

Hope that helps,

Richard

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.