Skip to main content
Participant
August 3, 2010
Question

Importing HTML converts CR+LF or just LF to a space

  • August 3, 2010
  • 1 reply
  • 952 views

When using the following code:

myTextArea.textFlow = TextConverter.importToFlow(someText, TextConverter.TEXT_FIELD_HTML_FORMAT, null);

with HTML like this (note CRLF is two characters; carrige return and linefeed, the newline sequence in Windows)

Just a test<br/>CRLF

Line 2<br/>CRLF

Line 3<br/>CRLF

I end up with HTML being displayed like this:

Just a test

<space>Line 2

<space>Line 3

I tried replacing the CRLF sequences with just LF (Unix style) and still the same occurs.  If I remove all CR & LF characters then I get the expected output:

Just a test

Line 2

Line 3

The importer has somehow interpreted the CR/LF sequence as a space or non-breaking space.  Is there a way to have the CR/LF dropped, or should I just continue to use a filter function that removes \r & \n characters before importing?. Is this a known issue?

I am using Flex 4.1 build 16076

This topic has been closed for replies.

1 reply

Adobe Employee
August 3, 2010

I believe you need to create a custom importer.  whiteSpaceCollapse is defaulting to collapse.

        // Create an importer for TEXT_FIELD_HTML_FORMAT that preserves whitespace.
        // Note: We have to make a copy of the textFlowInitialFormat,
        // which has various formats set to "inherit",
        // and then modify it and set it back.
        var config:Configuration = new Configuration();
        var format:TetLayoutFormat = new TextLayoutFormat(config.textFlowInitialFormat);
        format.whiteSpaceCollapse = "preserve";
        config.textFlowInitialFormat = format;
        var preservingHTMLImporter:ITextImporter =
            TextConverter.getImporter(TextConverter.TEXT_FIELD_HTML_FORMAT, config);
        preservingHTMLImporter.throwOnError = false;

Hope that helps,

Richard

Alon_K_Author
Participant
August 7, 2010

Thanks for the tip Richard, unfortunately now the newline sequences are turned into line breaks instead of spaces.  I think this is odd, as most html has plenty of new line sequences (except if it is "compressed"), otherwise it would be impossible to read and maintain manually.  This is not a special case for HTML.  What am I missing here?