Skip to main content
K.Daube
Community Expert
Community Expert
March 10, 2015
Answered

How to get formatted text into arrays

  • March 10, 2015
  • 2 replies
  • 1899 views

Dear experts and helpers,

For my project I import an RTF file and then read the data from it into 3 arrays. This works fine when just using the string contents of the paragraphs. However, the final script should be able to read and replace formatted text...
Why use the intermediate arrays? Because otherwise I need to switch back and forth between two fm-documents (and one may be a book component).

The imported file starts with a number of lines separated into two items by a TAB (» denotes a TAB, in FM \x08)
[[Garneau, 1990 #12]]    »   [9]
The right item may also be locally formatted text, e.g. [9]
Then follow the same (or smaller) number of paragraphs with formatted text like this:
[9] » D. Garneau, Ed., National Language Support Reference Manual (National language Information Design Guide. Toronto, CDN: IBM National Language Technical Centre, 1990.

Is it possible to replace in the body of the function below the following piece

  while(pgf.ObjectValid()) {
    pgfText = GetText (pgf, newDoc);
    gaBibliography.push(pgfText);
    pgf = pgf.NextPgfInFlow;
  }

with this

  while(pgf.ObjectValid()) {
    gaBibliography.push(pgf);
    pgf = pgf.NextPgfInFlow;
  }

Do I need a special declaration of the array gaBibliography ?
And how to get the right part of the intro lines as formatted thingy into array gaFmtCitsFmt ?

Currently I read into arrays only the 'strings' (function GetText not shown):

var gaFmtCitsRaw  = [];                           // left column in processed RTF
var gaFmtCitsFmt  = [];                           // right column in processed RTF
var gaBibliography= [];                           // bibliography lines from processed RTF
// filename is something like E:\_DDDprojects\FM+EN-escript\FM-testfiles\BibFM-collected-IEEE.rtf

function ReadFileRTF (fileName) {
  var nCits=0, nBib = 0, openParams, openReturnParams, newDoc, pgf, pgfText ;
  var TAB = String.fromCharCode(8);               // FM has wrong ASCI for TAB
  var parts = [];
 
  openParams = GetOpenDefaultParams();
  openReturnParams =  new PropVals(); 
  newDoc = Open (fileName, openParams, openReturnParams); 
  pgf = newDoc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;  // get first pgf in flow

// --- read the temp/formatted citations 
  while(pgf.ObjectValid()) {
    pgfText = GetText (pgf, newDoc);
    if (pgfText.substring (0,2) == "[[") {        // citation lines start with [[
      parts = pgfText.split(TAB);                 // get the two parts of the line
      gaFmtCitsRaw.push (parts[0]);               // Push the result onto the global array
      gaFmtCitsFmt.push (parts[1]);
      pgf = pgf.NextPgfInFlow;
    } else { break }
  }

// --- read the bibliography
  while(pgf.ObjectValid()) {                      // until end of doc
    pgfText = GetText (pgf, newDoc);
    gaBibliography.push(pgfText);
    pgf = pgf.NextPgfInFlow;
  }
  newDoc.Close (Constants.FF_CLOSE_MODIFIED);
} // --- end ReadFileRTF

The next questions then will be how to modify Ian Proudfoot's FindAndReplace script to handle formatted text as replacement. IMHO i will need to use copy/paste ...

This topic has been closed for replies.
Correct answer frameexpert

Thanks Russ for this advice. The first part (fille the clipboard) was OK, I tested the contents of the clipboard with a clipboard-inspector utility. Neverless I tested again:

var oDoc = app.ActiveDoc;
var pgf  = oDoc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;
var lastPfg = oDoc.MainFlowInDoc.FirstTextFrameInFlow.LastPgf;

  var tr = new TextRange();                       //get text selection for paragraph            
  tr.beg.obj = tr.end.obj = pgf;
  tr.beg.offset = 0;
  tr.end.offset = Constants.FV_OBJ_END_OFFSET;   

  oDoc.TextSelection = tr;
  oDoc.Copy();                                    // Clipboard can be pasted manuall

// Add a new paragraph after the current paragraph. 
  var newPgf = oDoc.NewSeriesPgf (lastPfg);       // OK
  var textLoc = new TextLoc (newPgf, 0);          // cursor nowhere 

Run script
=> new para at end
Put cursor therein and paste => OK
... OK all the time

However, the second part (paste) is quirky - don't know where the problem really is. Since on my system the ESTK has no connection to FM (since FM-10) I tested on the system of my wife - but the same effects there:

var oDoc = app.ActiveDoc;
var lastPfg = oDoc.MainFlowInDoc.FirstTextFrameInFlow.LastPgf;

// Add a new paragraph after the current paragraph. 
  var newPgf = oDoc.NewSeriesPgf (lastPfg);       // OK
  var textLoc = new TextLoc (newPgf, 0); 
 
oDoc.Paste ();                         // random results

Copy the first paragraph manually
Run script
=> new para at end, nothing pasted into
copy again manaully
run script
=> new para at end, nothing pasted into
copy again manaully
run script
=> new para at end, nothing pasted into
run script (with old clipboard contents)
=> new para at end, pasted as second para
run script (with old clipboard contents)
=> new para at end, pasted as third para

Is this black or white magic?


Klaus, OK, before you paste, you need to set the TextSelection to your insertion point.

// Add a new paragraph after the current paragraph. 

var newPgf = oDoc.NewSeriesPgf (lastPgf);

var textRange = new TextRange (new TextLoc (newPgf, 0), new TextLoc (newPgf, 0));


oDoc.TextSelection = textRange;

oDoc.Paste ();

-Rick

2 replies

Legend
March 11, 2015

Klaus, I would suggest that copy/paste might be the easiest way. However, I would not suggest that it is 100% reliable. Usually, I think, but I would not bet on it.

The alternative is to query the text range of each paragraph for any format changes, store each set of properties from the original, then iterate over the new text and reapply. You can find out where formatting changes occur with something like:

textItems = doc.GetTextForRange (textRange, Constants.FTI_CharPropsChange);

Now, I realize this doesn't tell you much and the truth is that it is a complicated concept. I would have to spend all day writing about it, because you need an intimate knowledge of text ranges and text item structures to make it work. Obviously, I can't do that.

What I can do is provide a working sample that shows the concept, although for a somewhat different application. I ran into this same type of issue with a script that applies character formatting, where I wanted to have an Undo feature as well. In order to accomplish an undo, I have to effectively remember the original formatting of the entire text snippet where the new formatting was applied. This is similar to what I think you want... to remember (and reapply) the original formatting of text snippets from the imported RTF content. If you are interested, go here and get the script called ADVANCED_Create_formatting_shortcuts.jsx:

FrameMaker ExtendScript Samples - West Street Consulting

Then, look up the following functions:

CaptureChrFormatUndoSnapshot()

UndoChrFormatApply()

Please accept the disclaimer that this is a complicated concept embedded within a complicated script. I hope it can be of some assistance.

Russ

K.Daube
Community Expert
K.DaubeCommunity ExpertAuthor
Community Expert
March 11, 2015

Thanks to Rick and Russ for the intitial feedback. Russ, Your example is really complicated, but thanks to your extensive comments I should get at least some insight.

My major problem seems to be the understanding of textrange.

- How can I 'grab' a full paragraph?

- How can I 'grab' a part of a paragraph, such as the part behind the first TAB character?

I know that You all do not have much time - in particular compared with me as a retired person. I hope to be patient enough for You. I'm experimenting a lot to enhance my knowledge - mainly based on examples from others.

Legend
March 11, 2015

Klaus,

Working with text is about the most complicated thing to do within FrameMaker. It seems counter-intuitive, since it is about the easiest thing to do with the GUI. But alas, once you remove the ability to select with a mouse and type with a keyboard, text becomes a wild jungle of complexity.

Text ranges are not too bad, once you get the general idea. It is just that... a range of text, like something you would select with a mouse. Like a mouse selection, it starts before some character in some paragraph and ends after some character in some paragraph. It may be the same paragraph, which is a selection within a paragraph. The character can even be the same, which is then just an insertion point (cursor) somewhere.

So, a text range is a data structure that defines two paragraphs and two characters. In the jargon of scripting, the character is called an "offset." An offset is simply the number of characters past the beginning of said paragraph, where 0 is the beginning.

For example, if you want to capture the first five characters of a paragraph as a text range, you can do this, where 'pgf' is some paragraph object:

var textRange = new TextRange();

textRange.beg.obj = pgf;

textRange.beg.offset = 0;

textRange.end.obj = pgf;

textRange.end.offset = 5;

If you want to capture a whole paragraph, change that last line to the number of characters in the pgf, or you can do this:

textRange.end.offset = Constants.FV_OBJ_END_OFFSET;

...where that constant is just some built-in thing that means "get me to the end of whatever." It's a convenience of the interface.

I'll also note that a text range is actually just an array of two text location structures, one named 'beg' and one named 'end.' If you think of a text location as defined by paragraph and an offset from the first character, maybe that will make more sense.

Text item structures are a whole new mess of complexity. I can't possibly go into an explanation of them here.

I think that many ES developers (definitely myself included) still use the FDK documentation because it is considerably more comprehensive. The two interfaces are largely parallel, but of course somewhat different in the language syntax. Consider that as a potential resource.

Russ

frameexpert
Community Expert
Community Expert
March 10, 2015

Hi Klaus, You can push paragraph objects into an array without a special declaration. I am pressed for time, but will try to look at the rest of your question later. -Rick

www.frameexpert.com