Copy link to clipboard
Copied
Am trying to extract definitions from a document glossary with a script. Have run into a problem with my lookahead that I can't seem to sort. Glossary entries look like image below:
The problem is that each entry may have one or two tabs. The first tab is rendered as "......" separating the acronym from the definition. Some glossaries have a second tab that appears as a blank space before the definition. The following works fine for glossaries with a single tab.
(?<=\bDRB\x08).*
However, if glossary uses two tabs, regexp picks up the second tab along with the definition. If I change my look ahead to:
(?<=\bDRB\x08\x08).*
It works for with two tabs, but not with one. If I change it to:
(?<=\bDRB\x08+).*
...which should find one or more occurance of the tab character, I get a "Not Found" error. Apparently operators do not work that same way in ascertions as they work in regexps.
Here is an example of using a capture group instead of lookbehind:
#target framemaker
var doc, pgf, text, regex, definition;
doc = app.ActiveDoc;
// Get the paragraph at the curser.
pgf = doc.TextSelection.beg.obj;
// Use a function to get the text (not shown).
text = CP.getText (pgf, doc);
// ExtendScript regular expression literal.
regex = /DRB\x08+(.+)/;
if (regex.test (text) === true) {
definition = regex.exec (text)[1];
alert (definition);
}
The best practice would be to crea
...I am sorry about that. The "g" flag on the regex changes the behavior of the exec method. This one will work:
var text, data;
text = "MPM miles per minute"; // blank space is two tabs
data = getAcronymAndDefinition (text);
if (data) {
alert (data.acronym);
alert (data.definition);
}
function getAcronymAndDefinition (text) {
var regex, data;
// Regular expression for capturing the data.
regex = /([\w\/]+)\t+(.+)/i;
if (regex.test (text) === true) {
...
Copy link to clipboard
Copied
I cannot check this with FrameMaker, but what I would test: pit the tab into brackets:
(?<=\bDRB(\x08)+).*
Does this help?
Copy link to clipboard
Copied
In FrameMaker it does not matter, what I enter in the Find/Replace dialog:
\x08+
(\x08)+
Both find one or several tabs.
Obviously this is different in ExtendScript.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
How are you using the regular expressions in your script? Are you using the doc.Find () method or using the RegExp object? The RegExp object in JavaScript/ExtendScript does not support Lookbehind.
Copy link to clipboard
Copied
Here is an example of using a capture group instead of lookbehind:
#target framemaker
var doc, pgf, text, regex, definition;
doc = app.ActiveDoc;
// Get the paragraph at the curser.
pgf = doc.TextSelection.beg.obj;
// Use a function to get the text (not shown).
text = CP.getText (pgf, doc);
// ExtendScript regular expression literal.
regex = /DRB\x08+(.+)/;
if (regex.test (text) === true) {
definition = regex.exec (text)[1];
alert (definition);
}
The best practice would be to create a function that you could call, perhaps passing in a paragraph and an acronym and then returning the definition. It depends on the overall functionality of your script.
Copy link to clipboard
Copied
Let me thank everyone for your quick responses. Yatani...you solution worked great for selecting each acronym, single or multiple tabs, and associated definition, which are all on one pgf. Rick, your suggestion is the piece I needed to separate out the definition from the rest of the paragraph. I'm not very smart on regex, but this was a great exercise in learning how to use their capture group ability instead of a lookbehind. Much learning I have yet to do.
Copy link to clipboard
Copied
I took Rick's suggestion and came up with the following to extract both the acronym (before the tab(s)) and the definition (following the tab(s)). However, I can't figure out how combine the two searches and capture groups into a single function. Any suggestions to streamline this?
var text = "";
var definition = "";
var acronym = "";
text = "MPM miles per minute"; // blank space is two tabs
getAcronym(); //add acronym to array
getDefinition(); //add definition to array
function getAcronym() {
var regex = /([\w\/]+)\t+.+/ig;
acronym = "";
if (regex.test(text) === true) {
acronym = regex.exec(text)[1]
}
return acronym;
}
function getDefinition() {
var regex = /[\w\/]+\t+(.+)/ig;
definition = "";
if (regex.test(text) === true) {
definition = regex.exec(text)[1]
}
return definition;
}
Bottomline is that it works and was a good exercise for me in working with regexs.
Copy link to clipboard
Copied
This is untested, but I would do something like this:
function getAcronymAndDefinition (text) {
var regex, data;
// Regular expression for capturing the data.
regex = /([\w\/]+)\t+(.+)/ig;
if (regex.test (text) === true) {
// Make an object to return with both values.
data = {};
data.acronym = regex.exec (text)[1];
data.definition = regex.exec (text)[2];
return data;
}
}
Copy link to clipboard
Copied
Rick...thanks for the assist. Unfortunately, every time I run it I get a "null is not an object" on the line...
data.definition = regex.exec (text)[2];
Thought maybe your "data = {}" was declaring an array, so changed it to "data = []" but got the same result. Here's code snippet I tested before turning into a function:
var regex, data;
var text = "DTM data transfer module";
var regex = /([\w\/]+)\t+(.+)/ig;
if (regex.test (text) === true) {
// Make an object to return with both values.
data = [];
data.acronym = regex.exec (text)[1];
data.definition = regex.exec (text)[2];
//return data;
}
//$.writeln(data[acronym]);
//$.writeln(data[1]);
The space in the text string is two tabs.
Copy link to clipboard
Copied
I am sorry about that. The "g" flag on the regex changes the behavior of the exec method. This one will work:
var text, data;
text = "MPM miles per minute"; // blank space is two tabs
data = getAcronymAndDefinition (text);
if (data) {
alert (data.acronym);
alert (data.definition);
}
function getAcronymAndDefinition (text) {
var regex, data;
// Regular expression for capturing the data.
regex = /([\w\/]+)\t+(.+)/i;
if (regex.test (text) === true) {
data = {};
data.acronym = regex.exec (text)[1];
data.definition = regex.exec (text)[2];
return data;
}
}
Copy link to clipboard
Copied
Thanks Rick...that's works like a champ. When I first tried to run your code, the regex.test would not test true. Had to build it up backwards to get it to work; ending up where you started. I should have closed and restarted FM & Extendscript Toolkit to purge any flags or variables from memory. This has been an excellent lesson for me in using regexs & the .exec method, which I had not seen before. Now I can finally use capture groups in my scripts. Hope you have a great Labor Day weekend.