Help With Regexp Lookaheads to Extract Definitions

Question

Am trying to extract definitions from a document glossary with a script. Have run into a problem with my lookahead that I can't seem to sort. Glossary entries look like image below:

The problem is that each entry may have one or two tabs. The first tab is rendered as "......" separating the acronym from the definition. Some glossaries have a second tab that appears as a blank space before the definition. The following works fine for glossaries with a single tab.

(?<=\bDRB\x08).*

However, if glossary uses two tabs, regexp picks up the second tab along with the definition. If I change my look ahead to:

(?<=\bDRB\x08\x08).*

It works for with two tabs, but not with one. If I change it to:

(?<=\bDRB\x08+).*

...which should find one or more occurance of the tab character, I get a "Not Found" error. Apparently operators do not work that same way in ascertions as they work in regexps.

frameexpert · Accepted Answer

Rick...thanks for the assist. Unfortunately, every time I run it I get a "null is not an object" on the line...

data.definition = regex.exec (text)[2];

Thought maybe your "data = {}" was declaring an array, so changed it to "data = []" but got the same result. Here's code snippet I tested before turning into a function:

var regex, data;
var text = "DTM		data transfer module";
var regex = /([\w\/]+)\t+(.+)/ig;

if (regex.test (text) === true) {
        // Make an object to return with both values.
        data = [];
        data.acronym = regex.exec (text)[1];
        data.definition = regex.exec (text)[2];
        //return data;
        }
//$.writeln(data[acronym]);
//$.writeln(data[1]);

The space in the text string is two tabs.

I am sorry about that. The "g" flag on the regex changes the behavior of the exec method. This one will work:

var text, data; 

text = "MPM		miles per minute";  // blank space is two tabs

data = getAcronymAndDefinition (text);
if (data) {
    alert (data.acronym);
    alert (data.definition);
}

function getAcronymAndDefinition (text) {
    
    var regex, data;
    
    // Regular expression for capturing the data.
    regex = /([\w\/]+)\t+(.+)/i;
    if (regex.test (text) === true) {
        data = {};
        data.acronym = regex.exec (text)[1];
        data.definition = regex.exec (text)[2];
        return data;
    }
}

yatani · Answer

Hi, FightergatorTry this code instead. (?<=\x08)\b.*

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded