The script error looks like Illustrator being a buggy jerk: e.g. `paragraphs.length` reporting 3 paragraphs when there are only 2. You can work around it like this: var foundParagraphs = [];
for (var i = 0; i < doc.stories.length; i++) {
var paragraphs = doc.stories[i].paragraphs;
for (var j = 0; j < paragraphs.length; j++) {
try {
foundParagraphs.push(paragraphs[j].contents.replace(/^\s+|\s+$/g));
} catch (e) { // bug workaround where paragraphs.length reports e.g. 3 items but there are only 2
if (e.number !== 1302) { throw e; }
}
}
}
foundParagraphs.sort(); I have to say, testing with the supplied files (which you should really anonymize before sharing, btw) the results are disappointing. That’s not really surprising to me: I was prototyping this sort of QA/revision autonomation over a decade ago and it is never as straightforward as it sounds. Pack copy documents rarely match exactly what appears on the artwork, e.g. “en_GB • Jute roller blind” ends up as “Jute roller blind”, and some fields don’t appear as text at all (e.g. picto names like “Wash care symbol - professional cleaning_do not dry clean.pdf”). BTW, the SourceIllustrator file is not an improvement over the plain text file (which itself is not great), so I wouldn’t bother with that. If you can get the original pack copy document in XML or other structured format, that’s a better starting point as that tells you what each piece of text actually means. From there, you can selectively clean each one according to the same rules the artist follows, e.g. removing the “en_GB/fr_FR/etc •” crud, before trying to match it in the artwork. I must emphasize: your entire ability to automate anything hangs on the quality of your inputs. If the customer’s pack copy is a sloppy mess that only a human artworker/QAer can reliably interpret, don’t even try. Maybe when AGI gets good enough, you’ll be able to throw unpredictable data at it and it’ll do as good a job as a human at making sense of it, but classical automation demands strictly defined, absolutely consistent inputs or you only dig yourself an even bigger hole. The part of my script which extracts the artwork text would be improved by having it ignore the non-pack copy text on “cutter” and “legend” layers. My implementation (which was really just proof of concept) gets the document’s stories, ’cos that’s simplest, but it should really iterate through the individual text frames on the “artwork” layer to extract their text. Oh, and it should also check for any text box overflows while it’s doing that, as an obvious QA issue. (This still won’t guarantee the text is visible on pack, e.g. a text frame could be hidden under an image or solid fill instead of on top of it, but it’d be a very badly built artwork that did that. For that level of verification, you’ll need an OCR approach—which has its own challenges.) Beyond that, for the text comparison step, you really want to use fuzzy text matching, e.g. fuse.js. This will identify text which is mostly, but not quite, identical. Touch-wood that should match up most of the pack copy fields to the text that appears in the artwork, at which point the fine differences (which might just be extra whitespace or fixed punctuation) can be highlighted for manual review. Find yourself a Node.js developer if you can’t write that script yourself. Interop between ExtendScript and Node scripts is a chore (for sharing data between them it’s easiest to read and write JSON files). … TBH, it’d be better to autonomate first artwork production, using scripts to place all the pack copy into a tear-off sheet of pre-built text frames/panels which the artworker can drop into place, or take a lead artwork and replace its original text to assist in producing the adaptations. Eliminating manual copy-paste will reduce QA as the text is already correctly formatted within Illustrator and most text frames be dropped straight into the artwork and sized and positioned to fit. Prebuilt text frames/panels can also be tagged so that QA scripts can easily locate them and compare them for changes. It is much easier to automate checking/amending an artwork that was partly/fully created by automation than an artwork built completely manually. Some human interaction will be unavoidable with artworks of this complexity and range, but you can reduce the variability that human operators always introduce. Sorry that’s not an easy answer, but it’s not an easy problem, both technological and logistical. (I was 90% automating those artworks over a decade ago and the industry still hasn’t progressed any. And while I had first artworks solved I still hadn’t figured a satisfactory solution to QA and amends by the time I left, so I can’t give you all the answers as I don’t know them myself.)
... View more