Skip to main content
riskypotato
Known Participant
August 17, 2009
Question

caption to keywords, minus grammar

  • August 17, 2009
  • 1 reply
  • 4580 views

Hi all,

I need a script that will take the caption of a photo (description field of IPTC core), then remove all pronouns, prepositions, articles, conjunctions, plus of course all punctuation, and then place the remaining words into the keywords field. There may be some other words I'll want removed too. I have never written a script and know nothing about writing scripts. I was hoping someone could point me in the right direction. Thanks for any help!

Phil

This topic has been closed for replies.

1 reply

Paul Riggott
Inspiring
August 19, 2009

Here is a start for you.

You will need to create a dictionary of words, I just created a text file with one word per line IE:

most 
much  
my
myself
neither
no one  
nobody
none
nothing
one
one another

NB: You will need to amend the code with the correct path to your dictionary!!!

The script will show up in the right click menu

#target bridge  
   if( BridgeTalk.appName == "bridge" ) { 
CreateKeys = new MenuElement("command", "Keys from Description", "at the end of Thumbnail");
}
CreateKeys.onSelect = function () {
   keysFromDesc();
   }

function keysFromDesc(){
///////////////////////////////////////////////////////////////////////////////////////////
// AMEND PATH TO SUIT!
var pronouns =File("~/desktop/pronouns.txt");
////////////////////////////////////////////////////////////////////////////////////////////
if(!pronouns.exists){
  alert("Dictionary does not exit " + pronouns);
  return;
}
pronounsList=[];
pronouns.open("r");
while(!pronouns.eof){
var line = pronouns.readln();
if(line.length>2){
line=line.replace(/^[ \t]+|[ \t]+$/g,'');
pronounsList.push(line+" ");
}
}
pronouns.close();
pronounsList=pronounsList.toString().replace(/,/g,'|');
var items = app.document.selections;
var items2 = new Array;
for (var a =0; a<items.length;a++){
  if(items.type == "file") items2.push(items);
  }
items = items2;

for (var i = 0; i < items.length; ++i) {
var item = items;
var md = item.synchronousMetadata;
var str = md.read("
http://purl.org/dc/elements/1.1/","dc:description");
md.namespace = "http://ns.adobe.com/photoshop/1.0/";
md.Keywords = formatDesc(str);
}

function formatDesc(str){
str=" "+str+" ";
str=str.replace (/[\!\"\£\$\%\^\&\*\(\)\:\;\,\'\\\/]/g,'');
var rex = new RegExp (pronounsList,'gi');
var keys= str.replace (rex, '').split(' ');
return  ReturnUniqueSortedList(keys).join(';');
}

function ReturnUniqueSortedList(ArrayName){
var unduped = new Object;
for (var i = 0; i < ArrayName.length; i++) {  
unduped[ArrayName] = ArrayName;
}
var uniques = new Array;
for (var k in unduped) {
   uniques.push(unduped);
   }
return uniques.sort();
}
}

riskypotato
Known Participant
August 19, 2009

Paul,

Wow this is fantastic! Okay, let's say I have this list of words: as, on, in, the, and, or, but (the real list will be much longer), and they are in a folder called "wordsnot" on the root level of my Macintosh HD. Would this be correct?

var pronouns =File("Macintosh HD/wordsnot.txt");

Also, what about punctuation marks in the caption, such as commas and periods?

Many thanks!

Phil

Paul Riggott
Inspiring
August 19, 2009

Ah I don't think I have have checked for period and cr lf,will do that. Not too sure on the mac filepath as I try to keep well away from my Macs unless I have to. The list at the moment need to be one per line but could be changed if required.

Paul.