PDF Explorer > General

Keywords export in PDF Explorer?

(1/1)

Wolfram:
Your PDF Explorer as a great product. Keyword editing is easy with your program, even with many pdf-files.

What I would need and could not find: Keyword list export.

Keywords of every single file are shown in PDF Explorer. For searching specific keywords (stored as pdf meta data or XMP meta data) I need a list of all available keywords. (sorted A-Z?)
This list would also be very helpfull for a database of already submitted keywords.

Q1: How can I do a export of keywords with present version already today?

Q2: What is your opinion if such an export is worth considering ? And if yes, what are your plans?

Q3: Do you know other software permitting an export of keywords after scanning pdf´s?

Hoping for your answer with comments.

RTT:

--- Quote ---Q1: How can I do a export of keywords with present version already today?
--- End quote ---
You can use the File>ExportGridFields tool to export the keywords column to an external .csv or .txt file.
Then you can post-process that file, using  a script, etc., to build your list of sorted, an unique, keywords list.

Here is a simple script you could use to post-process that exported keywords column file.


--- Code: ---/*****************
Helper prototype methods
*****************/
if (!Array.indexOf) {
  Array.prototype.indexOf = function (obj, start) {
    for (var i = (start || 0); i < this.length; i++) {
      if (this[i] == obj) {
        return i;
      }
    }
    return -1;
  }
}

String.prototype.trim = function() {
    var    str = this.replace(/^\s\s*/, ''),
        ws = /\s/,
        i = str.length;
    while (ws.test(str.charAt(--i)));
    return str.slice(0, i + 1);
}

Array.prototype.uniqueMerge = function( a ) {
    for ( var i = 0, l = a.length; i<l; ++i ) {
    var s=a[i].replace(/['"]/g,'').trim();
        if (s && this.indexOf( s ) === -1 ) {
            this.push( s );
        }
    }
    return this
};

/*****************
code starts here
****************/
var fso = new ActiveXObject("Scripting.FileSystemObject");
f = fso.OpenTextFile(WScript.Arguments.Item(0), 1);
var keywordsList=new Array;
while (!f.AtEndOfStream) {
   var keywords=f.ReadLine().split(/,|;/);
     keywordsList=keywordsList.uniqueMerge(keywords);
}

keywordsList=keywordsList.sort();

WScript.echo(keywordsList.join('; '))
--- End code ---
Just save this code to a BuildKeywordsList.js file, and drag-drop on it the PDFE exported keywords column file. It will show you a popup with the sorted list of all your PDFs keywords.

if you want that output in a file, run the next command line in the same folder where you have the BuildKeywordsList.js and Keywords.txt
cscript //NoLogo BuildKeywordsList.js Keywords.txt>KeywordsList.txt

And feel free to ask, If you have doubts on the usage of this solution.


--- Quote ---Q2: What is your opinion if such an export is worth considering ? And if yes, what are your plans?
--- End quote ---
The next to release PDFE version has a scripting tool, so custom scripting this kind of tasks will be quite easy.


--- Quote ---Q3: Do you know other software permitting an export of keywords after scanning pdf´s?
--- End quote ---
No

Padanges:

--- Code: ---if (!Array.indexOf) {
  Array.prototype.indexOf = function (obj, start) {
    for (var i = (start || 0); i < this.length; i++) {
      if (this[i] == obj) {
        return i;
      }
    }
    return -1;
  }
}

--- End code ---

Could you please clarify why do we use the keyword start here? The method prototype works just fine without that too.

RTT:

--- Quote from: Padanges on September 05, 2016, 12:22:06 PM ---Could you please clarify why do we use the keyword start here? The method prototype works just fine without that too.

--- End quote ---
Not quite sure what you are asking here. :-\
The JScript language lacks the indexOf method for arrays. The above code just adds it to its prototype. The code implements the same behavior of the one in JavaScript. Read here about the start optional parameter.

Padanges:
"It's an optional parameter which defines where to start the search". Thanks.

Navigation

[0] Message Index

Go to full version