RTTSoftware Support Forum
PDF-ShellTools => General => Topic started by: Grant Botes on December 06, 2017, 12:58:10 AM
-
I'm looking for a way to extract highlights and comments/notes from sets of PDF files into a CSV file. Is there a way to do it in PDF-ShellTools that I might have missed, or perhaps any suggestions in this regard?
Thanks in advance.
-
The next script will dump all the PDF annotations found in each of the scripted PDF files to a CSV file.
var wsShell = pdfe.CreateObject("WScript.Shell");
var dialog = pdfe.SaveDialog;
dialog.DefaultExt = '.csv';
dialog.filter = 'Comma-separated values file (*.csv)|*.csv';
dialog.filename = wsShell.SpecialFolders('MyDocuments') + '\\PDF_Annotations.csv';
dialog.Options = '[ofOverwritePrompt]';
if (dialog.execute) {
var fso = new ActiveXObject("Scripting.FileSystemObject");
var CSVFile = fso.CreateTextFile(dialog.Filename, 2, true);
//using the more functional TAB character as delimiter
var listSep = "\t";
//var listSep = GetUserListSeparator();
var CSVLine = StringFormat('Filename{0}Type{0}Comments{0}Author{0}Date{0}Name', [listSep]);
CSVFile.WriteLine(CSVLine);
for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
var file = pdfe.SelectedFiles(i);
pdfe.Echo(file.Filename + ' : Extracting annotations');
var annotations = file.Annotations;
if (annotations) {
for (var n = 0; n < annotations.Count; n++) {
var annot = annotations(n);
CSVLine = StringFormat('"{1}"{0}"{2}"{0}"{3}"{0}"{4}"{0}"{5}"{0}"{6}"', [listSep, file.Filename, annot.Type, annot.Contents, annot.Author, annot.Date, annot.Name]);
CSVFile.WriteLine(CSVLine);
}
pdfe.Echo(' ' + annotations.Count + ' annotations extracted');
} else {
pdfe.Echo(' Annotations not found.', 0xFF0000);
}
}
CSVFile.Close();
pdfe.Echo('Loading CSV file');
wsShell.Run(dialog.Filename);
pdfe.Echo('Done.', 0, 2);
}
//==============================================================================
function StringFormat(s, args) {
return s.replace(/{(\d+)}/g, function(match, number) {
return typeof args[number] != 'undefined' ? args[number] : match;
});
};
//reads the list separator character defined in the user regional settings
function GetUserListSeparator() {
var wsShell = pdfe.CreateObject("WScript.Shell");
var ListSeparator = wsShell.RegRead("HKCU\\Control Panel\\International\\sList");
return ListSeparator ? ListSeparator : ",";
}
Just import the attached myscript file to your PDF-ShellTools list of scripts (http://www.rttsoftware.com/Manuals/STIndex.htm?pageURL=ST/English/MyScripts.htm), test it and let me know if you need any change.
-
That's perfect! thank you very much! :)
-
Hello,
How to import scripts. Thanks
-
Hello,
How to import scripts. Thanks
From the manager context menu tools options, "scripts" tab, click the "import" button.
(https://www.rttsoftware.com/Manuals/ST/English/images/MyScriptsManager.png)
Check the user's guide, scripts topic (https://www.rttsoftware.com/Manuals/STIndex.htm?pageURL=ST/English/MyScripts.htm), for more details.
-
Thanks a lot.