Author Topic: Question: I need a way to extract highlights and comments to a CSV  (Read 39 times)

0 Members and 1 Guest are viewing this topic.

Grant Botes

  • Newbie
  • *
  • Posts: 3
I'm looking for a way to extract highlights and comments/notes from sets of PDF files into a CSV file. Is there a way to do it in PDF-ShellTools that I might have missed, or perhaps any suggestions in this regard?

Thanks in advance.

RTT

  • Administrator
  • *****
  • Posts: 778
Re: Question: I need a way to extract highlights and comments to a CSV
« Reply #1 on: December 06, 2017, 01:06:40 AM »
The next script will dump all the PDF annotations found in each of the scripted PDF files to a CSV file.
Code: [Select]
var wsShell = pdfe.CreateObject("WScript.Shell");
var dialog = pdfe.SaveDialog;

dialog.DefaultExt = '.csv';
dialog.filter = 'Comma-separated values file (*.csv)|*.csv';
dialog.filename = wsShell.SpecialFolders('MyDocuments') + '\\PDF_Annotations.csv';
dialog.Options = '[ofOverwritePrompt]';

if (dialog.execute) {

    var fso = new ActiveXObject("Scripting.FileSystemObject");
    var CSVFile = fso.CreateTextFile(dialog.Filename, 2, true);

    //using the more functional TAB character as delimiter   
    var listSep = "\t";
    //var listSep = GetUserListSeparator();   
    var CSVLine = StringFormat('Filename{0}Type{0}Comments{0}Author{0}Date{0}Name', [listSep]);
    CSVFile.WriteLine(CSVLine);
    for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
        var file = pdfe.SelectedFiles(i);
        pdfe.Echo(file.Filename + ' : Extracting annotations');
        var annotations = file.Annotations;
        if (annotations) {
            for (var n = 0; n < annotations.Count; n++) {
                var annot = annotations(n);
                CSVLine = StringFormat('"{1}"{0}"{2}"{0}"{3}"{0}"{4}"{0}"{5}"{0}"{6}"', [listSep, file.Filename, annot.Type, annot.Contents, annot.Author, annot.Date, annot.Name]);
                CSVFile.WriteLine(CSVLine);
            }
            pdfe.Echo('   ' + annotations.Count + ' annotations extracted');
        } else {
            pdfe.Echo('   Annotations not found.', 0xFF0000);
        }
    }
    CSVFile.Close();

    pdfe.Echo('Loading CSV file');
    wsShell.Run(dialog.Filename);
    pdfe.Echo('Done.', 0, 2);
}

//==============================================================================
function StringFormat(s, args) {
    return s.replace(/{(\d+)}/g, function(match, number) {
        return typeof args[number] != 'undefined' ? args[number] : match;
    });
};

//reads the list separator character defined in the user regional settings
function GetUserListSeparator() {
    var wsShell = pdfe.CreateObject("WScript.Shell");
    var ListSeparator = wsShell.RegRead("HKCU\\Control Panel\\International\\sList");
    return ListSeparator ? ListSeparator : ",";
}
Just import the attached myscript file to your PDF-ShellTools list of scripts, test it and let me know if you need any change.

Grant Botes

  • Newbie
  • *
  • Posts: 3
Re: Question: I need a way to extract highlights and comments to a CSV
« Reply #2 on: December 06, 2017, 09:43:31 AM »
That's perfect! thank you very much!  :)