PDF-ShellTools > General

Question: I need a way to extract highlights and comments to a CSV

(1/2) > >>

Grant Botes:
I'm looking for a way to extract highlights and comments/notes from sets of PDF files into a CSV file. Is there a way to do it in PDF-ShellTools that I might have missed, or perhaps any suggestions in this regard?

Thanks in advance.

RTT:
The next script will dump all the PDF annotations found in each of the scripted PDF files to a CSV file.

--- Code: ---var wsShell = pdfe.CreateObject("WScript.Shell");
var dialog = pdfe.SaveDialog;

dialog.DefaultExt = '.csv';
dialog.filter = 'Comma-separated values file (*.csv)|*.csv';
dialog.filename = wsShell.SpecialFolders('MyDocuments') + '\\PDF_Annotations.csv';
dialog.Options = '[ofOverwritePrompt]';

if (dialog.execute) {

    var fso = new ActiveXObject("Scripting.FileSystemObject");
    var CSVFile = fso.CreateTextFile(dialog.Filename, 2, true);

    //using the more functional TAB character as delimiter   
    var listSep = "\t";
    //var listSep = GetUserListSeparator();   
    var CSVLine = StringFormat('Filename{0}Type{0}Comments{0}Author{0}Date{0}Name', [listSep]);
    CSVFile.WriteLine(CSVLine);
    for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
        var file = pdfe.SelectedFiles(i);
        pdfe.Echo(file.Filename + ' : Extracting annotations');
        var annotations = file.Annotations;
        if (annotations) {
            for (var n = 0; n < annotations.Count; n++) {
                var annot = annotations(n);
                CSVLine = StringFormat('"{1}"{0}"{2}"{0}"{3}"{0}"{4}"{0}"{5}"{0}"{6}"', [listSep, file.Filename, annot.Type, annot.Contents, annot.Author, annot.Date, annot.Name]);
                CSVFile.WriteLine(CSVLine);
            }
            pdfe.Echo('   ' + annotations.Count + ' annotations extracted');
        } else {
            pdfe.Echo('   Annotations not found.', 0xFF0000);
        }
    }
    CSVFile.Close();

    pdfe.Echo('Loading CSV file');
    wsShell.Run(dialog.Filename);
    pdfe.Echo('Done.', 0, 2);
}

//==============================================================================
function StringFormat(s, args) {
    return s.replace(/{(\d+)}/g, function(match, number) {
        return typeof args[number] != 'undefined' ? args[number] : match;
    });
};

//reads the list separator character defined in the user regional settings
function GetUserListSeparator() {
    var wsShell = pdfe.CreateObject("WScript.Shell");
    var ListSeparator = wsShell.RegRead("HKCU\\Control Panel\\International\\sList");
    return ListSeparator ? ListSeparator : ",";
}

--- End code ---
Just import the attached myscript file to your PDF-ShellTools list of scripts, test it and let me know if you need any change.

Grant Botes:
That's perfect! thank you very much!  :)

addala:
Hello,
How to import  scripts. Thanks

RTT:

--- Quote from: addala on November 02, 2018, 11:35:39 AM ---Hello,
How to import  scripts. Thanks

--- End quote ---

From the manager context menu tools options, "scripts" tab, click the "import" button.

Check the user's guide, scripts topic, for more details.

Navigation

[0] Message Index

[#] Next page

Go to full version