PDF-ShellTools > General

Question: I need a way to extract highlights and comments to a CSV


Grant Botes:
I'm looking for a way to extract highlights and comments/notes from sets of PDF files into a CSV file. Is there a way to do it in PDF-ShellTools that I might have missed, or perhaps any suggestions in this regard?

Thanks in advance.

The next script will dump all the PDF annotations found in each of the scripted PDF files to a CSV file.

--- Code: ---var wsShell = pdfe.CreateObject("WScript.Shell");
var dialog = pdfe.SaveDialog;

dialog.DefaultExt = '.csv';
dialog.filter = 'Comma-separated values file (*.csv)|*.csv';
dialog.filename = wsShell.SpecialFolders('MyDocuments') + '\\PDF_Annotations.csv';
dialog.Options = '[ofOverwritePrompt]';

if (dialog.execute) {

    var fso = new ActiveXObject("Scripting.FileSystemObject");
    var CSVFile = fso.CreateTextFile(dialog.Filename, 2, true);

    //using the more functional TAB character as delimiter   
    var listSep = "\t";
    //var listSep = GetUserListSeparator();   
    var CSVLine = StringFormat('Filename{0}Type{0}Comments{0}Author{0}Date{0}Name', [listSep]);
    for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
        var file = pdfe.SelectedFiles(i);
        pdfe.Echo(file.Filename + ' : Extracting annotations');
        var annotations = file.Annotations;
        if (annotations) {
            for (var n = 0; n < annotations.Count; n++) {
                var annot = annotations(n);
                CSVLine = StringFormat('"{1}"{0}"{2}"{0}"{3}"{0}"{4}"{0}"{5}"{0}"{6}"', [listSep, file.Filename, annot.Type, annot.Contents, annot.Author, annot.Date, annot.Name]);
            pdfe.Echo('   ' + annotations.Count + ' annotations extracted');
        } else {
            pdfe.Echo('   Annotations not found.', 0xFF0000);

    pdfe.Echo('Loading CSV file');
    pdfe.Echo('Done.', 0, 2);

function StringFormat(s, args) {
    return s.replace(/{(\d+)}/g, function(match, number) {
        return typeof args[number] != 'undefined' ? args[number] : match;

//reads the list separator character defined in the user regional settings
function GetUserListSeparator() {
    var wsShell = pdfe.CreateObject("WScript.Shell");
    var ListSeparator = wsShell.RegRead("HKCU\\Control Panel\\International\\sList");
    return ListSeparator ? ListSeparator : ",";

--- End code ---
Just import the attached myscript file to your PDF-ShellTools list of scripts, test it and let me know if you need any change.

Grant Botes:
That's perfect! thank you very much!  :)


[0] Message Index


Go to full version