PDF-ShellTools > Ideas/Suggestions

Script to count how many colour pages in PDF?

(1/4) > >>

nightslayer23:
Hi all, so I'm in need of a script similar to RapidPDFCount which uses a DLL script to count how many pages of a PDF are colour and how many are black & white..

Is there a way of getting a script for PDF Shell Tools to do the same thing? Then display the count in a custom Collumn?

RTT:
That's functionality not directly available from the scripts API but we can create a script to automate the ImageMagick tool and get that info.
The idea is to render each PDF page and analyze the result bitmaps for color content.

Made some test and here is a sample script that creates a .csv file with a "Color Pages Count" and "BW/Gray Page Count" columns. It renders each PDF page, converts the result bitmaps to the HSI colorspace and computes the mean value of the saturation channel. The page is considered colorized if this value is higher than 0, or BW/Gray otherwise. You may adjust this threshold to your needs.


--- Code: ---// Add format function to the String prototype
// First, checks if it isn't implemented yet.
if (!String.prototype.format) {
    String.prototype.format = function() {
        var args = arguments;
        return this.replace(/{(\d+)}/g, function(match, number) {
            return typeof args[number] != 'undefined' ? args[number] : match;
        });
    };
}

var imo = new ActiveXObject("ImageMagickObject.MagickImage.1");
var fso = new ActiveXObject("Scripting.FileSystemObject");

var tmpfolder = fso.GetSpecialFolder(2 /*TemporaryFolder*/ );
var InfoFilename = tmpfolder + '\\PagesInfo.txt';
var CSVOutputFileName = tmpfolder + '\\' + fso.GetTempName();
var CSVOutputFile = fso.CreateTextFile(CSVOutputFileName, true, true);

//write header line to the csv file
CSVOutputFile.WriteLine('Filename,Status,"Pages Count","Color Pages Count","BW/Gray Pages Count"');

for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
    var file = pdfe.SelectedFiles(i);
    pdfe.echo('Processing ' + file.filename);
    try {
        //use imagemagick to render each pdf page, convert the result image colorspace
        //to HSI and output "1" if the mean of the saturation values is higher
        //than 0 (the page has color), and "0" if 0 (no color in the page)
        imo.convert(file.filename, "-colorspace", "HSI", "-format", "%[fx:mean.g>0?1:0]", "info:" + InfoFilename);

        //read the result info file, that contains a "0" or "1" for each page
        //in the PDF. E.g. 0110, for a 4 pages PDF with pages 1 and 4 being bw/gray
        //and 2 and 3 with color.
        var f = fso.GetFile(InfoFilename);
        var fts = f.OpenAsTextStream();
        var info = fts.ReadAll();
        fts.Close();
        f.Delete();

        //Count the number of "1"
        var ColorPagesCount = info.split('1').length - 1;
        //Count the number of "0"       
        var BWPagesCount = info.split('0').length - 1;

        pdfe.echo(file.filename + ': Color Pages Count = ' + ColorPagesCount + ',BW / Gray Page Count = ' + BWPagesCount, 0, 2);
        CSVOutputFile.WriteLine('"{0}",OK,{1},{2},{3}'.format(file.filename, file.NumPages, ColorPagesCount, BWPagesCount));
    } catch (e) {
        pdfe.echo(file.filename + ': Error (' + e.message + ')', 0xff0000, 2);
        CSVOutputFile.WriteLine('"{0}",Failed'.format(file.filename));
    }
}
CSVOutputFile.Close();
dialog = pdfe.SaveDialog;
dialog.DefaultExt = '.csv';
dialog.filter = 'CSV (*.csv)|*.csv';
dialog.Options = '[ofOverwritePrompt]';
dialog.Filename = fso.GetParentFolderName(file.filename) + '\\PDFsInfo.csv';
if (dialog.execute) {
    if (fso.FileExists(dialog.Filename)) fso.DeleteFile(dialog.Filename);
    fso.MoveFile(CSVOutputFileName, dialog.Filename);
    var WshShell = WScript.CreateObject("WScript.Shell");
    WshShell.Run(dialog.Filename);
} else {
    fso.DeleteFile(CSVOutputFileName);
}
--- End code ---
To test it, just import the attached .myscript file into the PDF-ShellTools My Scripts, and you will get a "Number of Color and BW/Gray pages" named script, you can invoke for all the selected PDF files from the Windows shell PDF files context menu, from the PDF-ShellTools>My Scripts sub menu.
The scrip needs to have the 32-bit version of the ImageMagick tool installed. I've tested with the ImageMagick-7.0.5-5-Q16-x86-dll.exe one. While installing, make sure you select the "Install ImageMagickObject OLE Control for VBScript,..." option, under the "additional tasks" page of the installer.
The ImageMagick also needs to have the Ghostscript tool installed, to handle the PDF format.

If the script is performing as needed we can change it to put the info into custom metadata properties, as you suggested.

nightslayer23:
This works!

Thank you so much!

Could we also have an extention of this script to allow us to display the output results as custom columns in windrows explorer?

RTT:
Variant of the above script, to save the result in 2 custom metadata properties. This makes the result of the slow process to calculate these values immediately available to the shell, e.g. for easy management of the PDFs, the next time we need to know this information, without the need to run the script again. And because these custom metadata properties are saved in the PDF file itself, there is no risk of information loss if the files are moved to another disk or sent to someone else.

--- Code: ---var imo = new ActiveXObject("ImageMagickObject.MagickImage.1");
var fso = new ActiveXObject("Scripting.FileSystemObject");

var tmpfolder = fso.GetSpecialFolder(2 /*TemporaryFolder*/ );
var InfoFilename = tmpfolder + '\\PagesInfo.txt';

var ProgressBar = pdfe.ProgressBar;
ProgressBar.max = pdfe.SelectedFiles.Count;

for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
    ProgressBar.position = i + 1;
    var file = pdfe.SelectedFiles(i);
    var FileMetadata = file.Metadata;

    //Bypass already processed files.
    if (FileMetadata.ColorPagesCount && FileMetadata.BWGrayPagesCount) {
        pdfe.echo(file.filename + ': Color Pages Count = ' + FileMetadata.ColorPagesCount + ', BW/Gray Pages Count = ' + FileMetadata.BWGrayPagesCount);
        pdfe.echo(' [Already set]', 0xFF, 1);
        continue;
    }

    pdfe.echo('Processing ' + file.filename + ' (' + file.NumPages + ' pages)');
    try {
        //use imagemagick to render each pdf page, convert the result image colorspace
        //to HSI and output "1" if the mean of the saturation values is higher
        //than 0 (the page has color), and "0" if 0 (no color in the page) 
        imo.convert(file.filename, "-colorspace", "HSI", "-format", "%[fx:mean.g>0?1:0]", "info:" + InfoFilename);

        //read the result info file, that contains a "0" or "1" for each page
        //in the PDF. E.g. 0110, for a 4 pages PDF with pages 1 and 4 being bw/gray
        //and 2 and 3 with color.
        var f = fso.GetFile(InfoFilename);
        var fts = f.OpenAsTextStream();
        var info = fts.ReadAll();
        fts.Close();
        f.Delete();

        //Count the number of "1"
        var ColorPagesCount = info.split('1').length - 1;
        //Count the number of "0"       
        var BWGrayPagesCount = info.split('0').length - 1;

        pdfe.echo(file.filename + ': Color Pages Count = ' + ColorPagesCount + ', BW/Gray Pages Count = ' + BWGrayPagesCount, 0, 2);

        if (FileMetadata.ColorPagesCount !== ColorPagesCount.toString() || FileMetadata.BWGrayPagesCount !== BWGrayPagesCount.toString()) {
            FileMetadata.ColorPagesCount = ColorPagesCount;
            FileMetadata.BWGrayPagesCount = BWGrayPagesCount;
            if (FileMetadata.CommitChanges()) {
                pdfe.echo(' [OK]', 0x006400, 1);
            } else {
                pdfe.echo(' [Setting metadata failed]', 0xFF0000, 1);
            }
        } else {
            pdfe.echo(' [Already set]', 0xFF, 1);
        }

    } catch (e) {
        pdfe.echo(file.filename + ' : ', 0, 2);
        pdfe.echo(e.name + ' ( ' + e.message + ' )', 0xff0000, 1);
    }
}

pdfe.echo('Done');

--- End code ---

As with the page size script, there is the need to define the custom properties that will hold the values. This script expects two custom properties, named "ColorPagesCount" and "BWGrayPagesCount", as shown in the attached screenshot.

nightslayer23:
What about another function similar to this, but instead of telling you colour or black & white...it tells you the ink coverage on each page and then displays that as a value in explorer? Would be basically the same just looking at a percentage of 0-33% , 34-66% or 67-100% and displaying that as Line, Medium or High as the value?

Navigation

[0] Message Index

[#] Next page

Go to full version