PDF Explorer > Ideas/Suggestions

Extract file Checksum

<< < (2/4) > >>

Padanges:
Moreover, I found (using the first version of the code) that large files fail to get any checksum at all. It's usually for files over 100mb, though on some occasion even they sometimes get the calculations done. The 600mb file - failed always. But that was not a problem for external software tools.
Could you investigate this issues? Why is this script not fully working? Memory limitations? How to overrun them? How to estimate limits?

RTT:

--- Quote from: Padanges on August 27, 2016, 08:10:34 PM ---Actually, each time I change code in any Script, it does not work immediately after closing the editor.
Strangely enough, when I run debug for the code on the Jscript side, it returns me a number as a command prompt answer. But the Edit Info batch tool returns only empty strings (even when I change return to "any text" on Case Esle in VBscript). Something's buggy here.

--- End quote ---
I'm testing with my development version and I forgot that the version you have doesn't support library includes in these tools (Renamer, Edit Info Fields, etc.) that call a specif function in the script. It works for the My Scripts batch tool, and from the script editor, but only after that 1.5.66.2 version got release I fixed this problem in these other tools. My other tool, the PDF-ShellTools, have this already fixed, so you may test this with it.


--- Quote from: Padanges on August 27, 2016, 08:30:55 PM ---Moreover, I found (using the first version of the code) that large files fail to get any checksum at all. It's usually for files over 100mb, though on some occasion even they sometimes get the calculations done. The 600mb file - failed always. But that was not a problem for external software tools.
Could you investigate this issues? Why is this script not fully working? Memory limitations? How to overrun them? How to estimate limits?

--- End quote ---
The problem with that script is that it loads all the file content into memory and so it fails with these bigger files. Anyway, here is a totally rewritten CalcHash function, now in JScript language, that calculates the file hash in chunks. Take note the main function name is CalcHashJS, so name your script, and call tags, accordingly. For convenience (for other, not so experienced users), I'm including the stand alone myscript file attached, so you just need to import it.

--- Code: ---var EL;

function CalcHashJS() {
    if (!EL) {
        var MSXML = new ActiveXObject("MSXML2.DOMDocument");
        EL = MSXML.createElement("tmp");
        EL.dataType = "bin.hex";
    }
    EL.nodeTypedValue = GetFileHash(BatchFile.Filename, CurrentField.Value);
    return EL.text;
}

function GetFileHash(filename, hashtype) {
    var stream = GetBinaryFileStream(filename);
    try {
        switch (hashtype) {
        case "md5":
            return ComputeHash(stream, "System.Security.Cryptography.MD5CryptoServiceProvider");
            break;
        case "sha1":
            return ComputeHash(stream, "System.Security.Cryptography.SHA1Managed");
            break;
        case "sha256":
            return ComputeHash(stream, "System.Security.Cryptography.SHA256Managed");
            break;
        case "sha384":
            return ComputeHash(stream, "System.Security.Cryptography.SHA384Managed");
            break;
        case "sha512":
            return ComputeHash(stream, "System.Security.Cryptography.SHA512Managed");
            break;
        case "ripemd160":
            return ComputeHash(stream, "System.Security.Cryptography.ripemd160Managed")
            break;
        default:
            return "";
        }
    } catch (err) {
        return err.message
    } finally {
        stream.Close;
    }
}

function ComputeHash(stream, objectid) {
    var BlockSize = 65536;
    var HashObj = new ActiveXObject(objectid);
    try {
        var nBlocks = parseInt(stream.size / BlockSize);
        for (var i = 1; i < nBlocks; i++) {
            var block = stream.read(BlockSize);
            HashObj.TransformBlock(block, 0, BlockSize, block, 0);
        }
        var n = stream.size - stream.position;
        block = stream.read(n);
        HashObj.TransformFinalBlock(block, 0, n);
        return HashObj.Hash;
    } finally {
        HashObj.Clear
    }
}

function GetBinaryFileStream(filename) {
    objStream = new ActiveXObject("ADODB.Stream");
    objStream.Type = 1; //adTypeBinary
    objStream.Open();
    objStream.LoadFromFile(filename);
    return objStream
}

--- End code ---

Padanges:
Thanks! But there's still one problem - we need to fix the value return for the archived files: they get extracted, and get log status "OK" but the value is not returned for the entry.

By the way, what do you get when you write "var EL;" before the "function CalcHashJS() {" line? The code works fine with the line under the begin/end function scope. And what does EL stand for? Just curious :)

RTT:

--- Quote from: Padanges on August 29, 2016, 08:48:11 AM ---Thanks! But there's still one problem - we need to fix the value return for the archived files: they get extracted, and get log status "OK" but the value is not returned for the entry.

--- End quote ---
Now with code to deal with archived files:

--- Code: ---var fso = new ActiveXObject("Scripting.FileSystemObject");
var EL;

function CalcHashJS() {
    if (!EL) {
        var MSXML = new ActiveXObject("MSXML2.DOMDocument");
        EL = MSXML.createElement("tmp");
        EL.dataType = "bin.hex";
    }

    var filename = BatchFile.Filename;
    if (filename.indexOf('>') > 0) { //file is archived
        filename = fso.getFileName(filename); //get archived file name, by removig the archive name and any internal path
        filename = fso.BuildPath(fso.GetSpecialFolder(2 /*TemporaryFolder*/ ), filename);
        BatchFile.BookmarkRoot; //trick to trigger the file extraction from the archive to the system temporary folder
        if (!fso.FileExists(filename)) return "extraction from archive failed"
    }

    EL.nodeTypedValue = GetFileHash(filename, CurrentField.Value);
    return EL.text;
}

function GetFileHash(filename, hashtype) {
    var stream = GetBinaryFileStream(filename);
    try {
        switch (hashtype) {
        case "md5":
            return ComputeHash(stream, "System.Security.Cryptography.MD5CryptoServiceProvider");
            break;
        case "sha1":
            return ComputeHash(stream, "System.Security.Cryptography.SHA1Managed");
            break;
        case "sha256":
            return ComputeHash(stream, "System.Security.Cryptography.SHA256Managed");
            break;
        case "sha384":
            return ComputeHash(stream, "System.Security.Cryptography.SHA384Managed");
            break;
        case "sha512":
            return ComputeHash(stream, "System.Security.Cryptography.SHA512Managed");
            break;
        case "ripemd160":
            return ComputeHash(stream, "System.Security.Cryptography.ripemd160Managed")
            break;
        default:
            return "";
        }
    } catch (err) {
        return err.message
    } finally {
        stream.Close;
    }
}

function ComputeHash(stream, objectid) {
    var BlockSize = 65536;
    var HashObj = new ActiveXObject(objectid);
    try {
        var nBlocks = parseInt(stream.size / BlockSize);
        for (var i = 1; i < nBlocks; i++) {
            var block = stream.read(BlockSize);
            HashObj.TransformBlock(block, 0, BlockSize, block, 0);
        }
        var n = stream.size - stream.position;
        block = stream.read(n);
        HashObj.TransformFinalBlock(block, 0, n);
        return HashObj.Hash;
    } finally {
        HashObj.Clear
    }
}

function GetBinaryFileStream(filename) {
    objStream = new ActiveXObject("ADODB.Stream");
    objStream.Type = 1; //adTypeBinary
    objStream.Open();
    objStream.LoadFromFile(filename);
    return objStream
}

--- End code ---

--- Quote ---By the way, what do you get when you write "var EL;" before the "function CalcHashJS() {" line? The code works fine with the line under the begin/end function scope. And what does EL stand for? Just curious :)

--- End quote ---
Just a small optimization, and there are other objects (e.g. the stream) that can have similar optimization. When used by the tools, on multiple files, this function will be called multiple times too, so no need to create the MSXML and add the node element (the "EL", used to convert the variant array of bytes hash to hexadecimal representation) with each call. Just put it on a global variable and reuse it on the
next calls.
Replace the above script "CalcHashJS" function with this one:

--- Code: ---function CalcHashJS() {
    var msg = '';
    if (!EL) {
        var MSXML = new ActiveXObject("MSXML2.DOMDocument");
        EL = MSXML.createElement("tmp");
        EL.dataType = "bin.hex";
        msg = '(EL created)';
    }

    var filename = BatchFile.Filename;
    if (filename.indexOf('>') > 0) { //file is archived
        filename = fso.getFileName(filename); //get archived file name, by removig the archive name and any internal path
        filename = fso.BuildPath(fso.GetSpecialFolder(2 /*TemporaryFolder*/ ), filename);
        BatchFile.BookmarkRoot; //trick to trigger the file extraction from the archive to the system temporary folder
        if (!fso.FileExists(filename)) return "extraction from archive failed"
    }

    EL.nodeTypedValue = GetFileHash(filename, CurrentField.Value);
    return msg + EL.text;
}
--- End code ---
Check it on the renamer tool with a multiple files selection.
Now move the "EL" inside the function, and check again.  ;)

Padanges:
The script always returns "extraction from archive failed" for the archives. As I understand - there's a problem with the getFileName() function.
We need to change the line:

--- Code: ---filename = fso.getFileName(filename); //get archived file name, by removig the archive name and any internal path
--- End code ---
to the line:

--- Code: ---filename = filename.substring(filename.indexOf('>')+1,filename.length); // remove the tag
--- End code ---

Thanks for the explanation! So if I get it right, then the line:

--- Code: ---if (!EL) {
--- End code ---
is actually:

--- Code: ---if (typeof EL === 'undefined' || EL === null) {
--- End code ---
which optimizes the batch process by catching undeleted object.

It is interesting to note that ZIP archives return "(EL created)+HASH" despite "var El;" being either inside function scope, or outside, while RAR archives react to this as all the rest of files. I checked this with BatchInfoEdit tool, not the BatchRename tool though.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Reply

Go to full version