PDF Explorer > General

Archive scan depth limit

<< < (2/3) > >>

RTT:

--- Quote from: Padanges on November 26, 2016, 08:46:26 AM ---I used to extract file name from full path by checking whether it's inside an archive with such code:

--- Code: ---if (fileName.indexOf('>') > 0) {                // remove archive name tag
fileName = fileName.substring(fileName.indexOf('>') + 1); }

--- End code ---
After messing around I found out that it would not work properly depending on archive depth.

--- End quote ---
Try this way:
FileName = FileName.substring(FileName.indexOf('>') + 1).split('|').slice(-1)[0];


--- Quote ---Currently our file name pattern is: <archive.zip>archive-inside.zip|document-inside.pdf .
Wouldn't it be simpler if we had pattern like this: <archive.zip><archive-inside.zip>document-inside.pdf ?

--- End quote ---
No. Current format makes it easy to parse with a simple split operation. What's after the main archive name will be handled by the un-archive code, and it is passed to it as the filename to extract. It splits it and follows the split array in order to reach the last level, that is the file the caller requested.

RTT:

--- Quote from: Padanges on November 26, 2016, 08:59:47 AM ---I think limiting scan depth should even speed-up file scanning in cases where we have archived archives of various recognizable file types.

--- End quote ---
How's that? ???

Padanges:

--- Quote ---How's that?
--- End quote ---

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?

RTT:

--- Quote from: Padanges on November 30, 2016, 10:26:17 AM ---
--- Quote ---How's that?
--- End quote ---

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?

--- End quote ---
The archive within archive scan depth, that I just finished implementing, is about instructing the scanner how many levels of archives inside archives should be scanned. If in the scenario you are referring, these .txt files are archived in an archive inside a main archive, then setting the scan depth can indeed exclude these files from the indexation, and speed-up the scanning. But if you just want to scan all, the scan depth check, in the end, makes the process slower. But not that much, and the feature is indeed useful.

Padanges:

--- Quote ---The archive within archive scan depth, that I just finished implementing, is about instructing the scanner how many levels of archives inside archives should be scanned.
--- End quote ---
That's sweet :)


--- Quote ---FileName = FileName.substring(FileName.indexOf('>') + 1).split('|').slice(-1)[0];
--- End quote ---
An alternative code could be: FileName = FileName.substring(FileName.lastIndexOf('|') + 1);

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version