Author Topic: Archive scan depth limit  (Read 10324 times)

0 Members and 1 Guest are viewing this topic.

Padanges

  • Newbie
  • *
  • Posts: 179
Archive scan depth limit
« on: November 22, 2016, 10:36:23 AM »
Hi,
is it possible to limit the depth of archives for document scanning? For example, I have an archive within an archive, and I would like to find only documents which are only in the primary archive - is there a way to do that?


Thanks in advance

RTT

  • Administrator
  • *****
  • Posts: 907
Re: Archive scan depth limit
« Reply #1 on: November 23, 2016, 12:04:56 AM »
Not possible right now but, but definitively something that may be implemented. I will check it. Thanks for the suggestion.

Padanges

  • Newbie
  • *
  • Posts: 179
Re: Archive scan depth limit
« Reply #2 on: November 25, 2016, 07:54:15 AM »
This feature would be most welcome ;D

Padanges

  • Newbie
  • *
  • Posts: 179
Re: Archive scan depth limit
« Reply #3 on: November 26, 2016, 08:46:26 AM »
I used to extract file name from full path by checking whether it's inside an archive with such code:
Code: [Select]
if (fileName.indexOf('>') > 0) {                // remove archive name tag
fileName = fileName.substring(fileName.indexOf('>') + 1); }
After messing around I found out that it would not work properly depending on archive depth.
Currently our file name pattern is: <archive.zip>archive-inside.zip|document-inside.pdf .
Wouldn't it be simpler if we had pattern like this: <archive.zip><archive-inside.zip>document-inside.pdf ?

Padanges

  • Newbie
  • *
  • Posts: 179
Re: Archive scan depth limit
« Reply #4 on: November 26, 2016, 08:59:47 AM »
I think limiting scan depth should even speed-up file scanning in cases where we have archived archives of various recognizable file types.

RTT

  • Administrator
  • *****
  • Posts: 907
Re: Archive scan depth limit
« Reply #5 on: November 27, 2016, 01:59:50 AM »
I used to extract file name from full path by checking whether it's inside an archive with such code:
Code: [Select]
if (fileName.indexOf('>') > 0) {                // remove archive name tag
fileName = fileName.substring(fileName.indexOf('>') + 1); }
After messing around I found out that it would not work properly depending on archive depth.
Try this way:
FileName = FileName.substring(FileName.indexOf('>') + 1).split('|').slice(-1)[0];

Quote
Currently our file name pattern is: <archive.zip>archive-inside.zip|document-inside.pdf .
Wouldn't it be simpler if we had pattern like this: <archive.zip><archive-inside.zip>document-inside.pdf ?
No. Current format makes it easy to parse with a simple split operation. What's after the main archive name will be handled by the un-archive code, and it is passed to it as the filename to extract. It splits it and follows the split array in order to reach the last level, that is the file the caller requested.

RTT

  • Administrator
  • *****
  • Posts: 907
Re: Archive scan depth limit
« Reply #6 on: November 27, 2016, 02:03:57 AM »
I think limiting scan depth should even speed-up file scanning in cases where we have archived archives of various recognizable file types.
How's that? ???

Padanges

  • Newbie
  • *
  • Posts: 179
Re: Archive scan depth limit
« Reply #7 on: November 30, 2016, 10:26:17 AM »
Quote
How's that?

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?

RTT

  • Administrator
  • *****
  • Posts: 907
Re: Archive scan depth limit
« Reply #8 on: December 01, 2016, 12:47:47 AM »
Quote
How's that?

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?
The archive within archive scan depth, that I just finished implementing, is about instructing the scanner how many levels of archives inside archives should be scanned. If in the scenario you are referring, these .txt files are archived in an archive inside a main archive, then setting the scan depth can indeed exclude these files from the indexation, and speed-up the scanning. But if you just want to scan all, the scan depth check, in the end, makes the process slower. But not that much, and the feature is indeed useful.

Padanges

  • Newbie
  • *
  • Posts: 179
Re: Archive scan depth limit
« Reply #9 on: December 16, 2016, 09:26:56 AM »
Quote
The archive within archive scan depth, that I just finished implementing, is about instructing the scanner how many levels of archives inside archives should be scanned.
That's sweet :)

Quote
FileName = FileName.substring(FileName.indexOf('>') + 1).split('|').slice(-1)[0];
An alternative code could be: FileName = FileName.substring(FileName.lastIndexOf('|') + 1);

RTT

  • Administrator
  • *****
  • Posts: 907
Re: Archive scan depth limit
« Reply #10 on: December 18, 2016, 01:13:14 AM »
An alternative code could be: FileName = FileName.substring(FileName.lastIndexOf('|') + 1);
No. It will fail if the archived file is in the main archive (depth 0), i.e. no '|' character present.