Author Topic: questions  (Read 5013 times)

0 Members and 1 Guest are viewing this topic.

Client

  • Newbie
  • *
  • Posts: 2
questions
« on: March 27, 2007, 07:06:00 PM »
I'm trying the PDF Explorer, and I do have some questions. How many pdf files can explore? I'd like to use it for about 100 000 pdf files with 6 index criteria (title;subject;keywords).

 As far as I test it is a great product.

 Thanks

RTT

  • Administrator
  • *****
  • Posts: 764
« Reply #1 on: March 27, 2007, 10:19:00 PM »
The database architecture is only limited by the 2GB file size boundary of any of the database files.
The database files are

PDEDB.inf
PDEDB.idx
PDFEInd.dls
PDFEInd.hbs
PDFEInd.hta
PDFEInd.hts

Because of that, there is not way to give correct numbers for number of files that can be indexed because these database files grow in function of the indexed resources folder structure, length of pdf metadata and text words, when indexation of page text contents is used. The 2GB boundary is huge and to give you one idea, we can index 100 000 PDF files easily and the .inf file size,(this one contain the indexed metadata), is less than 50MB, but these numbers can be slightly different in your system so what I always say is: try it, stress it, and figure out by yourself. If some limitations arrive contact me and tell me the numbers.

It is also important, when there is this huge number of files, to spread these files in multiple folders to speed up PDF Explorer database accesses. It is always a bad option to put all the files in only one system disk folder, not only for PDFE but also for the Windows Shell itself.

malcolmdean

  • Newbie
  • *
  • Posts: 3
« Reply #2 on: May 15, 2008, 09:29:00 AM »
Quote from: "RTT"
It is also important, when there is this huge number of files, to spread these files in multiple folders to speed up PDF Explorer database accesses. It is always a bad option to put all the files in only one system disk folder, not only for PDFE but also for the Windows Shell itself.
This also applies, to a lesser extent, to Linux. I have around 25,000 PDFs, some on disk, some stuffed into Gmail. I'll have to IMAP them down to disk some day, but it is clear that no matter the desktop OS, none of them were designed for this problem.

Are there utilities which will take a single directory and automagically re-distribute them into new sub-folders? For example, every file beginning with "A" goes into an "A" sub-directory, or a sequentailly-numbered sub-directory.

Would you consider adding this functionality to your program?

Malcolm
Los Angeles

RTT

  • Administrator
  • *****
  • Posts: 764
« Reply #3 on: May 15, 2008, 10:34:00 PM »
Quote from: "malcolmdean"
Are there utilities which will take a single directory and automagically re-distribute them into new sub-folders? For example, every file beginning with "A" goes into an "A" sub-directory, or a sequentailly-numbered sub-directory.

Would you consider adding this functionality to your program?

There are many programs, normally referenced as "File Renamers", capable of rename the full path, not only the file name. Feature very useful to organize files in folders. Right now I don't remember of a good one, but if you Google for "File Renamer" you will find many.

My other tool, PDF-ShellTools, already contain a simple file renamer with this feature. Can create/move to directories and even use the metadata fields contents not just to compose the file name but also to compose the directory path.
http://rttsoftware.googlepages.com/STIndex.htm?pageURL=Renamer.htm

This is a feature I can also add to the already available PDFE Renamer batch tool.

Anonymous

  • Guest
Directory Management - Repository versus Lists
« Reply #4 on: May 28, 2008, 09:13:00 AM »
Quote from: "RTT"
Quote from: "malcolmdean"
Are there utilities which will take a single directory and automagically re-distribute them into new sub-folders? For example, every file beginning with "A" goes into an "A" sub-directory, or a sequentailly-numbered sub-directory.

Would you consider adding this functionality to your program?

There are many programs, normally referenced as "File Renamers", capable of rename the full path, not only the file name. Feature very useful to organize files in folders. Right now I don't remember of a good one, but if you Google for "File Renamer" you will find many.


After thinking more about this, I realized that my workflow is probably not like many of your users. A publisher will probably put PDFs in directories for a book or magazine issue. A graphic artist is likely to put PDFs in directories named after projects and clients.

I am currently using GMail as a repository -- a big bucket containing many PDFs and their citation data. I specifically want to avoid having to create directories, balance them for performance... all those OS-related tasks should be handled by the computer. I would simply send PDFs to "the repository" and expect the computer to retrieve them when needed, without regard to their sub-directory location.

For PDF Explorer, this would suggest an optional setting which allows the user to specifically give PDFE responsibility for creating and managing subdirectories for optimal disc performance.

This feature would appeal to users with very large collections, such as universities, insurance companies, engineering and aeronautical firms (NASA), and governments.

MD

RTT

  • Administrator
  • *****
  • Posts: 764
Re: Directory Management - Repository versus Lists
« Reply #5 on: May 28, 2008, 11:06:00 PM »
Quote from: "Malcolm Dean"
For PDF Explorer, this would suggest an optional setting which allows the user to specifically give PDFE responsibility for creating and managing subdirectories for optimal disc performance.

This feature would appeal to users with very large collections, such as universities, insurance companies, engineering and aeronautical firms (NASA), and governments.

So your idea is to have some kind of automation to disperse all the files by no important name directories, automatically created when the one been used reach a defined maximum number of files by folder?
In a NTFS disk we can put 4 billion files in a directory. The problem of big number of files by directory only occur when browsing by directory, because all the file names, and associated metadata, must be read into memory to perform a simple tasks as sort by any of these metadata attributes.
When searching for a file, search operations, not browse, are used, and the possible big number of results is the problem, not the number of files by folder, so, in this case your idea works ok.
Tree categorization is also a good way to organize and search for. We can use your idea of non categorized one level directory structure and create later a virtual categorization tree, but, why not use the physical disk directory tree to get, at least, a simple organization from the start?
The "Extra>TaskAutomationFolders" functionality can be used to achieve any of these goals. With a new tool to execute your idea, and, if the renamer tool can also create folders using metadata for names, achieve the second, more usual, approach.