Author Topic: Scientific paper abstract in custom field ?  (Read 5843 times)

0 Members and 1 Guest are viewing this topic.

Al

  • Newbie
  • *
  • Posts: 4
Scientific paper abstract in custom field ?
« on: April 19, 2010, 06:29:35 AM »
Hi RTT
Many thanks for the updated English help file, really great. :-) I have a much better idea of the fine detail of the program. I was especially interested in the custom fields and the recent forum topics on metadata.

I use PDFExplorer to catalog scientific papers (references) stored in pdf.
One question I have is whether it is practical to add the scientific abstract of the scientific paper as a custom field. The reason would be to allow searching for words in the abstract, as well as the current metadata (without indexing the text of the whole paper). The query is just how much data a custom field can hold, whether the PDFExplorer search function can search it and whether the large increase in words that would occur in adding the abstract would slow the search engine too much and make the database unacceptably large (have thousands of references).

Would appreciate any guidance, Regards, Al

RTT

  • Administrator
  • *****
  • Posts: 907
Re: Scientific paper abstract in custom field ?
« Reply #1 on: April 20, 2010, 01:24:40 AM »
Quote
Many thanks for the updated English help file, really great.
Now just needs to be revised by someone to who English is not a problem. :)
With the build 59 patch 1, to release soon, I will introduce context help functionality, so when invoked by the F1 key, help will open in the chapter related to the functionality focused at invoke time. I suppose this will greatly increase its usefulness.

Quote
The query is just how much data a custom field can hold
All the joined, by file, custom fields content length can reach a total of 65436 raw characters, so we can have just one custom field with 65436, or two with 32718 characters each, etc..

Quote
whether the PDFExplorer search function can search
Yes, the DBSearch, and grid Search/Filter tools, also search in the custom fields.

Quote
whether the large increase in words that would occur in adding the abstract would slow the search engine too much and make the database unacceptably large (have thousands of references).
That's the bigger problem with this approach. Because the metadata fields content is saved in the database as a whole, without individual word indexing techniques (as the used by the text content indexation tool), all the content we add will increase the database file linearly.  And bigger database,  results in bigger query times for the DBSearch scan mode.
Supposing an hypothetical 500 words, by abstract, scenario, with the common 5.5 average character count by word (for English language) and for 100,000 indexed files, we will get 275MB added to the PDEDB.inf database file. I already tested this scenario with a dummy, code created, database and what happens is that the DBSearch take a loot more time to finish the query, but that's the only problem related to database interaction.

Charging the grids also takes more time, because of the amount of data being exchanged,  and the initial operations done to compound that bigger data.
Grids also became less responsive (but just a little), if this bigger content custom field is visible, as a grid column, on the used grid layout.

Another performance issued arise from the way grids handle its content. All is done in memory, so memory used is directly related to the metadata of the files that are present on the grid. Just problematic if your database scan operation returns to many records, and your PC is short in free RAM.

My advice is that you should test your scenario, to see if the performance degradation that will, for sure, occur, is not compatible with your expectations. Just create a temp folder, copy there a big number of PDFs, and use the EditInfoFields batch tool to fill one of the custom fields with a dummy, worst case scenario, abstract.
Now you can test it to see what will happen at your system. Because in this test all the files custom fields will have all the same content, better search for words present in the other metadata fields, or DBSearch will return all the files as result, and that will not happen in an real scenario, if searched words are not generic ones.

Al

  • Newbie
  • *
  • Posts: 4
Re: Scientific paper abstract in custom field ?
« Reply #2 on: April 20, 2010, 07:59:57 AM »
RTT
many thanks for the prompt and very informative reply.
Will have a test as you suggest - regards Al