Author Topic: Number of pdfs in work/search table does not match with the archiv!?  (Read 6756 times)

0 Members and 1 Guest are viewing this topic.

dp

  • Newbie
  • *
  • Posts: 8
Hello,

the archiv counts 602 pdfs, but in pdfexplorer the search table, as same as in the web-server, are counted only 556 pdfs.

Furthermore in the web server lots of pdfs, which are in the archiv folder, cant be ound.

I would be pleased, if you hav any iea to solve the problem.

best regards,

dp

RTT

  • Administrator
  • *****
  • Posts: 778
What is for you "the archive"?
The PDF Explorer database, or the disk folder(s) where you have these files?

If second option, when you scan these files using the DiskTree scan mode, at end of the scan are all these 602 files present in the scan grid, or only the 556?
If only the 556, check if the ones that are not indexed are not corrupted files, making PDFE to discard them. If files are OK, please send me one of these so I can check for why PDFE fails to index it.

If first option, what is the scan method you are using to query database for these files (DBDiskTree, DBDisk, DBSearch), to realise not all are showing?

The web server uses the same routines to query the PDFE database, so, you will not find more files than the ones you can find using the PDFE GUI scan tools. Only after all the files are indexed using the GUI DiskTree scan mode, and the source folders of these file are enabled for the user accessing the web server, (check under the Web server "Users and Permissions", Groups options, if you have web access users/groups defined) the user will be able to search for these files.

If I'm getting all wrong, please try to explain it again :)

dp

  • Newbie
  • *
  • Posts: 8
Hi,

I am sorry for my bad expressions, but I am using a german version of PDFE and got no manual.

With "archive" I meant the disk folder where all the files are in. Now I got a much better idea of the functions of PDFE, rather the differences between the database and the disk folder. So, I try again and explain how I work with it, because there are still discrpances in it:

1. I put files in the disk folder >> current number of files:615
2. open PDFE, start to scan these files by searching in the disk folder using the disk tree to index   
    them >> the search grid counts only 553 files
3. Using the batch processing tool, to index the text >> DBDiskTree and DBDisk grid count 553 files
4. Using the web interface >> DBDisktree only counts 302 files!

What am I doing wrong - where are the missing files? I don´t really believe, that more than 50% of the files are damaged.

Please, help me;-)

ps: I allready checked the settings of the web server...

RTT

  • Administrator
  • *****
  • Posts: 778
Quote
1. I put files in the disk folder >> current number of files:615
2. open PDFE, start to scan these files by searching in the disk folder using the disk tree to index   
    them >> the search grid counts only 553 files

Can you please check one or two of these 62 files that PDFE fails to index, and if they open fine in a PDF reader, send them to me, in an email attachment, so I can check for what's wrong.

Are all these 615 files in the same main folder, or they are spread by sub-folders?

Quote
3. Using the batch processing tool, to index the text >> DBDiskTree and DBDisk grid count 553 files

That's correct, all the 553 indexed files are being queried correctly from the database.

Quote
4. Using the web interface >> DBDisktree only counts 302 files!

Now that's weird!
Check if you are using the last PDFE build 58+Patch 4, from Help>About version information. If not, and the patch update don't show up in the Help>CheckForUpdates functionality, please request it.

Quote
What am I doing wrong - where are the missing files? I don´t really believe, that more than 50% of the files are damaged.
Please, help me;-)

Don't worry, I'm sure we will figure out where they are :)

Quote
ps: I allready checked the settings of the web server...

If your PDFs are spread by more than one folder, check also, under the Database>Edit, if all these folders have its respective checkbox checked.

dp

  • Newbie
  • *
  • Posts: 8
Hi,

I worked all day on this. my results are, that the DBDisktree of the web interface now counts 553, as same as the DBDisktree of the database, YES! (I explain later how I managed this).

the last problem is, that there are still about 60 files which do not get indexed by the database from the disk folder. I allready seperated this 60 files from the correct files. on monday, I´ll check them with the acrobat reader - we will see... I tell you monday.

have a nice weekend

RTT

  • Administrator
  • *****
  • Posts: 778
If indeed these files are good, and don't fire any error when opened in Acrobat, check if the problem is related to erroneous PDF version been used in these files, situation not correctly handled by current online PDFE versions. The PDF files I'm talking about are using PDF specification features only available since v1.5, but are marked with a lower version. PDF reader's don't complain about, so I'm going to adopt the same behavior in PDFE.

If these PDF files have a version < 1.5 try change it to 1.5, or higher, and see if now PDFE index it. Just open the file in a text editor with binary edit capabilities, the free Notepad++ is a good one, and change the first bytes from %PDF-1.x to %PDF-1.5.

dp

  • Newbie
  • *
  • Posts: 8
I checked the pdfs by the IT, they told me that the pdfs are good. the pdfs allready have the version 1.5. I am sure that these pdfs are created with the same software as the other correct pdfs are - today I put new pdfs in the database, without any problems.

I would like to attach one of the erroneous PDFs, but they are all bigger than 200kB. You told me to send  in an e-mail attachement. so, can you give me a e-mail adress where I can send it?

Thanks a lot.

dp

  • Newbie
  • *
  • Posts: 8
Quote
Now that's weird!
Check if you are using the last PDFE build 58+Patch 4, from Help>About version information. If not, and the patch update don't show up in the Help>CheckForUpdates functionality, please request it.

I checked the version of PDFE, it is 1.5.0.58. Couldn´t find any patch build 58+patch 4 in the patch update. I requested the update, but it couldn´t find any update.

Is the patch still necessary for the problem? if yes, how can I get it?

RTT

  • Administrator
  • *****
  • Posts: 778
Quote
I would like to attach one of the erroneous PDFs, but they are all bigger than 200kB. You told me to send  in an e-mail attachement. so, can you give me a e-mail adress where I can send it?
That's the forum attachments size limit but if indeed you want to make it public, so other can check too, you can always use a file share service (rapidshare, megaupload,...) and put the download link here.
Anyway, my email address is all over the place. In the program about box, users' guides, readme files, in many pages of this site, ... Not here at the forum because I want to protect registered user's against spam bots, and I haven't found a way to make only my user profile available.
And if I'm right about who you are, by the name and email address you used to register here in the forum, I can see you already requested support by email in the past. Don't tell me you haven't received my reply to that request!!!

Quote
Is the patch still necessary for the problem? if yes, how can I get it?
Don't know, I have to see first why PDFE "don't like" these 60 files. If you are a licensed PDFE user, just request the direct download link by email..

You haven't tell yet how you managed to finally make all the 553 indexed files to appear in the web interface, so what was the cause?

RTT

  • Administrator
  • *****
  • Posts: 778
Re: Number of pdfs in work/search table does not match with the archiv!?
« Reply #9 on: September 01, 2009, 05:20:34 PM »
Thanks for the sample files. ;)
I have checked and indeed these files have the problem I suggested you to check. All the files that fail to index use a PDF feature, in this case a cross-reference object, introduced only in PDF Version 1.5, but they are marked with a lower version.
I forgot to tell early that checking PDF file version from Acrobat file properties don't work because Acrobat assume the correct version the files are, not showing the version the files are indeed marked with. That's probably why the IT guys failed to report the correct version. Other readers, Foxit Reader, PDF Xchange Viewer, ..., show the correct file hard-coded version value.

Because PDFE fail to index files with this version error, there is the need to change these files version to the correct one. To simplify this task, I have now coded a simple tool I'm sending attached.
Tool will change file version to 1.5, but only in the PDF files that have this problem.
If after file version fix some files continue to fail to index, please, send me a sample so I can check it.