Title: Search for Arabic words (Right->Left) fail!
Post by: HammoD on May 12, 2008, 05:10:00 PM
Hi, i have tried to search using arabic words/language but PDF Explorer fail to give the result i am looking for. i have tried to change the setting of char set to match "ARABIC-CHARSET" but the same result!

Please advice.

Post by: RTT on May 12, 2008, 09:54:00 PM
Searching in metadata or indexed text content?
Can you please point me to a sample pdf, in an email attachment or post here the download link.
Send me also some list of words to search for. As you can imagine, Arabic letters are just funny glyphs to me :(
Post by: HammoD on May 13, 2008, 05:57:00 PM
Thanks for your prompt reply. Actually I have been trying to search using the lower right panel, there is a tab called DiskTree, I am using that tab, then on the left panel, there is a tab called Search/Filter. There I type the word I am looking for, then I select Content instead of Filename, Tileā€¦.etc, then I click on Filter.
The above steps work more than perfect with English phrases, but I get zero result with Arabic phrases.
Ok, here will be the test. I have prepared a file called TEST.pdf which contain a phrase in Arabic means "This file to test the wonderful program".
Please have the link: http://rapidshare.com/files/114651990/TEST.pdf.html (http://http://rapidshare.com/files/114651990/TEST.pdf.html)

Note: I was trying to type Arabic in this post, but it seems Forum interface do not accept Arabic characters :( . You can download the PDF file from the above link, and you can select the phrase from the file itself. It is selectable.
I really hope the above can help you. :)

Please let me know if you need more examples. I will be more than glad to do :)
Post by: RTT on May 14, 2008, 12:16:00 AM
I have made some tests and it appears the problem is related to text extraction. If you take a look to the text only reader, the Arabic words appear inverted. Right now I don't know why is this happening, but I'm going to investigate...

Take note the main PDFE management power uses the metadata fields for search operations, not the text content.
Most of the time the metadata fields are empty or don't contain meaningful metadata but with PDFE QuickInfoEdit mode (F4), with the help of the QuickEdit assistants (QuickPaste, QuickImageOCR) the metadata fields fill operation is a easy task.

Thank you very much by your help, and I hope you can do further tests if I'm able to fix the text extraction issue.