RTTSoftware Support Forum

PDF Explorer => General => Topic started by: Anonymous on April 10, 2007, 02:17:00 PM

Title: Export filter in PDF Explorer to be used in Endnote
Post by: Anonymous on April 10, 2007, 02:17:00 PM: Dear RTT,
I have two questions concerning PDF explorer and Endnote

First Question

Does exist an automatic procedure in order to fill fields as Author, Year, Title, Keywords from a pdf file???

or the only method is copy the text in pdf file and then to attribut the field??

Second Question

In which format we can export pdf's references in PDF explorer in order to create a file to be imported in Endnote?

I tried a txt, csv and Html export filters without success!!!

Someone knows the exact procedure...

Thanks,

kk
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: RTT on April 10, 2007, 09:03:00 PM: Quote from: "kk"
Does exist an automatic procedure in order to fill fields as Author, Year, Title, Keywords from a pdf file???

If the PDF files from where you what to extract information use a common text layout you can use the Search&Extract batch tool to apply a regular expression that can gather the needed information and put it directly into the PDF metadata fields. Depending of the complexity of the text layout this regular expression can be more or less difficult to develop. You need to know/learn regular expressions syntax.

Quote from: "kk"
or the only method is copy the text in pdf file and then to attribut the field??

It is always the most effective process. The use of the QuickInfoEdit mode and QuickPaste assistant simplify the task.

Quote from: "kk"
In which format we can export pdf's references in PDF explorer in order to create a file to be imported in Endnote?
I tried a txt, csv and Html export filters without success!!!

The currently available export filters formats are not suitable to import by EndNote. I'm sure there are many tricks to use them but to make simpler this job the next build 56 release add a EndNote tagged file format to the available export filters.
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: Anonymous on April 10, 2007, 10:46:00 PM: [quote:12cdj5nv]If the PDF files from where you what to extract information use a common text layout you can use the Search&Extract batch tool to apply a regular expression that can gather the needed information and put it directly into the PDF metadata fields. [/quote]

Can you post some examples of regular expressions here or some screen captures to better understand this functionality???

thanks and congratulations for this software

kk
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: RTT on April 11, 2007, 12:44:00 AM: Just click the button at the right of the Regular Expression edit box (just below the question mark help hint button) to show the included generic examples. You need to consult the web/book to better understand the regular expression syntax. You can also send to me a sample of these PDF files, telling what you want to extract, and I can try to create a sample expression.

In the below link you can get a fantastic tool to help in the regular expression creation
http://www.regexbuddy.com/ (http://www.regexbuddy.com/)

In here you can consult the syntax supported by PDF Explorer
http://www.regexpstudio.com/ (http://www.regexpstudio.com/)
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: Anonymous on April 11, 2007, 10:33:00 AM: Thanks for your help

here
w*w.int-res.com/articles/meps_oa/m334p063.pdf

It's an open access pdf... a good example...

I'm very interesting to extract the following fields

Author (P. J. Hansen, N. Lundholm, B. Rost)
Year (2007)
Journal (Marine Ecology Progress Series)
Keywords (Some keywords extracted in the Abstract)

(Year as Custom 1 field)
(Journal as Custom 2 field)

in this kind of journal and also in other as Marine Ecology, Ecology letters, Science, Nature, Ecology,...

If you construct some automatic procedures for one journal... they can be applied in other journals ???
(another open sourecs papers in some interesting journals
w*w.springerlink.com/content/67h6k6w373t15124/fulltext.pdf
w*w.blackwell-synergy.com.gate1.inist.fr/doi/pdf/10.1111/j.1461-0248.2007.01018.x

In Author field, it is possible to extract authors in this format (the same in Endnote)?
Hansen, P. J.
Lundholm, N.
Rost, B.

I think that problems interest to many researchers and if you resolve them, pdf explorer will be quickly adapted tool to organize all bibliography in pdf
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: Jaar on April 11, 2007, 12:56:00 PM: Quote
I think that problems interest to many researchers and if you resolve them, pdf explorer will be quickly adapted tool to organize all bibliography in pdf

I Think so
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: RTT on April 12, 2007, 01:07:00 PM: Not easy to achieve, and probably there is no way to make a standard tool/formula that can work successfully with all, even with same layout, PDFs.
First I need to enhance the quality of the text extraction routines, with text columns detection, and second, develop a more versatile tool. The search&extract is limited to the search and extract of patterns and, in these documents, patterns are not so evident, in terms of plain text extraction.
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: malcolmdean on May 15, 2008, 09:42:00 AM: Quote from: kk
I'm very interested in extracting the following fields

Author (P. J. Hansen, N. Lundholm, B. Rost)
Year (2007)
Journal (Marine Ecology Progress Series)
Keywords (Some keywords extracted in the Abstract)

(Year as Custom 1 field)
(Journal as Custom 2 field)

If you construct some automatic procedures for one journal... they can be applied in other journals ??? I think that problems interest to many researchers and if you resolve them, pdf explorer will be quickly adapted tool to organize all bibliography in pdf

When you visit the Web page of the journal article, download the citation data in a popular format such as RIS. Keep the RIS file together with the PDF, then you can import the data into a citation manager or an academic word processor such as NotaBene.

I think this will handle your extraction problem, but it does create another file management problem for the user. It would be great if PDFE could acquire a lower level of functionality to help manage the underlying directory structure and keep PDFs together with their citation data files.

Malcolm
Los Angeles
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: RTT on May 15, 2008, 11:02:00 PM: Quote from: malcolmdean
It would be great if PDFE could acquire a lower level of functionality to help manage the underlying directory structure and keep PDFs together with their citation data files.

There is also the possibility to attach the RIZ file to the PDF file itself, using the "Tools>Attachments" tool. This way, files can go anywhere and always maintain the link.

How about an RIZ import tool, to import the RIZ tagged content into some of the 100 PDFE available custom metadata fields. Later the user can export a full grid data to a EndNote tagged format file, or, if some more bibliographic management tools are developed, use all this data from PDFE itself.
Title: Re: Export filter in PDF Explorer to be used in Endnote
Post by: Anonymous on May 15, 2008, 11:17:00 PM: Quote from: RTT
Quote from: malcolmdean
It would be great if PDFE could acquire a lower level of functionality to help manage the underlying directory structure and keep PDFs together with their citation data files.

There is also the possibility to attach the RIZ file to the PDF file itself, using the "Tools>Attachments" tool. This way, files can go anywhere and always maintain the link.

How about an RIZ import tool, to import the RIZ tagged content into some of the 100 PDFE available custom metadata fields. Later the user can export a full grid data to a EndNote tagged format file, or, if some more bibliographic management tools are developed, use all this data from PDFE itself.

That sounds good -- anything handling citations is good for the academic world market. UC spends nearly half a billion annually on PDF rights, and there are few tools which begin to address these problems.

All the citation data formats are plain text, but they are very tricky, also. Sometimes an extra space makes the difference between a successful import/export. So much energy has been put into the problem of citation formatting that the programs do not "play well with others."

Two programs which would mutually benefit from some relationship with PDFE are the Firefox Zotero plug-in, and the powerful academic word processor Nota Bene. Both programs, at this stage, have limitations in the number of files they can handle. (Zotero's limit is in the Mozilla code, and NB's code is so old that some of it is 16-bit.)