RTTSoftware Support Forum
PDF Explorer => General => Topic started by: Anonymous on April 10, 2007, 02:17:00 PM
-
Dear RTT,
I have two questions concerning PDF explorer and Endnote
First Question
Does exist an automatic procedure in order to fill fields as Author, Year, Title, Keywords from a pdf file???
or the only method is copy the text in pdf file and then to attribut the field??
Second Question
In which format we can export pdf's references in PDF explorer in order to create a file to be imported in Endnote?
I tried a txt, csv and Html export filters without success!!!
Someone knows the exact procedure...
Thanks,
kk
-
Does exist an automatic procedure in order to fill fields as Author, Year, Title, Keywords from a pdf file???
If the PDF files from where you what to extract information use a common text layout you can use the Search&Extract batch tool to apply a regular expression that can gather the needed information and put it directly into the PDF metadata fields. Depending of the complexity of the text layout this regular expression can be more or less difficult to develop. You need to know/learn regular expressions syntax.
or the only method is copy the text in pdf file and then to attribut the field??
It is always the most effective process. The use of the QuickInfoEdit mode and QuickPaste assistant simplify the task.
In which format we can export pdf's references in PDF explorer in order to create a file to be imported in Endnote?
I tried a txt, csv and Html export filters without success!!!
The currently available export filters formats are not suitable to import by EndNote. I'm sure there are many tricks to use them but to make simpler this job the next build 56 release add a EndNote tagged file format to the available export filters.
-
[quote:12cdj5nv]If the PDF files from where you what to extract information use a common text layout you can use the Search&Extract batch tool to apply a regular expression that can gather the needed information and put it directly into the PDF metadata fields. [/quote]
Can you post some examples of regular expressions here or some screen captures to better understand this functionality???
thanks and congratulations for this software
kk
-
Just click the button at the right of the Regular Expression edit box (just below the question mark help hint button) to show the included generic examples. You need to consult the web/book to better understand the regular expression syntax. You can also send to me a sample of these PDF files, telling what you want to extract, and I can try to create a sample expression.
In the below link you can get a fantastic tool to help in the regular expression creation
http://www.regexbuddy.com/ (http://www.regexbuddy.com/)
In here you can consult the syntax supported by PDF Explorer
http://www.regexpstudio.com/ (http://www.regexpstudio.com/)
-
Thanks for your help
here
w*w.int-res.com/articles/meps_oa/m334p063.pdf
It's an open access pdf... a good example...
I'm very interesting to extract the following fields
Author (P. J. Hansen, N. Lundholm, B. Rost)
Year (2007)
Journal (Marine Ecology Progress Series)
Keywords (Some keywords extracted in the Abstract)
(Year as Custom 1 field)
(Journal as Custom 2 field)
in this kind of journal and also in other as Marine Ecology, Ecology letters, Science, Nature, Ecology,...
If you construct some automatic procedures for one journal... they can be applied in other journals ???
(another open sourecs papers in some interesting journals
w*w.springerlink.com/content/67h6k6w373t15124/fulltext.pdf
w*w.blackwell-synergy.com.gate1.inist.fr/doi/pdf/10.1111/j.1461-0248.2007.01018.x
In Author field, it is possible to extract authors in this format (the same in Endnote)?
Hansen, P. J.
Lundholm, N.
Rost, B.
I think that problems interest to many researchers and if you resolve them, pdf explorer will be quickly adapted tool to organize all bibliography in pdf
-
I think that problems interest to many researchers and if you resolve them, pdf explorer will be quickly adapted tool to organize all bibliography in pdf
I Think so
-
Not easy to achieve, and probably there is no way to make a standard tool/formula that can work successfully with all, even with same layout, PDFs.
First I need to enhance the quality of the text extraction routines, with text columns detection, and second, develop a more versatile tool. The search&extract is limited to the search and extract of patterns and, in these documents, patterns are not so evident, in terms of plain text extraction.
-
I'm very interested in extracting the following fields
Author (P. J. Hansen, N. Lundholm, B. Rost)
Year (2007)
Journal (Marine Ecology Progress Series)
Keywords (Some keywords extracted in the Abstract)
(Year as Custom 1 field)
(Journal as Custom 2 field)
If you construct some automatic procedures for one journal... they can be applied in other journals ??? I think that problems interest to many researchers and if you resolve them, pdf explorer will be quickly adapted tool to organize all bibliography in pdf
When you visit the Web page of the journal article, download the citation data in a popular format such as RIS. Keep the RIS file together with the PDF, then you can import the data into a citation manager or an academic word processor such as NotaBene.
I think this will handle your extraction problem, but it does create another file management problem for the user. It would be great if PDFE could acquire a lower level of functionality to help manage the underlying directory structure and keep PDFs together with their citation data files.
Malcolm
Los Angeles
-
It would be great if PDFE could acquire a lower level of functionality to help manage the underlying directory structure and keep PDFs together with their citation data files.
There is also the possibility to attach the RIZ file to the PDF file itself, using the "Tools>Attachments" tool. This way, files can go anywhere and always maintain the link.
How about an RIZ import tool, to import the RIZ tagged content into some of the 100 PDFE available custom metadata fields. Later the user can export a full grid data to a EndNote tagged format file, or, if some more bibliographic management tools are developed, use all this data from PDFE itself.
-
It would be great if PDFE could acquire a lower level of functionality to help manage the underlying directory structure and keep PDFs together with their citation data files.
There is also the possibility to attach the RIZ file to the PDF file itself, using the "Tools>Attachments" tool. This way, files can go anywhere and always maintain the link.
How about an RIZ import tool, to import the RIZ tagged content into some of the 100 PDFE available custom metadata fields. Later the user can export a full grid data to a EndNote tagged format file, or, if some more bibliographic management tools are developed, use all this data from PDFE itself.
That sounds good -- anything handling citations is good for the academic world market. UC spends nearly half a billion annually on PDF rights, and there are few tools which begin to address these problems.
All the citation data formats are plain text, but they are very tricky, also. Sometimes an extra space makes the difference between a successful import/export. So much energy has been put into the problem of citation formatting that the programs do not "play well with others."
Two programs which would mutually benefit from some relationship with PDFE are the Firefox Zotero plug-in, and the powerful academic word processor Nota Bene. Both programs, at this stage, have limitations in the number of files they can handle. (Zotero's limit is in the Mozilla code, and NB's code is so old that some of it is 16-bit.)