RTTSoftware Support Forum

PDF-ShellTools => General => Topic started by: lkaiser on February 10, 2016, 09:02:48 PM

Title: extract text
Post by: lkaiser on February 10, 2016, 09:02:48 PM
Hi RTT,

Are there any API method to get the text from a PDF file ?
I am searching how to extract adresses informations (and other localized informations if possible) from files containing invoices for sorting and grouping by destination (country, region , city ... )
Thank you in advance

Lionel
Title: Re: extract text
Post by: RTT on February 10, 2016, 11:44:58 PM
Take a look to the Text and TextEx properties of the IPDFPage object (http://www.rttsoftware.com/Manuals/STIndex.htm?pageURL=ST/English/MyScriptsAPI.htm#IPDFPage).

There is a "View text" My Script (http://www.rttsoftware.com/Manuals/STIndex.htm?pageURL=ST/English/MyScripts.htm), under the samples tab, that demos the TextEx property.
Title: Re: extract text
Post by: lkaiser on February 14, 2016, 12:25:53 AM
I managed to extract only the desired text thanks to the code of the demo .
Maybe in future developments there will be a method to extract text or items from a specified rectangular area in the page.

Thank You RTT