Recent Posts

Pages: 1 2 3 [4] 5 6 ... 10
31
Bug reports / Re: Persistent error in PDFE
« Last post by RTT on October 15, 2024, 02:09:06 AM »
Instead of using the Acrobat plugin as the PDFE PDF reader try using the Sumatra PDF one, with the help of the NPAPI interface.
Sumatra PDF stopped delivering the NPAPI plugin with the 3.0 release, so you need to install the old 2.5.2 version (https://www.sumatrapdfreader.org/download-prev), to get the npPDFViewer.dll file. Make sure you select, from the SumatraPDF installer options, the "install browser plugin..." option. You can then install the latest Sumatra PDF over that one. The dll will remain and will still work with the latest versions, even the 64-bit ones.
After you configure PDFE to use it (check attached screenshot) you will have a much more stable reader, without these errors.
32
Bug reports / Persistent error in PDFE
« Last post by puckman on October 14, 2024, 08:52:17 PM »
Hello,
I've owned a licensed copy of PDFE for over 10 years.  I often get a repeatedly severe exception error which shuts down PDFE multiple time per boot-time session.  So I resolve it by re-booting my machine.

I tried to resolve it on my own through googling but to no avail.

I'll attach an image of the error notice I captured.  Hope you can help me out with this one.

Also, I also receive a lot of DDE errors for viewing the file after filtering and correcting.  However, didn't capture this one.  Just wondering if they're related.

Cheers
33
General / Re: date format
« Last post by puckman on July 30, 2024, 03:22:18 PM »
Thanks RTT!
That's exactly the explanation I've been looking for.  I've added the link to Delphi basics page on time and date formatting. https://delphibasics.co.uk/RTL.php?Name=FormatDateTime in case someone else is looking for the same answer wiith specific date and time formating.
34
General / Re: date format
« Last post by RTT on June 02, 2024, 04:34:43 PM »
You need to create a custom grid layout, and add to it a dynamic calculated column with a formula such as:
formatdate('YYY/MM/DD',date(CD))
This will create a grid column where the PDF creation date is shown with the format defined in that formula. Because there is no format reference to the time, only the date part will be visible.
35
General / date format
« Last post by Roberto Marson on June 02, 2024, 09:57:16 AM »
 wanted to display only the date without the times in the creation date column. It's possible?
36
General / New Version
« Last post by dohnjoe on May 11, 2024, 05:12:49 PM »
Will there ever be a new version?
If yes, can you share the list of new features?
37
Bug reports / Re: Extract Images (no append option?)
« Last post by RTT on April 18, 2024, 01:56:20 AM »
These options are only about the filenames. The image is always extracted, even if the skip name option is selected. If a file with the same name exists, the newly extracted image gets the next available name. Internally to the PDF, the image objects have no names. The names are generated by the tool by appending a number to the name prefix specified, and that number is sequential.
38
Bug reports / Extract Images (no append option?)
« Last post by nightslayer23 on April 17, 2024, 05:43:45 AM »
Hey, when using the extract images function, the only options to handle duplicate filenames or images is to:

  • Replace It
  • Skip Name

There however should be an option to save the duplicates? Append filename?
39
Ideas/Suggestions / Re: optimize PDFs for machine learning / AI model training
« Last post by RTT on February 26, 2024, 02:48:50 AM »
There are functionalities to extract text, with the possibility to get font information (name, size,...), but not to edit it.

Take note it's not easy to segment a PDF in order to isolate these parts you want to remove. Internally, for the worst-case scenarios, you may have a "goto xy" and "print command" for each of the characters, without any specific order. There is no indication of what is a word, paragraph, etc. You need functionality like the used in OCR tools, that are able to provide that type of feature extraction in a useful format like hOCR.
40
My company trains and grounds Large Language Model (LLM) with PDF files. The problem is the valuable part of a
PDF is the body text, while the Table of contents, footnotes, index, and headers/footers create problems (especially with semantic search).

Do any of your utilities allow for batch processing of files that will:
- delete all text below a point size (ie delete text =<9 points will remove foot notes and index)
- remove Table of Contents
- remove all text in margins 

There is a lot of demand for a user-friendly tool that preps PDFs for machine learning.

Pages: 1 2 3 [4] 5 6 ... 10