Author Topic: Changing OCR'd text PDF View Tab in Text Mode  (Read 254 times)

0 Members and 1 Guest are viewing this topic.

puckman

  • Newbie
  • *
  • Posts: 2
Changing OCR'd text PDF View Tab in Text Mode
« on: July 29, 2019, 08:36:45 AM »
Hi everyone,
My first post here.
After comparing OCR software offerings from Acrobat, ABBYY, Epson and OmniPage for use with receipts I am still looking for a solution. Neat does a good job but I prefer neither to place my financial information in the cloud or subscribe to software.
My research on this project has revealed that receipts present extraordinary OCR challenges because of the small print, poor quality of paper and ink and a myriad of formats.  Regardless, I am still looking for a solution.
I found promising results from TOPOCR.  However, the trial demo does disables saving results as well as copying and pasting.  Its favourable factor is the price.  Compared to the cost of other similar software, it's affordable.  It lacks some standard features like batch processing.
After comparing resulting text from TIFF's OCR'd by Acrobat and TOPOCR, I wanted to see if I could change the underlying texts in PDF by using the PDF View Tab in PDFE.  I switched to text mode and manually altered a few words.  Unfortunately, I haven't been able to make the edits persistent.
I have concluded that perhaps that this task is not possible with any software.
Am I correct?

RTT

  • Administrator
  • *****
  • Posts: 838
Re: Changing OCR'd text PDF View Tab in Text Mode
« Reply #1 on: July 29, 2019, 09:09:52 PM »
After comparing resulting text from TIFF's OCR'd by Acrobat and TOPOCR, I wanted to see if I could change the underlying texts in PDF by using the PDF View Tab in PDFE.  I switched to text mode and manually altered a few words.  Unfortunately, I haven't been able to make the edits persistent.
I have concluded that perhaps that this task is not possible with any software.
Am I correct?
Yes, the text mode edit functionality exists mainly to easily edit of text to be copied, or to add minor changes (punctuation, etc.) in order to get better text to speech results. No easy way to connect changes in a text mode only view, that doesn't has accurate position and font style information, to the already formatted PDF.
Try with MS Word. It can convert the PDF to a editable document, you can then save again as PDF.

puckman

  • Newbie
  • *
  • Posts: 2
Re: Changing OCR'd text PDF View Tab in Text Mode
« Reply #2 on: July 31, 2019, 05:54:51 AM »
Try with MS Word. It can convert the PDF to a editable document, you can then save again as PDF.
Thanks for your prompt reply.  Full disclosure here, the topic of creating PDF's is a complex topic.  I'm replying here from an i
I thought about a similar solution.  Prior to your posted suggestion I tried it with a different software and then after reading your suggestion with MS Office 365 Plus.  Both produced the same results.
Here's my observation:
In both cases, when opening the rehashed pdf, the only pdf DOM remaining was the text.  The original scanned document was obscured or altogether missing.

This is a departure from the AdobeĀ® document model.  I know this is beyond the control of the PDFE software.  The underlying text of the OCR'd image can be manipulated with Javascript and then saved with the original document.  The extent and fidelity of manipulating the text relies on the script's sophistication and eloquence .

I believe while viewing the PDF it should retain the original scanned document and the underlying text (which is hidden) but also should retain as much as possible the perceptual accuracy of the text.