Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

OCR with 'editable text and images' degrades the image quality of text-covered images.

New Here ,
Sep 11, 2023 Sep 11, 2023

Hi Dov and others,

 

I think what is happening is folks are modifying some portion of the PDF then saving it.  

 

When I edit the text of my high res PDF, then save or save as, it does reduce my file size from 28MB to 1.8MB.  I cannot find a way to prevent this.  Most of the PDF does not seem to have any change in resolution problems, but some areas do become pixelated. 

 

[This post is detached from a different one.]

TOPICS
Edit and convert PDFs
4.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 11, 2023 Sep 11, 2023

Okay, what's happening in my case I believe is when I edit the PDF, it converts it to a file with editible text etc.  The text is embedded in the image.  I thought it was layered and seperate from the background.  The conversion process does a sort of content aware fill behind and around the text at a lower resolution than the rest of the background. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 11, 2023 Sep 11, 2023

Acrobat PDF does not have a kind of Content Aware Fill.

 

As a side note: Dov, unfortunately, retired from Adobe a few years ago, and his expertise is not more available.

ABAMBO | Hard- and Software Engineer | Photographer
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 11, 2023 Sep 11, 2023

Interesting, do you know what is happening here.  When I edit the PDF, it converts scanned page to editbable text and image.  I can edit the text over the image after it converts and pulls it out.  But it makes the area behind and around the text lower resolution, and appears to fill in what should be behind the text.  

 

Because it's a scanned flat image, how does it fill in behind the text? 

 

After I save the file after editing the text the file size drops by 90% and the areas around and behind the text are blurry/low resolution, is there any way to avoid this?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 11, 2023 Sep 11, 2023

Here's a screenshot of before and after

 

3.PNG2.PNG

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 11, 2023 Sep 11, 2023

Yes, you have a very high compression rate in the second image. You see clearly the compression artefacts. That makes your file size shrink and the quality to diminish. 

 

Without seeing the original file, it is difficult to see what goes on.

ABAMBO | Hard- and Software Engineer | Photographer
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 11, 2023 Sep 11, 2023

I need to check, but out of memory, the OCR has parameters that can be applied. I suppose, yours are low resolution and high compression. 

 

And clearly, Adobe has the technology for content aware fill, but I never saw it being applied to PDF files. I would love to see an original scan of yours, that when you OCR, exposes this features. 

ABAMBO | Hard- and Software Engineer | Photographer
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 11, 2023 Sep 11, 2023

Thank you, yeah, the scan outside of the editable texts seems to preserve the original resolution, it's the areas around and behind the text it interprets out that gets resampled/compressed.  I can send you a link to the file I'm using.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 12, 2023 Sep 12, 2023
LATEST

OK, I did get your data. I see the compression artefacts on the original file:

Abambo_0-1694514996886.png

Normally, with a high-quality image, you do not see those artefacts. But I suspect that they were as that in the image, before getting used in this composition.

 

I'm using the default settings, but I changed this, as that is what you are (probably) using:

Abambo_3-1694516064752.png

Interesting message! 😉

Abambo_4-1694516128709.png

However, I can confirm your findings, that the resulting image is a highly compressed bad version of the original one. However, this is not related to the original poster's quest. (Edit) You are Acrobat is indeed heavily editing the file with a simple request. I will detach your post into a new post for this reason.

 

For a solution that would provide better results, you have several options (I'm aware that some are probably beyond your reach):

  1. Go back to the source file, where image and text are separated, and do whatever you want to do.
  2. Find a PDF that has not the text and background image merged. That would nearly be as good as (1).
  3. Find the background image and recreate the experience in InDesign or Illustrator or a different tool. The text can be grabbed from the OCR results.
  4. Take the PDF to Photoshop (or similar) and use the Photoshop tools to erase the text. Photoshop does probably a better job with the content aware fill, but it will be more laborious. Recreate the text either in Photoshop (!) or as in (3). Photoshop can export text layers as text to PDF.

 

Maybe someone else knows how to fix the low-quality output of the OCR content aware fill?

 

ABAMBO | Hard- and Software Engineer | Photographer
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines