• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
3

Removing Black Scan Edges from PDF's... pre OCR

New Here ,
Apr 30, 2014 Apr 30, 2014

Copy link to clipboard

Copied

I have been working on a 800 page scanned PDF of a book written in the 1800's... the objective... make it look as good as possible, and create a fully searchable text indexed PDF set that was - perfect.

I have worked with Acrobat a great deal in the past but never to do this type of task, normally print based work where the content is perfect and the process I am about to explain to you would never be required. So this was a search and destroy mission for me... I did alot of searching over the past few days. Alot of downloading of software trials to see only that they were in short - buggy rubbish.

The biggest problem came in that the entire 800 pages were scanned manually from a bound document. Meaning the had very large black areas to all four edges of both the odd and even pages, they varied in thickness from page to page all the way though the document and that meant cropping the PDF was pointless for a few reasons. The first is that when you crop a PDF you are doing nothing like a crop command in Photoshop. You are really only putting your hand over your eye - so to speak so that you can not see the side of the document that you have obscured. The crop in PDF alters the page size - meaning if you want your document to look like a dogs breakfast and have every page a different width or height... thats exactly what you will be getting. I did not.

It also means if you have varying widths of black in your document from page to page you are facing a nightmare, as each page would need to be cropped by itself... a feat requiring about 12 commands or keypresses/mouse clicks. 800 pages x 12 = 9,600 things I did not want to do. Especially when it doesn't fix it. The reason is - I want my document to remain A4. You might want yours to remain the size you want... and cropping individual pages would be the wrong direction to go in for that end.

So say you do go down the manually cropping each page direction - because you only have a few pages... but then after cropping each page you decide now you are going to be smart and resize the PDF so that the page size is the right size and the edge of white (you assume) comes back. Wrong. This is where the crop command fails... all it is doing is consealing the edge of the page... you uncrop or resize the page and it give you back the original blakc scan edge. Fail.

I went lateral... Acrobat has a tool for "obscuring" anything... the NSA probably use it... or should... and its called "Redaction". Its nicely hidden in Acrobat - on the Right hand side "Tools" if the "Protection' Tab is not visible you will need to turn it on... under the "Comment" button there is a bunch of tick marks... turn on "Protection". Its a stupid name... but anyway here are the steps.

Redaction allows for two types of consealment of items in a PDF. Mark for Redaction & Redact Pages. Within the Mark for Redaction section there is - "Redacting Text" & "Redact Blocks". Text is handy if you are wanting to hide the names of xxx and xxx in a document you are sending to someone as you can search and replace with this tool. But I was not. You will be using the "Mark for Redaction" Tool but only for areas or blocks. The tool automatically flips to redact text when you get too close to some text ... and the Mouse arrow change to a "I" Bar. You need to move the mouse further away from the text and get back to the cross hairs target thingy.

(1) Firstly Select "Redaction Properties"

I do this but you may have a better method. I alter the colour of my redaction tool to white and also the edge of the redaction to be RED or something... it doesn't matter as its going to wipe out to white as that is the Fill colour. I only set it to Red so that if I miss my target I can click on the area I have drawn and delete it... the red border allows you to see it.

(2) Masking out the redactions

Then simply go to the "Mark for Redaction" tool... and you are now going to be maksing out... the edges of your scanned PDF that are all black and crappy... make the PDF page sit inside the main areas of the window so you can see the extents of the page... then to mask out the RIGHT edge of a page... drag from OUTSIDE the right of the page (the artboard if you like)... to the bottom left of that side of the page. You cannot EDIT the selection... once you click off its in place... BUT you can click on the selected area and hit delete and kill it if you accidentally go over something you didn't want to hide. I often do 2 or 3 areas per side of the page... in little blocks of white... if the shape of the blackness is an irregular shape it doesn't matter. So now continue masking the other black edges of the PDF page... FROM outside the edge of the page to inside the PDF page... you will have noticed by now that the mask that you ahve been drawing "vanishes" for any areas outside the PDF page... this is PERFECT as it means your page size has not been critically changed unlike the crop command... it has not stuffed up your document. Thats huge.

(3) Cementing the redactions.

So you can go on and on masking areas of pages and pages... before you make the redactions PERMANENT and when I say that - I mean it. There is NO undo. It would not be much good if some smarty could open a redacted PDF and just delete all the white boxes and then read all the redacted information ... so when you are ready to cement in place your redacted black scanned areas... hit the "Apply Redactions" button on the Protection tool bar. It gives you a warning... and you confirm it and bang... it races through the document - totally wiping out everything under your masked areas. At the end of that it gives you options to see anything else thats invisible in the document, like overlapping text... this won't be the case on your scanned document though so decline that.

(4) Saving.

Your redacted document is not saved... when you hit save on a freshly redacted document - it never saves. It confirms what you want to save and inserts the word "Redacted" into the PDF name. Thats damn handy. Mix up a redacted document and send the wrong one to someone and your job could suddenly become untenable.

So you save the document as a redacted PDF... and now you have the original saved PDF without redactions, and the newly named Redacted PDF with no massive black scanned edges through it.

The whole process is incredibly fast... muich much much faster than the 30 mins it took me to write this for you... but this is such an incredibly massive power user tip I figured you all needed to know the best way to get rid of scanned page edges.

Yes it works on both Mac and Windows Acrobat.

Once you have done this to a PDF the edges will be pristine white... no artifacts to trip up the OCR system... so you will get alot less OCR errors out in the page edges !!

Have fun

Guy

TOPICS
Scan documents and OCR

Views

78.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Beginner , Oct 07, 2016 Oct 07, 2016

I'm glad to find this tip. I have a compatible tip that's worth trying *before* this one. Like "redaction," the "watermark" feature doesn't sound like it would be relevant, but it is.

Use the EditPDF > Add WATERMARK feature, choosing a simple .jpg file which consists of a narrow white rectangle (say, 765 pixels tall by 90 pixels wide). Set opacity to 100%, rescale relative to target page if needed, locate at Horizontal Distance "0" from center, and apply the "watermark" to ALL pages in just one s

...

Votes

Translate

Translate
New Here ,
Mar 04, 2015 Mar 04, 2015

Copy link to clipboard

Copied

Wow, this was written last year, and I'm the first to write? For shame... This is an excellent tip! I am only an undergrad in History, but we read a lot of scanned book pdf. files. This tip helps tremendously when it comes to printing and also reading (It's quite irritating seeing all of those scanned black edge marks). Thank you very much for this!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 28, 2015 Apr 28, 2015

Copy link to clipboard

Copied

This is something that has been bugging me for months.  Whenever I scan a document, my Adobe Pro basically shrinks the height of the doc so that each page has a black space at the bottom of the page.  After reading your article I used the redacting in white as you suggested, and that worked fine.  I thought about you having to do this for an 800-page doc, and I figured there must be some way to prevent these black areas, rather than having to correct them.  I fiddle with the scanning presets in Adobe Pro and found that if I indicate the page length to be 10.6", as opposed to the default 11", it stretched the doc back out to the normal length.

I had another problem, in that my scanner tends to pull the docs through slightly skewed.  The result is that the scanned copy has partial black lines on the sides.  I looked for a setting to correct this, but don't find one in either Adobe Pro or in my scanner software.  I again adjusted the page dimensions in the presets to indicate that the original was only 8.3", instead of the actual 8.5".   Now my scanned docs come out nice and white on all the edges - no need to crop or redact!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 30, 2015 Oct 30, 2015

Copy link to clipboard

Copied

I'm working on a project to digitize old documents with scanning hardware that is less than optimal. And this works amazingly well in fixing the problem. Thank you!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 29, 2015 Nov 29, 2015

Copy link to clipboard

Copied

Great tip. However, it still requires manual redaction on a page-by-page basis. I found another, quicker solution for cases where black margins are predictable/consistent throughout the document.

1. Crop all the page in one go by the required amount (e.g. 5 mm left and right).

2. Perform OCR. This process also happens to erase the area beyond the cropped region by resetting the page size.

3. (Optional) Go back into the crop dialogue box and enlarge the page size to the original dimensions.

[Using Acrobat DC]

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 18, 2016 Apr 18, 2016

Copy link to clipboard

Copied

Thanks. This tip is most invaluable. Like others on this forum I was having problems cleaning the black borders. This did it in no time. Only thing I found was that using white as a fill colour did not match my document perfectly, so I used a custom colour of (250,250,250), white being (255,255,255).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Oct 07, 2016 Oct 07, 2016

Copy link to clipboard

Copied

I'm glad to find this tip. I have a compatible tip that's worth trying *before* this one. Like "redaction," the "watermark" feature doesn't sound like it would be relevant, but it is.

Use the EditPDF > Add WATERMARK feature, choosing a simple .jpg file which consists of a narrow white rectangle (say, 765 pixels tall by 90 pixels wide). Set opacity to 100%, rescale relative to target page if needed, locate at Horizontal Distance "0" from center, and apply the "watermark" to ALL pages in just one step.

Note, this will only work well if your pdf has been scanned very consistently. Since my workflow involves books (2-page spreads) that I scan myself, I just make sure that the book's binding always aligns with the 5.5" mark on the scanning bed, precisely so that I can wipe out the gutter shadow with this one step. (You can then apply additional watermarks for side margins and top and bottom margins too, although various cropping tricks may serve better there.)

Of course, if your document does NOT have consistent placement on the scanning/photocopy bed, OP's tip of "redacting" with white rectangles is a good fallback option.

Honestly, I'm a bit baffled that Adobe's development team hasn't yet seen this as a common task that warrants its own straightforward tool...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 01, 2018 Jan 01, 2018

Copy link to clipboard

Copied

Thanks a lot, it was the best method.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jul 24, 2021 Jul 24, 2021

Copy link to clipboard

Copied

Its been many years, but I recall using Adobe Acrobate 4 and it had a crop feature that allowed you to select alternating pages and define how far in you wanted the margin on all edges - allowing for print closer to the left or right depending on the page number.

I now have Acrobat X (I don't want to pay subscription fees) and find it frustrating that they removed the Crop feature - at least from X.

Redaction takes a long time with a hundred page or more document when trying to clean up the edges...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 30, 2021 Oct 30, 2021

Copy link to clipboard

Copied

I was mistaken, Acrobat X has Crop - it is under the Pages setting - quite clearly. I did a search for the word Crop and it didn't find it in the menus so I stopped looking...

Gah!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 10, 2023 Apr 10, 2023

Copy link to clipboard

Copied

LATEST

I have had the same trouble with Acrobat DC.  If I scan with the feeder on my printer, I often get marks on the scan that are not really there.  I have been unable to find an answer.  However, I do have an alternative.  I have also been using a program called PaperPort for many years. You can scan directly into this program and you have file that resembles a stacked document.  With this program, you can simply outline the area that has the black marks and click delete, and it is gone.  If you have a lot of black in the margins, you cane enclose the text in a box and then click on a tab that says Erase Outside, and everything outside the text will be deleted.  It does not change the size of the document.  I used this to do the same thing for a dissertation that was also bound that you are doing..  It has a number of other tools that I really like as well.

For example, if you want to combine two files into one, all you have to do is drag one file on top of the other and they merge.  A lot easier and faster than Acrobat.  The program used to be published by Nuance, but it has been taken over by Kofax.  I have not used the new version by Kofax.

That said, I really do like Acrobat,  have used it for many years and would not be without it.  Over all, it is much faster than Paperport and does things that are hard or cannot be done in Paperport.  Also, it is more stable than Paperport. 

I just wish that Acrobat would incorporate more of these kinds of functions.  I still find it hard to believe that there is not a way to simply put a box aroiund a mark on the page and delete it.  I thought I was just missing it, but I spoke with a tech and he said that is not supported in Acrobat.  So I guess I am stuck with using both programs to do what I need to do.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines