• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
3

Convert web page to editable PDF

Community Beginner ,
Apr 11, 2024 Apr 11, 2024

Copy link to clipboard

Copied

I know similar threads exist, but I haven't run into one as messy as this. I use a page creation system for my lesson notes. Because it has all kinds of editing tools and document inclusion tools, the working page is a nightmare of JS, Ajax, and all manner of dark arts. On top of everything else, it is bilingual: English and Hebrew in adjacent text boxes. I want to convert these pages to PDF, so that I can distribute them more easily. 

 

I'm using Edge (although I'm open to suggestions). If I use the Edge extension to convert the "edit mode" version of the page, I get the text but not any images. I get placeholders instead.

 

There's also a "non-editable" mode which the has same content in a somewhat different format. The non-editable version of the page has a lot of its own magic. For example, you can click on text to open a citation. or a dictionary. If I use the Adobe extension to convert that to a PDF, not only don't I get the images but I only get the page's menu.

 

If I go at this by giving the Acrobat desktop (Windows) app a URL, I get a single empty page. It doesn't matter which version of the page I feed it.

 

If I use the browser's print mechanism and print to "Adobe PDF," I get a PDF file that looks right. The problem is that the text is not rendered as text. It is actually chunks of graphics in various odd places and even odder shapes.

 

If I let Adobe scan the resulting PDF, it will recognize the English text but not the Hebrew. Since the Hebrew is sitting inside chunks of graphics, sometimes the Hebrew is in front of the English. That leaves me unable to edit some of the English and none of the Hebrew.

 

You are welcome to play with this. You won't be able to do any damage, since it's all protected behind my account credentials. There's no confidential or sensitive information.

 

The editable form is at

 

https://www.sefaria.org/sheets/554913?editor=1

 

The non-editable form is at

 

https://www.sefaria.org/sheets/554913?lang=bi

 

Does anyone have any ideas?

TOPICS
Create PDFs , How to , PDF , Scan documents and OCR

Views

931

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 11, 2024 Apr 11, 2024

Copy link to clipboard

Copied

Hi Jerry,

 

I think I may have something for you, but it's not perfect, but probably better than what you've been doing.

 

First off, you are absolutely correct that the various internal codes of making a website cause havoc with trying to "translate" that into a "normal" page. Acrobat suffers considerably on these pages. One thing that can help is if the folks who created a page offer a "Print" option on the page. What's done here is that they took the time to create a print version of the content so that it comes out great. However, sometimes a website will offer a Print button but doesn't do anything more than just provide a link to your computer's print option and nothing more.

 

However, one piece of software that I rely upon when wishing to convert a web page into something better is from "Print Friendly." (https://www.printfriendly.com/). One of their options is to provide a plugin that will show up as a saved web page on your computer. If you are on a page that you wish to print, you select that plugin, and it will convert the page you are looking at into something that can be printed or saved as a PDF.

 

BTW, one of the features I really like about Print Friendly is that when viewing their result (before printing or saving as a PDF), is that you can remove unwanted items such as images you do not need/want or other content. You cannot edit the content though.

 

I used it on your two links. For the first link, it was successful for the English but was a disaster on the Hebrew. However, on the 2nd link it was successful on the English and much better on the Hebrew.

 

To examine what was achieved, I copied the text and pasted it into Word. For the first link, for every space, it put a carriage return and made other strange results. As I stated, this was better with the Hebrew. The caveat here is that it put a "†" for each space (yes, ironic, no?). Anyhow, using Word, I did a global change for all "†" to a "(space)" and that worked very well. (Note, the "†" did not visibly show up in the PDF, only when copied and pasted.)

 

The one question I do have in all this is what kind of Editing do you plan on doing? Editing in Acrobat is fine if you only wish to fix a spelling or change a date. Unfortunately, extensive rewriting is not possible or realistic. Acrobat is neither a word processing program nor a page layout application. Your best bet for any extensive editing is to go back to the original document that was used to create the content and edit there. 

 

Anyhow, I'm afraid that's the best I can offer you.

 

Good luck!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 13, 2024 Apr 13, 2024

Copy link to clipboard

Copied

LATEST

Thanks for putting all that time into it.

 

As it happens, I use PrintFriendly; but It doesn't seem to help much. The fundamental issue lies with the way browswers render the page as all those scripts execute. PrintFriendly can't make a purse out of a sow's ear.

 

There is a print function in that system. If you're in edit mode, it's under File; but if you're in read mode, you have to click on an element of the page. That will open up a panel on the right side of the screen, and print will be under tools. Either way, the results are similar.

 

I'm not trying to do any serious editing. For example, there are links in my page. The links get lost when I print to Adobe PDF, so I want to restore them. If you look at the bottom of the PDF, you'll seen a contact link. In the PDF, though, it isn't a functioning link. Actually, each letter in the English text is a graphic! I can create a link by drawing a box around the text. 

 

There are times when I want to momentarily highlight a chunk of text. I can highlight part of the English text by drawing a box around it. But that doesn't reliably work for the Hebrew. Sometimes, if I'm in select mode and hold the mouse down, I'll get the crosshairs that let me draw a box around the Hebrew. Other times, that doesn't work; and I get large blocks of the page selected.

 

I just figured out the sequence of events that determines which I get! It's tricky, but essentially I have to trick Acrobat into going into box mode. If I draw a box somewhere away from the Hebrew text, then Adobe will stay in box mode if I select an area where the Hebrew is. It works, but it's clumsy to do in real time.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines