Highlighted

Preserving Text Copyability/Searchability in Imported PDFs

Participant ,
Sep 18, 2020

Copy link to clipboard

Copied

My group's work (using TCS2019) frequently involves incorporating PDFs from other sources (e.g., engineering documents and outputs from our pricing system) by importing them into frames in our FrameMaker books. Generally this works well, retaining the copyability and searchability of the embedded documents in our final PDF output... but recently I've discovered that documents from one source are not searchable or copyable in our final output, even though the PDFs I'm incorporating are searchable and copyable in their own right.

 

My preliminary read is that it's some artifact of how the providing group is generating their PDFs... but before I go asking them to change their process, I was wondering if anyone could think of something I might be able to do on my end to correct this? Something in the settings or properties of the input files that might be relevant, for instance.

 

FWIW, we use "Print Selected Files..." rather than Publish to make our PDFs, because there are some utility files in our book templates that we don't want to include in our output (and because in my experiments, Publish tends to hash some aspects of our documents). This may not be the optimum approach, I acknowledge, but in this instance I'm looking to fix a specific problem, not retool a longstanding (if potentially somewhat out-of-date) process.

TOPICS
PDF output

Views

38

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Preserving Text Copyability/Searchability in Imported PDFs

Participant ,
Sep 18, 2020

Copy link to clipboard

Copied

My group's work (using TCS2019) frequently involves incorporating PDFs from other sources (e.g., engineering documents and outputs from our pricing system) by importing them into frames in our FrameMaker books. Generally this works well, retaining the copyability and searchability of the embedded documents in our final PDF output... but recently I've discovered that documents from one source are not searchable or copyable in our final output, even though the PDFs I'm incorporating are searchable and copyable in their own right.

 

My preliminary read is that it's some artifact of how the providing group is generating their PDFs... but before I go asking them to change their process, I was wondering if anyone could think of something I might be able to do on my end to correct this? Something in the settings or properties of the input files that might be relevant, for instance.

 

FWIW, we use "Print Selected Files..." rather than Publish to make our PDFs, because there are some utility files in our book templates that we don't want to include in our output (and because in my experiments, Publish tends to hash some aspects of our documents). This may not be the optimum approach, I acknowledge, but in this instance I'm looking to fix a specific problem, not retool a longstanding (if potentially somewhat out-of-date) process.

TOPICS
PDF output

Views

39

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Sep 18, 2020 0
Advocate ,
Sep 18, 2020

Copy link to clipboard

Copied

You may use a book function to 'hide' certain components from processing (book update, publish) 

{I'm currently on an iPad and can not verify the exact command - it is available with the context menu on a selected book component}

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 18, 2020 0
Advisor ,
Sep 20, 2020

Copy link to clipboard

Copied

I'd go with Klaus' suggestion – right-click on a chapter component and select Exclude. For your own peace of mind, you might then prefer to update the ToC … off-hand, I don't think it's actually necessary. Check all assumptions!

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 20, 2020 0
dauphinb LATEST
Participant ,
Sep 22, 2020

Copy link to clipboard

Copied

Thank you (and Klaus) for the reply, but I'm afraid I didn't communicate clearly: Switching to a Publish-based workflow isn't really my problem (though I'm happy to have learned how Exclude works!). The problem is that importing PDFs from a particular group within my organization into FrameMaker and then generating a new PDF of the book makes the text in the imported PDF content unsearchable/uncopyable (even though it's fine in the original input PDFs).

 

As it turns out, using Publish doesn't seem to fix this. I've been playing around with the settings and PDF subformat of these input files, but haven't been able to fix it that way, either... and I haven't been able to replicate the probelm with PDFs I made myself. I suspect something about our pricing system's built-in PDF generation is hashing the fonts, which will, unfortunately, be opaque to that system's users. I know a work-around -- having them give me RTFs instead of PDFs, and making the PDFs myself -- but I was hoping for something more elega nt!

 

-Bill

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 22, 2020 0