Copy link to clipboard
Copied
I have a formatted word document which consists of both LTR (English) and RTL (Arabic) languages in one line or paragraph. The Arabic font used is either "Courier New" or "Times New Roman" both of which are Unicode fonts. The created PDF looks fine, the exact replica of the word document. However, the problem is when you search for a Arabic word showing present in the PDF. The search may come up with the message "No matches were found". But when you highlight, copy and paste the same word from the PDF, it exist as a disjointed/split/broken form, I guess in the search map and only way to find the word is by using the same disjointed form. I may say its random though it appears to be consistent sometimes. Both Acrobat 9 and 11 standard showing the same results. Is there a way to fix this problem?
For example كِتَابًا may exist as كِ تَابًا in search map.
Thankfully appreciated
Asim
Copy link to clipboard
Copied
How did you convert the files from Word to PDF, exactly? Also, what version
of Office do you use? Does the same thing happen if you use a different
font in the Word file?
On Fri, Apr 1, 2016 at 7:43 PM, m. asims46535133 <forums_noreply@adobe.com>
Copy link to clipboard
Copied
Over the time I have suspected the following:
1 - Formatting:
a - Line spacing
b - Font size
c - Paragraph justification etc.)
2 - PDF producer used:
a - Acrobat PDFMaker
b - Microsoft office Word
3 -The way the Word file was converted:
a --Acrobat-> Create->PDF From file
b - Word->Save As->Adobe PDF (Acrobat PDFMaker Office COM Addin)
c- Word->Save As->PDF (Publish a copy of the document as a PDF)
4 - Unicode fonts:
a - Courier New
b - Arial
c - Times New Roman
d - Traditional Arabic
After giving up the quest for pin pointing the cause of the problem, I finally settled with "Microsoft Office Word 2007", Unicode font "Times New Roman", "Adobe Acrobat 9 Standard", 2(a), and 3(a), for my own ease & peace, though the problem is still bothersome.
I was able to confirm the following:
- Tweaking with paragraph Justify, Justify Low, Justify Medium, Justify High, suddenly show best or worst results.
- Unicode font "Courier New" acts better in some ways.
- 3(c) shown to create worst splitting in all, i.e. Word 2007, 2013, and 2016
- Acrobat 9 and 11 has same splitting results.
I was hoping to find why PDFMaker would introduce blank space (Unicode Hex 0020) only in the search map while its showing none in the actual rendition (display).
Thankfully appreciated your response
Asim
Copy link to clipboard
Copied
When you use the File - Save As - Adobe PDF command in Word you're not actually using the PDFMaker plugin, though. You're using a built-in function of Word.
To use the PDFMaker plugin you need to either generate the file from within Acrobat (like you did in 3a) or to use the Acrobat tab in the ribbon (or the button in the toolbar, in earlier versions of Office). The other way to generate the PDF using Adobe technology is with the Adobe PDF printer, of course.
Copy link to clipboard
Copied
OK, I have updated the above list with two more wonderful ways to create Adobe PDF.
3 - The way the Word file was converted:
a --Acrobat-> Create->PDF From file
b - Word->Save As->Adobe PDF (Acrobat PDFMaker Office COM Addin)
c- Word->Save As->PDF (Publish a copy of the document as a PDF)
d - Word->Acrobat tab in the ribbon->Create PDF
e - Word->Print->Printer Name->Adobe PDF (using Adobe PDF converter)
But...
3(a), 3(b), and 3(d), all are using the same path because Adobe Acrobat 9 was installed over MS Office 2007. The PDF created from all three shows the same source in the document properties of the created PDFs, i.e.
Application: Acrobat PDFMaker 9.0 for Word.
PDF Producer: Adobe PDF library 9.0.
Where as...
3(c) is the only one using the built in function of Word 2007. Because, the PDF created from this option shows different information in the document properties of the created PDF, i.e.
Application: Microsoft Office Word 2007
PDF Producer: Microsoft Office Word 2007
And...
3(e) is also using Adobe Acrobat 9 in some ways, I guess...i.e.
Application: PScript5.dll Version 5.2.2
PDF Producer: Acrobat Distiller 9.0.0 (Windows)
So, in other words there is no confirm solution of the problem, because all these wonderful options showing more or less same results, at least for the heavily formatted Word document containing mingle of both languages.
Case Closed?
Thankfully appreciated!
Asim
Copy link to clipboard
Copied
Though I am still evaluating, choosing the options:
1 - (a) from above, along with ...
2 - Word option-> Save->Embed fonts in the file->Embed only the characters used in the document (best for reducing the file size).
3 - While, "Do not embed common system fonts" is unchecked.
... may produce the better results.
Copy link to clipboard
Copied
Hi Asim
Can you please share a sample file with which issue is reproducible?
Thanks
Tanvi
Copy link to clipboard
Copied
Hi Tanvi,
I apologize for the late reply. For some reason I missed the notification of your comment.
The problem is random and its investigation in very tiresome. It may or may not occur at the same location, the second time. Here is the sample files with split Arabic words (in search map only) as indicated by highlight.
.... But I don't see the file attachment option!
Thanks for your concern as well as response.
Copy link to clipboard
Copied
Hi Asim
You can share the file at my email. I have private messaged you my email ID.
Thanks
Tanvi