Skip to main content
Participant
April 1, 2016
Question

Disjointed/Split/broken Arabic word in the search map

  • April 1, 2016
  • 1 reply
  • 3200 views

I have a formatted word document which consists of both LTR (English) and RTL (Arabic) languages in one line or paragraph. The Arabic font used is either "Courier New" or "Times New Roman" both of which are Unicode fonts. The created PDF looks fine, the exact replica of the word document. However, the problem is when you search for a Arabic word showing present in the PDF. The search may come up with the message "No matches were found". But when you highlight, copy and paste the same word from the PDF, it exist as a disjointed/split/broken form, I guess in the search map and only way to find the word is by using the same disjointed form. I may say its random though it appears to be consistent sometimes. Both Acrobat 9 and 11 standard showing the same results. Is there a way to fix this problem?

For example كِتَابًا may exist as كِ تَابًا in search map.

Thankfully appreciated

Asim

This topic has been closed for replies.

1 reply

try67
Community Expert
Community Expert
April 1, 2016

How did you convert the files from Word to PDF, exactly? Also, what version

of Office do you use? Does the same thing happen if you use a different

font in the Word file?

On Fri, Apr 1, 2016 at 7:43 PM, m. asims46535133 <forums_noreply@adobe.com>

Participant
April 3, 2016

Over the time I have suspected the following:

1 - Formatting:

              a - Line spacing

              b - Font size

              c - Paragraph justification etc.)

2 - PDF producer used:

              a - Acrobat PDFMaker

              b - Microsoft office Word

3 -The way the Word file was converted:

              a --Acrobat-> Create->PDF From file

              b - Word->Save As->Adobe PDF (Acrobat PDFMaker Office COM Addin)

              c- Word->Save As->PDF (Publish a copy of the document as a PDF)

4 - Unicode fonts:

              a - Courier New

              b - Arial

              c - Times New Roman

              d - Traditional Arabic

After giving up the quest for pin pointing the cause of the problem, I finally settled with "Microsoft Office Word 2007", Unicode font "Times New Roman", "Adobe Acrobat 9 Standard", 2(a), and 3(a), for my own ease & peace, though the problem is still bothersome.

I was able to confirm the following:

               - Tweaking with paragraph Justify, Justify Low, Justify Medium, Justify High, suddenly show best or worst results.

               - Unicode font "Courier New" acts better in some ways.

               - 3(c) shown to create worst splitting in all, i.e. Word 2007, 2013, and 2016

               - Acrobat 9 and 11 has same splitting results.

I was hoping to find why PDFMaker would introduce blank space (Unicode Hex 0020) only in the search map while its showing none in the actual rendition (display).

Thankfully appreciated your response

Asim

try67
Community Expert
Community Expert
April 3, 2016

When you use the File - Save As - Adobe PDF command in Word you're not actually using the PDFMaker plugin, though. You're using a built-in function of Word.

To use the PDFMaker plugin you need to either generate the file from within Acrobat (like you did in 3a) or to use the Acrobat tab in the ribbon (or the button in the toolbar, in earlier versions of Office). The other way to generate the PDF using Adobe technology is with the Adobe PDF printer, of course.