I've experienced problems with text disappearing when importing Word docx files into InDesign 5.5 for PC, but I think the problem also arises in other versions. The solution seems to be to use the older Word doc format.
I'll illustrate with a short amount of text in two paragraphs (see screen shot). I have imported the text in docx format, then copied and pasted the same text. Then I saved the docx file in the older doc format and got a different (better) result. Pasting from that format achieved the same result as for the docx format.
The problems are the hyphen sometimes dropping out (from 'so-called'm which appears twice in parentheses in first paragraph - the first time the hyphen drops out, the second time not, and on pasting it drops out but leaves a space, and the second time it's fine), and a whole chunk of text dropping out from the dox import. It doesn't happen with the doc import, though there are numerous missing glyphs. They at least are easy to see and fix. The hyphens and text dropping out aren't easy to see of course.
The solution seems to be to forget about docx imports and use doc always.
Looks to me like the Word files are a mess. I've never had an issue using DOCX and if there are images involved, it's a must to keep the original images intact.
I agree with Bob. Examine the Word files at the point where the text drops. That's where your problem is.
Select the first character that disappears both in Word and in the "paste" version. What can you tell us about it?
Maybe it's an InDesign CS5.5 problem ... because it's so old, its Word import filters are also old and have problems with .docx.
Do you have a friend w/CC 2018? (or any recent vintage of ID?) They could test to verify. Or post a link to that snippet of .docx file and we can test for you.
Thank you for your comments and suggestions. I'm not sure about linking the file - I tried uploading 'test2.docx' to Dropbox and the link is here (not sure whether this will work...): Dropbox - test2.docx
I did investigate the point at which the text drops. There's nothing on the hyphens, but at 'biography of...', if I change the fount to Times and look at the glyph panel, it says the character is GID 846, unicode 202A, 'left-to-right embedding', which is interesting. I hadn't tried before, but if I use the arrow keys to traverse over that point in the Word file, there is an invisible character (after the word space and before the capital R) at the point where the text drops out.
The point is though that in the normal course of events you wouldn't want to be investigating the file before getting on with the job of importing it. These and similar problems recently cropped up on two completely separate jobs, which is what prompted me to post.
just tested the docx file you uploaded with placing and the result was the trimmed text.
InDesign CC 2018.1 version 126.96.36.199 on Windows 10 (1803).
From my German InDesign:
Who created the files? The same person?
... The point is though that in the normal course of events you wouldn't want to be investigating the file before getting on with the job of importing it.
At our office "the normal course of events" is to do a rigid clean-up of all Word files, including downsaving them to .doc.
The funny thing is: even then we sometimes run in to problems, as there are just some type of authors who manage to make Word do things that it (hopefully) not was designed for. But the latest CC2018 can take lots more abusive documents than its predecessors.
The point is though that in the normal course of events you wouldn't want to be investigating the file before getting on with the job of importing it.
I could not possibly disagree more. When I get Word files, the first thing I do is open them and check them.
Many thanks Uwe - that's a useful confirmation.
Jane - files from two different authors, not linked in any way.
Thank you Jongware and Bob.
I always 'eyeball' a Word file and do the obvious tidy-up of things like superfluous tabs and spaces and so on. But I'd be interested to know what might be involved in a 'rigid clean-up' and what Bob is checking for, and how.
Following on from jane-e's post, I copied the character at the point where the text drops out and pasted that into a search in Word, and indeed Word found it. But it isn't revealed when you tell Word to show all formatting/characters (although I have just noticed a difference in the hyphens I mentioned above, so that's a step forward).
It is a Word issue, and it has to do with the unknown characters in Word. I can select them and copy and paste them into InDesign, but I can't figure out what they are. Copying them into the Glyphs panel reveals nothing.
The only direct formatting is the Arial italics on the space before (RV), so it's not a formatting issue.
Our 'rigid clean-up' consists of running a huge Word macro that checks for, and fixes, numerous possible problems. At least some of these have been resolved in CC2018, but apparently not all.
It can take a while -- something like an hour for a large book -- and I am still adding new stuff to it on a fairly regular basis, so no improvement in that.
But it seems worth the trouble, as I got your text just perfectly into InDesign. Including the LTR and RTL Unicode markers that InDesign does not consume as instructions, but rather leaves as 'visible' characters.
without using a macro in Word I was able to import the document as doc file:
What a strange result.
The first character before "Rahel" is encoded u/202A which stands for LEFT-TO-RIGHT EMBEDDING.
And that obviously is tripping off the docx import filter.
Further we can see also a lot of POP_DIRECTIONAL_FORMATTING special characters.
Especially at the end of the third paragraph.
I recently changed from CS4 to CC and found still the same problem described here; thought it would have been solved. For time matters I didn't read all of the comments here, but think the less time consuming solution is still to convert Word-files from docx into doc - annoying, but helpful.