what is the best way to remove mso styles from HTML files created from Word?
- November 7, 2025
- 1 reply
- 39 views
I am in the process of migrating to RH 2022. The automatic conversion of project file from 2019 to RH 2022 failed. We had old files that were corrupted as well as a large amount of images. After several attempts by me and Adobe support engineers we could not get the project to compile. The output required is CHM.
The Adobe team recommended to migrate files by importing the project files/folders manually. This has worked so far until I had to import HTML files that were created from MS Word. These files have mso styles, which I think are inherited from the orginal word document. The outputs do not reflect the new css I have created, but the mso files that were included as baggage.
I discovered that I can manually delete those styles from the topic from the developer view. The mso styles usually add anywhere to 1000 to 3000 lines to the HTML code! I think it may be best to clean these HTML files before importing them. Manually deleting all these lines from each topic is very time consuming. Can you recommend the best method/software to clean an word-generated HTML file? Will the resultant file still be compatible for import to RH2022?
Note: The mso styles in the topics are included in the topic HTML code within the elements shown below (see also attached capture). To make sure my outputs do not show any of these styles, I have to delete these section from the topic including the 1000s of line with each style definition from word. I would like to automate this, or clean the HTML files before bringing them RH2022.
/*<![CDATA[*/
/*Font definitions */
....
/*]]>*/
Recommendations appreciated! Thank you
