Bulk convert .DOT to PDF then count occurrences of words
Hi all
I am working within a project where there are approximately 2000 word template (dot) files. We are doing a re-branding and I am needing to identify which documents contain certain words/phrases so that we can scope the level of work to replace the brand, website addresses etc.
Word has very little in-built functionality so I've thought to look to:
a) Bulk convert these files to PDF
b) Use Adobe OCR or similar functionality to identify occurances of pre-identified phrases
The ideal output is a table such as below.
File Name | Brand XYZ | brandxyz.com | 555 012 345 |
Doc 1 | 40 | 0 | 4 |
Doc 2 | 6 | 1 | 0 |
Doc 3 | 1 | 1 | 1 |
Doc 4 | 0 | 0 | 1 |
Doc 5 | 6 | 0 | 4 |
Is anybody aware of a way I could do this, please?
As a secondary goal, I would like to know if there's a capability to count instances of all words. The idea here would be to try to identify occurances of phone numbers and other data that we are not expecting to see. We might for example identify an old phone number or names of people on letters that no longer work for the company.
| File Name | and | this | that |
Doc 1 | 90 | 17 | 4 |
Doc 2 | 54 | 8 | 9 |
Doc 3 | 80 | 15 | 7 |
Doc 4 | 24 | 15 | 22 |
Doc 5 | 37 | 2 | 4 |
Thanks, Brendan
