Copy link to clipboard
Copied
Hi there.
RH 9 (latest patch)
Windows 7-64-bit
WebHelp output
We have a couple of very large projects (2000+ topics in each) and publishing is taking a long time because even though we have Publish All cleared, it's still sending up all the files.
For example, in one of these large projects, I made changes to two topics, and added an index entry and then generated it and published it. The generation time isn't too bad and is what I would expect.
But the publish time for only changing two topics and an index entry is ridiculous. It's sending up thousands of files. Here's my results:
Total Files: 2519
Files Published: 2179
Elapsed Time: 18:04
How can we improve this? This is happening for me and my two coworkers on our Doc Team. We are publishing to an internal intranet webserver and we use that for documentation reviews from SMEs. We all share the same publishing location. We all upload our content the same way, by mapping a drive to the server and publish using the File System option.
We are also sharing projects using the open source Merucrial source control system.
Is RH getting confused because multiple authors are sending to one spot? Meaning if I publish to that location and if someone else publishes there, then the next day when I publish again, does RH no longer know that I've published there before and think it has to do all the files over again?
Thanks in advance!
Copy link to clipboard
Copied
Jared
No issue with what you say about the time but I don't think Rh is sending all the files, it just gives that impression.
You will see it running through all the filenames but it is checking the two files are synchronised and only uploading what has changed. However, that is more than the two topics. The index has changed and that affects a number of support files that also have to be uploaded.
Think of it this way. If you uploaded just what has changed using FTP it would be quicker. If you had to stop and manually compare the server and local versions, it would take longer and that is what Rh is doing.
I'm not sure about how source control works here. To the best of my knowledge the process takes what is there and then works as if it were on your PC, in other words it updates your copy. I'm not sure if multiple authors publishing rather than one person doing it is good practice. Maybe ask about the workflow in the source control forum.
See www.grainge.org for RoboHelp and Authoring tips

Copy link to clipboard
Copied
Hi Peter. Thanks for responding, but I'm currently watching the current target directory as I type, and it's clearly pushing up all the .htm files. I didn't change these and they have no relation to the files I did modify. The date modified for the htms in the publishing directory is my current date and time not some older date if it were only doing a compare.
I'll do some more testing.
Copy link to clipboard
Copied
Let us know how that goes.
See www.grainge.org for RoboHelp and Authoring tips

Copy link to clipboard
Copied
Perhaps try doing a Get Latest before generating and publishing, to be sure all the files are exactly the same as the last checkin? There's a .txt file in the publishing directory (at least there is one on a test project I have) that lists MD5 and SHA1 codes, so perhaps these aren't matching for some reason.
You could also try having only one person doing the publishing for a few days, just to see if that makes a difference.
Amber
Copy link to clipboard
Copied
Amber thanks for replying. I'm not sure what you mean by MD5 and SHA1 codes. What are those?
Copy link to clipboard
Copied
They're codes that can be generated against file that are supposed to be unique to the specific file. If the file is changed in any way then the code will no longer match. I assume RH is using these codes to determine if the file has changed since the last upload. So if you all do a Get Latest before publishing, then I'm theorising that the unchanged files will then match those codes on the server (rather than perhaps having older versions being published from your local drive). But it's only a guess on my part.
Amber
(Ah, I like how wikipedia describes MD5 "also commonly used to check data integrity". )
Copy link to clipboard
Copied
I bet RH is only looking at the date stamps to figure out what's changed. Doing hashes on each file would suck up a lot of time/processing.
Copy link to clipboard
Copied
...There's a .txt file in the publishing directory (at least there is one on a test project I have) that lists MD5 and SHA1 codes, so perhaps these aren't matching for some reason.
...
They're codes that can be generated against file that are supposed to be unique to the specific file. If the file is changed in any way then the code will no longer match. I assume RH is using these codes to determine if the file has changed since the last upload. So if you all do a Get Latest before publishing, then I'm theorising that the unchanged files will then match those codes on the server (rather than perhaps having older versions being published from your local drive). But it's only a guess on my part.
Okay. I've seen this before. The bsscftp.txt file. There's one for each folder in my project.
"Perhaps these aren't matching," you said. What are they supposed to match? How do I interpret these files? Looks like there's three lines of text for each file in the project:
Here's a few:
100
FILENAME:Assigning_PC-DMIS_Functions_to_buttons_on_the_SpaceMouse_or_SpaceBall.htm MD5:8466113706989726975102507288529710379995797109686161 SHA-1:541226911911210852100981196582681001225299481005610067821051051007761
FILENAME:Automating_PC_DMIS.htm MD5:115102103871041118712253122491027452102116971039710357666161 SHA-1:1208343487111511678837711856857449847288845183556748115437261
FILENAME:Available_PC_DMIS_Functions_for_SpaceMouse_or_SpaceBall.htm MD5:75717310010089101114848773841117797779911978110120686161 SHA-1:1166612079687482821021071001161161085410149105100681137010384118717861
...
And so on.
As for getting the latest, I assume you're referring to a source control? I do try to do a pull of the latest changes (we're using Mercurial, not RoboControl) but I don't know if that makes a difference.
Copy link to clipboard
Copied
Sorry for the delay, I've been away for a couple of weeks.
If RH was checking these codes, the process would be something like: RH calculates the MD5 for a file, looks up the number in the .txt file, if the number is the same as the newly calculated one don't upload, if it's different (because the file has changed) then upload.
Yes, I meant getting the latest version from source control.
The last suggestion is designating one person to do all publishing jobs for a few days, to see if that makes a difference. It might not be workable longer term, but at least it might indicate a single file that is storing a list of "last changed" topics.
Amber
Copy link to clipboard
Copied
It does make a diff to have just one person publishing. Lately, I've been the one pulling the other authors' changes and publishing when they need it. In that case, the publish part of generation only puts up the newly changed/added topics (along with a bunch of file its creates on the fly to handle searching, toc, index etc, but that's 'normal').
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more