Découpage d'un fichier word en plusieurs fichiers

Report · Jun 10, 2021

Bonjour,

Ayant un fichier word d'un livre, je voudrais le découper automatiquement par chapitre.

Le problème est que je ne sais pas par où commencer 😞

Est-il plus facile de la transformer avant en pdf ?

Merci pour vos idées.

Cordialement

Hello,

Having a word file of a book, I would like to automatically split it by chapter.

Problem is, I don't know where to start 😞

Is it easier to convert it to pdf before?

Thank you for your ideas.

cordially

Report · Jun 13, 2021

Indeed. If you convert it to PDF beforehand, it will be easy to split it by chapter using ColdFusion's cfpdf.

Report · Jun 14, 2021

Bonjour,

Merci pour la réponse.

Mais quelle est l'option de CFPDF à employer pour faire cela ? (CFPDF fait tellement de choses !)

Cordialement et merci par avance.

Hello,

Thank you for the answer.

But what is the CFPDF option to use to do this? (CFPDF does so many things!)

Sincerely, and thank you in advance.

Report · Jun 14, 2021

You could use action="deletepages". 🙂

In the following example, I extracted the individual chapters from Guy Kawasaki's ebook.

<!--- Determine the chapter start and end. Preferably manually, for accuracy. --->
<cfset chapterStartPage=[15, 43, 81, 102, 119, 159, 193, 212, 239, 270, 285, 298, 318]>
<cfset chapterEndPage=[41, 80, 101, 118, 158, 191, 211, 238, 269, 284, 297, 316, 365]>

<cfset sourcePDF="C:\Users\bkbk\Desktop\kawasaki\The Art of the Start 2.0 - Guy Kawasaki.pdf">
<cfset chapterDestination="C:\Users\bkbk\Desktop\kawasaki\chapters">

<cfloop from="1" to="#arrayLen(chapterStartPage)#" index="i">
    
    <!--- 
    For example, chapter 3 corresponds to the third item in the above arrays: 
    chapterStartPage[3]=81, chapterEndPage[3]=101. 
    To extract chapter 3, delete all pages between 1 and 80 and 
    between 102 and the end of the book.  
    --->
 	<cfset pagesToBeDeleted="1-#chapterStartPage[i]-1#,#chapterEndPage[i]+1#-*">
 	
	<cfpdf
    action = "deletepages"
    pages = "#pagesToBeDeleted#"
    source = "#sourcePDF#"
    overwrite = "yes"
    destination = "#chapterDestination#\chapter #i#.pdf">
</cfloop>

Report · Jun 17, 2021

@ZNB , does that answer your question?

Report · Jun 24, 2021

Bonjour,

C'est exactement ce que je cherchais ! MERCI

Par contre, il faut que je trouve un moyen pour déterminer automatiquement le début et la fin du chapitre.

J'ai pensé demander au client de mettre par exemple $deb$ et $fin$ pour le déterminer si je ne trouve pas autre chose.

PDF mette-t-il un signe particulier en début et fin de chapitre ?

Merci par avance.

Cordialement

Hello,

This is exactly what I was looking for ! THANK YOU

On the other hand, I have to find a way to automatically determine the beginning and the end of the chapter.

I thought I'd ask the client to put for example $start$ and $end$ to determine if I can't find something else.

Does PDF put a special mark at the beginning and end of the chapter?

Thanks in advance.

cordially

Report · Jun 26, 2021

Hi @ZNB ,

Your expectation is quite reasonable. I also expected ColdFusion would by now offer a way to extract the chapter metadata of a PDF. Especially because ColdFusion is, like PDF, within the Adobe family.

Perhaps, there is such a way. If there is, then I am unaware of it and shall be glad to hear about it.

It may indeed be a good idea to ask the client to put $XstartX$ and $XendX$, respectively, at the start and end of a chapter. Here, X is the chapter number. Using chapter numbers will make the code simpler.

In any case, it is possible to get information about PDF chapters automatically. You could do so by using the iText PDF library integrated in ColdFusion to extract the PDF bookmark.

In the example below, I extract the bookmark of the Guy Kawasaki PDF ebook. From it, I could construct an array of chapter-start-pages, as defined in my previous code. What remains is for you to find a way to define the chapter-end-pages.

<cfset reader = CreateObject("java", "com.lowagie.text.pdf.PdfReader").init("C:\Users\bkbk\Desktop\kawasaki\The Art of the Start 2.0 - Guy Kawasaki.pdf")> 
<cfset simpleBookmark = createObject("java","com.lowagie.text.pdf.SimpleBookmark")> 
<cfset bookmarks = simpleBookmark.getBookmark(reader)> 

<cfif isNull(bookmarks)> 
	 No bookmarks. 
	<cfabort> 
</cfif>

<cfset chapterStartPages = arrayNew(1)>	
	
<cfset iterator = bookmarks.listIterator()>


<cfloop condition="iterator.hasNext()">
	
	<!--- A HashMap --->
	<cfset bookmark = iterator.next()> 	

	<!--- Debugging code. 
	Shows you an object containing chapter titles and page numbers, 
	if there are any. 
	ColdFusion will tell you that this object is a struct.
	But it is not; it is a HashMap. 
	--->
	<!---<cfdump var="#bookmark#">--->	
	
	<cfoutput>   		
			<cfif not isNull(bookmark.get('Kids'))>		
				<cfloop from="1" to="#arrayLen(bookmark.get('Kids'))#" index="i">
					<cfif bookmark.get('Kids')[i]['Title'] contains "chapter">
						
						<cfset title = trim(bookmark.get('Kids')[i]['Title'])>
						<cfset pageNumber = listGetAt(trim(bookmark.get('Kids')[i]['Page']),1," ")>
						
						<p>
							Chapter Title: <strong>#title#</strong> <br>
							Chapter Start-Page: <strong>#pageNumber#</strong>
						</p>
						
						<cfset arrayAppend(chapterStartPages,pageNumber)>
					</cfif>
				</cfloop>
			</cfif>		
	</cfoutput>
</cfloop>

<cfdump var="#chapterStartPages#" label="Chapter Start Pages">

The output is:

Report · Aug 31, 2021

Bonjour,

Réponse très interessante !

Mais la fin d'un chapitre, ce n'est pas le début du chapitre suivant - 1 ?

Qu'en pensez-vous ?

Merci par avance

Hello,

Very interesting answer!

But the end of a chapter is not the start of the next chapter - 1?

What do you think ?

Thanks in advance

Report · Aug 31, 2021

You're right, of course. The question is, to which chapter do the extra pages between the end of chapter 5 and the beginning of chapter 6 belong? To answer that question requires some knowledge of the content and context.

Interesting point. Do you have any ideas on this? Or any preference?

Report · Sep 01, 2021

Je ne comprend pas votre question !
Où est le problème ?
A vous lire.

I do not understand your question !
Where is the problem ?
To read to you.

Report · Sep 01, 2021

...

Fin: Chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

...

Report · Sep 01, 2021

c'est selon où sera positionné le début !

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Fin: Chapitre X

Début: Chapitre XI

2ème cas : appartient au chapitre XI

Fin: Chapitre X

Début: Chapitre XI

Page + illustration 1 (Which chapter does this belong to?)

Par contre, quel est le "signe" que met Adobe pour le changement de page, je ne sais.

Merci par avance

Report · Sep 01, 2021

Numéro de page
…	…
567	Fin: Chapitre X
568	Texte et illustration (Cette page appartient-elle au chapitre X ou au chapitre XI?)
569	Début: Chapitre XI
570	…

Report · Sep 01, 2021

Je reprends mon exemple :

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

2ème cas : appartient au chapitre XI

Début: Chapitre XI

Page + illustration 1 (Which chapter does this belong to?)

Il n'y a que la balise début !

Cordialement

Report · Sep 03, 2021

Bjr

Complément :

Si Page + illustration 1 est autonome alors il faut le considérer comme un chapitre.

Qu'en pensez-vous ?

Connaissez-vous la balise que met Adobe pour le changement de page ?

Merci par avance

Cordialement

Report · Sep 04, 2021

Je reprends mon exemple :

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

2ème cas : appartient au chapitre XI

Début: Chapitre XI

Page + illustration 1 (Which chapter does this belong to?)

By @ZNB

I now understand. In fact, we're both asking the same question.

It is what I meant when I asked:

...to which chapter do the extra pages between the end of chapter 5 and the beginning of chapter 6 belong?

By @BKBK

I then added:

To answer that question requires some knowledge of the content and context.

This means that you have to know whether the content of the extra page belongs to Chapter X or to Chapter XI. Which implies that you won't be able to do this programmatically; you have to do it manually.

I am sorry that I don't know how to automate PDF page-handling at page level, using ColdFusion. Like you, I don't know whether Adobe has a tag or function to denote PDF page-change. Nevertheless, the iText library might be of help. The iText code I gave earlier corresponds to:

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

Report · Sep 04, 2021

Bonjour,

Merci pour la réponse.

Avez-vous un email pour pouvoir discuter directement ?

Cordialement

jmbusselet@hotmail.com

Report · Sep 04, 2021

There is no need to exchange e-mails. We can discuss by means of this forum's private-messaging. 🙂

In any case, I would advise you to delete your e-mail. You might otherwise get spammed.

Report · Sep 13, 2021

Bonjour,

I have had another look at your original question. If what you are looking for is a way to distinguish the beginning and the end of a chapter, section or paragraph, then you can simply create your own tags. That is what we do in our team. For paragraphs as well as for sections and chapters.

For example, the following tags are placed at the respective locations:

[*@@paragraph@@*] to denote the beginning of a paragraph;

[/*@@paragraph@@*] to denote the end of a paragraph;

[*@@section@@*] to denote the beginning of a section;

[/*@@section@@*] to denote the end of a section;

[*@@chapter@@*] to denote the beginning of a chapter;

[/*@@chapter@@*] to denote the end of a chapter;

You could refine the tags to your desire. For example, by adding numbers:

[*@@chapter@@*][6] beginning of chapter 6;

[/*@@chapter@@*][6] end of chapter 6.

The tags are in use throughout the life of the application. So we define them in onApplicationStart in Application.cfc, as follows:

Now suppose a client submits file content. In this way, you can

do a file-read;
(using a regular-expression to look for the tags) identify where in the content paragraphs, sections, and chapters start or end.

Report · Sep 23, 2021

Bonjour,

In addition:

This solution has the following advantages:

It is customizable and intuitive. The tags for paragraph contain the name 'paragraph', those for chapter contain 'chapter', and so on.
It is extensible. You can create tags for paragraphs, quotes, images, chapters, sections, and so on.
It is reusable. You define the tags just once, in Application.cfc. You can then have access to them anywhere in the application, for the entire duration of the application.
It is "searchable". The @ characters and square brackets are expressly used to facilitate searching. Thus, you can apply a regular expression to find all the chapters in a given book.

Report · Sep 23, 2021

Bonjour,

J'ai essayé de trouver un exemple avec ces balises.

Sans succès !

Pourriez-vous m'indiquer où trouver cela ?

Merci par avance.

Report · Sep 24, 2021

It is unlikely that you will find the tags elsewhere. I created them myself. 🙂

That is the whole idea behind this method: you create tags unique to your own publishing environment.

Report · Sep 24, 2021

Exemple perso ?

Comment les mettre automatiquement ?

Merci par avance

Report · Sep 24, 2021

I am beginning to think that we misunderstand each other. Your question suggests that, from your point of view, the author submits a bunch of content and the developer determines where the paragraphs, chapters and sections start or end.

From my point of view, the developer issues the list of start/end tags beforehand to every prospective author. It is then up to an author to place the respective tags at the locations where paragraphs, chapters and sections start or end. Then the developer's publishing software will, after parsing the content, know exactly how to format the entire book.

Report · Sep 25, 2021

Non, non nous nous comprenons très bien.

La différence est que j'aimerai que l'opération se fasse automatiquement.

Le niveau, en informatique, est très bas chez les auteurs 😞

Donc plus cela sera automatisé et mieux cela sera !!

Adobe Community

Découpage d'un fichier word en plusieurs fichiers

1 Correct answer