• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Découpage d'un fichier word en plusieurs fichiers

Contributor ,
Jun 10, 2021 Jun 10, 2021

Copy link to clipboard

Copied

Bonjour,

Ayant un fichier word d'un livre, je voudrais le découper automatiquement par chapitre.

Le problème est que je ne sais pas par où commencer 😞

Est-il plus facile de la transformer avant en pdf ?

Merci pour vos idées.

Cordialement

 

Hello,

Having a word file of a book, I would like to automatically split it by chapter.

Problem is, I don't know where to start 😞

Is it easier to convert it to pdf before?

Thank you for your ideas.

cordially

TOPICS
Advanced techniques , Documentation

Views

614

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Jun 26, 2021 Jun 26, 2021

Hi @ZNB ,

Your expectation is quite reasonable. I also expected ColdFusion would by now offer a way to extract the chapter metadata of a PDF. Especially because ColdFusion is, like PDF, within the Adobe family.

 

Perhaps, there is such a way. If there is, then I am unaware of it and shall be glad to hear about it. 

 

It may indeed be a good idea to ask the client to put $XstartX$ and $XendX$, respectively, at the start and end of a chapter. Here, X is the chapter number. Using chapter numbers wi

...

Votes

Translate

Translate
Community Expert ,
Jun 13, 2021 Jun 13, 2021

Copy link to clipboard

Copied

Indeed. If you convert it to PDF beforehand, it will be easy to split it by chapter using ColdFusion's cfpdf.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 14, 2021 Jun 14, 2021

Copy link to clipboard

Copied

Bonjour,

Merci pour la réponse.

Mais quelle est l'option de CFPDF à employer pour faire cela ? (CFPDF fait tellement de choses !)

Cordialement et merci par avance.

 

Hello,

Thank you for the answer.

But what is the CFPDF option to use to do this? (CFPDF does so many things!)

Sincerely, and thank you in advance.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 14, 2021 Jun 14, 2021

Copy link to clipboard

Copied

You could use action="deletepages". 🙂

In the following example, I extracted the individual chapters from Guy Kawasaki's ebook. 

<!--- Determine the chapter start and end. Preferably manually, for accuracy. --->
<cfset chapterStartPage=[15, 43, 81, 102, 119, 159, 193, 212, 239, 270, 285, 298, 318]>
<cfset chapterEndPage=[41, 80, 101, 118, 158, 191, 211, 238, 269, 284, 297, 316, 365]>

<cfset sourcePDF="C:\Users\bkbk\Desktop\kawasaki\The Art of the Start 2.0 - Guy Kawasaki.pdf">
<cfset chapterDestination="C:\Users\bkbk\Desktop\kawasaki\chapters">

<cfloop from="1" to="#arrayLen(chapterStartPage)#" index="i">
    
    <!--- 
    For example, chapter 3 corresponds to the third item in the above arrays: 
    chapterStartPage[3]=81, chapterEndPage[3]=101. 
    To extract chapter 3, delete all pages between 1 and 80 and 
    between 102 and the end of the book.  
    --->
 	<cfset pagesToBeDeleted="1-#chapterStartPage[i]-1#,#chapterEndPage[i]+1#-*">
 	
	<cfpdf
    action = "deletepages"
    pages = "#pagesToBeDeleted#"
    source = "#sourcePDF#"
    overwrite = "yes"
    destination = "#chapterDestination#\chapter #i#.pdf">
</cfloop>

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 17, 2021 Jun 17, 2021

Copy link to clipboard

Copied

@ZNB , does that answer your question?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 24, 2021 Jun 24, 2021

Copy link to clipboard

Copied

Bonjour,

C'est exactement ce que je cherchais ! MERCI

Par contre, il faut que je trouve un moyen pour déterminer automatiquement le début et la fin du chapitre.

J'ai pensé demander au client de mettre par exemple $deb$ et $fin$ pour le déterminer si je ne trouve pas autre chose.

PDF mette-t-il un signe particulier en début et fin de chapitre ?

Merci par avance.

Cordialement

 

Hello,

This is exactly what I was looking for ! THANK YOU

On the other hand, I have to find a way to automatically determine the beginning and the end of the chapter.

I thought I'd ask the client to put for example $start$ and $end$ to determine if I can't find something else.

Does PDF put a special mark at the beginning and end of the chapter?

Thanks in advance.

cordially

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 26, 2021 Jun 26, 2021

Copy link to clipboard

Copied

Hi @ZNB ,

Your expectation is quite reasonable. I also expected ColdFusion would by now offer a way to extract the chapter metadata of a PDF. Especially because ColdFusion is, like PDF, within the Adobe family.

 

Perhaps, there is such a way. If there is, then I am unaware of it and shall be glad to hear about it. 

 

It may indeed be a good idea to ask the client to put $XstartX$ and $XendX$, respectively, at the start and end of a chapter. Here, X is the chapter number. Using chapter numbers will make the code simpler.

 

In any case, it is possible to get information about PDF chapters automatically. You could do so by using the iText PDF library integrated in ColdFusion to extract the PDF bookmark.  

 

In the example below, I extract the bookmark of the Guy Kawasaki PDF ebook. From it, I could construct an array of chapter-start-pages, as defined in my previous code. What remains is for you to find a way to define the chapter-end-pages.

 

<cfset reader = CreateObject("java", "com.lowagie.text.pdf.PdfReader").init("C:\Users\bkbk\Desktop\kawasaki\The Art of the Start 2.0 - Guy Kawasaki.pdf")> 
<cfset simpleBookmark = createObject("java","com.lowagie.text.pdf.SimpleBookmark")> 
<cfset bookmarks = simpleBookmark.getBookmark(reader)> 

<cfif isNull(bookmarks)> 
	 No bookmarks. 
	<cfabort> 
</cfif>

<cfset chapterStartPages = arrayNew(1)>	
	
<cfset iterator = bookmarks.listIterator()>


<cfloop condition="iterator.hasNext()">
	
	<!--- A HashMap --->
	<cfset bookmark = iterator.next()> 	

	<!--- Debugging code. 
	Shows you an object containing chapter titles and page numbers, 
	if there are any. 
	ColdFusion will tell you that this object is a struct.
	But it is not; it is a HashMap. 
	--->
	<!---<cfdump var="#bookmark#">--->	
	
	<cfoutput>   		
			<cfif not isNull(bookmark.get('Kids'))>		
				<cfloop from="1" to="#arrayLen(bookmark.get('Kids'))#" index="i">
					<cfif bookmark.get('Kids')[i]['Title'] contains "chapter">
						
						<cfset title = trim(bookmark.get('Kids')[i]['Title'])>
						<cfset pageNumber = listGetAt(trim(bookmark.get('Kids')[i]['Page']),1," ")>
						
						<p>
							Chapter Title: <strong>#title#</strong> <br>
							Chapter Start-Page: <strong>#pageNumber#</strong>
						</p>
						
						<cfset arrayAppend(chapterStartPages,pageNumber)>
					</cfif>
				</cfloop>
			</cfif>		
	</cfoutput>
</cfloop>

<cfdump var="#chapterStartPages#" label="Chapter Start Pages">	

 

 

The output is:

BKBK_0-1624721397256.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Aug 31, 2021 Aug 31, 2021

Copy link to clipboard

Copied

Bonjour,

Réponse très interessante !

Mais la fin d'un chapitre, ce n'est pas le début du chapitre suivant - 1 ?

Qu'en pensez-vous ?

Merci par avance 

 

Hello,

Very interesting answer!

But the end of a chapter is not the start of the next chapter - 1?

What do you think ?

Thanks in advance

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 31, 2021 Aug 31, 2021

Copy link to clipboard

Copied

You're right, of course. The question is, to which chapter do the extra pages between the end of chapter 5 and the beginning of chapter 6 belong? To answer that question requires some knowledge of the content and context. 

 

Interesting point. Do you have any ideas on this? Or any preference?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 01, 2021 Sep 01, 2021

Copy link to clipboard

Copied

Je ne comprend pas votre question !
Où est le problème ?
A vous lire.

 

I do not understand your question !
Where is the problem ?
To read to you.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 01, 2021 Sep 01, 2021

Copy link to clipboard

Copied

...

Fin: Chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

...

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 01, 2021 Sep 01, 2021

Copy link to clipboard

Copied

c'est selon où sera positionné le début !

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Fin: Chapitre X

Début: Chapitre XI

2ème cas : appartient au chapitre XI

Fin: Chapitre X

Début: Chapitre XI

Page + illustration 1 (Which chapter does this belong to?)

Par contre, quel est le "signe" que met Adobe pour le changement de page, je ne sais.

Merci par avance

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 01, 2021 Sep 01, 2021

Copy link to clipboard

Copied

Numéro de page

 

567

Fin: Chapitre X

568

Texte et illustration  (Cette page appartient-elle au chapitre X ou au chapitre XI?)

569

Début: Chapitre XI

570

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 01, 2021 Sep 01, 2021

Copy link to clipboard

Copied

Je reprends mon exemple : 

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

2ème cas : appartient au chapitre XI

Début: Chapitre XI

Page + illustration 1 (Which chapter does this belong to?)

Il n'y a que la balise début !

Cordialement

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 03, 2021 Sep 03, 2021

Copy link to clipboard

Copied

Bjr

Complément :

Si Page + illustration 1 est autonome alors il faut le considérer comme un chapitre.

Qu'en pensez-vous ?

Connaissez-vous la balise que met Adobe pour le changement de page ?

Merci par avance

Cordialement

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 04, 2021 Sep 04, 2021

Copy link to clipboard

Copied

quote

Je reprends mon exemple : 

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

 

2ème cas : appartient au chapitre XI

Début: Chapitre XI

Page + illustration 1 (Which chapter does this belong to?)

 

By @ZNB

 

I now understand. In fact, we're both asking the same question.

 

It is what I meant when I asked:

quote

...to which chapter do the extra pages between the end of chapter 5 and the beginning of chapter 6 belong? 


By @BKBK


I then added:

 

BKBK_0-1630747735310.png

To answer that question requires some knowledge of the content and context. 


This means that you have to know whether the content of the extra page belongs to Chapter X or to Chapter XI. Which implies that you won't be able to do this programmatically; you have to do it manually.

 

I am sorry that I don't know how to automate PDF page-handling at page level, using ColdFusion. Like you, I don't know whether Adobe has a tag or function to denote PDF page-change. Nevertheless, the iText library might be of help. The iText code I gave earlier corresponds to:

 

1er cas : appartient au chapitre X

Page + illustration 1 (Which chapter does this belong to?)

Début: Chapitre XI

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 04, 2021 Sep 04, 2021

Copy link to clipboard

Copied

Bonjour,

Merci pour la réponse.

Avez-vous un email pour pouvoir discuter directement ?

Cordialement

jmbusselet@hotmail.com

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 04, 2021 Sep 04, 2021

Copy link to clipboard

Copied

There is no need to exchange e-mails. We can discuss by means of this forum's private-messaging. 🙂

In any case, I would advise you to delete your e-mail. You might otherwise get spammed.

 

BKBK_0-1630766153094.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 13, 2021 Sep 13, 2021

Copy link to clipboard

Copied

Bonjour,

I have had another look at your original question. If what you are looking for is a way to distinguish the beginning and the end of a chapter, section or paragraph, then you can simply create your own tags. That is what we do in our team. For paragraphs as well as for sections and chapters.

 

For example, the following tags are placed at the respective locations:

 

[*@@paragraph@@*] to denote the beginning of a paragraph; 

[/*@@paragraph@@*] to denote the end of a paragraph; 

[*@@section@@*] to denote the beginning of a section; 

[/*@@section@@*] to denote the end of a section; 

[*@@chapter@@*] to denote the beginning of a chapter; 

[/*@@chapter@@*] to denote the end of a chapter; 

 

You could refine the tags to your desire. For example, by adding numbers:

[*@@chapter@@*][6] beginning of chapter 6;

[/*@@chapter@@*][6] end of chapter 6.

 

The tags are in use throughout the life of the application. So we define them in onApplicationStart in Application.cfc, as follows:

 

<cfset application.paragraphStartTag="[*@@paragraph@@*]">

<cfset application.paragraphEndTag="[/*@@paragraph@@*]">

<cfset application.sectionStartTag="[*@@section@@*]">
<cfset application.sectionEndTag="[/*@@section@@*]">

<cfset application.chapterStartTag="[*@@chapter@@*]">
<cfset application.chapterEndTag="[/*@@chapter@@*]">

 

Now suppose a client submits file content. In this way, you can

  1. do a file-read;
  2. (using a regular-expression to look for the tags) identify where in the content paragraphs, sections, and chapters start or end.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 23, 2021 Sep 23, 2021

Copy link to clipboard

Copied

Bonjour,

 

In addition:

This solution has the following advantages:

  1.  It is customizable and intuitive. The tags for paragraph contain the name 'paragraph', those for chapter contain 'chapter', and so on.
  2.  It is extensible. You can create tags for paragraphs, quotes, images, chapters, sections, and so on.
  3. It is reusable. You define the tags just once, in Application.cfc. You can then have access to them anywhere in the application, for the entire duration of the application.
  4. It is "searchable". The @ characters and square brackets are expressly used to facilitate searching. Thus, you can apply a regular expression to find all the chapters in a given book.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 23, 2021 Sep 23, 2021

Copy link to clipboard

Copied

Bonjour,

J'ai essayé de trouver un exemple avec ces balises.

Sans succès !

Pourriez-vous m'indiquer où trouver cela ?

Merci par avance.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 24, 2021 Sep 24, 2021

Copy link to clipboard

Copied

It is unlikely that you will find the tags elsewhere. I created them myself. 🙂

 

That is the whole idea behind this method: you create tags unique to your own publishing environment.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 24, 2021 Sep 24, 2021

Copy link to clipboard

Copied

Exemple perso ?

Comment les mettre automatiquement ?

Merci par avance

 

 

 

 

 

 

 

 

 

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 24, 2021 Sep 24, 2021

Copy link to clipboard

Copied

I am beginning to think that we misunderstand each other. Your question suggests that, from your point of view, the author submits a bunch of content and the developer determines where the paragraphs, chapters and sections start or end. 

 

From my point of view, the developer issues the list of start/end tags beforehand to every prospective author. It is then up to an author to place the respective tags at the locations where paragraphs, chapters and sections start or end. Then the developer's publishing software will, after parsing the content, know exactly how to format the entire book.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Sep 25, 2021 Sep 25, 2021

Copy link to clipboard

Copied

Non, non nous nous comprenons très bien.

La différence est que j'aimerai que l'opération se fasse automatiquement.

Le niveau, en informatique, est très bas chez les auteurs 😞

Donc plus cela sera automatisé et mieux cela sera !!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation