Copy link to clipboard
Copied
Bonjour,
Ayant un fichier word d'un livre, je voudrais le découper automatiquement par chapitre.
Le problème est que je ne sais pas par où commencer 😞
Est-il plus facile de la transformer avant en pdf ?
Merci pour vos idées.
Cordialement
Hello,
Having a word file of a book, I would like to automatically split it by chapter.
Problem is, I don't know where to start 😞
Is it easier to convert it to pdf before?
Thank you for your ideas.
cordially
Hi @ZNB ,
Your expectation is quite reasonable. I also expected ColdFusion would by now offer a way to extract the chapter metadata of a PDF. Especially because ColdFusion is, like PDF, within the Adobe family.
Perhaps, there is such a way. If there is, then I am unaware of it and shall be glad to hear about it.
It may indeed be a good idea to ask the client to put $XstartX$ and $XendX$, respectively, at the start and end of a chapter. Here, X is the chapter number. Using chapter numbers wi
...Copy link to clipboard
Copied
Indeed. If you convert it to PDF beforehand, it will be easy to split it by chapter using ColdFusion's cfpdf.
Copy link to clipboard
Copied
Bonjour,
Merci pour la réponse.
Mais quelle est l'option de CFPDF à employer pour faire cela ? (CFPDF fait tellement de choses !)
Cordialement et merci par avance.
Hello,
Thank you for the answer.
But what is the CFPDF option to use to do this? (CFPDF does so many things!)
Sincerely, and thank you in advance.
Copy link to clipboard
Copied
You could use action="deletepages". 🙂
In the following example, I extracted the individual chapters from Guy Kawasaki's ebook.
<!--- Determine the chapter start and end. Preferably manually, for accuracy. --->
<cfset chapterStartPage=[15, 43, 81, 102, 119, 159, 193, 212, 239, 270, 285, 298, 318]>
<cfset chapterEndPage=[41, 80, 101, 118, 158, 191, 211, 238, 269, 284, 297, 316, 365]>
<cfset sourcePDF="C:\Users\bkbk\Desktop\kawasaki\The Art of the Start 2.0 - Guy Kawasaki.pdf">
<cfset chapterDestination="C:\Users\bkbk\Desktop\kawasaki\chapters">
<cfloop from="1" to="#arrayLen(chapterStartPage)#" index="i">
<!---
For example, chapter 3 corresponds to the third item in the above arrays:
chapterStartPage[3]=81, chapterEndPage[3]=101.
To extract chapter 3, delete all pages between 1 and 80 and
between 102 and the end of the book.
--->
<cfset pagesToBeDeleted="1-#chapterStartPage[i]-1#,#chapterEndPage[i]+1#-*">
<cfpdf
action = "deletepages"
pages = "#pagesToBeDeleted#"
source = "#sourcePDF#"
overwrite = "yes"
destination = "#chapterDestination#\chapter #i#.pdf">
</cfloop>
Copy link to clipboard
Copied
@ZNB , does that answer your question?
Copy link to clipboard
Copied
Bonjour,
C'est exactement ce que je cherchais ! MERCI
Par contre, il faut que je trouve un moyen pour déterminer automatiquement le début et la fin du chapitre.
J'ai pensé demander au client de mettre par exemple $deb$ et $fin$ pour le déterminer si je ne trouve pas autre chose.
PDF mette-t-il un signe particulier en début et fin de chapitre ?
Merci par avance.
Cordialement
Hello,
This is exactly what I was looking for ! THANK YOU
On the other hand, I have to find a way to automatically determine the beginning and the end of the chapter.
I thought I'd ask the client to put for example $start$ and $end$ to determine if I can't find something else.
Does PDF put a special mark at the beginning and end of the chapter?
Thanks in advance.
cordially
Copy link to clipboard
Copied
Hi @ZNB ,
Your expectation is quite reasonable. I also expected ColdFusion would by now offer a way to extract the chapter metadata of a PDF. Especially because ColdFusion is, like PDF, within the Adobe family.
Perhaps, there is such a way. If there is, then I am unaware of it and shall be glad to hear about it.
It may indeed be a good idea to ask the client to put $XstartX$ and $XendX$, respectively, at the start and end of a chapter. Here, X is the chapter number. Using chapter numbers will make the code simpler.
In any case, it is possible to get information about PDF chapters automatically. You could do so by using the iText PDF library integrated in ColdFusion to extract the PDF bookmark.
In the example below, I extract the bookmark of the Guy Kawasaki PDF ebook. From it, I could construct an array of chapter-start-pages, as defined in my previous code. What remains is for you to find a way to define the chapter-end-pages.
<cfset reader = CreateObject("java", "com.lowagie.text.pdf.PdfReader").init("C:\Users\bkbk\Desktop\kawasaki\The Art of the Start 2.0 - Guy Kawasaki.pdf")>
<cfset simpleBookmark = createObject("java","com.lowagie.text.pdf.SimpleBookmark")>
<cfset bookmarks = simpleBookmark.getBookmark(reader)>
<cfif isNull(bookmarks)>
No bookmarks.
<cfabort>
</cfif>
<cfset chapterStartPages = arrayNew(1)>
<cfset iterator = bookmarks.listIterator()>
<cfloop condition="iterator.hasNext()">
<!--- A HashMap --->
<cfset bookmark = iterator.next()>
<!--- Debugging code.
Shows you an object containing chapter titles and page numbers,
if there are any.
ColdFusion will tell you that this object is a struct.
But it is not; it is a HashMap.
--->
<!---<cfdump var="#bookmark#">--->
<cfoutput>
<cfif not isNull(bookmark.get('Kids'))>
<cfloop from="1" to="#arrayLen(bookmark.get('Kids'))#" index="i">
<cfif bookmark.get('Kids')[i]['Title'] contains "chapter">
<cfset title = trim(bookmark.get('Kids')[i]['Title'])>
<cfset pageNumber = listGetAt(trim(bookmark.get('Kids')[i]['Page']),1," ")>
<p>
Chapter Title: <strong>#title#</strong> <br>
Chapter Start-Page: <strong>#pageNumber#</strong>
</p>
<cfset arrayAppend(chapterStartPages,pageNumber)>
</cfif>
</cfloop>
</cfif>
</cfoutput>
</cfloop>
<cfdump var="#chapterStartPages#" label="Chapter Start Pages">
The output is:
Copy link to clipboard
Copied
Bonjour,
Réponse très interessante !
Mais la fin d'un chapitre, ce n'est pas le début du chapitre suivant - 1 ?
Qu'en pensez-vous ?
Merci par avance
Hello,
Very interesting answer!
But the end of a chapter is not the start of the next chapter - 1?
What do you think ?
Thanks in advance
Copy link to clipboard
Copied
You're right, of course. The question is, to which chapter do the extra pages between the end of chapter 5 and the beginning of chapter 6 belong? To answer that question requires some knowledge of the content and context.
Interesting point. Do you have any ideas on this? Or any preference?
Copy link to clipboard
Copied
Je ne comprend pas votre question !
Où est le problème ?
A vous lire.
I do not understand your question !
Where is the problem ?
To read to you.
Copy link to clipboard
Copied
...
Fin: Chapitre X
Page + illustration 1 (Which chapter does this belong to?)
Début: Chapitre XI
...
Copy link to clipboard
Copied
c'est selon où sera positionné le début !
1er cas : appartient au chapitre X
Page + illustration 1 (Which chapter does this belong to?)
Fin: Chapitre X
Début: Chapitre XI
2ème cas : appartient au chapitre XI
Fin: Chapitre X
Début: Chapitre XI
Page + illustration 1 (Which chapter does this belong to?)
Par contre, quel est le "signe" que met Adobe pour le changement de page, je ne sais.
Merci par avance
Copy link to clipboard
Copied
Numéro de page |
|
… |
… |
567 |
Fin: Chapitre X |
568 |
Texte et illustration (Cette page appartient-elle au chapitre X ou au chapitre XI?) |
569 |
Début: Chapitre XI |
570 |
… |
Copy link to clipboard
Copied
Je reprends mon exemple :
1er cas : appartient au chapitre X
Page + illustration 1 (Which chapter does this belong to?)
Début: Chapitre XI
2ème cas : appartient au chapitre XI
Début: Chapitre XI
Page + illustration 1 (Which chapter does this belong to?)
Il n'y a que la balise début !
Cordialement
Copy link to clipboard
Copied
Bjr
Complément :
Si Page + illustration 1 est autonome alors il faut le considérer comme un chapitre.
Qu'en pensez-vous ?
Connaissez-vous la balise que met Adobe pour le changement de page ?
Merci par avance
Cordialement
Copy link to clipboard
Copied
Je reprends mon exemple :
1er cas : appartient au chapitre X
Page + illustration 1 (Which chapter does this belong to?)
Début: Chapitre XI
2ème cas : appartient au chapitre XI
Début: Chapitre XI
Page + illustration 1 (Which chapter does this belong to?)
By @ZNB
I now understand. In fact, we're both asking the same question.
It is what I meant when I asked:
...to which chapter do the extra pages between the end of chapter 5 and the beginning of chapter 6 belong?
By @BKBK
I then added:
To answer that question requires some knowledge of the content and context.
This means that you have to know whether the content of the extra page belongs to Chapter X or to Chapter XI. Which implies that you won't be able to do this programmatically; you have to do it manually.
I am sorry that I don't know how to automate PDF page-handling at page level, using ColdFusion. Like you, I don't know whether Adobe has a tag or function to denote PDF page-change. Nevertheless, the iText library might be of help. The iText code I gave earlier corresponds to:
1er cas : appartient au chapitre X
Page + illustration 1 (Which chapter does this belong to?)
Début: Chapitre XI
Copy link to clipboard
Copied
Bonjour,
Merci pour la réponse.
Avez-vous un email pour pouvoir discuter directement ?
Cordialement
jmbusselet@hotmail.com
Copy link to clipboard
Copied
There is no need to exchange e-mails. We can discuss by means of this forum's private-messaging. 🙂
In any case, I would advise you to delete your e-mail. You might otherwise get spammed.
Copy link to clipboard
Copied
Bonjour,
I have had another look at your original question. If what you are looking for is a way to distinguish the beginning and the end of a chapter, section or paragraph, then you can simply create your own tags. That is what we do in our team. For paragraphs as well as for sections and chapters.
For example, the following tags are placed at the respective locations:
[*@@paragraph@@*] to denote the beginning of a paragraph;
[/*@@paragraph@@*] to denote the end of a paragraph;
[*@@section@@*] to denote the beginning of a section;
[/*@@section@@*] to denote the end of a section;
[*@@chapter@@*] to denote the beginning of a chapter;
[/*@@chapter@@*] to denote the end of a chapter;
You could refine the tags to your desire. For example, by adding numbers:
[*@@chapter@@*][6] beginning of chapter 6;
[/*@@chapter@@*][6] end of chapter 6.
The tags are in use throughout the life of the application. So we define them in onApplicationStart in Application.cfc, as follows:
<cfset application.paragraphStartTag="[*@@paragraph@@*]">
<cfset application.paragraphEndTag="[/*@@paragraph@@*]">
<cfset application.sectionStartTag="[*@@section@@*]">
<cfset application.sectionEndTag="[/*@@section@@*]">
<cfset application.chapterStartTag="[*@@chapter@@*]">
<cfset application.chapterEndTag="[/*@@chapter@@*]">
Now suppose a client submits file content. In this way, you can
Copy link to clipboard
Copied
Bonjour,
In addition:
This solution has the following advantages:
Copy link to clipboard
Copied
Bonjour,
J'ai essayé de trouver un exemple avec ces balises.
Sans succès !
Pourriez-vous m'indiquer où trouver cela ?
Merci par avance.
Copy link to clipboard
Copied
It is unlikely that you will find the tags elsewhere. I created them myself. 🙂
That is the whole idea behind this method: you create tags unique to your own publishing environment.
Copy link to clipboard
Copied
Exemple perso ?
Comment les mettre automatiquement ?
Merci par avance
Copy link to clipboard
Copied
I am beginning to think that we misunderstand each other. Your question suggests that, from your point of view, the author submits a bunch of content and the developer determines where the paragraphs, chapters and sections start or end.
From my point of view, the developer issues the list of start/end tags beforehand to every prospective author. It is then up to an author to place the respective tags at the locations where paragraphs, chapters and sections start or end. Then the developer's publishing software will, after parsing the content, know exactly how to format the entire book.
Copy link to clipboard
Copied
Non, non nous nous comprenons très bien.
La différence est que j'aimerai que l'opération se fasse automatiquement.
Le niveau, en informatique, est très bas chez les auteurs 😞
Donc plus cela sera automatisé et mieux cela sera !!