Copy link to clipboard
Copied
Dear Community
I found the (german) Bible under https://info2.sermon-online.com/german/MartinLuther-1912/
In this the PDF-version (3'736 KB) is much shorter than the TXT-version (4'368 KB) :
Martin_Luther_Uebersetzung_1912.pdf | 2016-03-15 08:09 | 3.6M | |
| Martin_Luther_Uebersetzung_1912.txt | 2014-02-03 00:06 | 4.2M |
I counted the character (of the pure text) and found 4'016'646 charachters in the TXT-version. The PDF-version is less (3.6M only). How is it possible?
Thanks for any explanation.
Bruno Meier
Copy link to clipboard
Copied
This is the result of text compression.
Copy link to clipboard
Copied
Of course!
But how ist it done?
Does PDF not need 8 Bit per character?
Copy link to clipboard
Copied
A PDF uses more space than TXT but then it is compressed. To oversimplify, parts of the file as ZIP compressed. ZIP your text file, it will probably be smaller still.
Copy link to clipboard
Copied
Thanks. ZIP-compression is much stronger. The TXT-version (4'368 KB) of the Bible by ZIP use 32.5% only.
Copy link to clipboard
Copied
Read this:
https://www.prepressure.com/library/compression-algorithm/flate-deflate
A simple example:
When there are 5 characters "a" the algorithm uses onle one "a".
Copy link to clipboard
Copied
Thanks for the hint in the "flate-deflate". By the way: there are some 2% of multible characters in Bible text.
kind regards
Bruno Meier
Copy link to clipboard
Copied
I posted a simplified example.
Copy link to clipboard
Copied