I found the (german) Bible under https://info2.sermon-online.com/german/MartinLuther-1912/
In this the PDF-version (3'736 KB) is much shorter than the TXT-version (4'368 KB) :
I counted the character (of the pure text) and found 4'016'646 charachters in the TXT-version. The PDF-version is less (3.6M only). How is it possible?
Thanks for any explanation.
This is the result of text compression.
But how ist it done?
Does PDF not need 8 Bit per character?
A PDF uses more space than TXT but then it is compressed. To oversimplify, parts of the file as ZIP compressed. ZIP your text file, it will probably be smaller still.
Thanks. ZIP-compression is much stronger. The TXT-version (4'368 KB) of the Bible by ZIP use 32.5% only.
Thanks for the hint in the "flate-deflate". By the way: there are some 2% of multible characters in Bible text.
I posted a simplified example.