Copy link to clipboard
Copied
Hello everyone,
I have the following problem:
- I connect to a webserver via socket connection from InDesign in order to read the website.
- the document that I read is a php page
- the content i read is utf-8 encoded
- unfortunately the utf-8 characters are truncated in my result
--> WHY? Any idea is welcome!
Here are the scripts which I'm using:
First the php script on the server:
<?php
header("Content-Type: text/html; charset=utf-8");
echo "Motörhead";
?>
Now the ExtendScript:
Result = "";
Connection = new Socket;
if (Connection.open ("localhost:80")) {
Connection.write ("GET /test.php HTTP/1.0 \n\n");
Result = Connection.read(999999);
Connection.close();
}
var myDocument = app.documents.add();
var myTextFrame = myDocument.pages.item(0).textFrames.add();
myTextFrame.geometricBounds = ["0mm", "0mm", "100mm", "100mm"];
myTextFrame.contents = Result;
And finally the result in my layout document:
HTTP/1.1 200 OK
Date: Wed, 30 Dec 2009 09:34:32 GMT
Server: Apache/2.2.12 (Win32) DAV/2 mod_ssl/2.2.12 OpenSSL/0.9.8k mod_autoindex_color PHP/5.3.0 mod_perl/2.0.4 Perl/v5.10.0
X-Powered-By: PHP/5.3.0
Content-Length: 9
Connection: close
Content-Type: text/html; charset=utf-8
Motrhead
As you can see, the character "ö" in the last line has been swallowed.
What's wrong here?
Thanks for you help in andvance,
Dr. TYPO
Copy link to clipboard
Copied
I do not know PHP, but I think there is something wrong at that side. Apparently setting the HTTP header to utf-8 plus wishful thinking does not change the applied output encoding of PHP. Besides, if you do not deliver HTML you should also use the correct MIME type text/plain.
Motörhead is 9 characters. Your reported content length is also 9 bytes, while the ö Umlaut should be encoded into at least 2 bytes.
Try the snippet below, an alternative is to ensure that your PHP source document is already utf-8 encoded.
<?php
header("Content-Type: text/plain; charset=utf-8");
echo utf8_encode("Motörhead");
?>
http://en.wikipedia.org/wiki/MIME
http://php.net/manual/en/function.utf8-encode.php
http://linux.die.net/man/1/hexdump
Dirk
Copy link to clipboard
Copied
Hello Dirk,
thanks for the hint.
This does the trick:
in the php script the utf8 econded string must be published url encoded, otherwise in would be utf8 interpreted twice from the receiving ExtendScrip
echo urlencode(utf8_encode("Motörhead"));
and then in the ExtendScript decode the content, e.g.
alert (decodeURI(Result));
Copy link to clipboard
Copied
The second wrapper should not be required if your web site is supposed to deliver UTF8.
Instead you'd add a conversion step within ExtendScript.
E.g. use one of the JS UTF8 decoders from the web, or write the raw bytes out into a file then adjust the File.encoding variable for re-reading.
File.encoding is described in JavaScript Tools Guide, see the ESTK help menu.
Dirk
Copy link to clipboard
Copied
Ah, I always forget that Socket also has its own .encoding variable. Just make sure it is set correctly.
Dirk
Copy link to clipboard
Copied
https://forums.adobe.com/people/Dirk+Becker wrote
Ah, I always forget that Socket also has its own .encoding variable. Just make sure it is set correctly.
Dirk
OMG! Thank you! Sanity restored...
Ariel
Find more inspiration, events, and resources on the new Adobe Community
Explore Now