Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

socket connection: no utf-8 characters?

Community Beginner ,
Dec 30, 2009 Dec 30, 2009

Hello everyone,

I have the following problem:

- I connect to a webserver via socket connection from InDesign in order to read the website.

- the document that I read is a php page

- the content i read is utf-8 encoded

- unfortunately the utf-8 characters are truncated in my result

--> WHY? Any idea is welcome!

Here are the scripts which I'm using:

First the php script on the server:

<?php
header("Content-Type: text/html; charset=utf-8");
echo "Motörhead";
?>

Now the ExtendScript:

Result = "";
Connection = new Socket;


if (Connection.open ("localhost:80")) {
    Connection.write ("GET /test.php HTTP/1.0 \n\n");
    Result = Connection.read(999999);
    Connection.close();
    }

var myDocument = app.documents.add();
var myTextFrame = myDocument.pages.item(0).textFrames.add();
myTextFrame.geometricBounds = ["0mm", "0mm", "100mm", "100mm"];
myTextFrame.contents = Result;

And finally the result in my layout document:

HTTP/1.1 200 OK
Date: Wed, 30 Dec 2009 09:34:32 GMT
Server: Apache/2.2.12 (Win32) DAV/2 mod_ssl/2.2.12 OpenSSL/0.9.8k mod_autoindex_color PHP/5.3.0 mod_perl/2.0.4 Perl/v5.10.0
X-Powered-By: PHP/5.3.0
Content-Length: 9
Connection: close
Content-Type: text/html; charset=utf-8

Motrhead

As you can see, the character "ö" in the last line has been swallowed.

What's wrong here?

Thanks for you help in andvance,

Dr. TYPO

TOPICS
Scripting
3.3K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Mentor ,
Dec 30, 2009 Dec 30, 2009

I do not know PHP, but I think there is something wrong at that side. Apparently setting the HTTP header to utf-8 plus wishful thinking does not change the applied output encoding of PHP. Besides, if you do not deliver HTML you should also use the correct MIME type text/plain.

Motörhead is 9 characters. Your reported content length is also 9 bytes, while the ö Umlaut should be encoded into at least 2 bytes.

Try the snippet below, an alternative is to ensure that your PHP source document is already utf-8 encoded.

<?php
header("Content-Type: text/
plain; charset=utf-8");
echo
utf8_encode("Motörhead");

?>

http://en.wikipedia.org/wiki/MIME

http://php.net/manual/en/function.utf8-encode.php

http://linux.die.net/man/1/hexdump

Dirk

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Dec 30, 2009 Dec 30, 2009

Hello Dirk,

thanks for the hint.

This does the trick:

in the php script the utf8 econded string must be published url encoded, otherwise in would be utf8 interpreted twice from the receiving ExtendScrip

echo urlencode(utf8_encode("Motörhead"));

and then in the ExtendScript decode the content, e.g.

alert (decodeURI(Result));

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Mentor ,
Dec 30, 2009 Dec 30, 2009

The second wrapper should not be required if your web site is supposed to deliver UTF8.

Instead you'd add a conversion step within ExtendScript.

E.g. use one of the JS UTF8 decoders from the web, or write the raw bytes out into a file then adjust the File.encoding variable for re-reading.

File.encoding is described in JavaScript Tools Guide, see the ESTK help menu.

Dirk

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Mentor ,
Dec 30, 2009 Dec 30, 2009

Ah, I always forget that Socket also has its own .encoding variable. Just make sure it is set correctly.

Dirk

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
People's Champ ,
Sep 12, 2017 Sep 12, 2017
LATEST

https://forums.adobe.com/people/Dirk+Becker  wrote

Ah, I always forget that Socket also has its own .encoding variable. Just make sure it is set correctly.

Dirk

OMG! Thank you! Sanity restored...

Ariel

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines