Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

stripping non utf-8 characters from string

Guest
Sep 16, 2008 Sep 16, 2008
hello all,

I would like to strip (or replace) all non utf-8 characters from a string (for example a form-textfield). What is the most simple way to achieve that?

thanks in advance,
rudy struyf
TOPICS
Advanced techniques
1.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 16, 2008 Sep 16, 2008
alpenman69 wrote:
> I would like to strip (or replace) all non utf-8 characters from a string (for example a form-textfield). What is the most simple way to achieve that?

no such thing as non-utf8 chars. what exactly are you trying to do?
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Sep 16, 2008 Sep 16, 2008
I would try to clean the string before sending it to a database (sql server)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 16, 2008 Sep 16, 2008
alpenman69 wrote:
> I would try to clean the string before sending it to a database (sql server)

ok, but strip it of what? everthing's in unicode. utf-8's a stingy multi-byte
encoding (ie it expands the bytes needed to represent a char only if needed) so
what exactly are you trying to get rid of?
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Sep 17, 2008 Sep 17, 2008
when you paste a text (for example from MS Word) into a formfield and write the string to sql server database, you will see that some characters are replaced in the database as a symbol (square).
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Sep 17, 2008 Sep 17, 2008
LATEST
a square means either there's a slight encoding issue or more likely the font you chose to display these doesn't contain that glyph.

if your table is using one of "N" datatypes to hold your unicode text and you're using the JDBC driver instead (labeled as ms sql server) of the ODBC one then it's most likely a simple font issue.

got a public page i can see that shows this issue?
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources