Skip to main content
Inspiring
March 16, 2023
Answered

Unicode words mixed-up with special characters along with numeric values.

  • March 16, 2023
  • 2 replies
  • 660 views

User copying Arabic words from PDF file and pasting into Text box.

After pasting words shows all fine with no any alien characters. But at submit, taking alien characters mix-up with special characters along with numeric values.

Please provide us solution to identify alien string value at client side and validate user right there.

e.g. 

دھﺎﻧﺎراج ﺗﻮﻣﺎس ﺟﯿﺎراج

    This topic has been closed for replies.
    Correct answer BKBK

    A suggestion:

    1.  Escape every # in the input string by converting it to ##.
    2.  Then use the canonicalize function.

      Example:
      <cfscript>
      originalString="دھ&##65166;&##65255;&##65166;راج &##65175;&##65262;&##65251;&##65166;س &##65183;&##64511;&##65166;را;";
      canonicalizedString=canonicalize(originalString,false,false);
      writeoutput(canonicalizedString);
      </cfscript>
      ​

    2 replies

    BKBK
    Community Expert
    BKBKCommunity ExpertCorrect answer
    Community Expert
    March 16, 2023

    A suggestion:

    1.  Escape every # in the input string by converting it to ##.
    2.  Then use the canonicalize function.

      Example:
      <cfscript>
      originalString="دھ&##65166;&##65255;&##65166;راج &##65175;&##65262;&##65251;&##65166;س &##65183;&##64511;&##65166;را;";
      canonicalizedString=canonicalize(originalString,false,false);
      writeoutput(canonicalizedString);
      </cfscript>
      ​
    Inspiring
    March 17, 2023

    BKBK,
    This is really amazing. This solution working.
    Since very long issue was persisting in my application. I was always suggesting users to avoid copy paste Arabic words from PDF. Now it will give me relax as well as our clients.
    Further I have to do testing deeply and I will implement into production.

    Community Expert
    March 16, 2023

    Do you want to perform validation in the PDF or in the HTML form? Can you possibly accept the PDF as a form submission rather than copying from PDF to HTML? What's the character set in the PDF, and what is it in the HTML form and the CF server?

     

    Dave Watts, Eidolon LLC

    Dave Watts, Eidolon LLC
    Known Participant
    March 16, 2023

    Sometimes users copying text from PDF and pasting it into HTML. In this activity the issue occurs. 

    Copying from Google translator or typing directly into HTML textbox is not an issue. In copy paste text from PDF only having issue. I hope now you understood exact matter.

    Community Expert
    March 16, 2023

    Right, but my questions still stand. Do you want to change the PDF? Do you want to change how you accept form data? Right now, I think the problem is an incompatibility between the character set used by the PDF and the one used by CF and your HTML form.

     

    Dave Watts, Eidolon LLC

    Dave Watts, Eidolon LLC