Unicode Injection

Tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks

Check the subscription plans!

Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.

Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.

Introduction

Depending on how the back-end/front-end is behaving when it receives weird unicode characters an attacker might be able to bypass protections and inject arbitrary characters that could be used to abused injection vulnerabilities such as XSS or SQLi.

Unicode Normalization

Unicode normalization occurs when unicode characters are normalized to ascii characters.

One common scenario of this type of vulnerability occurs when the system is modifying somehow the input of the user after having checked it. For example, in some languages a simple call to make the input uppercase or lowercase could normalize the given input and the unicode will be transformed into ASCII generating new characters.
For more info check:

Unicode Normalization

`\u` to `%`

Unicode characters are usually represented with the \u prefix. For example the char 㱋 is \u3c4b(check it here). If a backend transforms the prefix \u in %, the resulting string will be %3c4b, which URL decoded is: <4b. And, as you can see, a < char is injected.
You could use this technique to inject any kind of char if the backend is vulnerable.
Check https://unicode-explorer.com/ to find the chars you need.

This vuln actually comes from a vulnerability a researcher found, for a more in depth explanation check https://www.youtube.com/watch?v=aUsAHb0E7Cg

Emoji Injection

Back-ends something behaves weirdly when they receives emojis. That’s what happened in this writeup where the researcher managed to achieve a XSS with a payload such as: 💋img src=x onerror=alert(document.domain)//💛

In this case, the error was that the server after removing the malicious characters converted the UTF-8 string from Windows-1252 to UTF-8 (basically the input encoding and the convert from encoding mismatched). Then this does not give a proper < just a weird unicode one: ‹
``So they took this output and converted again now from UTF-8 ot ASCII. This normalized the ‹to< this is how the exploit could work on that system.
This is what happened:

<?php

$str = isset($_GET["str"]) ? htmlspecialchars($_GET["str"]) : "";

$str = iconv("Windows-1252", "UTF-8", $str);
$str = iconv("UTF-8", "ASCII//TRANSLIT", $str);

echo "String: " . $str;

Emoji lists:

Windows Best-Fit/Worst-fit

As explained in this great post, Windows has a feature called Best-Fit which will replace unicode characters that cannot be displayed in ASCII mode with a similar one. This can lead to unexpected behavior when the backend is expecting a specific character but it receives a different one.

It’s possible to find best-fit characters in https://worst.fit/mapping/.

As Windows will usually convert unicode strings to ascii strings as one of the last parts of the execution (usually going from a “W” suffixed API to an “A” suffixed API like GetEnvironmentVariableA and GetEnvironmentVariableW) this would allow atackers to bypass protections by sending unicode characters that will be converted lastly in ASCII characters that would perform nuexpected actions.

In the blog post there are proposed methods to bypass vulnerabilities fixed using a blacklist of chars, exploit path traversals using characters mapped to “/“ (0x2F) and characters mapped to “\“ (0x5C) or even bypassing shell escape protections like PHP’s escapeshellarg or Python’s subprocess.run using a list, this was done for example using fullwidth double quotes (U+FF02) instead of double quotes so at the end what looked like 1 argument was transformed in 2 arguments.

Note that for an app to be vulnerable it needs to use “W” Windows APIs but end calling an “A” Windows api so the “Best-fit” of the unicode string is created.

Several dicovered vulnerabilities won’t be fixed as people don’t agree who should be fixing this issue“.

Tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks

Check the subscription plans!

Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.

Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.

HackTricks

Unicode Injection

Introduction

Unicode Normalization

\u to %

Emoji Injection

Windows Best-Fit/Worst-fit

`\u` to `%`