Homograph / Homoglyph Attacks in Phishing
Tip
Learn & practice AWS Hacking:
HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking:HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking:HackTricks Training Azure Red Team Expert (AzRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.
Overview
A homograph (aka homoglyph) attack abuses the fact that many Unicode code points from non-Latin scripts are visually identical or extremely similar to ASCII characters. By replacing one or more Latin characters with their look-alike counterparts, an attacker can craft:
- Display names, subjects or message bodies that look legitimate to the human eye but bypass keyword-based detections.
- Domains, sub-domains or URL paths that fool victims into believing they are visiting a trusted site.
Because every glyph is identified internally by its Unicode code point, a single substituted character is enough to defeat naïve string comparisons (e.g., "Παypal.com" vs. "Paypal.com").
Typical Phishing Workflow
- Craft message content – Replace specific Latin letters in the impersonated brand / keyword with visually indistinguishable characters from another script (Greek, Cyrillic, Armenian, Cherokee, etc.).
- Register supporting infrastructure – Optionally register a homoglyph domain and obtain a TLS certificate (most CAs do no visual similarity checks).
- Send email / SMS – The message contains homoglyphs in one or more of the following locations:
- Sender display name (e.g.,
Ηеlрdеѕk) - Subject line (
Urgеnt Аctіon Rеquіrеd) - Hyperlink text or fully qualified domain name
- Sender display name (e.g.,
- Redirect chain – Victim is bounced through seemingly benign websites or URL shorteners before landing on the malicious host that harvests credentials / delivers malware.
Unicode Ranges Commonly Abused
| Script | Range | Example glyph | Looks like |
|---|---|---|---|
| Greek | U+0370-03FF | Η (U+0397) | Latin H |
| Greek | U+0370-03FF | ρ (U+03C1) | Latin p |
| Cyrillic | U+0400-04FF | а (U+0430) | Latin a |
| Cyrillic | U+0400-04FF | е (U+0435) | Latin e |
| Armenian | U+0530-058F | օ (U+0585) | Latin o |
| Cherokee | U+13A0-13FF | Ꭲ (U+13A2) | Latin T |
Tip: Full Unicode charts are available at unicode.org.
Detection Techniques
1. Mixed-Script Inspection
Phishing emails aimed at an English-speaking organisation should rarely mix characters from multiple scripts. A simple but effective heuristic is to:
- Iterate each character of the inspected string.
- Map the code point to its Unicode block.
- Raise an alert if more than one script is present or if non-Latin scripts appear where they are not expected (display name, domain, subject, URL, etc.).
Python proof-of-concept:
import unicodedata as ud
from collections import defaultdict
SUSPECT_FIELDS = {
"display_name": "Ηоmоgraph Illusion", # example data
"subject": "Finаnꮯiаl Տtatеmеnt",
"url": "https://xn--messageconnecton-2kb.blob.core.windows.net" # punycode
}
for field, value in SUSPECT_FIELDS.items():
blocks = defaultdict(int)
for ch in value:
if ch.isascii():
blocks['Latin'] += 1
else:
name = ud.name(ch, 'UNKNOWN')
block = name.split(' ')[0] # e.g., 'CYRILLIC'
blocks[block] += 1
if len(blocks) > 1:
print(f"[!] Mixed scripts in {field}: {dict(blocks)} -> {value}")
2. Punycode Normalisation (Domains)
Internationalised Domain Names (IDNs) are encoded with punycode (xn--). Converting every hostname to punycode and then back to Unicode allows matching against a whitelist or performing similarity checks (e.g., Levenshtein distance) after the string has been normalised.
import idna
hostname = "Ρаypal.com" # Greek Rho + Cyrillic a
puny = idna.encode(hostname).decode()
print(puny) # xn--yl8hpyal.com
3. Homoglyph Dictionaries / Algorithms
Tools such as dnstwist (--homoglyph) or urlcrazy can enumerate visually-similar domain permutations and are useful for proactive takedown / monitoring.
Prevention & Mitigation
- Enforce strict DMARC/DKIM/SPF policies – prevent spoofing from unauthorised domains.
- Implement the detection logic above in Secure Email Gateways and SIEM/XSOAR playbooks.
- Flag or quarantine messages where display name domain ≠ sender domain.
- Educate users: copy-paste suspicious text into a Unicode inspector, hover links, never trust URL shorteners.
Real-World Examples
- Display name:
Сonfidеntiаl Ꭲiꮯkеt(CyrillicС,е,а; CherokeeᎢ; Latin small capitalꮯ). - Domain chain:
bestseoservices.com➜ municipal/templatesdirectory ➜kig.skyvaulyt.ru➜ fake Microsoft login atmlcorsftpsswddprotcct.approaches.it.comprotected by custom OTP CAPTCHA. - Spotify impersonation:
Sρօtifսsender with link hidden behindredirects.ca.
These samples originate from Unit 42 research (July 2025) and illustrate how homograph abuse is combined with URL redirection and CAPTCHA evasion to bypass automated analysis.
References
- The Homograph Illusion: Not Everything Is As It Seems
- Unicode Character Database
- dnstwist – domain permutation engine
Tip
Learn & practice AWS Hacking:
HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking:HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking:HackTricks Training Azure Red Team Expert (AzRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.
HackTricks

