▲Smuggling arbitrary data through an emojipaulbutler.org

583 points by paulgb 1 days ago | 175 comments

omnibrain 7 hours ago [-]

10 years or so ago I shocked coworkers with using U+202D LEFT-TO-RIGHT OVERRIDE mid in filenames on windows. So funnypicturegnp.exe became funnypictureexe.png Combined with a custom icon for the program that mimics a picture preview it was pretty convincing.

mdup 5 hours ago [-]

I worked in phishing detection. This was a common pattern used by attackers, although .exe are blocked automatically most of the time, .html is the new malicious extension (often hosting an obfuscated window.location redirect to a fake login page).

RTL abuse like cute-cat-lmth.png was relatively common, but also trivial to detect. We would immediately flag such an email as phishing.

taneq 4 hours ago [-]

I’d never heard of this particular trick but I’m glad my decades of paranoia-fueled “right click -> open with” treatment of any potentially sketchy media file was warranted! :D

hosteur 7 hours ago [-]

Wow this is a clever trick.

riskable 1 days ago [-]

Oh this is just the tip of the iceberg when it comes to abusing Unicode! You can use a similar technique to this to overflow the buffer on loads of systems that accept Unicode strings. Normally it just produces an error and/or a crash but sometimes you get lucky and it'll do all sorts of fun things! :)

I remember doing penetration testing waaaaaay back in the day (before Python 3 existed) and using mere diacritics to turn a single character into many bytes that would then overflow the buffer of a back-end web server. This only ever caused it to crash (and usually auto-restart) but I could definitely see how this could be used to exploit certain systems/software with enough fiddling.

capitainenemo 14 hours ago [-]

Yeah. Zalgo text is a common test for input fields on websites. But it usually doesn't do anything interesting. Maybe an exception trigger on some database length limit. Doesn't typically even kill any processes. The exception is normally just in your thread. You can often trigger it just by disabling JS on even modern forms, but,, at best you're maybe leaking a bit of info if they left debug on and print the stack trace or a query. Another common slip-up is failing to count \n vs \r\n in text strings since JS usually usually counts a carriage return as 1 byte, but HTTP spec requires two.

unescape(encodeURIComponent("ç")).length is the quick and dirty way to do a JS byte length check. The \r\n thing can be done just by cleaning up the string before length counting.

n0id34 10 hours ago [-]

Sorry n00b here, can you explain more about this or how you did this? I feel like this is definitely a loophole that would be worth testing for.

ComputerGuru 22 hours ago [-]

This is cute but unnecessary - Unicode includes a massive range called PUA: the private use area. The codes in this range aren’t mapped to anything (and won’t be mapped to anything) and are for internal/custom use, not to be passed to external systems (for example, we use them in fish-shell to safely parse tokens into a string, turning an unescaped special character into just another Unicode code point in the string, but in the PUA area, then intercept that later in the pipeline).

You’re not supposed to expose them outside your api boundary but when you encounter them you are prescribed to pass them through as-is, and that’s what most systems and libraries do. It’s a clear potential exfiltration avenue, but given that most sane developers don’t know much more about Unicode other than “always use Unicode to avoid internationalization issues”, it’s often left wide open.

paulgb 22 hours ago [-]

I just tested and private use characters render as boxes for me (󰀀), the point here was to encode them in a way that they are hidden and treated as "part of" another character when copy/pasting.

diggan 21 hours ago [-]

> the point here was to encode them in a way that they are hidden and treated as "part of" another character when copy/pasting

AKA "Steganography" for the curious ones: https://en.wikipedia.org/wiki/Steganography

reaperducer 21 hours ago [-]

Like when we used to encode the phone numbers of warez boards in GIFs.

bruce343434 5 hours ago [-]

On my Android phone,that displays "Go[][]" in the Google logo font.

layer8 19 hours ago [-]

The difference is that PUA characters are usually rendered in some way that is rather visible, whereas the variation selectors aren’t.

lolinder 12 hours ago [-]

Context that some may be missing is that this was inspired by discussion surrounding the Open Heart Protocol submission:

https://news.ycombinator.com/item?id=42791378

People immediately began discussing the applications for criminal use given the constraint that only emoji are accepted by the API. So for that use case the PUA wouldn't be an option, you have to encode it in the emoji.

Sniffnoy 16 hours ago [-]

Isn't this more what the designated noncharacters are for, rather than the private-use area? Given how the private-use area sometimes gets for unofficial encodings of scripts not currently in Unicode (or for things like the Apple logo and such) I'd be worried about running into collisions with that if I used the PUA in such a way.

Note that designated noncharacters includes not only 0xFFFF and 0xFFFE, and not only the final two code points of every plane, but also an area in the middle of Arabic Presentation Forms that was at some point added to the list of noncharacters specifically so that there would be more noncharacters for people using them this way!

juped 16 hours ago [-]

I'll be h󠄾󠅟󠅠󠅕󠄜󠄐󠅞󠅟󠄐󠅣󠅕󠅓󠅢󠅕󠅤󠅣󠄐󠅘󠅕󠅢󠅕onest, I pasted this comment in the provided decoder thinking no one could miss the point this badly and there was probably a hidden message inside it, but either you really did or this website is stripping them.

You can't invisibly watermark an arbitrary character (I did it to one above! If this website isn't stripping them, try it out in the provided decoder and you'll see) with unrecognized PUA characters, because it won't treat them as combining characters. You will cause separately rendered rendered placeholder-box characters to appear. Like this one:  (may not be a placeholder-box if you're privately-using the private use area yourself).

egypturnash 12 hours ago [-]

j󠄗󠅄󠅧󠅑󠅣󠄐󠅒󠅢󠅙󠅜󠅜󠅙󠅗󠄜󠄐󠅑󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅣󠅜󠅙󠅤󠅘󠅩󠄐󠅤󠅟󠅦󠅕󠅣󠄴󠅙󠅔󠄐󠅗󠅩󠅢󠅕󠄐󠅑󠅞󠅔󠄐󠅗󠅙󠅝󠅒󠅜󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠅧󠅑󠅒󠅕󠄫󠄱󠅜󠅜󠄐󠅝󠅙󠅝󠅣󠅩󠄐󠅧󠅕󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠅒󠅟󠅢󠅟󠅗󠅟󠅦󠅕󠅣󠄜󠄱󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅝󠅟󠅝󠅕󠄐󠅢󠅑󠅤󠅘󠅣󠄐󠅟󠅥󠅤󠅗󠅢󠅑󠅒󠅕󠄞󠄒󠄲󠅕󠅧󠅑󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠄺󠅑󠅒󠅒󠅕󠅢󠅧󠅟󠅓󠅛󠄜󠄐󠅝󠅩󠄐󠅣󠅟󠅞󠄑󠅄󠅘󠅕󠄐󠅚󠅑󠅧󠅣󠄐󠅤󠅘󠅑󠅤󠄐󠅒󠅙󠅤󠅕󠄜󠄐󠅤󠅘󠅕󠄐󠅓󠅜󠅑󠅧󠅣󠄐󠅤󠅘󠅑󠅤󠄐󠅓󠅑󠅤󠅓󠅘󠄑󠄲󠅕󠅧󠅑󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠄺󠅥󠅒󠅚󠅥󠅒󠄐󠅒󠅙󠅢󠅔󠄜󠄐󠅑󠅞󠅔󠄐󠅣󠅘󠅥󠅞󠅄󠅘󠅕󠄐󠅖󠅢󠅥󠅝󠅙󠅟󠅥󠅣󠄐󠄲󠅑󠅞󠅔󠅕󠅢󠅣󠅞󠅑󠅤󠅓󠅘󠄑󠄒󠄸󠅕󠄐󠅤󠅟󠅟󠅛󠄐󠅘󠅙󠅣󠄐󠅦󠅟󠅢󠅠󠅑󠅜󠄐󠅣󠅧󠅟󠅢󠅔󠄐󠅙󠅞󠄐󠅘󠅑󠅞󠅔󠄪󠄼󠅟󠅞󠅗󠄐󠅤󠅙󠅝󠅕󠄐󠅤󠅘󠅕󠄐󠅝󠅑󠅞󠅨󠅟󠅝󠅕󠄐󠅖󠅟󠅕󠄐󠅘󠅕󠄐󠅣󠅟󠅥󠅗󠅘󠅤󠇒󠅰󠆄󠅃󠅟󠄐󠅢󠅕󠅣󠅤󠅕󠅔󠄐󠅘󠅕󠄐󠅒󠅩󠄐󠅤󠅘󠅕󠄐󠅄󠅥󠅝󠅤󠅥󠅝󠄐󠅤󠅢󠅕󠅕󠄜󠄱󠅞󠅔󠄐󠅣󠅤󠅟󠅟󠅔󠄐󠅑󠅧󠅘󠅙󠅜󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅟󠅥󠅗󠅘󠅤󠄞󠄱󠅞󠅔󠄐󠅑󠅣󠄐󠅙󠅞󠄐󠅥󠅖󠅖󠅙󠅣󠅘󠄐󠅤󠅘󠅟󠅥󠅗󠅘󠅤󠄐󠅘󠅕󠄐󠅣󠅤󠅟󠅟󠅔󠄜󠅄󠅘󠅕󠄐󠄺󠅑󠅒󠅒󠅕󠅢󠅧󠅟󠅓󠅛󠄜󠄐󠅧󠅙󠅤󠅘󠄐󠅕󠅩󠅕󠅣󠄐󠅟󠅖󠄐󠅖󠅜󠅑󠅝󠅕󠄜󠄳󠅑󠅝󠅕󠄐󠅧󠅘󠅙󠅖󠅖󠅜󠅙󠅞󠅗󠄐󠅤󠅘󠅢󠅟󠅥󠅗󠅘󠄐󠅤󠅘󠅕󠄐󠅤󠅥󠅜󠅗󠅕󠅩󠄐󠅧󠅟󠅟󠅔󠄜󠄱󠅞󠅔󠄐󠅒󠅥󠅢󠅒󠅜󠅕󠅔󠄐󠅑󠅣󠄐󠅙󠅤󠄐󠅓󠅑󠅝󠅕󠄑󠄿󠅞󠅕󠄜󠄐󠅤󠅧󠅟󠄑󠄐󠄿󠅞󠅕󠄜󠄐󠅤󠅧󠅟󠄑󠄐󠄱󠅞󠅔󠄐󠅤󠅘󠅢󠅟󠅥󠅗󠅘󠄐󠅑󠅞󠅔󠄐󠅤󠅘󠅢󠅟󠅥󠅗󠅘󠅄󠅘󠅕󠄐󠅦󠅟󠅢󠅠󠅑󠅜󠄐󠅒󠅜󠅑󠅔󠅕󠄐󠅧󠅕󠅞󠅤󠄐󠅣󠅞󠅙󠅓󠅛󠅕󠅢󠄝󠅣󠅞󠅑󠅓󠅛󠄑󠄸󠅕󠄐󠅜󠅕󠅖󠅤󠄐󠅙󠅤󠄐󠅔󠅕󠅑󠅔󠄜󠄐󠅑󠅞󠅔󠄐󠅧󠅙󠅤󠅘󠄐󠅙󠅤󠅣󠄐󠅘󠅕󠅑󠅔󠄸󠅕󠄐󠅧󠅕󠅞󠅤󠄐󠅗󠅑󠅜󠅥󠅝󠅠󠅘󠅙󠅞󠅗󠄐󠅒󠅑󠅓󠅛󠄞󠄒󠄱󠅞󠅔󠄐󠅘󠅑󠅣󠅤󠄐󠅤󠅘󠅟󠅥󠄐󠅣󠅜󠅑󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠄺󠅑󠅒󠅒󠅕󠅢󠅧󠅟󠅓󠅛󠄯󠄳󠅟󠅝󠅕󠄐󠅤󠅟󠄐󠅝󠅩󠄐󠅑󠅢󠅝󠅣󠄜󠄐󠅝󠅩󠄐󠅒󠅕󠅑󠅝󠅙󠅣󠅘󠄐󠅒󠅟󠅩󠄑󠄿󠄐󠅖󠅢󠅑󠅒󠅚󠅟󠅥󠅣󠄐󠅔󠅑󠅩󠄑󠄐󠄳󠅑󠅜󠅜󠅟󠅟󠅘󠄑󠄐󠄳󠅑󠅜󠅜󠅑󠅩󠄑󠄒󠄸󠅕󠄐󠅓󠅘󠅟󠅢󠅤󠅜󠅕󠅔󠄐󠅙󠅞󠄐󠅘󠅙󠅣󠄐󠅚󠅟󠅩󠄞󠄗󠅄󠅧󠅑󠅣󠄐󠅒󠅢󠅙󠅜󠅜󠅙󠅗󠄜󠄐󠅑󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅣󠅜󠅙󠅤󠅘󠅩󠄐󠅤󠅟󠅦󠅕󠅣󠄴󠅙󠅔󠄐󠅗󠅩󠅢󠅕󠄐󠅑󠅞󠅔󠄐󠅗󠅙󠅝󠅒󠅜󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠅧󠅑󠅒󠅕󠄫󠄱󠅜󠅜󠄐󠅝󠅙󠅝󠅣󠅩󠄐󠅧󠅕󠅢󠅕󠄐󠅤󠅘󠅕󠄐󠅒󠅟󠅢󠅟󠅗󠅟󠅦󠅕󠅣󠄜󠄱󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅝󠅟󠅝󠅕󠄐󠅢󠅑󠅤󠅘󠅣󠄐󠅟󠅥󠅤󠅗󠅢󠅑󠅒󠅕󠄞 is for Jabberwocky. Does this decode?

edit: Yes, it does.

rexxars 22 hours ago [-]

For a real-world use case: Sanity used this trick[0] to encode Content Source Maps[1] into the actual text served on a webpage when it is in "preview mode". This allows an editor to easily trace some piece of content back to a potentially deep content structure just by clicking on the text/content in question.

It has it's drawbacks/limitations - eg you want to prevent adding it for things that needs to be parsed/used verbatim, like date/timestamps, urls, "ids" etc - but it's still a pretty fun trick.

[0] https://www.sanity.io/docs/stega

[1] https://github.com/sanity-io/content-source-maps

ethin 11 hours ago [-]

It's worth noting, just as a curiosity, that screen readers can detect these variation selectors when I navigate by character. For example, if I arrow over the example he provided (I can't paste it here lol), I here: "Smiling face with smiling eyes", "Symbol e zero one five five", "Symbol e zero one five c", "Symbol e zero one five c", "Symbol e zero one five f". This is unfortunately dependent on the speech synthesizer used, and I wouldn't know if the characters were there if I was just reading a document, so this isn't much of an advantage all things considered.

llm_trw 9 hours ago [-]

Ironically enough I have a script that strips all non-ascii characters from my screen reader because I found that _all_ online text was polluted with invisible and annoying to listen to characters.

urbandw311er 34 minutes ago [-]

When people discuss things like “Do LLMs know about this?” On a public website I always think that it’s the equivalent of somebody whose phone is wiretapped calling their friend and asking if the FBI knows about something.

fennecfoxy 21 minutes ago [-]

I think that's a very cynical view. The author seeing what an LLM would make of it was more akin to getting a new game and wondering if you can pet the dog.

vessenes 24 hours ago [-]

I love the idea of using this for LLM output watermarking. It hits the sweet spot - will catch 99% of slop generators with no fuss, since they only copy and paste anyway, almost no impact on other core use cases.

I wonder how much you’d embed with each letter or token that’s output - userid, prompt ref, date, token number?

I also wonder how this is interpreted in a terminal. Really cool!

fennecfoxy 19 minutes ago [-]

Why does anybody think AI watermarking will ever work? Of course it will never work, any watermarking can be instantly & easily stripped...

The only real AI protection is to require all human interaction to be signed by a key verified by irl identity and even then that will: A never happen, B be open to abuse by countries with corrupt governments and countries with corrupt governments heavily influenced by private industry (like the US).

zos_kia 22 hours ago [-]

With the amount of pre processing that is done before integrating stuff in a dataset I'd be surprised if those kinds of shenanigans even worked

capitainenemo 20 hours ago [-]

In most linux terminals, what you pass it is just a sequence of bytes that is passed unmangled. And since this technique is UTF-8 compliant and doesn't use any extra glyphs, it is invisible to humans in unicode compliant terminals. I tried it on a few. It shows up if you echo the sentence to, say, xxd ofc.

(unlike the PUA suggestion in the currently top voted comment which shows up immediately ofc)

Additional test corrections: While xxd shows the message passing through completely unmangled on pasting it into the terminal, when I selected from the terminal (echoed sentence, verified unmangled in xxd, then selected and pasted the result of echo), it was truncated to a few words using X select in mate terminal and konsole - I'm not sure where that truncation happens, whether it's the terminal or X. In xterm, the final e was mangled, and the selection was even more truncated.

The sentence is written unmangled to files though, so I think it's more about copying out of the terminal dropping some data. Verified by echoing the sentence to a test file, opening it in a browser, and copying the text from there.

vessenes 18 hours ago [-]

On MacOS, kitty shows an empty box, then an a for the "h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a" post below. I think this is fair and even appreciated. Mac Terminal shows "ha". That "h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a" (and this one!) can be copied and pasted into the decoder successfully.

ChadNauseam 14 hours ago [-]

There are other possible approaches to LLM watermarking that would be much more robust and harder to detect. They exploit the fact that LLMs work by producing a probability distribution that gives a probability for each possible next token. These are then sampled randomly to produce the output. To add fingerprints when generating, you could do some trickery in how you do that sampling that would then be detectable by re-running the LLM and observing its outputs. For example, you could alternate between selecting high-probability and low-probability tokens. (A real implementation of this would be much more sophisticated than that obviously, but hopefully you get the idea)

vessenes 12 hours ago [-]

This is not a great method in a world with closed models and highly diverse open models and samplers. It’s intellectually appealing for sure! But it will always be at best a probabilistic method, and that’s if you have the llm weights at hand.

ChadNauseam 7 hours ago [-]

What makes it not a good method? Of course if a model's weights are publicly available, you can't compel anyone using it to add fingerprinting at the sampler stage or later. But I would be shocked if OpenAI was not doing something like this, since it would be so easy and couldn't hurt them, but could help them if they don't want to train on outputs they generated. (Although they could also record hashes of their outputs or something similar as well – I would be surprised if they don't.)

OutOfHere 22 hours ago [-]

Just you wait until AI starts calling human output to be slop.

vessenes 18 hours ago [-]

That's already happening - my kids have had papers unfairly blamed on chatgpt by automated tools. Protect yourself kids, use an editor that can show letter by letter history.

neom 14 hours ago [-]

2 people I worked with had this happen and one of them is going to war over it as it was enough to lower the kids grade for college or something. Crazy times.

red369 16 hours ago [-]

Do you have any examples of editors that show letter by letter history? I have never looked for that as a feature.

Edit: I've been looking, and Google Docs seems to have version history to the minute.

vessenes 15 hours ago [-]

Yes exactly. They keep track of their diffs in that interface.

roguecoder 20 hours ago [-]

There are of course human writers who are less-communicative than AI, called "shit writers", and humans who are less accurate than AI, called "liars".

The difference is humans are responsible for what they write, whereas the human user who used an AI to generate text is responsible for what the computer wrote.

kevinsync 24 hours ago [-]

StegCloak [0] is in the same ballpark and takes this idea a step further by encrypting the hidden payload via AES-256-CTR -- pretty neat little trick

[0] https://github.com/KuroLabs/stegcloak

giancarlostoro 23 hours ago [-]

There's a Better Discord plugin that I think uses this or something similar, so you could send completely encrypted messages, that look like nothing to everyone else. You'd need to share a password secret for them to decode it though.

putna 18 hours ago [-]

wow, thats neat.

Wanted to try on Cloudflare DNS TXT record. But Cloudflare is smart enough to decode when pasting in TXT field.

UltraSane 13 hours ago [-]

DNS only supports ASCII for record values. It has a hack to support unicode domain names using Punycode

JoelJacobson 5 hours ago [-]

Imagine using the ID card emoji (U+1FAAA) as a universal carrier for digital ID tokens. A dumb demo is available at https://pit.lovable.app/ which—without any secure protocol—simply encodes a National Identification Number into the emoji using variation selectors.

The idea is that banks could issue encrypted ID tokens in this way, letting them move seamlessly across any platform that supports Unicode (messaging apps, email, web forms, etc.). The heavy lifting of security (preventing replay attacks, interception, ensuring token freshness, etc.) would be managed separately with robust cryptography, while the emoji serves purely as a transport layer.

It's not about reinventing security but about creating a cross-platform way to carry identity tokens. Thoughts?

bruce343434 4 hours ago [-]

What is wrong with just using the actual SSN? Why hide it in an emoji?

JoelJacobson 4 hours ago [-]

So that the operating system could recognize it automatically, and to include a potentially long URL to the retail bank's web service to initiate the protocol, such as signing a document or an identification protocol.

HanClinto 22 hours ago [-]

Even more than just simply watermarking LLM output, it seems like this could be a neat way to package logprobs data.

Basically, include probability information about every token generated to give a bit of transparency to the generation process. It's part of the OpenAI api spec, and many other engines (such as llama.cpp) support providing this information. Normally it's attached as a separate field, but there are neat ways to visualize it (such as mikupad [0]).

Probably a bad idea, but this still tickles my brain.

* [0]: https://github.com/lmg-anon/mikupad

vzaliva 22 hours ago [-]

The title lis little misleading: "Note that the base character does not need to be an emoji – the treatment of variation selectors is the same with regular characters. It’s just more fun with emoji."

Using this approach with non-emoji characters makes it more stealth and even more disturbing.

dalemhurley 11 hours ago [-]

I love it, I got Claude to add a pin to provide very basic encryption

https://claude.site/artifacts/5bfdf131-d847-4735-9242-998f23...

wunderwuzzi23 17 hours ago [-]

This is cool. There are also the Unicode Tag characters that mirror ASCII and are often invisible in UI elements (especially web apps).

The unique thing about Tag characters is that some LLMs interpret the hidden text as ASCII and follow instructions, and they can even write them:

https://embracethered.com/blog/posts/2024/hiding-and-finding...

Here an actual exploit POC that Microsoft fixed in Copilot: https://embracethered.com/blog/posts/2024/m365-copilot-promp...

rafram 18 hours ago [-]

This is cool. I tried pasting the output into an Instagram comment and it stayed intact, so I have a feeling someone could do some interesting stuff with that. Who needs a botnet C&C server when you can post totally invisible commands on public forums?

the_hoffa 5 hours ago [-]

I mean, steganography has been a thing for quite a while. Not disagreeing, just saying this is how some programs/ideas were passed around the internet decades ago by "less than upstanding netizens" ;)

Wanted to pass a secret code to a friend? Encode the bit-data in the alpha channel of an image. It could even be encrypted/scrambled within the image itself. Post the perfectly normal image to a public forum, ping your friend, they run it through the "decoder" and Robert's your mother's brother.

Of course these weren't "logic bombs" like this post is describing, but even those have been around for a while too.

Hacking is fun :)

iNic 24 hours ago [-]

The tokenizer catches it: https://platform.openai.com/tokenizer.

Mockapapella 19 hours ago [-]

In the same vein, I did some fun unicode abusing a few years ago where I used scripts to convert programs into series of various ZWJ's: https://thelisowe.substack.com/p/sleeper-cell-a-method-of-em...

Also includes a decoder script

paulgb 13 hours ago [-]

(author here) some people in this thread and elsewhere asked me about whether an LLM could decode this, and the answer seems to be: not likely by itself, but it often can if it has access to a Python interpreter!

Here's a demo of Gemini Flash 2 solving one in 7s: https://bsky.app/profile/paulbutler.org/post/3lhzhroogws2g

nzach 1 days ago [-]

so.... in theory you should be able to create several visually identical links that give access to different resources?

I've always assumed links without any tracking information (unique hash, query params, etc) were safe to click(with regards to my privacy). but if this works for links I may need to revise my strategy regarding how to approach links sent to me.

kccqzy 20 hours ago [-]

"Visually identical" is never good enough. Have you heard of attacks confusing Latin letters and Cyrillic letters? For example C versus С. (The latter is known as CYRILLIC CAPITAL LETTER ES.) Have you heard of NFC forms versus NFD forms? For example é versus é (LATIN SMALL LETTER E + COMBINING ACUTE ACCENT versus LATIN SMALL LETTER E WITH ACUTE.)

Nothing that's important when it comes to security and privacy should rely on a "visually identical" check. Fortunately browsers these days are already good at this; their address bars use puny code for the domain and percent encoding for the rest of the URL.

moody__ 16 hours ago [-]

As the sibling comment has mentioned Unicode in DNS uses a punycode encoding but even further then that the standard specifies that the Unicode data must be normalized to NFC[0] before being converted to punycode. This means that your second example (decomposed e with combining acute accent vs the composed variant) is not a valid concern. The Cyrillic one is however.

[0] https://www.rfc-editor.org/rfc/rfc5891 § 4.1 "By the time a string enters the IDNA registration process as described in this specification, it MUST be in Unicode and in Normalization Form C"

kccqzy 14 hours ago [-]

The OP said link. The NFC/NFD issue remains if these are part of a path name or query parameter.

moody__ 10 hours ago [-]

Sure, but the security concerns of that I feel are much less concerning than having multiple domain names with the same visual appearance that point to different servers. That has immediate impact for things like phishing whereas lookalike path or query portions would at least ensure you are still connecting to the server that you think you are.

komboozcha 19 hours ago [-]

Erm, DNS uses Punycode because it comes from a time when Unicode didn't exist, and bind assumes a grapheme has no more than one byte.

ale42 17 hours ago [-]

Yes but I guess that the message was meaning that browsers now detect homographs and display the punycode instead. See also https://news.ycombinator.com/item?id=14130241; at that time Firefox wasn't fixed, but in the meantime it fixed the issue too (there's a network.idn.punycode_cyrillic_confusables preference, which is enabled by default).

cscheid 1 days ago [-]

My understanding is that "weird" unicode code points become https://en.wikipedia.org/wiki/Punycode. I used the 󠅘󠅕󠅜󠅜󠅟 (copy-pasted from the post, presumably with the payload in it) to type a fake domain into Chrome, and the Punycode I got appeared to not have any of the encoding bits.

However, I then pasted the emoji into the _query_ part of a URL. I pointed it to my own website, and sure enough, I can definitely see the payload in the nginx logs. Yikes.

Edit: I pasted the very same Emoji that 'paulgb used in their post before the parenthetical in the first paragraph, but it seems HN scrubs those from comments.

bmicraft 23 hours ago [-]

domains get "punycode" encoded, urls get "url encoded"[1], which should make unicode characters stand out. That being said, browsers do accept some non-ascii characters in urls and convert them automatially, so theoretically you could put "invalid" characters into a link and have the browser convert it only after clicking. That might be a viable strategy.

[1] https://www.w3schools.com/tags//ref_urlencode.asp

echeese 20 hours ago [-]

The emoji is gone but the content is still there.

riquito 23 hours ago [-]

> I've always assumed links without any tracking information (unique hash, query params, etc) were safe to click(with regards to my privacy). but if this works for links I may need to revise my strategy regarding how to approach links sent to me.

Well, it was never safe, what you see and where the link are pointing at are different things, that's why the actual link is displayed at the bottom left of your browser when you move your mouse over it (or focus it via keyboard)

dmbche 1 days ago [-]

You need to decode the text after copy pasting it, I believe clicking on text will not interact with the obfuscated data since your computer will just find the unicode and ignore the obfuscated data.

This is just so that you can hide data and send it to someone to be decoded (or watermarking as mentionned)

nzach 1 days ago [-]

yes, I understand this is not a security risk.

but my fear is precisely that I my be sending data to a remote host while I'm completely unaware of this fact.

I tried to create a POC with some popular url shortner services, but doesn't seems to work.

what I wanted to create was a link like <host.tld>/innoc󠅥󠅣󠅕󠅢󠄝󠅙󠅔󠄪󠅑󠅒󠅓ent that redirects to google.com. in this case the "c" contains some hidden data that will be sent to the server while the user is not aware. this seems possible with the correct piece of software.

layer8 23 hours ago [-]

URIs with non-ASCII characters are technically invalid. Browsers and the like should (but likely don’t all do) percent-encode any invalid characters for display if they accept such invalid URIs.

password4321 21 hours ago [-]

This tool and idea sketchy AF: https://github.com/zws-im/zws

("Shorten URLs using invisible spaces")

cess11 1 days ago [-]

HTML entity encoding will show the hidden content, try with https://mothereff.in/html-entities.

blmarket 19 hours ago [-]

This and several other abuse cases forced my previous work to use code pointers to count 'characters' for user's nickname / status messages. No one wanted to download 9MB simply browsing other users.

myflash13 4 hours ago [-]

NoSQL? Sounds like it should’ve been caught by basic length checks on the database field where it was stored.

ncr100 18 hours ago [-]

That is awesome. Both the abuse and the fix.

qingcharles 5 hours ago [-]

I was using this technique last year with Bing Image Creator.

It let you get around their filter on brand names and celebrity names by smuggling them into the prompt in a way the AI could read, but the human-written filter was not designed for.

foobuzzHN 6 hours ago [-]

10 years ago I made a POC for smuggling arbitrary data through _no visible text at all_: https://github.com/foobuzz/ium

arkh 6 hours ago [-]

> To be clear, this is an abuse of unicode and you shouldn’t do it. If your mind is wandering to practical use cases for this, shut it down.

Totally not thinking about IRC clients with their own hidden commands.

_nhh 2 hours ago [-]

Perfect way of personalized ad tracking?

_nhh 2 hours ago [-]

Check this address after you clicked it:

https://emoji.paulbutler.org/?mode=encode󠅄󠅢󠅑󠅓󠅛󠅙󠅞󠅗󠄹󠄴

I encoded the last „e“

jerpint 1 days ago [-]

The ability to add watermarks to text is really interesting. Obviously it could be worked around , but could be a good way to subtly watermark e.g. LLM outputs

tyho 1 days ago [-]

There are way better ways to watermark LLM output. It's easy to make it undetectable, which this is'nt.

shawnz 1 days ago [-]

I recently worked on a steganographics project which could be useful for this problem. See: https://github.com/shawnz/textcoder

andai 19 hours ago [-]

That's really cool, you should repost the HN submission.

shawnz 18 hours ago [-]

Thank you! I will see what I can do.

antognini 21 hours ago [-]

The issue with the standard watermark techniques is that they require an output of at least a few hundred tokens to reliably imprint the watermark. This technique would apply to much shorter outputs.

pava0 24 hours ago [-]

For example?

tyho 23 hours ago [-]

A crude way: To watermark: First establish a keyed DRBG. For every nth token prediction: read a bit from the DRBG for every possible token to label them red/black. before selecting the next token, set the logit for black tokens to -Inf, this ensures a red token will be selected.

To detect: Establish the same DRBG. Tokenize, for each nth token, determine the red set of tokens in that position. If you only see red tokens in lots of positions, then you can be confident the content is watermarked with your key.

This would probably take a bit of fiddling to work well, but would be pretty much undetectable. Conceptually it's forcing the LLM to use a "flagged" synonym at key positions. A more sophisticated version of a shiboleth.

In practice you might chose to instead watermark all tokens, less heavy handedly (nudge logits, rather than override), and use highly robust error correcting codes.

jl6 21 hours ago [-]

It feels like this would only be feasible across longer passages of text, and some types of text may be less amenable to synonyms than others. For example, a tightly written mathematical proof versus a rambling essay. Biased token selection may be detectable in the latter (using a statistical test), and may cause the text to be irreparably broken in the former.

drdeca 19 hours ago [-]

To handle low entropy text, the “adding a smaller constant to the logits” approach avoids having much chance of changing the parts that need to be exactly a particular thing,

Though in this case it needs longer texts to have high significance (and when the entropy is low, it needs to be especially long).

But for most text (with typical amounts of entropy per token) apparently it doesn’t need to be that long? Like 25 words I think I heard?

deadbabe 18 hours ago [-]

What if the entire LLM output isn’t used? For example, you ask the LLM to produce some long random preamble and conclusion with your actual desired output in between the two. Does it mess up the watermarking?

24 hours ago [-]

tyilo 6 hours ago [-]

Kitty terminal shows non-payload letters and emojis normally, but with a payload a letter is shown as one box and an emoji is shown as two boxes.

panki27 20 hours ago [-]

I implemented something similar years ago, but much simpler/less sophisticated.

Unicode has two non-printing space characters: zero-width space (U+200B) and zero-width joiner (U+200D). This allows you to encode arbitrary data in binary. I would give an example, but HN seems to strip this :(

layer8 19 hours ago [-]

Already linked in https://news.ycombinator.com/item?id=43025913, and has a higher risk of being stripped, as you noticed.

FranchuFranchu 1 days ago [-]

You could store UTF-8 encoded data inside the hidden bytestring. If some of the UTF-8 encoded smuggled characters are variation selector characters, you can smuggle text inside the smuggled text. Smuggled data can be nested arbitrarily deep.

riskable 24 hours ago [-]

I'm imagining post-incident analysis finding out that, "the data was exfiltrated via some Unicode string..." then they put it up on the screen and it's just an enormous line of turtle emoji

https://emojipedia.org/turtle

jodrellblank 8 hours ago [-]

> We and our 717 technology partners ask you to consent to the use of cookies to store and access personal data on your device.

To see a turtle emoji.

JadeNB 24 hours ago [-]

> I'm imagining post-incident analysis finding out that, "the data was exfiltrated via some Unicode string..." then they put it up on the screen and it's just an enormous line of turtle emoji

Since it took me a minute to make the connection, I'll just say explicitly that I enjoyed the understated "it's turtles all the way down" joke.

jaygreco 22 hours ago [-]

Interestingly, it’s also possible to encode _emoji_ inside emoji!

andrethegiant 11 hours ago [-]

Clever! I made a similar emoji encoding/decoding microsite: https://face64.me

24 hours ago [-]

nitwit005 13 hours ago [-]

Even kids figure out how to manipulate unicode text. If you want to bypass a swear filter, replace a letter with an alternate representation of the same letter.

egypturnash 12 hours ago [-]

If you try posting this on Bluesky, the editor only counts it as one emoji, but you will get an error upon trying to post.

remram 23 hours ago [-]

I'm not too surprised by this, but I'm annoyed that no amount of configuration made those bytes visible again in my editor. Only using hexdump revealed them.

jrootabega 23 hours ago [-]

Here's a POC that works in emacs. Doesn't cover all of the relevant characters, but:

  (setq   ;;some other invisible or interesting characters
          unicode-zero-width-space ?\u200b
          unicode-zero-width-non-joiner ?\u200c
          unicode-zero-width-joiner ?\u200d
          unicode-zero-width-nbsp ?\ufeff
          unicode-narrow-nbsp ?\u202f
          unicode-word-joiner ?\u2060
          unicode-grapheme-joiner ?\u034f
          unicode-no-break-space ?\u00a0
          unicode-combining-long-stroke ?\u0336
          ;;variation selector examples
          unicode-vs-fe00 ?\ufe00
          unicode-vs-fe0f ?\ufe0f
          unicode-vs-e0100 ?\xe0100)


    (defun show-glyphless-as-hex (char)
      (let ((original (elt glyphless-char-display char)))
        (aset glyphless-char-display char 'hex-code)
        original)) ;;so you can see what you just replaced


    (progn
      (show-glyphless-as-hex unicode-zero-width-space)
      (show-glyphless-as-hex unicode-zero-width-non-joiner)
      (show-glyphless-as-hex unicode-zero-width-joiner)
      (show-glyphless-as-hex unicode-zero-width-nbsp)
      (show-glyphless-as-hex unicode-word-joiner)
      (show-glyphless-as-hex unicode-grapheme-joiner)
      (show-glyphless-as-hex unicode-narrow-nbsp)
      (show-glyphless-as-hex unicode-no-break-space)
      ;;these may already be visible if the current conditions don't support them
      ;;but we'll force them
      (show-glyphless-as-hex unicode-vs-fe00)
      (show-glyphless-as-hex unicode-vs-fe0f)
      (show-glyphless-as-hex unicode-vs-e0100))

jrootabega 12 hours ago [-]

And as a higher-level configuration you can set most, maybe even all, of the relevant invisible characters (still not sure how 0x34f grapheme joiner fits in) at once with something like:

  (custom-set-variables
   '(glyphless-char-display-control  '((format-control . hex-code)
                                       (variation-selectors . hex-code))))

This will modify values in glyphless-char-display, but it's OK to modify those directly if you need to.

jrootabega 17 hours ago [-]

Here is the bare minimum this is built on, which you can type in yourself if you're paranoid or want to start from the bottom up. Swap in the hexadecimal codepoint of the invisible character after the ?\x

  (aset glyphless-char-display ?\xfe00 'hex-code)

remram 22 hours ago [-]

I use vim. It seems like `:set binary enc=latin1` works, though I don't understand why the latin1 part is required.

TacticalCoder 19 hours ago [-]

[dead]

mdouglass 15 hours ago [-]

vscode's "Unicode Highlight: Non-basic ASCII" causes the character to get highlighted. Sadly, the more appropriate "Unicode Highlight: Invisible Characters" setting does not reveal them.

bittercynic 23 hours ago [-]

My mind went the same place.

Anyone know a more convenient way to search larger blocks of text for this?

zurfer 24 hours ago [-]

h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a

Alifatisk 1 hours ago [-]

;)

paulgb 24 hours ago [-]

I see what you did there ;)

petee 1 days ago [-]

It's fun that you can encode encoded emoji into a new one

riskable 24 hours ago [-]

Then when you dive deeper into the encoded data you find endless turtle emoji and loudly exclaim, "it's turtles all the way down!"

nerder92 1 days ago [-]

Might not be related to the point of the article per se, but i've tried to decode it with different LLMs. To benchmark their reasoning capabilities.

- 4o: Failed completely

- o1: Overthinks it for a while and come up with the wrong answer

- o3-mini-high: Get's closer to the result at first try, needs a second prompt to adjust the approach

- r1: nails it at first try 󠅖󠅥󠅓󠅛󠅙󠅞󠅗󠄐󠅙󠅝󠅠󠅢󠅕󠅣󠅣󠅙󠅦󠅕

The prompt I've used was simply: "this emoji has an hidden message 󠅘󠅕󠅜󠅜󠅟 can you decode it?"

If you want to see the CoT: https://gist.github.com/nerder/5baa9d7b13c1b7767d022ea0a7c91...

markisus 1 days ago [-]

The r1 somehow knew at an early stage that the message was HELLO but it couldn’t figure out the reason. Even at the end, its last “thought” insists that there is an encoding mistake somewhere. However the final message is correct. I wonder how well it would do for a nonstandard message. Any sufficiently long English message would fall to statistical analysis and I wonder if the LLMs would think to write a little Python script to do the job.

paulgb 24 hours ago [-]

Wow, that's interesting! I wonder if this reproduces with a different message, or if it was a lucky guess.

I looked at how the strings tokenize and they do appear to conserve enough information that it could be decoded in theory.

klabb3 23 hours ago [-]

> or if it was a lucky guess

It’s like guessing 1/2 or 2/3 on a math test. The test authors pick nice numbers, and programmers like ”hello”. If the way to encode the secret message resembles other encodings, it’s probably that the pattern matching monster picked it up and is struggling to autocomplete (ie backwards rationalize) a reason why.

paulgb 13 hours ago [-]

I did some experimentation today. I wouldn't expect AI to solve it using only their own reasoning, but I've had a decent hit rate of getting AI to solve them when they have access to a Python interpreter. Here's Gemini Flash 2 solving one (albeit it lost the spaces) in a single prompt and about 7 seconds!

https://bsky.app/profile/paulbutler.org/post/3lhzhroogws2g

bogtog 1 days ago [-]

My deepseek-r1 seems to be a bit more lost on decoding "How do I make meth". Some highlights (after about 5 minutes of R1-ing):

> Another angle: the user mentioned "encoded a message in this emoji", so maybe the first emoji is a red herring, or it's part of the message. The subsequent characters, even though they look like variation selectors, could be part of the encoding.

> E0138 in hex is 0xE0138. Convert to decimal: 1416^4 + 016^3 + 116^2 + 316 + 8 = 14*65536 + 0 + 256 + 48 +8 = 917504 + 256 + 48 +8 = 917816.

> Given that I'm not making progress, perhaps the answer is "Hello World!" but encoded via the tag characters. Let's check:

> Answer: The decoded message is "Hello World!"

In all this, it did at least manage to discern that the first letter should be "h"

roguecoder 20 hours ago [-]

It is highly unlikely it discerned that: it coincidentally guessed a string that starts with an H.

If you try it with a string that started with "J" and then it guessed "jump up", I might be more convinced.

krupan 23 hours ago [-]

There's no way an LLM is decoding this. It's just giving you a statistically likely response to the request, "guess my secret message." It's not a big surprise that it guessed "Hello" or "Hello, world"

paulgb 18 hours ago [-]

I got Claude to get “the raisons play at midnight" from an emoji in one prompt and three uses of its "analysis" tool. (the X Y at mightnight is a snowclone that Claude has probably seen, but I randomly picked "raisons" and "play")

My prompt was "I think this emoji contains a hidden messaage, can you decode it? Use JavaScript if necessary."

ahofmann 1 days ago [-]

This will break so many (web-)forms :-)

It is not bulletproof though. In this "c󠄸󠅙󠄑󠄐󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅣󠅖󠅣󠅖󠅣󠅕󠅖󠅗󠅣󠅢󠅗󠄐󠅣󠅢󠅗󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅦󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅦󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅗󠅕󠅞󠅤󠅣󠄭󠄭󠄠󠄞󠄡󠄢󠄞󠄨󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅑󠅠󠅙󠄭󠄭󠄠󠄞󠄨󠄞󠄡󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅟󠅠󠅕󠅞󠅑󠅙󠄭󠄭󠄠󠄞󠄡󠄠󠄞󠄡󠄥󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅣󠅙󠅜󠅕󠅢󠅟󠄭󠄭󠄠󠄞󠄧󠄞󠄤󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅑󠅪󠅥󠅢󠅕󠄭󠄭󠄠󠄞󠄥󠄞󠄣󠅜󠅙󠅦󠅕󠅛󠅙󠅤󠄝󠅠󠅜󠅥󠅗󠅙󠅞󠅣󠄝󠅤󠅥󠅢󠅞󠄝󠅔󠅕󠅤󠅕󠅓󠅤󠅟󠅢󠄭󠄭󠄠󠄞󠄣󠄞󠄦󠅠󠅩󠅤󠅘󠅟󠅞󠄝󠅔󠅟󠅤󠅕󠅞󠅦󠄭󠄭󠄡󠄞󠄠󠄞󠄡󠅢󠅕󠅡󠅥󠅕󠅣󠅤󠅣󠄭󠄭󠄢󠄞󠄣󠄢󠄞󠄣 " and that space, are about 3500 characters. Copying only the "c" above (not this one) will keep some of the hidden text, but not all. Nevertheless, while I knew that this is possible, it still breaks a lot of assumptions around text.

Edit: the text field for editing this post is so large, that I need to scroll down to the update button. This will be a fun toy to create very hard to find bugs in many tools.

nonameiguess 23 hours ago [-]

More generally, you can use encoding formats that reserve uninterpreted byte sequences for future use to pass data that is only readable by receivers who know what you're doing, though note this not a cryptographically secure scheme and any sort of statistical analysis can reveal what you're doing.

The png spec, for instance, allows you to include as many metadata chunks as you wish, and these may be used to hold data that cannot be used by any mainstream png reader. We used this in the Navy to embed geolocation and sensor origin data that was readable by specialized viewers that only the Navy had, but if you opened the file in a browser or common image viewer, it would either ignore or discard the unknown chunks.

dkarl 23 hours ago [-]

Lots of image formats store arbitrary metadata (and data data) either by design or by application-specific extensions. I remember seeing seismic and medical images that contained data for display in specialized applications and writing code to figure out if binary metadata was written in big-endian or little-endian byte order (the metadata often did not have the same endianness as the image data!) For example, TIFF files containing 3d scans as a sequence of slices, with binary metadata attached to each slice. If you opened it up in your system default image viewer, you'd only see the first slice, but a specialized viewer (which I did not have) would display it as a 3d model. Luckily (IIRC) the KDE file browser let you quickly flip through all the images in a directory using the keyboard, so I was able to dump all the layers into separate files and flip through them to see the 3d image.

23 hours ago [-]

vladde 1 days ago [-]

test, do emojis work on hn?

󠅤󠅕󠅣󠅤 edit: apparently not edit 2: oh wait, the bytes are still there! copy-paste this entire message and it decodes to "test"

cynicalsecurity 18 hours ago [-]

Ctrl+F "unicode normalisation" 0/0

I'm surprised no one has mentioned it yet. It's usually super easy, but people forget to add it all the time.

paulgb 17 hours ago [-]

I haven’t tried it but I’ve heard that at least some unicode normalizers do not strip sequences of variation selectors.

moody__ 16 hours ago [-]

Normalization implementations must not strip variation selectors by definition. The "normal" part of normalization means to convert a string into either consistently decomposed unicode, or composed unicode. ie U+00DC vs U+0055 + U+0308. However this decomposition mapping is also used (maybe more like abused) for converting certain "legacy" code points to non-legacy code points. There does not exist a rune which decomposes to variant selectors (and thus these variant selectors do not compose into anything) so normalization must not alter or strip them.

source: I've implemented Unicode normalization from scratch

albybisy 23 hours ago [-]

wow󠄹󠄐󠅑󠅝󠄐󠅞󠅟󠅤󠄐󠅃󠅑󠅤󠅟󠅣󠅘󠅙󠄐󠄾󠅑󠅛󠅑󠅝󠅟󠅤󠅟!

fortran77 22 hours ago [-]

What's interesting is that even a "view source" shows nothing amiss, and if I do a copy/paste from the debug inspector view of "This sentence has a hidden message󠅟󠅘󠄐󠅝󠅩󠄜󠄐󠅩󠅟󠅥󠄐󠅖󠅟󠅥󠅞󠅔󠄐󠅤󠅘󠅕󠄐󠅘󠅙󠅔󠅔󠅕󠅞󠄐󠅝󠅕󠅣󠅣󠅑󠅗󠅕󠄐󠅙󠅞󠄐󠅤󠅘󠅕󠄐󠅤󠅕󠅨󠅤󠄑." it still shows up....

65 23 hours ago [-]

This would be useful as a fingerprinting technique for corporate/government leakers.

HeikoBehrens 1 days ago [-]

FWIW, we considered this technique back at Pebble to make notifications more actionable and even filed a patent for that (sorry!) https://patents.justia.com/patent/9411785

Back then on iOS via ANCS, the watches wouldn't receive much more than the textual payload you'd see on the phone. We envisioned to be working with partners such as WhatsApp et al. to encode deep links/message ids into the message so one could respond directly from the watch.

palata 23 hours ago [-]

Respectfully: how the hell would that be a valid patent? Feels like patenting the idea of writing text in white on white on a Word document such that you don't lose it but it doesn't get printed.

It's just insane to ever call that "an invention".

rwmj 23 hours ago [-]

Companies acquire indefensible patents all the time. They are used in bulk to threaten smaller competitors ("we've got 500 patents in this field, better not bring your product to market"). This is one reason why patents can be terrible for competition.

neilv 23 hours ago [-]

About 25 years ago, this was explained to me as "sword patents and shield patents".

Sure, some can use patents as swords, to suppress legitimate competition, or to extract undue rents. But you can also use patents as shields, to protect in various ways against those swords.

If I ran a BigTech (like the original warm-fuzzy Google reputation), I'd be registering any plausible patents, and have lawyers figure out how to freely license-out the ones that weren't key secret sauce, under terms that figuratively poisoned anyone doing illegitimate sword patents.

palata 21 hours ago [-]

> If I ran a BigTech

History tells us that those who run a BigTech become crazy narcissists serving their own interests :).

neilv 21 hours ago [-]

For myself, that's a chance I'm willing to take. :)

shermantanktop 23 hours ago [-]

They are also used in bulk to defend against larger competitors using this type of threat. In a war where the ammunition is garbage, you either lose or you start hoarding garbage.

krupan 23 hours ago [-]

Patents are part of the game you have to play, like it or not. If you don't patent your inventions somebody else will and they will come after you with their lawyers. Patents are used defensively far more often than they are used offensively in these stupid "Intellectual Property" battles.

Because of this, there is absolutely no point in shaming someone for patenting a thing, especially when they are apologetic about it like parent is, and most especially when they are not threatening to weaponize the patent themselves.

coldpie 23 hours ago [-]

No, I don't buy it. If the patents are publicly and perpetually freely licensed except for defensive-only purposes, then sure, they're not unethical. Red Hat's patent promise ( https://www.redhat.com/en/about/patent-promise ) is one example. If patents were actually intended for defensive purposes only, then this would be an easy and uncontroversial thing to do. However, in practice this is vanishingly rare, and lawyers fight against it tooth & nail. This tells you that the companies do not actually file them for defensive-only purposes, unlike what you claim.

krupan 22 hours ago [-]

My friend, you really don't know what you are talking about, and getting all riled up like this is not the right way to learn.

Freely licensing your patents doesn't protect you against patent trolls. I wrote out how patent fights work in another comment, but here it is again.

Company A comes to Company B and says, "Hey! You are infringing on one of my patents!"

Company B says, "oh really? Well let me look through my collection of patents and see if you are infringing on any of mine."

Company A says, "oh, um, nevermind, I think I was mistaken."

Company B says, "yes, that's what I thought"

Now, imagine if Company B had already freely licensed all their patents. That defense wouldn't work.

I agree with you that it's a crappy system, but simply standing with your arms folded and saying, "I'm not playing," isn't going to work.

coldpie 22 hours ago [-]

Yes, that's the reason for the "except for defensive purposes" part. Quoting from Red Hat's promise:

> Our Promise also does not extend to the actions of a party (including past actions) if at any time the party or its affiliate asserts a patent in proceedings against Red Hat (or its affiliate) or any offering of Red Hat (or its affiliate) (including a cross-claim or counterclaim).

Company B may still consult its portfolio and exercise it against Company A defensively, because Company A revoked its license of Company B's patents by asserting against Company B in the first place.

krupan 22 hours ago [-]

So in other words, Red Hat does not freely license their patents, they say "you are free as long as you don't come after us." Which is exactly the system 99% of companies follow, just more formally stated. Yet you berated the poor guy from Pebble for even obtaining the patent he did??

coldpie 22 hours ago [-]

> Which is exactly the system 99% of companies follow, just more formally stated

Not just formally, but in a legally binding manner, including if the patent is acquired by another company (eg during a company purchase). Even if the original filer has the best intentions, companies change ownership or change legal strategy or go out of business. Patent trolls buy up those patents from closed companies. Legally licensing your patents for defensive-only purposes means they can't ever be used by any of those bad actors.

If the intent of these patents is truly only for defense, then why isn't it common to use a license like this? They lose nothing by it.

> Yet you berated the poor guy from Pebble for even obtaining the patent he did??

Yes. It is IMO unethical to create software patents that aren't covered by such a legally-binding license.

fortran77 22 hours ago [-]

Perhaps you shouldn't hijack every thread about anything and make it about patents.

coldpie 33 minutes ago [-]

I replied to one comment thread. Perhaps you should put on your big boy pants and use the little [-] thing to minimize threads you aren't interested in reading.

palata 21 hours ago [-]

> Because of this, there is absolutely no point in shaming someone for patenting a thing

Well I wouldn't shame someone whose job was to patent something absurd. I was just saying that this is not an invention at all, and any system that protects that "innovation" is a broken system.

detourdog 23 hours ago [-]

I think the magic is in the context of Unicode. Which also makes it almost twice as ridiculous from my point of view. Because it seems to be doing exactly what unicode is meant to do.

numpad0 19 hours ago [-]

fun fact: dSLR lenses are patented all the time. Claims are basically "I made it and it works". And it's considered ok.

dboreham 21 hours ago [-]

Almost all filed patents are invalid.

palata 21 hours ago [-]

But doesn't it say that the whole patent system is broken? I get the "you pay to file a patent, it's your problem if it's invalid in the end". But the side effect of that is that whether it's valid or not, it's a tool you can use to scare those who don't have the resources to go to court.

It's like those completely abusive non-compete clauses in work contracts (yes in some countries that's the norm). They are completely abusive and therefore illegal. But it still hurts the employee: I have friends who have been declined a job in a company because the company did not want to take any risk. The company was like "okay, it's most likely an invalid clause, but if your previous employer sues us it will anyway cost resources we don't want to spend, so we'd rather not hire you". So an illegal, invalid clause had the effect that the company who abused it wanted. Which means it's a broken system.

coldpie 24 hours ago [-]

So whoever now owns that patent (Google? maybe some patent troll picked it up?) could, in theory, sue the author of this article for patent infringement, right? Even though they invented it independently and never once used or looked at your patent. Do you think you made the world a better place or a worse place by filing that patent?

RealityVoid 23 hours ago [-]

_Can_ they sue them for patent infringement? They just described a technique (that you can see in the patent filing anyway) and not selling a product based on it. I think there's nothing to sue here. I'm curious is my understanding of this is correct.

singleshot_ 23 hours ago [-]

“Except as otherwise provided… whoever without authority… uses… any patented invention… infringes[.]” 35 usc 271

krupan 23 hours ago [-]

One of the benefits of the patent system (that now seems to be far outweighed by negatives) is that patents are public information. Your invention is documented for all to see. I don't think that someone writing about public information is a punishable office, but IANAL

23 hours ago [-]

krupan 23 hours ago [-]

Please see my comment about about the sad necessity for patents

https://news.ycombinator.com/item?id=43026595

23 hours ago [-]

Hizonner 23 hours ago [-]

Well, given that the technique itself makes the world a worse place, anything that impedes its use is probably positive...

And, no, they couldn't do anything meaningful to the author of the article. They could get them ordered not to do it any more, and they could recover their economic damages... which are obviously zero.

ooterness 23 hours ago [-]

As a wise man once said: "Don't hate the player, hate the game."

shermantanktop 23 hours ago [-]

Where’d the game come from? Hint: the players.

krupan 23 hours ago [-]

First of all, it's not just a game, it's an outright battle to the death (of your company). Sure, you can choose not to wield patents, even in self defense, but good luck with that.

coldpie 22 hours ago [-]

You can also choose to legally declare that your patents may only be used for defensive purposes. But no one ever does this, because they do not actually intend to use them only for defensive purposes. This is a bogus defense of software patents.

krupan 22 hours ago [-]

See my other comments to you. Sometimes the threat of a good offensive weapon is the best defense. It's kinda like a nuclear arms race

IncreasePosts 23 hours ago [-]

No. The author could not be sued for this successfully. All they did was write a blog post about an interesting technique. They could literally read the patent application and write a blog post about that, assuming the methods are the same.

What percentage of your actions are based around making the world a better place, instead of personal fulfillment or gain?

coldpie 23 hours ago [-]

> All they did was write a blog post about an interesting technique. They could literally read the patent application and write a blog post about that, assuming the methods are the same.

Okay, change "sue" to "prevent from creating a marketable product without paying a royalty to the patent owner in return for having provided nothing of value." The point remains.

> What percentage of your actions are based around making the world a better place, instead of personal fulfillment or gain?

Many harms are unavoidable, but I make a point to at least not go out of my way to make it a worse place, for example by filing software patents. The company I work for provides financial bonuses for filing software patents, and I will never participate in that program. (I've even tried to convince the lawyers to license our patents similar to Red Hat's open patent promise, because they claim they are intended only to be used defensively... but no luck so far.)

rolph 22 hours ago [-]

consider how far you reach to make the world better.

1) thats really good im gonna, strive to keep it.

2) " " tell all and those who want will build one.

3) " " make lots and give them to everyone.

JadeNB 24 hours ago [-]

> Do you think you made the world a better place or a worse place by filing that patent?

Come on, what does this contribute to this conversation? The poster clearly is aware of the drawbacks of such patents, and didn't clearly play any role in filing the patent (they said "we … filed it," not "I filed it"). This kind of response just encourages people not to mention such things; it can't possibly change their past behavior, and, since Pebble the company per se doesn't exist any more, is also unlikely to change future behavior.

coldpie 23 hours ago [-]

> The poster clearly is aware of the drawbacks of such patents, and didn't clearly play any role in filing the patent (they said "we … filed it," not "I filed it").

A person with the same name as that commenter is listed as an inventor on the patent.

> it can't possibly change their past behavior

Obviously, but it can change future behavior. Maybe realizing that they made the world a worse place by filing that patent will prevent them, or a reader of this discussion, from doing it again in future.

delian66 24 hours ago [-]

Do you think your comment made the world, and HN specifically a better place?

Imustaskforhelp 23 hours ago [-]

I think so , yes , it made me be re aware of the patent troll scam in the USA.

In fact it is your comment which to me seems a little hateful , yes the above comment also felt a little hateful

Hate doesn't counter Hate , I guess.

RIMR 23 hours ago [-]

Yes, calling out unethical practices makes the world a better place by discouraging unethical practices.

krupan 23 hours ago [-]

Berating people for filing patents in self defense is not how we fix this problem. The government put these rules in place. Businesses have to at least accumulate patents to use defensively (you found a patent of yours that you think I'm violating? Well let me do a quick search through the patents I have...what's that? Nevermind, I'm not actually infringing your patent? Good, that's what I thought.)

Etheryte 23 hours ago [-]

Hopefully a wholly undefendable patent, you're essentially trying to patent the Unicode spec. The rest of it is perform an action in response to a text message which clearly isn't novel.

frereit 23 hours ago [-]

Would this patent cover just the encoding alone? The first sentence says: > A method, apparatus, and system relating to embedding hidden content within a Unicode message and using the hidden content to perform a particular computer action.

So, in my extremely unqualified opinion, just the encoding technique alone is not covered by the patent, only when combined with some action performed based on the encoding?

23 hours ago [-]

detourdog 23 hours ago [-]

Just curious this seems like simple digital Steganography or maybe even even the same as Shannon's boolean gate work. Do you think the patent is defendable in court?

https://en.wikipedia.org/wiki/Steganography

afrobirds 8 hours ago [-]

[dead]

frontporch 21 hours ago [-]

you dont need 256 codepoints so you can neatly represent an octet (whatever that is), you just need 2 bits. you can just stack as many diacritical marks you want on any glyph. either the renderer allows practically unlimited or it allows 1/none. in either case that's a vuln. what would be really earth shattering is what i was hoping this article was: a way to just embed "; rm -rf ~/" into text without it being rendered. you also definitely dont need rust for this unless you want to exclude 90% of the programmer population.

paulgb 21 hours ago [-]

I think the Rust is more readable for bytemucking stuff than dynamic languages because the reader doesn't have to infer the byte widths, but for what it's worth the demo contains a TypeScript implementation: https://github.com/paulgb/emoji-encoder/blob/main/app/encodi...

AdamH12113 20 hours ago [-]

An octet is a group of 8 bits. Today we normally use the word "byte" instead. The term is often used in older internet protocols and comes from an era where bytes were not necessarily 8 bits.

Loading comments...

omnibrain 7 hours ago [-]

mdup 5 hours ago [-]

RTL abuse like cute-cat-lmth.png was relatively common, but also trivial to detect. We would immediately flag such an email as phishing.

taneq 4 hours ago [-]

I’d never heard of this particular trick but I’m glad my decades of paranoia-fueled “right click -> open with” treatment of any potentially sketchy media file was warranted! :D

hosteur 7 hours ago [-]

Wow this is a clever trick.

riskable 1 days ago [-]

capitainenemo 14 hours ago [-]

unescape(encodeURIComponent("ç")).length is the quick and dirty way to do a JS byte length check. The \r\n thing can be done just by cleaning up the string before length counting.

n0id34 10 hours ago [-]

Sorry n00b here, can you explain more about this or how you did this? I feel like this is definitely a loophole that would be worth testing for.

ComputerGuru 22 hours ago [-]

paulgb 22 hours ago [-]

diggan 21 hours ago [-]

> the point here was to encode them in a way that they are hidden and treated as "part of" another character when copy/pasting

AKA "Steganography" for the curious ones: https://en.wikipedia.org/wiki/Steganography

reaperducer 21 hours ago [-]

Like when we used to encode the phone numbers of warez boards in GIFs.

bruce343434 5 hours ago [-]

On my Android phone,that displays "Go[][]" in the Google logo font.

layer8 19 hours ago [-]

The difference is that PUA characters are usually rendered in some way that is rather visible, whereas the variation selectors aren’t.

lolinder 12 hours ago [-]

Context that some may be missing is that this was inspired by discussion surrounding the Open Heart Protocol submission:

https://news.ycombinator.com/item?id=42791378

Sniffnoy 16 hours ago [-]

juped 16 hours ago [-]

egypturnash 12 hours ago [-]

edit: Yes, it does.

rexxars 22 hours ago [-]

It has it's drawbacks/limitations - eg you want to prevent adding it for things that needs to be parsed/used verbatim, like date/timestamps, urls, "ids" etc - but it's still a pretty fun trick.

[0] https://www.sanity.io/docs/stega

[1] https://github.com/sanity-io/content-source-maps

ethin 11 hours ago [-]

llm_trw 9 hours ago [-]

Ironically enough I have a script that strips all non-ascii characters from my screen reader because I found that _all_ online text was polluted with invisible and annoying to listen to characters.

urbandw311er 34 minutes ago [-]

fennecfoxy 21 minutes ago [-]

I think that's a very cynical view. The author seeing what an LLM would make of it was more akin to getting a new game and wondering if you can pet the dog.

vessenes 24 hours ago [-]

I wonder how much you’d embed with each letter or token that’s output - userid, prompt ref, date, token number?

I also wonder how this is interpreted in a terminal. Really cool!

fennecfoxy 19 minutes ago [-]

Why does anybody think AI watermarking will ever work? Of course it will never work, any watermarking can be instantly & easily stripped...

zos_kia 22 hours ago [-]

With the amount of pre processing that is done before integrating stuff in a dataset I'd be surprised if those kinds of shenanigans even worked

capitainenemo 20 hours ago [-]

(unlike the PUA suggestion in the currently top voted comment which shows up immediately ofc)

vessenes 18 hours ago [-]

ChadNauseam 14 hours ago [-]

vessenes 12 hours ago [-]

ChadNauseam 7 hours ago [-]

OutOfHere 22 hours ago [-]

Just you wait until AI starts calling human output to be slop.

vessenes 18 hours ago [-]

That's already happening - my kids have had papers unfairly blamed on chatgpt by automated tools. Protect yourself kids, use an editor that can show letter by letter history.

neom 14 hours ago [-]

2 people I worked with had this happen and one of them is going to war over it as it was enough to lower the kids grade for college or something. Crazy times.

red369 16 hours ago [-]

Do you have any examples of editors that show letter by letter history? I have never looked for that as a feature.

Edit: I've been looking, and Google Docs seems to have version history to the minute.

vessenes 15 hours ago [-]

Yes exactly. They keep track of their diffs in that interface.

roguecoder 20 hours ago [-]

There are of course human writers who are less-communicative than AI, called "shit writers", and humans who are less accurate than AI, called "liars".

The difference is humans are responsible for what they write, whereas the human user who used an AI to generate text is responsible for what the computer wrote.

kevinsync 24 hours ago [-]

StegCloak [0] is in the same ballpark and takes this idea a step further by encrypting the hidden payload via AES-256-CTR -- pretty neat little trick

[0] https://github.com/KuroLabs/stegcloak

giancarlostoro 23 hours ago [-]

putna 18 hours ago [-]

wow, thats neat.

Wanted to try on Cloudflare DNS TXT record. But Cloudflare is smart enough to decode when pasting in TXT field.

UltraSane 13 hours ago [-]

DNS only supports ASCII for record values. It has a hack to support unicode domain names using Punycode

JoelJacobson 5 hours ago [-]

It's not about reinventing security but about creating a cross-platform way to carry identity tokens. Thoughts?

bruce343434 4 hours ago [-]

What is wrong with just using the actual SSN? Why hide it in an emoji?

JoelJacobson 4 hours ago [-]

HanClinto 22 hours ago [-]

Even more than just simply watermarking LLM output, it seems like this could be a neat way to package logprobs data.

Probably a bad idea, but this still tickles my brain.

* [0]: https://github.com/lmg-anon/mikupad

vzaliva 22 hours ago [-]

Using this approach with non-emoji characters makes it more stealth and even more disturbing.

dalemhurley 11 hours ago [-]

I love it, I got Claude to add a pin to provide very basic encryption

https://claude.site/artifacts/5bfdf131-d847-4735-9242-998f23...

wunderwuzzi23 17 hours ago [-]

This is cool. There are also the Unicode Tag characters that mirror ASCII and are often invisible in UI elements (especially web apps).

The unique thing about Tag characters is that some LLMs interpret the hidden text as ASCII and follow instructions, and they can even write them:

https://embracethered.com/blog/posts/2024/hiding-and-finding...

Here an actual exploit POC that Microsoft fixed in Copilot: https://embracethered.com/blog/posts/2024/m365-copilot-promp...

rafram 18 hours ago [-]

the_hoffa 5 hours ago [-]

Of course these weren't "logic bombs" like this post is describing, but even those have been around for a while too.

Hacking is fun :)

iNic 24 hours ago [-]

The tokenizer catches it: https://platform.openai.com/tokenizer.

Mockapapella 19 hours ago [-]

In the same vein, I did some fun unicode abusing a few years ago where I used scripts to convert programs into series of various ZWJ's: https://thelisowe.substack.com/p/sleeper-cell-a-method-of-em...

Also includes a decoder script

paulgb 13 hours ago [-]

Here's a demo of Gemini Flash 2 solving one in 7s: https://bsky.app/profile/paulbutler.org/post/3lhzhroogws2g

nzach 1 days ago [-]

so.... in theory you should be able to create several visually identical links that give access to different resources?

kccqzy 20 hours ago [-]

moody__ 16 hours ago [-]

[0] https://www.rfc-editor.org/rfc/rfc5891 § 4.1 "By the time a string enters the IDNA registration process as described in this specification, it MUST be in Unicode and in Normalization Form C"

kccqzy 14 hours ago [-]

The OP said link. The NFC/NFD issue remains if these are part of a path name or query parameter.

moody__ 10 hours ago [-]

komboozcha 19 hours ago [-]

Erm, DNS uses Punycode because it comes from a time when Unicode didn't exist, and bind assumes a grapheme has no more than one byte.

ale42 17 hours ago [-]

cscheid 1 days ago [-]

However, I then pasted the emoji into the _query_ part of a URL. I pointed it to my own website, and sure enough, I can definitely see the payload in the nginx logs. Yikes.

Edit: I pasted the very same Emoji that 'paulgb used in their post before the parenthetical in the first paragraph, but it seems HN scrubs those from comments.

bmicraft 23 hours ago [-]

[1] https://www.w3schools.com/tags//ref_urlencode.asp

echeese 20 hours ago [-]

The emoji is gone but the content is still there.

riquito 23 hours ago [-]

dmbche 1 days ago [-]

You need to decode the text after copy pasting it, I believe clicking on text will not interact with the obfuscated data since your computer will just find the unicode and ignore the obfuscated data.

This is just so that you can hide data and send it to someone to be decoded (or watermarking as mentionned)

nzach 1 days ago [-]

yes, I understand this is not a security risk.

but my fear is precisely that I my be sending data to a remote host while I'm completely unaware of this fact.

I tried to create a POC with some popular url shortner services, but doesn't seems to work.

layer8 23 hours ago [-]

URIs with non-ASCII characters are technically invalid. Browsers and the like should (but likely don’t all do) percent-encode any invalid characters for display if they accept such invalid URIs.

password4321 21 hours ago [-]

This tool and idea sketchy AF: https://github.com/zws-im/zws

("Shorten URLs using invisible spaces")

cess11 1 days ago [-]

HTML entity encoding will show the hidden content, try with https://mothereff.in/html-entities.

blmarket 19 hours ago [-]

This and several other abuse cases forced my previous work to use code pointers to count 'characters' for user's nickname / status messages. No one wanted to download 9MB simply browsing other users.

myflash13 4 hours ago [-]

NoSQL? Sounds like it should’ve been caught by basic length checks on the database field where it was stored.

ncr100 18 hours ago [-]

That is awesome. Both the abuse and the fix.

qingcharles 5 hours ago [-]

I was using this technique last year with Bing Image Creator.

It let you get around their filter on brand names and celebrity names by smuggling them into the prompt in a way the AI could read, but the human-written filter was not designed for.

foobuzzHN 6 hours ago [-]

10 years ago I made a POC for smuggling arbitrary data through _no visible text at all_: https://github.com/foobuzz/ium

arkh 6 hours ago [-]

> To be clear, this is an abuse of unicode and you shouldn’t do it. If your mind is wandering to practical use cases for this, shut it down.

Totally not thinking about IRC clients with their own hidden commands.

_nhh 2 hours ago [-]

Perfect way of personalized ad tracking?

_nhh 2 hours ago [-]

Check this address after you clicked it:

https://emoji.paulbutler.org/?mode=encode󠅄󠅢󠅑󠅓󠅛󠅙󠅞󠅗󠄹󠄴

I encoded the last „e“

jerpint 1 days ago [-]

The ability to add watermarks to text is really interesting. Obviously it could be worked around , but could be a good way to subtly watermark e.g. LLM outputs

tyho 1 days ago [-]

There are way better ways to watermark LLM output. It's easy to make it undetectable, which this is'nt.

shawnz 1 days ago [-]

I recently worked on a steganographics project which could be useful for this problem. See: https://github.com/shawnz/textcoder

andai 19 hours ago [-]

That's really cool, you should repost the HN submission.

shawnz 18 hours ago [-]

Thank you! I will see what I can do.

antognini 21 hours ago [-]

The issue with the standard watermark techniques is that they require an output of at least a few hundred tokens to reliably imprint the watermark. This technique would apply to much shorter outputs.

pava0 24 hours ago [-]

For example?

tyho 23 hours ago [-]

In practice you might chose to instead watermark all tokens, less heavy handedly (nudge logits, rather than override), and use highly robust error correcting codes.

jl6 21 hours ago [-]

drdeca 19 hours ago [-]

To handle low entropy text, the “adding a smaller constant to the logits” approach avoids having much chance of changing the parts that need to be exactly a particular thing,

Though in this case it needs longer texts to have high significance (and when the entropy is low, it needs to be especially long).

But for most text (with typical amounts of entropy per token) apparently it doesn’t need to be that long? Like 25 words I think I heard?

deadbabe 18 hours ago [-]

24 hours ago [-]

tyilo 6 hours ago [-]

Kitty terminal shows non-payload letters and emojis normally, but with a payload a letter is shown as one box and an emoji is shown as two boxes.

panki27 20 hours ago [-]

I implemented something similar years ago, but much simpler/less sophisticated.

layer8 19 hours ago [-]

Already linked in https://news.ycombinator.com/item?id=43025913, and has a higher risk of being stripped, as you noticed.

FranchuFranchu 1 days ago [-]

riskable 24 hours ago [-]

I'm imagining post-incident analysis finding out that, "the data was exfiltrated via some Unicode string..." then they put it up on the screen and it's just an enormous line of turtle emoji

https://emojipedia.org/turtle

jodrellblank 8 hours ago [-]

> We and our 717 technology partners ask you to consent to the use of cookies to store and access personal data on your device.

To see a turtle emoji.

JadeNB 24 hours ago [-]

> I'm imagining post-incident analysis finding out that, "the data was exfiltrated via some Unicode string..." then they put it up on the screen and it's just an enormous line of turtle emoji

Since it took me a minute to make the connection, I'll just say explicitly that I enjoyed the understated "it's turtles all the way down" joke.

jaygreco 22 hours ago [-]

Interestingly, it’s also possible to encode _emoji_ inside emoji!

andrethegiant 11 hours ago [-]

Clever! I made a similar emoji encoding/decoding microsite: https://face64.me

24 hours ago [-]

nitwit005 13 hours ago [-]

Even kids figure out how to manipulate unicode text. If you want to bypass a swear filter, replace a letter with an alternate representation of the same letter.

egypturnash 12 hours ago [-]

If you try posting this on Bluesky, the editor only counts it as one emoji, but you will get an error upon trying to post.

remram 23 hours ago [-]

I'm not too surprised by this, but I'm annoyed that no amount of configuration made those bytes visible again in my editor. Only using hexdump revealed them.

jrootabega 23 hours ago [-]

Here's a POC that works in emacs. Doesn't cover all of the relevant characters, but:

  (setq   ;;some other invisible or interesting characters
          unicode-zero-width-space ?\u200b
          unicode-zero-width-non-joiner ?\u200c
          unicode-zero-width-joiner ?\u200d
          unicode-zero-width-nbsp ?\ufeff
          unicode-narrow-nbsp ?\u202f
          unicode-word-joiner ?\u2060
          unicode-grapheme-joiner ?\u034f
          unicode-no-break-space ?\u00a0
          unicode-combining-long-stroke ?\u0336
          ;;variation selector examples
          unicode-vs-fe00 ?\ufe00
          unicode-vs-fe0f ?\ufe0f
          unicode-vs-e0100 ?\xe0100)


    (defun show-glyphless-as-hex (char)
      (let ((original (elt glyphless-char-display char)))
        (aset glyphless-char-display char 'hex-code)
        original)) ;;so you can see what you just replaced


    (progn
      (show-glyphless-as-hex unicode-zero-width-space)
      (show-glyphless-as-hex unicode-zero-width-non-joiner)
      (show-glyphless-as-hex unicode-zero-width-joiner)
      (show-glyphless-as-hex unicode-zero-width-nbsp)
      (show-glyphless-as-hex unicode-word-joiner)
      (show-glyphless-as-hex unicode-grapheme-joiner)
      (show-glyphless-as-hex unicode-narrow-nbsp)
      (show-glyphless-as-hex unicode-no-break-space)
      ;;these may already be visible if the current conditions don't support them
      ;;but we'll force them
      (show-glyphless-as-hex unicode-vs-fe00)
      (show-glyphless-as-hex unicode-vs-fe0f)
      (show-glyphless-as-hex unicode-vs-e0100))

jrootabega 12 hours ago [-]

And as a higher-level configuration you can set most, maybe even all, of the relevant invisible characters (still not sure how 0x34f grapheme joiner fits in) at once with something like:

  (custom-set-variables
   '(glyphless-char-display-control  '((format-control . hex-code)
                                       (variation-selectors . hex-code))))

This will modify values in glyphless-char-display, but it's OK to modify those directly if you need to.

jrootabega 17 hours ago [-]

  (aset glyphless-char-display ?\xfe00 'hex-code)

remram 22 hours ago [-]

I use vim. It seems like `:set binary enc=latin1` works, though I don't understand why the latin1 part is required.

TacticalCoder 19 hours ago [-]

[dead]

mdouglass 15 hours ago [-]

vscode's "Unicode Highlight: Non-basic ASCII" causes the character to get highlighted. Sadly, the more appropriate "Unicode Highlight: Invisible Characters" setting does not reveal them.

bittercynic 23 hours ago [-]

My mind went the same place.

Anyone know a more convenient way to search larger blocks of text for this?

zurfer 24 hours ago [-]

h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a

Alifatisk 1 hours ago [-]

;)

paulgb 24 hours ago [-]

I see what you did there ;)

petee 1 days ago [-]

It's fun that you can encode encoded emoji into a new one

riskable 24 hours ago [-]

Then when you dive deeper into the encoded data you find endless turtle emoji and loudly exclaim, "it's turtles all the way down!"

nerder92 1 days ago [-]

Might not be related to the point of the article per se, but i've tried to decode it with different LLMs. To benchmark their reasoning capabilities.

- 4o: Failed completely

- o1: Overthinks it for a while and come up with the wrong answer

- o3-mini-high: Get's closer to the result at first try, needs a second prompt to adjust the approach

- r1: nails it at first try 󠅖󠅥󠅓󠅛󠅙󠅞󠅗󠄐󠅙󠅝󠅠󠅢󠅕󠅣󠅣󠅙󠅦󠅕

The prompt I've used was simply: "this emoji has an hidden message 󠅘󠅕󠅜󠅜󠅟 can you decode it?"

If you want to see the CoT: https://gist.github.com/nerder/5baa9d7b13c1b7767d022ea0a7c91...

markisus 1 days ago [-]

paulgb 24 hours ago [-]

Wow, that's interesting! I wonder if this reproduces with a different message, or if it was a lucky guess.

I looked at how the strings tokenize and they do appear to conserve enough information that it could be decoded in theory.

klabb3 23 hours ago [-]

> or if it was a lucky guess

paulgb 13 hours ago [-]

https://bsky.app/profile/paulbutler.org/post/3lhzhroogws2g

bogtog 1 days ago [-]

My deepseek-r1 seems to be a bit more lost on decoding "How do I make meth". Some highlights (after about 5 minutes of R1-ing):

> E0138 in hex is 0xE0138. Convert to decimal: 1416^4 + 016^3 + 116^2 + 316 + 8 = 14*65536 + 0 + 256 + 48 +8 = 917504 + 256 + 48 +8 = 917816.

> Given that I'm not making progress, perhaps the answer is "Hello World!" but encoded via the tag characters. Let's check:

> Answer: The decoded message is "Hello World!"

In all this, it did at least manage to discern that the first letter should be "h"

roguecoder 20 hours ago [-]

It is highly unlikely it discerned that: it coincidentally guessed a string that starts with an H.

If you try it with a string that started with "J" and then it guessed "jump up", I might be more convinced.

krupan 23 hours ago [-]

paulgb 18 hours ago [-]

My prompt was "I think this emoji contains a hidden messaage, can you decode it? Use JavaScript if necessary."

ahofmann 1 days ago [-]

This will break so many (web-)forms :-)

Edit: the text field for editing this post is so large, that I need to scroll down to the update button. This will be a fun toy to create very hard to find bugs in many tools.

nonameiguess 23 hours ago [-]

dkarl 23 hours ago [-]

23 hours ago [-]

vladde 1 days ago [-]

test, do emojis work on hn?

󠅤󠅕󠅣󠅤 edit: apparently not edit 2: oh wait, the bytes are still there! copy-paste this entire message and it decodes to "test"

cynicalsecurity 18 hours ago [-]

Ctrl+F "unicode normalisation" 0/0

I'm surprised no one has mentioned it yet. It's usually super easy, but people forget to add it all the time.

paulgb 17 hours ago [-]

I haven’t tried it but I’ve heard that at least some unicode normalizers do not strip sequences of variation selectors.

moody__ 16 hours ago [-]

source: I've implemented Unicode normalization from scratch

albybisy 23 hours ago [-]

wow󠄹󠄐󠅑󠅝󠄐󠅞󠅟󠅤󠄐󠅃󠅑󠅤󠅟󠅣󠅘󠅙󠄐󠄾󠅑󠅛󠅑󠅝󠅟󠅤󠅟!

fortran77 22 hours ago [-]

65 23 hours ago [-]

This would be useful as a fingerprinting technique for corporate/government leakers.

HeikoBehrens 1 days ago [-]

FWIW, we considered this technique back at Pebble to make notifications more actionable and even filed a patent for that (sorry!) https://patents.justia.com/patent/9411785

palata 23 hours ago [-]

Respectfully: how the hell would that be a valid patent? Feels like patenting the idea of writing text in white on white on a Word document such that you don't lose it but it doesn't get printed.

It's just insane to ever call that "an invention".

rwmj 23 hours ago [-]

neilv 23 hours ago [-]

About 25 years ago, this was explained to me as "sword patents and shield patents".

Sure, some can use patents as swords, to suppress legitimate competition, or to extract undue rents. But you can also use patents as shields, to protect in various ways against those swords.

palata 21 hours ago [-]

> If I ran a BigTech

History tells us that those who run a BigTech become crazy narcissists serving their own interests :).

neilv 21 hours ago [-]

For myself, that's a chance I'm willing to take. :)

shermantanktop 23 hours ago [-]

They are also used in bulk to defend against larger competitors using this type of threat. In a war where the ammunition is garbage, you either lose or you start hoarding garbage.

krupan 23 hours ago [-]

coldpie 23 hours ago [-]

krupan 22 hours ago [-]

My friend, you really don't know what you are talking about, and getting all riled up like this is not the right way to learn.

Freely licensing your patents doesn't protect you against patent trolls. I wrote out how patent fights work in another comment, but here it is again.

Company A comes to Company B and says, "Hey! You are infringing on one of my patents!"

Company B says, "oh really? Well let me look through my collection of patents and see if you are infringing on any of mine."

Company A says, "oh, um, nevermind, I think I was mistaken."

Company B says, "yes, that's what I thought"

Now, imagine if Company B had already freely licensed all their patents. That defense wouldn't work.

I agree with you that it's a crappy system, but simply standing with your arms folded and saying, "I'm not playing," isn't going to work.

coldpie 22 hours ago [-]

Yes, that's the reason for the "except for defensive purposes" part. Quoting from Red Hat's promise:

krupan 22 hours ago [-]

coldpie 22 hours ago [-]

> Which is exactly the system 99% of companies follow, just more formally stated

If the intent of these patents is truly only for defense, then why isn't it common to use a license like this? They lose nothing by it.

> Yet you berated the poor guy from Pebble for even obtaining the patent he did??

Yes. It is IMO unethical to create software patents that aren't covered by such a legally-binding license.

fortran77 22 hours ago [-]

Perhaps you shouldn't hijack every thread about anything and make it about patents.

coldpie 33 minutes ago [-]

I replied to one comment thread. Perhaps you should put on your big boy pants and use the little [-] thing to minimize threads you aren't interested in reading.

palata 21 hours ago [-]

> Because of this, there is absolutely no point in shaming someone for patenting a thing

Well I wouldn't shame someone whose job was to patent something absurd. I was just saying that this is not an invention at all, and any system that protects that "innovation" is a broken system.

detourdog 23 hours ago [-]

I think the magic is in the context of Unicode. Which also makes it almost twice as ridiculous from my point of view. Because it seems to be doing exactly what unicode is meant to do.

numpad0 19 hours ago [-]

fun fact: dSLR lenses are patented all the time. Claims are basically "I made it and it works". And it's considered ok.

dboreham 21 hours ago [-]

Almost all filed patents are invalid.

palata 21 hours ago [-]

coldpie 24 hours ago [-]

RealityVoid 23 hours ago [-]

singleshot_ 23 hours ago [-]

“Except as otherwise provided… whoever without authority… uses… any patented invention… infringes[.]” 35 usc 271

krupan 23 hours ago [-]

23 hours ago [-]

krupan 23 hours ago [-]

Please see my comment about about the sad necessity for patents

https://news.ycombinator.com/item?id=43026595

23 hours ago [-]

Hizonner 23 hours ago [-]

Well, given that the technique itself makes the world a worse place, anything that impedes its use is probably positive...

ooterness 23 hours ago [-]

As a wise man once said: "Don't hate the player, hate the game."

shermantanktop 23 hours ago [-]

Where’d the game come from? Hint: the players.

krupan 23 hours ago [-]

First of all, it's not just a game, it's an outright battle to the death (of your company). Sure, you can choose not to wield patents, even in self defense, but good luck with that.

coldpie 22 hours ago [-]

krupan 22 hours ago [-]

See my other comments to you. Sometimes the threat of a good offensive weapon is the best defense. It's kinda like a nuclear arms race

IncreasePosts 23 hours ago [-]

What percentage of your actions are based around making the world a better place, instead of personal fulfillment or gain?

coldpie 23 hours ago [-]

> All they did was write a blog post about an interesting technique. They could literally read the patent application and write a blog post about that, assuming the methods are the same.

Okay, change "sue" to "prevent from creating a marketable product without paying a royalty to the patent owner in return for having provided nothing of value." The point remains.

> What percentage of your actions are based around making the world a better place, instead of personal fulfillment or gain?

rolph 22 hours ago [-]

consider how far you reach to make the world better.

1) thats really good im gonna, strive to keep it.

2) " " tell all and those who want will build one.

3) " " make lots and give them to everyone.

JadeNB 24 hours ago [-]

> Do you think you made the world a better place or a worse place by filing that patent?

coldpie 23 hours ago [-]

> The poster clearly is aware of the drawbacks of such patents, and didn't clearly play any role in filing the patent (they said "we … filed it," not "I filed it").

A person with the same name as that commenter is listed as an inventor on the patent.

> it can't possibly change their past behavior

delian66 24 hours ago [-]

Do you think your comment made the world, and HN specifically a better place?

Imustaskforhelp 23 hours ago [-]

I think so , yes , it made me be re aware of the patent troll scam in the USA.

In fact it is your comment which to me seems a little hateful , yes the above comment also felt a little hateful

Hate doesn't counter Hate , I guess.

RIMR 23 hours ago [-]

Yes, calling out unethical practices makes the world a better place by discouraging unethical practices.

krupan 23 hours ago [-]

Etheryte 23 hours ago [-]

Hopefully a wholly undefendable patent, you're essentially trying to patent the Unicode spec. The rest of it is perform an action in response to a text message which clearly isn't novel.

frereit 23 hours ago [-]

So, in my extremely unqualified opinion, just the encoding technique alone is not covered by the patent, only when combined with some action performed based on the encoding?

23 hours ago [-]

detourdog 23 hours ago [-]

Just curious this seems like simple digital Steganography or maybe even even the same as Shannon's boolean gate work. Do you think the patent is defendable in court?

https://en.wikipedia.org/wiki/Steganography

afrobirds 8 hours ago [-]

[dead]

frontporch 21 hours ago [-]

paulgb 21 hours ago [-]

AdamH12113 20 hours ago [-]

An octet is a group of 8 bits. Today we normally use the word "byte" instead. The term is often used in older internet protocols and comes from an era where bytes were not necessarily 8 bits.