Follow

TIL CJK character codepoints often need 3 bytes to encode in UTF-8, which is significantly worse than the 2 bytes needed for Japanese in UTF-16. No wonder UTF-8 has poor Japanese adoption.

(To be clear, Shift-JIS is still the clear winner in terms of encodings for Japanese text; I'm just making the observation against UTF-16 in a context where I'm using widechars)

Sign in to participate in the conversation
Awoo Space

Awoo.space is a Mastodon instance where members can rely on a team of moderators to help resolve conflict, and limits federation with other instances using a specific access list to minimize abuse.

While mature content is allowed here, we strongly believe in being able to choose to engage with content on your own terms, so please make sure to put mature and potentially sensitive content behind the CW feature with enough description that people know what it's about.

Before signing up, please read our community guidelines. While it's a very broad swath of topics it covers, please do your best! We believe that as long as you're putting forth genuine effort to limit harm you might cause – even if you haven't read the document – you'll be okay!