@Azure isn't 1,114,112 (1.1 million) characters enough?

@sfner That's a worthwhile question! At the moment Unicode is roughly 10% full. Though to keep it to that level some Dubious Things have been done.

Since the logographic repertoires of Chinese, Japanese, and Korean have a common origin, most of them were smashed together in what was called Han unification. This was justified in keeping the number of code points down. Without it, they may have taken more than a hundred thousand glyphs on their own.

I'd argue this would be preferable, since Han Unification causes serious problems for mixed-language texts, which is one of those things is sort of Unicode's reason for being. I would like to see reasons for hacks like this to go away.

We also have auxiliary conscripts that aren't in Unicode. Like Blissymbolics. They may be one day. And given that people are cranking out new emoji at a pretty quick pace and continuing to make conscripts, I'd rather keep the option of wild expansion open to clearly legitimate inclusionist impulses.

@Azure @sfner 🍬 Could someone make an encoding that uses *all* the codepoints and drop UTF-16 compatibility, or would that cause impossible encoding problems? 🍬

@lyrabon It would just make platforms that use UTF-16 not work any more. Older Windows systems (i think they use UTF-8 by default now), some JavaScript engines, etc.


@Azure that sounds like a good concern; but I guess now it's too late, as they might have already closed the path for an incremental and backwards-compatible transition.

@lyrabon I remember reading about diacritic on latin words: "á" and "<a><composite_acute>" being two binary representations for the same data. Needless to say how not-interoperable that is. Maybe UTF-8 retaining ASCII backwards compatibility was a mistake.

@sfner @Azure a long time ago, people thought the same about 640KB of RAM

Sign in to participate in the conversation
Awoo Space is a Mastodon instance where members can rely on a team of moderators to help resolve conflict, and limits federation with other instances using a specific access list to minimize abuse.

While mature content is allowed here, we strongly believe in being able to choose to engage with content on your own terms, so please make sure to put mature and potentially sensitive content behind the CW feature with enough description that people know what it's about.

Before signing up, please read our community guidelines. While it's a very broad swath of topics it covers, please do your best! We believe that as long as you're putting forth genuine effort to limit harm you might cause – even if you haven't read the document – you'll be okay!