this post was submitted on 02 Nov 2023
328 points (92.5% liked)
tumblr
3372 readers
178 users here now
Welcome to /c/tumblr, a place for all your tumblr screenshots and news.
Our Rules:
-
Keep it civil. We're all people here. Be respectful to one another.
-
No sexism, racism, homophobia, transphobia or any other flavor of bigotry. I should not need to explain this one.
-
Must be tumblr related. This one is kind of a given.
-
Try not to repost anything posted within the past month. Beyond that, go for it. Not everyone is on every site all the time.
-
No unnecessary negativity. Just because you don't like a thing doesn't mean that you need to spend the entire comment section complaining about said thing. Just downvote and move on.
Sister Communities:
-
/c/[email protected] - Star Trek chat, memes and shitposts
-
/c/[email protected] - General memes
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
A Unicode character can be up to 4 bytes, so 2^32 or 4,294,967,296 potential unique characters. And it'd be easy enough to adjust the standard to allow for an extra byte(s) if necessary -- it's been done before.
This is incorrect. While in UTF-32 a character (actually a code point) requires 4 bytes, and in UTF-8 up to 4 bytes, the Unicode standard is limited to 17*2^16 code points. (edit: apparently because that is the limit of UTF-16. 4 Byte UTF-8 can encode 2^21 code points, but it is not technically limited to four bytes, so in total is a ble to encode 2^31 code points)
Unicode is the standard that says "the thing we call captial A is the 65th character", literally defining a mapping from numbers to concepts.
UTF-8 or UTF-32 are a way to encode a list of numbers in a more (UTF-8) or less (UTF-32) efficient way.