Tangible Bytes

A Web Developer’s Blog

Unicode

A couple of things about UTF-8 have eluded me for a while …

I knew that the first bit of ASCII (the bit people agreed on) is the same in ASCII and UTF-8

I knew that the rest of Unicode needs 2 or 3 bytes

But I wasn’t clear how you could tell how many bytes needed to be read at a time

And mostly I didn’t need to because the computer does it all for me - but those bits of vagueness can catch you out and so I went down the rabbit hole and it turns out to be fairly short.

Read more ...