Can You Reverse a Python String Containing a Flag Emoji? | by David Amos | Jan, 2022

  • Some character encodings exclude characters that are included in other character encodings.
  • Two different character encodings may decode the same integers into two different strings.

The Wikipedia article on characters mentions two different character encodings:

Two examples of the usual encodings are ASCII and UTF-8 for Unicode encoding.

– Wikipedia, “Personality (computing)”

OK, so ASCII and UTF-8 are two specific types of character encoding.

ASCII was the most popular character encoding on the World Wide Web until December 2007, when it was overtaken by UTF-8 encoding; UTF-8 is backward compatible with ASCII. “

– Wikipedia, “ASCII”

UTF-8 isn’t just the dominant character encoding for the web. It’s also the primary character encoding for Linux and macOS and is the default for Python code.

Personal “🇺” It is encoded as a string of four integers in UTF-8, while the character ”P” It is encoded as a single integer. why is that?

UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four single-byte (8-bit) cipher units. Code points with lower numeric values, which tend to happen more frequently, are encoded using fewer bytes. “

– Wikipedia the free encyclopedia

Well, that makes it seem like UTF-8 is not encoding the characters as integers, but instead as something called a Unicode code point. Each code unit can apparently be from one to four bytes.

  1. What is a Unicode code point?

Leave a Comment