Leetcode: Improving String Performance in Swift

String issues are one of the most common issues in Leetcode. By solving a task on Leetcode, you can also compare your solution with other solutions in terms of performance and memory usage. I have experience solving problems with strings, and my task execution time is often 100% better than other users’ solutions. Let’s see how we can speed up solving the strings problem.

Strings in Swift

In many programming languages, a string is an array of characters with direct access to the string/array elements. In this case, you can access a substring or a character immediately at O(1) time. However, Swift is designed differently. Let’s take a more in-depth look at what lies behind the interface.

The String architecture implements the BidirectionalCollection protocol, which makes it possible to access the index at O(1) time. To find out this index, you need to calculate it separately, as a result, it takes O(n) time to get to any element. To understand why it works this way, we need to dig deeper – into coding.

The encoding determines how each Unicode character is stored in memory. There are more than 130,000 characters; Each must be distinguished in some way from the other and stored in memory. The primary storage unit in computers is bytes (8 bits). It can’t store more than 256 different values, so one byte isn’t enough to hold the text in Unicode encoding (remember, we have that many possible characters).

Unicode characters contain many of the basic encoding codes UTF-8, UTF-16, and UTF-32. They each have their own characteristics, but we are interested in UTF-8 because Swift uses it.

UTF-8

For each character in UTF-8, 1 to 4 bytes of memory are allocated. The number of bytes is determined by the character number in the Unicode table. The higher the number in the Unicode table, the greater the number of bytes required to encode the character, up to a maximum of 4 bytes. For example, the Latin letter “A” is encoded in the value of bit 00101101 or in the decimal fraction 56, which requires one byte to represent it in memory. And the Cyrillic character “б” is encoded in the value of bit 0110110101010111010, to which 2 bytes are already allocated. The world’s most common symbols are included in the first 128 Unicode characters – mainly the Latin alphabet, Arabic numbers, and basic punctuation.

Therefore, all these characters fit into one byte. But there is one problem here. Swift internally uses UTF-8, and does not know in advance which characters will be in the string. Swift considers all lines to be UTF-8, and each character in the string can be between 1 and 4 bytes.

Let’s look at an example. Imagine we have the string “Hi Павел” (“Hi Pavel” in Latin). The first two characters are Latin letters, a space, and 5 Cyrillic letters.

It is easy to indicate the first letter:

  1. In the beginning, we know the place in advance in memory where it is
  2. we read it
  3. Based on the bitmask, we can determine that this is one of the first 128 Unicode characters

We now know that the first character occupies one byte, so we must move on one byte and read the next. Let’s go further. How to get to the second code?

But the first character can be any other character and it only takes up 1 byte of memory, but 2, 3 or 4. Swift didn’t know beforehand that the first character was a letter from the Latin alphabet. Therefore, we need to read it first and determine how many bytes it takes up. To get to the next character, we must move the corresponding number of bytes.

Moreover, to denote the third character, we must specify how much the second character occupies in memory. Once we get to the Cyrillic characters, each of them has already taken up 2 bytes of memory. To access the following characters, Swift will shift by 2 bytes.

Again, there is no way to pre-determine how many bytes you need to move in order to access a character. You need to go through all the characters and choose how many bytes each of them occupies. Indexes do all of this work in Swift.

As we know, Swift has a certain set of functions for calculating indexes that work with strings, for example, index (after 🙂And index (before 🙂And distance: after, etc. (These functions inherit from Collection And bi-directional array protocols).

These functions do all the hard work of calculating the position in the string. It turns out that to denote a character in the middle of a line, we must first calculate its index, and sequentially calculate the index of the previous character. Only then will we find out exactly where in memory the required symbol is stored.

Leetcode Features

As we have seen, these calculations are very complex and costly in terms of performance. In Leetcode, the most preferred part of the assignment condition is the input data constraints. The standard limitation for strings is the characters used, limited to the Latin alphabet, numbers, and punctuation.

Based on this additional information, we can improve the performance of our algorithms. We don’t have a way to tell Swift that it’s UTF-8 with single-byte characters, but we can trick Swift a bit and convert the strings into an array of bytes (array ). As we know, arrays in Swift implement the extension RandomAccessCollection The protocol, which provides access to an array element in O(1) time. We don’t need to pre-compute the index to access an array element. We can access any element, for example, St[4], and Swift will not need to calculate the memory change that needs to be made. Swift can move four items at once and get the required byte instantly.

Once you have the input, you need to convert it from a string to a file Letter An array or even an array of bytes. Then you treat them as an array with direct access to the elements. This is how most Leetcode solutions speed up for string related tasks. It will greatly speed up your decision.

Array conversion options:

let str = "Hello world"

Array<UInt8>(str.utf8) // array of bytes
Array(str) // array of Characters

If it is required while working with strings, to access any element, something like this:

let startSubstringIndex = greeting.index(greeting.startIndex, offsetBy: 2)
greeting[startSubstringIndex..<greeting.index(startSubstringIndex, offsetBy: 3)]

… This construction, besides the complexity of the structure itself, is also slow, unlike this one, which is more concise and faster:

// let charArray = Array(greeting)
charArray[2..<5]

You can use this little hack to solve problems on Leetcode and buy some time from the scan system.

.

Leave a Comment