Strings are not efficient. Do your string manipulations effectively.

Strings are not efficient. Do your string manipulations effectively.

ยท

5 min read

Welcome back, If you are following the blog already. Otherwise, I am more than happy to welcome you here. I assume we all have written code at some point in our career or our learning journey. You might be a hardcore programmer or a developer who casually codes. I got you covered.

Let's assume that I mean Competitive Programming when I say code.

String based problems

To start with, let's not go for practical implementation in our development life. I would like to keep things simple by considering simple coding problems that involve strings and string manipulations. There is no getting around string problems. It is the core of programming.

String manipulation

Let's talk about string manipulation quickly. It is the process of making some changes in the string we have. For example, let's use python here

aRandomString = "Hello World"

print(aRandomString[0]) #get one char of the word
print(aRandomString[0:1]) #get one char of the word (same as above)
print(aRandomString[0:3]) #get the first three char
print(aRandomString[:3]) #get the first three char
print(aRandomString[-3:]) #get the last three char
print(aRandomString[3:]) #get all but the three first char
print(aRandomString[:-3]) #get all but the three last character

This snippet includes all string slicing options in python. Moreover, there are a lot of methods that can manipulate string values like

aRandomString = "Hello World"
aRandomString.replace("World", "Readers")
print(aRandomString) # 'Hello Readers'

The time complexity we don't care about

While all these methods are super handy and easy to use, we are responsible for the time those methods take under the hood. All those string slicing operations we saw are O(n^2). This is huge when we consider a large string. The larger the input, the more significant is the time difference.

That's how exponents work ๐Ÿ˜…

Why are strings inefficient?

Language discussions

The answer lies in the basics. The Implementation of strings in most languages is immutable.

Not for C programmers tho. C strings are effectively an array of characters.

#include <stdio.h>

int main() {
    char aRandomString[5] = "abcde"; 
    printf("%s\n",aRandomString); // abcde
    aRandomString[2] = 'x';
    printf("%s",aRandomString);    // abxde
    return 0;
}

Let's get back to the string implementation in an advanced language like python.

aRandomString = "abcde"
print(aRandomString)    
# abcde 
aRandomString[2] = 'x' 
print(aRandomString)    
# TypeError: 'str' object does not support item assignment

String manipulation in Python

A simple way to alter the string is one of the 2 methods below

# Method 1
aRandomString = "abcde"
print(aRandomString)
# aRandomString[2] = 'x'  
aRandomString = aRandomString[:2] + 'x' + aRandomString[3:]
print(aRandomString)

# Method 2 
stringAsAList = list(aRandomString) 
stringAsAList[2] = 'y'  # List is mutable
aRandomString = ''.join(stringAsAList) 
print(aRandomString)

What's the difference between the 2 methods above?

  • One is altering the string directly by slicing the string in half and sandwiching a character between them.
  • The other one is simply converting a string to a list, as the list is mutable and then we convert the list back to a string.

In the first option, we are technically creating a new string to make any changes in them. But when we are using lists to manipulate the values, it's considerably more efficient.

The more complex the operations are, the more significant the efficiency gap will be.

Let's explore them with an iterative problem.

Here is a video where I explain the difference between modifying strings directly or by using the trick of lists. Consider checking it out.

A simple program to explain

Here is a function that repeatedly adds a char to the end of the string and another function that performs the same operation, but with an array.

We are using a module time to figure out the runtime the function takes.

from time import time

def basedOnString(n):
    string = ""
    for i in range(n):
        string += 'a'

def basedOnArray(n):
    arr = []
    for i in range(n):
        arr.append(1)

a = time()
basedOnString(10000000)
b = time()
basedOnArray(10000000)
c = time()

print("String time = ", b-a)
print("Array time = ", c-b)

String time = 4.817650079727173

Array time = 0.7630009651184082

As you can see, the array does the job significantly faster.

Why Lists?

Whenever we modify a string, the modification is achieved by creating a new string and pointing it to the same variable. This does the job, but at what cost? This is why performing string operations by converting them to lists are comparatively efficient.

Getting back to where we started...

This is exactly why using lists to perform string manipulation is better.

The answer, before you ask the question

Will the array be efficient if we convert a list back to a string?

aString = ''.join(aListToBeConvertedToAString)

Yes!

The time taken after adding a join function to the basedOnArray() function is,

String time = 4.817650079727173

Array time = 0.8269970417022705 (Still faster)

Conclusion

I didn't teach you something out of the world here. We all know lists are dynamic in nature (ie. mutable) and strings are not.

But, the motivation of the article is to just make these little things obvious to the people and encourage them to use these tips to make their programs faster and to flex their understanding with their programming friends or on a serious note, in a coding interview scenario.

You can subscribe to my newsletter here and follow me on Twitter here. Both these actions are highly appreciated.

Feel free to check my youtube channel here. I will be posting tons of content here and they are queued.

ย