5 scenarios where beginners usually misuse Python

towards-data-science

This post was originally published by Dardan Xhymshiti at Towards Data Science

Utilising Python Better

Python is the go-to programming language for many beginners nowadays. The easy to learn syntax, vast number of libraries and the rich community are the main reasons why Python is skyrocketing.

When I landed in Python six years ago with a back full of Java, I found myself many times writing Python code with Java in mind. As a new starter, I didn’t quite utilise Python benefits and in some situations, I was even misusing it.

Back to now, I still see some new starters jumping and coding in Python without first taking some time to read about best practices and recommendations. To help on this, I’ve listed the below five scenarios where Python is misused and recommendations on better utilising it.


#1 When Using Lists

List and Tuple

List allows storing elements of whatever data types without limiting in size. Although this flexibility makes list the go-to data collection, actually there are some best practices when to use it and when not.

List should be used when storing elements of the same nature (data type and meaning).

Python doesn’t programmatically restrict this. Storing single nature items in a list makes developer’s life easier. Developers would easy expect what items the list is going to have in the future and confidently write scripts assuming that.

Consider the list_of_things below. This list doesn’t have a singular nature of items. A developer cannot figure out whether this list contains house parts, dimensions or other things, so he should handle all the different items separately.

list_of_things = ['Door', 2, 'Window', True, [2.3, 1.4])]

Consider the list_of_fruits and list_of_scores below. From the first couple of items, you can easily infer that the first list will always contain fruit names and the second one will contain score values.

list_of_fruits = ['apple', 'orange', 'pear', 'cherry', 'banana']
list_of_scores = [80, 98, 50, 55, 100]

On the other hand, tuple it’s more appropriate to be used when storing items with different meaning or data types. Tuple doesn’t provide the flexibility of storing unlimited items without creating new objects (because tuple is immutable).


#2 When Concatenating Strings Iteratively

String Concatenation

You’ve probably heard that in Python everything is an object and objects might be immutable and mutable. An immutable object requires a new object creation whenever you update the value assigned to it, whereas a mutable object doesn’t.

Let’s say you want to generate the whole alphabet in a single string. Since string is an immutable object, whenever you concatenate string values using “+” operator, you will always generate new objects.

one_line_alphabet = ''
for letter_index in range(ord('a'), ord('z')):
one_line_alphabet += chr(letter_index)

A preferred way to concatenate strings is to use the join function. Using the join function reduces the computation time for ~3 times. In a test that I performed, it took 0.135s to concatenate 1 million string values iteratively and only 0.044s when using the join() function.

small_letters = [chr(i) for i in range(ord('a'), ord('z')+1)]
single_line_alphabet = ''.join(small_letters)

Sowhenever you have to concatenate a list of strings use the join function. Concatenating few strings through the join function wouldn’t really make you see the performance difference. To concatenate few string values, use .format instead of plus operator. For example:

name = 'John'
surname = 'Doe'
full_name = '{name} {surname}'.format(name=name, surname=surname)

#3 When Reading & Writing Files

A .txt file

To read and writing files in Python you need to first open the file through the built-in open function. You open the file, read or write content and close the file. A couple of issues might raise when you do this. Forgetting to close the file and not handling exceptions are some of them.

Forgetting to close the file when you finish the job causes issues after. For example, if you forget to close the file after you write on it, the writing will not appear in the file and you will keep resources allocated in your machine while your file is still opened. If exceptions are not handled manually and errors happen while processing the file, the file will be kept open.

f = open(file='file.txt', mode='r')
lines = f.readlines()
...
f.close()

Using the with keyword is recommended whenever you open files. with is a context manager that wraps the code and ensures automatic exception handling for you. For example, whatever might fail in the with-body when you read/write the file, exceptions are automatically handled and the file is always closed for you.

with open('file.txt') as f:
read_data = f.read()
...

When skipping with you should handle everything in your own. Close file and exception handling should explicitly be handled by you. Instead make your life easier and let with mange the situation.


#4 When Skipping Generators

Keeping all values in a List versus Generating them one-by-one

In many scenarios, you need to generate a list of values that you will later use in your script. Say for example that you need to generate all three-number combinations for the first 100 numbers.

combinations = []
value = 100
for i in range(value):
for j in range(value):
for k in range(value):
combinations.append((i, j, k))

When the execution completes, the combinations list will contain 1M tuples each with three int values. These values will reside in memory until deleted. Checking the object size with getobjectsize functionfrom sys module gives the size of 8.29MB.

Instead of using lists to store values and keep all of them in memory you can create a generator which would generate 1 combination at a time whenever you call it. This reduces memory consumption and keeps execution faster.

def generate_combinations_of_three(value):
for i in range(value):
for j in range(value):
for k in range(value):
yield (i, j, k)gen = generate_combinations_of_three(100)next(gen) # yields (0, 0, 0)
next(gen) # yileds (0, 0, 1)
...

So, whenever possible use generators. Always remember that the memory capacity is limited and optimise memory usage as much as possible. Use generators especially when developing scalable solutions. Generators are important, consider them so!


#5 When Using Comprehensions

List Comprehensions

Pythonista describes a programmer that follows the guidelines from The Zen of Python whenever coding in Python. If you are a new starter in Python, you tend to exaggerate certain points from this Zen and understate others.

This is mostly noticed when you get to know with Comprehensions — you tend to translate ‘every’ loop in a comprehension. Say that you have a three dimensional matrix of numbers that you want to flatten.

matrix = [[[ 1, 2, 3 ],
[ 4, 5, 6 ],
[ 7, 8, 9 ]],
[[ 10, 20, 30 ],
[ 40, 50, 60 ],
[ 70, 80, 90 ]]]

Using list comprehensions the flattening looks like:

flatten_list = [x for sub_matrix in matrix for row in sub_matrix for     
x in row]

Using for loops the flattening looks like:

flatten_list = []
for sub_matrix in matrix:
for row in sub_matrix:
for x in row:
flatten_list.append(x)

Comprehensions are cool but readable code is cooler. Don’t make your intention to always use comprehensions. Even though that might take less code to write don’t trade off the code readability.


Conclusion

Whenever jumping to a new programming language, whether you are experienced or not, take your time to read about best practices. Every language has some ingredients that make it special, so make sure to utilise them in the right place.

Python is all about getting things done faster and easier, but you shouldn’t overlook small decisions that might have a negative impact on the lifetime of your code. Always look for better and optimised solutions whenever possible.

Spread the word

This post was originally published by Dardan Xhymshiti at Towards Data Science

Related posts