Optimize Performance and Memory Use¶
When you begin to use Python regularly in your work, you'll start noticing bottlenecks in your code. Some workflows may run at lightning speed, while others take hours of processing time to complete, or even crash.
Avoiding bloat is invaluable as you move toward using code for automation, bigger data, and working with APIs. Code efficiency means:
- Less chance of a slowdown or crash: the dreaded MemoryError.
- Quicker response time and fewer bottlenecks for the larger workflow.
- Better scaling.
- Efficient code is often (but not always!) cleaner and more readable.
Let's look at some ways you can reduce bloat in your code.
Access and store only what you need, no more.
- Storage: avoid a list where you could use a tuple
- Membership look-up: avoid a list (or tuple) where you could use a set (or dictionary)
- Iteration: avoid a function (or list comprehension) where you could use a generator (or generator expression)
- Profile: make time for performance checks by profiling your code for bottlenecks
Use fewer lists¶
If you have a collection of values, your first thought may be to store them in a list.
data_list = [17999712, 2015, 'Hawkins Road', 'Linden ', 'NC', 28356]
Lists are nice because they are very flexible. You can change the values in the list, including appending and removing values. But that flexibility comes at a cost. Lists are less efficient than tuples. For example, they use more memory.
import sys
data_tuple = tuple(data_list)
print(sys.getsizeof(data_list))
print(sys.getsizeof(data_tuple))
Note that sys.getsizeof
doesn't include the size of data in a container, just the size of the container. You can use it to compare data structures that have the same data in them, but not to compare different data.
Membership look-up: sequential vs. hashable¶
When you want to see if an element already exists in a collection of elements, neither lists nor tuples are the best choice.
- List and tuple lookup is sequential. The bigger the list, the longer look-up takes. This is called O(n) time complexity.
- Set and dictionary lookups are hashable, which means a lookup goes directly to the correct value. Lookup always takes the same amount of time, now matter how much data there is. This is called O(1) time complexity
For example, imagine an analyst has a dataset of 1 million addresses. They also have a smaller dataset of 10,000 zip codes. They want to know which of the zip codes are associated with at least 1 of the addresses.
One way to do that is with a list
from random import randint
addresses_zips = [randint(10000, 99950) for _ in range(1_000_000)]
zips_of_interest = [randint(10000, 99950) for _ in range(10_000)]
zips_with_address_match_from_list = []
for address_zip in addresses_zips:
if address_zip in zips_of_interest:
zips_with_address_match_from_list.append(address_zip)
print(len(zips_with_address_match_from_list))
A faster way is to use a set.
zips_of_interest_set = set(zips_of_interest)
zips_with_address_match_from_set = []
for address_zip in addresses_zips:
if address_zip in zips_of_interest_set:
zips_with_address_match_from_set.append(address_zip)
print(len(zips_with_address_match_from_set))
zips_with_address_match_from_set == zips_with_address_match_from_list
Big takeaway: Lists are appropriate when you need a collection where you can change the values, but they aren't the best choice for everything. Use a tuple if you don't need to change the values. Use a dictionary or set if you need to check if a value is in the collection.
Use more generators¶
Regular functions and comprehensions typically store outputs into containers, like lists or dictionaries. This can take up unnecessary memory, especially when we're creating multi-step workflows with many intermediate outputs.
In contrast, generators only hold one data item in memory at a time. A generator is a type of iterator that produces results on-demand (lazily), maintaining its state between iterations.
def massive_func():
"""A function that attempts to produce an infinitely long list of even numbers."""
x_list = []
x = 0
while True:
x_list.append(x)
x += 2
return x_list
# Calling this function will run out of space
for x in massive_func():
print(x)
def massive_gen():
"""A generator that produces an infinitely long stream of even numbers."""
x = 0
while True:
yield x
x += 2
# Calling this function will run out of time
for x in massive_gen():
print(x)
What goes for functions, also goes for list comprehensions. You can often use a generator expression in place of a list comprehension. We've already seen an example of a generator expression in the n-dimensional distance function:
coords = (1, 1, 1, 1)
sum(d ** 2 for d in coords)
Compare that example to one that uses a list comprehension:
coords = (1, 1, 1, 1)
sum([d ** 2 for d in coords])
The sum
function operates by looping over an iterable and adding the value to a running total. In the first case, the iterable is a generator that produces a single value at a time.
In the second case, the list comprension loops over coords
to produce a list where every value is stored in memory. Then the sum
function loops over that list.
An important limitation of generators is that because they produce a single value at a time and then forget about it, you cannot reuse them.
generator = (d ** 2 for d in coords)
sum(generator)
max(generator)
Big Takeaway: If you're only going to use a value once, you should probably use a generator. If you need to use it again, you probabably need to store it in something like a tuple or list.
Profile, don't guess¶
Profiling is any technique used to measure the performance of your code, such as its speed or resource usage. There are dozens of tools available for profiling, but we'll focus on two:
- Check memory use: Use
tracemalloc
to check the memory usage of code. - Spot-profile your code: Use the
timeit
notebook magic to perform some basic profiling by cell or by line.
To make profiling easier, the cell below defines functions for calculating a sum on a generator expression and on a list comprehension. Both functions will be called with a very large number of coordinates to make profile differences more obvious.
coords = (1, 1) * 1_000_000
def sum_generator(coords):
return sum(d ** 2 for d in coords)
def sum_list_comprehension(coords):
return sum([d ** 2 for d in coords])
Check memory use¶
The cells below uses tracemalloc
to capture information about memory usage for the the two versions of the function.
You do need to restart the kernel between runs of these cells, to ensure tracemalloc
isn't counting information stored in memory from a previous cell run.
import tracemalloc
tracemalloc.start()
sum_generator(coords)
current, peak = tracemalloc.get_traced_memory()
print(peak)
import tracemalloc
tracemalloc.start()
sum_list_comprehension(coords)
current, peak = tracemalloc.get_traced_memory()
print(peak)
Spot-check speed with %%timeit
¶
The timeit
module measures the execution time of a selection of code. Among the ways you'll see it written are "magic" commands in notebooks.
%%timeit
is a form of cell magic. It measures the execution time of the entire notebook cell.
%%timeit
sum_generator(coords)
%%timeit
sum_list_comprehension(coords)
If you just want to check the timeing for a single line, you can use the %timeit
line magic. That's useful if you have some code that takes some time to run, but you don't want it affecting the timeit
results. Compare the use of cell magic and line magic in the next two cells.
%%timeit
from time import sleep
sleep(1)
sum_list_comprehension(coords)
from time import sleep
sleep(1)
%timeit sum_list_comprehension(coords)
Big takeaway: You can use your knowledge of Python to make some predictions about where performance bottlenecks are occuring in your code. But you should check to be sure, because those bottlenecks frequently show up in unexpected places.
Exercises¶
The exercises below invite you to practice applying the different strategies outlined above. They follow the order of the concepts presented, and you'll need to at least run the code in #3 before you attempt #4 or #5, since they rely on the function definitions in #3. You can otherwise attempt them in any order. Start with the ones that seem most applicable to the work you need to do.
You can find example answers in the ExerciseAnswers.ipynb notebook.
1) Use the right data structure for immutable sequences¶
The code below creates a list containing all years in a research study timeframe, from 1900 to 2030.
The values in this collection will not need to be changed because the study will always use this timeframe.
import sys
def list_from_range(start, end):
"""Create a list from a range of values"""
return list(range(start, end + 1))
start = 1900
end = 2030
studyYears = list_from_range(start, end)
print(studyYears)
print("Bytes used: ", sys.getsizeof(studyYears))
Write a different implementation using a different storage option and demonstrate that option uses less memory.
2) Use the right data structure for membership lookup¶
The code below assigns a collection of placenames to a list. Then, it checks whether a placename is in the list. If not, the placename is reported missing.
If you have 1 million placenames to look up and 6 names in the list, that’s up to 6 million checks.
placeNames_list = ["Kinshasa", "Duluth", "Uruguay"] * 1_000_000
# O(n) list look-up
if "Dinkytown" not in placeNames_list:
print("Missing.")
Write a different implementation using a storage option that allows quicker checks for membership at scale.
3) Use generators¶
The code below uses a generator to create vertices for triangles from a random selection. It also defines a function for calculating the area of a polygon from its vertices.
from itertools import cycle
from random import randint
class Random_Vertex:
def __init__(self):
self.x = randint(0, 100)
self.y = randint(0, 100)
def generate_polygon_vertices(num_polygons, num_sides):
for _ in range(num_polygons):
vertices = (Random_Vertex() for _ in range(num_sides))
yield vertices
def calculate_area(vertices):
subtotals = []
vertex_cycle = cycle(vertices)
next(vertex_cycle)
for vertex in vertices:
next_vertex = next(vertex_cycle)
subtotal = vertex.x * next_vertex.y - vertex.y * next_vertex.x
subtotals.append(subtotal)
area = abs(sum(subtotals) / 2)
return area
The code below uses the code above to generate 1 million triangles. You want to find out the area of the largest triangle. The code below does this with a list comprehension, which holds all 1 million area values in memory.
triangles = generate_polygon_vertices(1_000_000, 3)
max([calculate_area(triangle) for triangle in triangles])
Rewrite the code above to use less memory.
Hint: The easiest fix is to replace the list comprehension with a generator expression. Harder would be writing your own generator using the yield
statement
4) Check memory use of lists vs. generators¶
Change both cells below to use tracemalloc
to compare their memory use.
Hint: Because the notebook keeps many variables in memory, you will want to restart the notebook kernel between running the cell to get a valid comparison. That means you will need to re-run the cell that defines the generate_polygon_vertices
generator and calculate_area
function.
# Using lists
triangles = generate_polygon_vertices(1_000, 3)
max([calculate_area(triangle) for triangle in triangles])
# Using a generator expression
triangles = generate_polygon_vertices(1_000, 3)
max(calculate_area(triangle) for triangle in triangles)
5) Compare execution speed of lists vs. generators¶
Change both cells below to use timeit
to compare their execution time.
# Using a list
triangles = generate_polygon_vertices(1_000, 3)
max([calculate_area(triangle) for triangle in triangles])
# Using a generator expression
triangles = generate_polygon_vertices(1_000, 3)
max(calculate_area(triangle) for triangle in triangles)