Optimize Performance and Memory Use¶
When you begin to use Python regularly in your work, you'll start noticing bottlenecks in your code. Some workflows may run at lightning speed, while others take hours of processing time to complete, or even crash.
Avoiding bloat is invaluable as you move toward using code for automation, bigger data, and working with APIs. Code efficiency means:
- Less chance of a slowdown or crash: the dreaded MemoryError.
- Quicker response time and fewer bottlenecks for the larger workflow.
- Better scaling.
- Efficient code is often (but not always!) cleaner and more readable.
Let's look at some ways you can reduce bloat in your code.
tl;dr
Access and store only what you need, no more.
- Storage: avoid a list where you could use a tuple
- Membership look-up: avoid a list (or tuple) where you could use a set (or dictionary)
- Iteration: avoid a function (or list comprehension) where you could use a generator (or generator expression)
- Profile: make time for performance checks by profiling your code for bottlenecks
Storage: lists vs. tuples¶
If you have a collection of values, your first thought may be to store them in a list.
data_list = [17999712, 2015, 'Hawkins Road', 'Linden ', 'NC', 28356]
Lists are nice because they are very flexible. You can change the values in the list, including appending and removing values. But that flexibility comes at a cost. Lists are less efficient than tuples. For example, they use more memory.
import sys
data_tuple = (17999712, 2015, 'Hawkins Road', 'Linden ', 'NC', 28356)
print(sys.getsizeof(data_list))
print(sys.getsizeof(data_tuple))
104 88
If you aren't going to be changing the values in a collection, use a tuple instead of a list.
Membership look-up: sequential vs. hashable¶
However, when you want to see if an element already exists in a collection of elements, use a set or dictionary to store that collection if possible.
- List and tuple look-up is sequential, going at the speed of O(n): linear time.
- With lists, Python scans the entire list until it finds the match (or reaches the end).
- Worst case: it has to look at every element.
- Set and dictionary look-up are hashable: mapping keys to values. These go at the speed of O(1): constant time.
- No matter how big the collection is, the set only ever has to check 1 value.
- Sets are built on hash tables. Python computes the hash of the element and jumps straight to where it should be stored.
The example below shows that a set is over 100x faster than a list in calculating the first 10,000 values of Recaman's sequence.
def recaman_check(cur, i, visited):
return (cur - i) < 0 or (cur - i) in visited
def recaman_list(n: int) -> list[int]:
"""
return a list of the first n numbers of the Recaman series
"""
visited_list = [0]
current = 0
for i in range(1, n):
if recaman_check(current, i, visited_list):
current += i
else:
current -= i
visited_list.append(current)
return visited_list
%%timeit
recaman_list(10000)
386 ms ± 36.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
def recaman_set(n: int) -> list[int]:
"""
return a set of the first n numbers of the Recaman series
"""
visited_set = {0}
current = 0
for i in range(1, n):
if recaman_check(current, i, visited_set):
current += i
else:
current -= i
visited_set.add(current)
return visited_set
%%timeit
recaman_set(10000)
2.06 ms ± 61.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
When you add an element to a set...
- Python calls the element’s hash() method to get a hash value (an integer);
- That hash value determines where the element will be stored in the set's internal structure; and
- When checking if an element is in the set, Python uses the hash to quickly find it.
Iteration: functions vs. generators¶
We often use functions to operate on data, but generators can be more memory-efficient and faster for certain tasks.
Regular functions and comprehensions typically store outputs into containers, like lists or dictionaries. This can take up unnecessary memory, especially when we're creating multi-step workflows with many intermediate outputs.
In contrast, generators only hold one data item in memory at a time. A generator is a type of iterator that produces results on-demand (lazily), maintaining its state between iterations.
Under the hood, a generator's syntax is similar to a function. Generally, you:
- define a process(),
- provide the logic, and
- ask for the result, either with a return statement (for functions) or a yield statement (for generators).
Here's a quick way to see why a generator is superior for memory. Let's compare a regular function that produces new values endlessly, storing them in a list, to a generator that yields each value one at a time, discarding it from memory as it moves to the next iteration.
def massive_rf():
"""A regular function that produces even numbers, endlessly."""
x_list = []
x = 0
while True:
x_list.append(x)
x += 2
# Run it:
massive_rf()
Woah! That did its best, but my notebook has now informed me that, "Your session crashed after using all available RAM."
def massive_gen():
"""A generator that produces even numbers, endlessly."""
x = 0
while True:
yield x
x += 2
# Run it: (use keyboard interrupt when you want to move on.)
for x in massive_gen():
print(x)
The generator was willing to keep going until I interrupted it because it did not store each result in memory as it proceeded.
Let's look at a more concrete scenario. Imagine you have a large dataset containing millions of employee records. You want to calculate the combined hourly rates of all employees on an annual salary.
# For the sake of simplicity, we'll represent the dataset with a small sample.
employeeDatabase = [
{'lastName': 'Knope', 'rate': 72000, 'pay_class': 'annual'},
{'lastName': 'Gergich', 'rate': 17, 'pay_class': 'hourly'},
{'lastName': 'Ludgate', 'rate': 60000, 'pay_class': 'annual'},
{'lastName': 'Swanson', 'rate': 'redacted', 'pay_class': 'redacted'},
{'lastName': 'Haverford', 'rate': 52000, 'pay_class': 'annual'}
]
You can use a function for this, but it means i) the entire input dataset will be held in memory, and ii) each result (each worker's hourly value) will be held in memory too.
def hourly_rate(payments):
"""Function that returns each salaried worker's hourly rate."""
hourlyRates = []
for worker in payments:
if worker.get('pay_class') == 'annual':
hourly = worker['rate'] / 2080
hourlyRates.append(hourly)
return hourlyRates
# Sum hourly rates for those receiving an annual salary.
salariesPerHour = sum(hourly_rate(employeeDatabase))
print(f"Total dispersments per hour for salaried employees: ${salariesPerHour:.2f}")
Total dispersments per hour for salaried employees: $88.46
If the input dataset is huge, this eats up a ton of space. Instead, what if we process data lazily, storing one row in memory at a time?
def hourly_rate_gen(payments):
"""Generator that yields each salaried worker's hourly rate."""
for worker in payments:
if worker.get('pay_class') == 'annual':
hourly = worker['rate'] / 2080
yield hourly
# Sum hourly rates for those receiving an annual salary.
salariesPerHour = sum(hourly_rate_gen(employeeDatabase))
print(f"Total dispersments per hour for salaried employees: ${salariesPerHour:.2f}")
Total dispersments per hour for salaried employees: $88.46
A return statement is your signal that every output being produced will be held in memory at the same time and provided (returned) all at once.
- If a function returns a list of 1 thousand items, all 1 thousand are stored in memory before the end of execution.
In a generator, the yield statement signals that execution can proceed one at a time. When yield is executed, the generator pauses, retaining the generator's state until the next time it is called.
- Lazy outputs: Each output that a generator produces is yielded, then discarded before the next output is yielded.
- Lazy inputs: A generator can also stream input data, but you have to write it that way. For example,
for worker in payments
above is a for loop that streams one element (one worker's information) from the employeeDatabase list at a time.
Tip: Generator pipelines are a powerful workflow for GIS and remote sensing. Use multiple generators to string tasks together lazily. These are hugely helpful for complex spatial analysis workflows, such as raster processing.
Iteration, continued: List comprehension vs. generator expression¶
Generator expressions (aka generator comprehensions) are concise, one-line generators. Generator expressions can be a handy replacement for list comprehensions.
Let's look at how the analysis above would appear in list comprehension format.
hourly = [worker['rate'] / 2080 for worker in employeeDatabase if worker.get('pay_class') == 'annual']
salariesPerHour = sum(hourly)
print(f"${salariesPerHour:.2f}")
$88.46
As with the function, the list comprehension constructs a list of n values. Then, we use sum() to add all values in the list together.
A generator expression looks almost identical to a list comprehension: simply swap out square brackets with parentheses.
hourly = (worker['rate'] / 2080 for worker in employeeDatabase if worker.get('pay_class') == 'annual')
salariesPerHour = sum(hourly)
print(f"${salariesPerHour:.2f}")
$88.46
Profiling: finding bottlenecks¶
Profiling is any technique used to measure the performance of your code, in particular its speed. There are dozens of tools available for profiling. We'll use a few to:
- Check memory use: Use
sys.getsizeof()
to check the memory size of variables. - Spot-profile your code: Use the
timeit
notebook magic to perform some basic profiling by cell or by line. - Profile your script comprehensively: The
cProfile
module has the ability to break down call by call to determine the number of calls and the total time spent on each.
Check memory use with getsizeof()
¶
Use this tool to quickly check how much memory a variable is taking up on your system.
import sys
tract1 = {
"area": 100,
"area_water": 20,
"population": 1000
}
print(f"Bytes: {sys.getsizeof(tract1)}")
Bytes: 184
print(f"Bytes: {sys.getsizeof(recaman_list(1000))}")
print(f"Bytes: {sys.getsizeof(recaman_set(1000))}")
Bytes: 8856 Bytes: 32984
"You said sets were better than lists!"
Remember, sets are preferred over lists for membership lookup because they are faster, not slimmer.
- If you care more about output size, make a list; it takes up less memory.
- If you care more about task speed, make a set.
Spot-check speed with %%timeit
¶
The timeit
module measures the execution time of a selection of code. Among the many ways you'll see it written are "magic" commands:
%timeit
is a form of line magic. Line magic arguments only extend to the end of the current line.%%timeit
is a form of cell magic. It measures the execution time of the entire notebook cell.
With both of these commands, the notebook will test your code multiple times and print the average speed of those calls.
%%timeit
# Cell magic example
from typing import NamedTuple
class Tract(NamedTuple):
population: int
households: int
tract1 = Tract(1000, 500)
tract2 = Tract(2000, 800)
tract3 = Tract(5000, 3000)
tract1.households
113 µs ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# Line magic example
%timeit sum(hourly_rate(employeeDatabase))
%timeit sum(hourly_rate_gen(employeeDatabase))
859 ns ± 222 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 712 ns ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
timeit
tip: Optionally, you can limit the number of calls and repetitions with:
- -n: number of times to execute the main statement (size of each sample to time)
- -r: number of times to repeat the timer (number of samples)
%timeit -n 1 -r 5 sum(hourly_rate_gen(employeeDatabase))
The slowest run took 5.31 times longer than the fastest. This could mean that an intermediate result is being cached. 2.66 µs ± 1.91 µs per loop (mean ± std. dev. of 5 runs, 1 loop each)
Profile with cProfile
¶
Whereas timeit
is a quick way to test speed, cProfile
is useful as a comprehensive and holistic code profiler. Some perks of cProfile
:
- Compare which lines take longest to execute
- See how often a function is executed
- Sort profiling results by time
- See the respective data the function interacts with
- Print detailed reports with multiple statistics
Let's take a look:
import cProfile
cProfile.run('recaman_list(10000)')
20002 function calls in 0.406 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 9999 0.394 0.000 0.394 0.000 <ipython-input-4-1cfc8d8a116c>:1(recaman_check) 1 0.011 0.011 0.406 0.406 <ipython-input-4-1cfc8d8a116c>:4(recaman_list) 1 0.000 0.000 0.406 0.406 <string>:1(<module>) 1 0.000 0.000 0.406 0.406 {built-in method builtins.exec} 9999 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Above the table, you are given the number of function calls and how long the code took overall.
The field cumtime is the cumulative time it took to call a given function, including all of its subfunctions.
cProfile.run('recaman_set(10000)')
20002 function calls in 0.010 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 9999 0.003 0.000 0.003 0.000 <ipython-input-4-1cfc8d8a116c>:1(recaman_check) 1 0.006 0.006 0.010 0.010 <ipython-input-7-c2c0edd0cc91>:1(recaman_set) 1 0.000 0.000 0.010 0.010 <string>:1(<module>) 1 0.000 0.000 0.010 0.010 {built-in method builtins.exec} 9999 0.001 0.000 0.001 0.000 {method 'add' of 'set' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
These results show that the set-based function executed the same number of calls (200002), but ran 40x faster.
cProfile
tip: Use cProfile.Profile() as a context manager!
with cProfile.Profile() as pr:
def recaman_check(cur, i, visited):
return (cur - i) < 0 or (cur - i) in visited
def recaman_set(n: int) -> list[int]:
"""
return a set of the first n numbers of the Recaman series
"""
visited_set = {0}
current = 0
for i in range(1, n):
if recaman_check(current, i, visited_set):
current += i
else:
current -= i
visited_set.add(current)
return visited_set
recaman_set(1000)
pr.print_stats('line') # Order by line number.
2008 function calls in 0.001 seconds Ordered by: line number ncalls tottime percall cumtime percall filename:lineno(function) 999 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 1 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr} 1 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 1 0.000 0.000 0.000 0.000 {built-in method builtins.len} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 999 0.000 0.000 0.000 0.000 <ipython-input-22-3b8d66b9f081>:2(recaman_check) 1 0.001 0.001 0.001 0.001 <ipython-input-22-3b8d66b9f081>:5(recaman_set) 1 0.000 0.000 0.000 0.000 cProfile.py:41(print_stats) 1 0.000 0.000 0.000 0.000 cProfile.py:51(create_stats) 1 0.000 0.000 0.000 0.000 pstats.py:108(__init__) 1 0.000 0.000 0.000 0.000 pstats.py:118(init) 1 0.000 0.000 0.000 0.000 pstats.py:137(load_stats)
Exercises¶
Section 2 exercises summary
- Tuple-based storage
- Set-based look-up
- Generator expression
- Generator
- Compare differences in speed with
timeit
%timeit
line magic%%timeit
cell magic
- Check for speed bottlenecks in detail with
cProfile
- Stretch goal: Raster generator
1) Tuple-based storage¶
The code below creates a list containing all years in a research study timeframe, from 1900 to 2030.
The values in this collection will not need to be changed because the study will always use this timeframe.
import sys
def listFromRange(r1, r2):
"""Create a list from a range of values"""
return [item for item in range(r1, r2+1)]
start = 1900
end = 2030
studyYears = listFromRange(start, end)
print(studyYears)
print("Bytes used: ", sys.getsizeof(studyYears))
[range(1900, 2031)] Bytes used: 64
Your turn: For the same timeframe, write a different implementation using a storage option that takes up less memory.
2) Set-based look-up¶
The code below assigns a collection of placenames to a list. Then, it checks whether a placename is in the list. If not, the placename is reported missing.
If you have 1 million placenames to look up and 6 names in the list, that’s up to 6 million checks.
placeNames_list = ["Kinshasa", "Duluth", "Uruguay", "Doherty Residence", "Dinkytown", "Khazad-dum"]
# List look-up
if "Dinkytown" not in placeNames_list:
print("Missing.") # O(n) look-up
Your turn: Write a different implementation using a storage option that allows quicker checks for membership at scale.
3) Generator expression¶
You have a list of random strings which contain a combination of upper and lowercase letters. You have written a list comprehension, lowerCase, to rewrite all of these strings into lowercase.
import random
import string
# Input dataset: A list of random strings. Each string is 8 letters long.
randomStrings = [''.join(random.choices(string.ascii_letters, k=8)) for i in range(10)]
print(randomStrings)
# Convert all strings to lowercase
lowerCase = [x.lower() for x in randomStrings]
print(lowerCase)
['quCxGiYN', 'cUDXcQBk', 'yOVxBjyl', 'QKznqHJV', 'KwkxAbra', 'hjLXVdAh', 'lppRGHIB', 'VoDgKHws', 'mCzLrskq', 'ovkTIIYS'] ['qucxgiyn', 'cudxcqbk', 'yovxbjyl', 'qkznqhjv', 'kwkxabra', 'hjlxvdah', 'lpprghib', 'vodgkhws', 'mczlrskq', 'ovktiiys']
Your turn: Write a different implementation that still prints all the lowercase results, but operates faster than a list comprehension (when used with a large dataset).
4) Generator¶
The following function compares the length of each input dataset to that of a primary list. If the input and primary lists are the same length, the function calculates their difference and returns the result.
# The list that each dataset will be compared to.
primary = [4, 7, 140, 55, 7, 91, 6]
# Input datasets
inputs = (
[0, 3, 40, 55, 6, 98, 4],
[5, 4, 3, 45, 1, 67, 2],
[7, 150, 0.5, 1]
)
def matchingStructure(inputsList, primList):
"""
This function compares the length of each input collection to the primary
list. An input that matches in length gets multiplied by the primary list and
appended to the results list.
"""
results = []
for item in inputsList:
if len(item) == len(primList):
difference = [b - a for a, b in zip(item, primList)]
results.append(difference)
return results
print(matchingStructure(inputs, primary))
[[4, 4, 100, 0, 1, -7, 2], [-1, 3, 137, 10, 6, 24, 4]]
Your turn: Write a different implementation that uses a generator instead of a function to compare lengths and calculate results.
5) Compare differences in speed using timeit
¶
5.1) %timeit
line magic¶
Using %timeit
line magic, compare the time it takes each comprehension below to run.
[i for i in range(50) if i % 2 == 0]
(i for i in range(50) if i % 2 == 0)
5.2) %%timeit
cell magic¶
Using %%timeit
cell magic, calculate the time it takes this cell to run.
Set the command to execute the main statement only once and repeat the timer only once.
employeeDatabase = [
{'lastName': 'Knope', 'rate': 72000, 'pay_class': 'annual'},
{'lastName': 'Gergich', 'rate': 17, 'pay_class': 'hourly'},
{'lastName': 'Ludgate', 'rate': 60000, 'pay_class': 'annual'},
{'lastName': 'Swanson', 'rate': 'redacted', 'pay_class': 'redacted'},
{'lastName': 'Haverford', 'rate': 52000, 'pay_class': 'annual'}
]
def hourly_rate(payments):
"""Function that returns each salaried workers' hourly rate."""
hourlyRates = []
for worker in payments:
if worker.get('pay_class') == 'annual':
hourly = worker['rate'] / 2080
hourlyRates.append(hourly)
return hourlyRates
# Sum hourly rates for those receiving an annual salary.
salariesPerHour = sum(hourly_rate(employeeDatabase))
print(f"Total dispersments per hour for salaried employees: ${salariesPerHour:.2f}")
6) Check for speed bottlenecks in detail using cProfile
¶
Using cProfile
, answer these questions about the following lines of code:
- How long does everything in this cell take to execute?
- Which item takes the longest time to execute? Tip: Sort by cumtime find hotspots more easily.
dataList = [x for x in range(1, 10_000_000)]
dataTuple = tuple(x for x in range(1, 10_000_000))
listFromList = []
listFromTuple = []
for item in dataList:
new = item + 1
listFromList.append(new)
for item in dataTuple:
new = item + 1
listFromTuple.append(new)
7) Stretch Goal: Raster generator¶
Let's say you have a raster depicting 500 square meter population density (people per 500m²) across a country. That's a huge dataset! You want to resample the raster down to 1 square kilometer (people per 1km²) to make it easier to work with.
To do this, you have written a function that creates a new raster of 1km² grid cells. Each 1km² cell contains the total population of all 500m² cells within it.
import numpy as np
# Starting dataset: 80x80 grid of people per 500m².
highResPop = np.ones((80, 80)) * 5
Note: The example here uses arrays to represent the rasters for simplicity, and each 500m² cell contains exactly 5 people.
def densityKM(popArray):
"""
Function that returns population density per km² cell from a
500 m² resolution population source.
Input: 500x500m 2D array
Output: 1x1km 2D array, covering the same area of interest.
"""
group_size = 20 # Every 20x20 group of 500m² cells equals 1km².
rows, cols = popArray.shape
# Aggregate
kmArray = popArray.reshape(
rows // group_size, group_size,
cols // group_size, group_size
)
# Sum over each group
kmDensity = kmArray.sum(axis=(1, 3))
# Output
return kmDensity
densityKM(highResPop)
array([[2000., 2000., 2000., 2000.], [2000., 2000., 2000., 2000.], [2000., 2000., 2000., 2000.], [2000., 2000., 2000., 2000.]])
Your turn: Write a different implementation using a generator. As an extra challenge, try to find a way to avoid storing your entire km² array in memory. (Instead, process one group of 20x20 cells at a time).