This chapter details how Python handles sequential data access using the Iterator Protocol and introduces Generators—special functions that create iterators efficiently using the yield keyword. This is key for working with massive datasets without running out of memory.
1. Iterables and Iterators
Any object in Python that can be looped over (like a list, tuple, or string) is called an iterable. An iterator is the object that actually performs the iteration.
A. The Iterator Protocol
An object is an iterator if it implements two special Dunder Methods (P3.2):
__iter__: Must return the iterator object itself.__next__: Returns the next item from the sequence. When there are no more items, it must raise theStopIterationexception, which tells theforloop to terminate.
# The built-in iter() and next() functions use the Dunder Methods internally.
my_list = [1, 2, 3]
# 1. Get the iterator object from the iterable (list)
my_iterator = iter(my_list)
# 2. Get the next element
print(next(my_iterator)) # Output: 1
print(next(my_iterator)) # Output: 2
# The 'for' loop handles the calling of next() and the StopIteration error automatically.
2. Generators
A Generator is a function that, instead of returning a single result using return, yields a sequence of results over time. Generators are the standard way to create iterators in a concise and efficient manner.
A. The yield Keyword
When the Python interpreter encounters yield in a function, it pauses the function’s execution, returns the value, and saves the function’s state (all local variables and instructions). When next() is called again, execution resumes exactly where it left off.
B. Generator Function Example
# A generator function
def countdown(n):
print("Starting countdown...")
while n > 0:
yield n # Pause, return n, and save state
n -= 1
print("Countdown complete.")
# Create the generator object
timer = countdown(3)
# Execution starts and runs until the first yield
print(next(timer)) # Output: Starting countdown... \n 3
# Execution resumes from the point of the previous yield
print(next(timer)) # Output: 2
print(next(timer)) # Output: 1
# Calling next() again will execute the final print and raise StopIteration.
# print(next(timer))
3. Generator Expressions
Similar to List Comprehensions (P1.7), Generator Expressions provide a concise way to define simple generators, using parentheses () instead of square brackets [].
- List Comprehension (Creates list in memory):
squares_list = [x**2 for x in range(1000000)] - Generator Expression (Creates generator object):
squares_gen = (x**2 for x in range(1000000))
# Creates the generator object (does not calculate values yet)
my_gen = (x * 2 for x in range(5))
# Values are calculated one by one as they are requested
print(next(my_gen)) # Output: 0
print(next(my_gen)) # Output: 2
4. Why Use Generators? (Efficiency)
Generators offer immense performance and memory benefits, particularly when dealing with large or infinite sequences:
- Lazy Evaluation: Values are generated on demand (lazily), one at a time. The generator does not store the entire sequence in memory.
- Memory Efficiency: When processing a billion records, a list comprehension would crash the system due to memory usage. A generator will only hold one record’s data at a time.
- Infinite Sequences: Generators can represent sequences that are theoretically infinite (e.g., all Fibonacci numbers) because they never try to calculate or store all of them.
