Generators and Iterators: Lazy Evaluation in Python
Generators and Iterators: Lazy Evaluation in Python
Generators and iterators are fundamental to Python’s approach to handling sequences and data streams efficiently. They enable lazy evaluation, memory-efficient processing, and elegant solutions to complex iteration problems.
Key Concept: Iterators produce values on-demand, while generators are a special type of iterator that use
yield
to produce values lazily.Understanding Iterators
What Makes Something Iterable?
1# Objects that can be iterated over
2iterable_examples = [
3 [1, 2, 3], # list
4 "hello", # string
5 (1, 2, 3), # tuple
6 {1, 2, 3}, # set
7 {'a': 1, 'b': 2}, # dict (iterates over keys)
8 range(5), # range object
9]
10
11# All these work in for loops
12for item in [1, 2, 3]:
13 print(item)
14
15# Behind the scenes: iter() creates an iterator
16numbers = [1, 2, 3]
17iterator = iter(numbers)
18print(next(iterator)) # 1
19print(next(iterator)) # 2
20print(next(iterator)) # 3
21# print(next(iterator)) # StopIteration exception
What to Notice:
iter()
creates an iterator from any iterablenext()
gets the next value from an iterator- StopIteration is raised when no more values are available
The Iterator Protocol
1class NumberIterator:
2 """Custom iterator that yields numbers from start to end"""
3
4 def __init__(self, start, end):
5 self.current = start
6 self.end = end
7
8 def __iter__(self):
9 """Return the iterator object (self)"""
10 return self
11
12 def __next__(self):
13 """Return the next value in the sequence"""
14 if self.current >= self.end:
15 raise StopIteration
16
17 value = self.current
18 self.current += 1
19 return value
20
21# Using the custom iterator
22numbers = NumberIterator(1, 5)
23for num in numbers:
24 print(num) # 1, 2, 3, 4
25
26# Manual iteration
27manual_iter = NumberIterator(10, 13)
28print(next(manual_iter)) # 10
29print(next(manual_iter)) # 11
30print(next(manual_iter)) # 12
31# next(manual_iter) # StopIteration
What to Notice:
- Iterator classes implement
__iter__()
and__next__()
__iter__()
returns self for iterator objects__next__()
produces the next value or raises StopIteration
Built-in Iterator Functions
1# enumerate() - adds index to any iterable
2fruits = ['apple', 'banana', 'cherry']
3for index, fruit in enumerate(fruits):
4 print(f"{index}: {fruit}")
5
6# zip() - combines multiple iterables
7names = ['Alice', 'Bob', 'Charlie']
8ages = [25, 30, 35]
9for name, age in zip(names, ages):
10 print(f"{name} is {age} years old")
11
12# map() - applies function to each element
13numbers = [1, 2, 3, 4, 5]
14squared = map(lambda x: x**2, numbers)
15print(list(squared)) # [1, 4, 9, 16, 25]
16
17# filter() - filters elements based on condition
18even_numbers = filter(lambda x: x % 2 == 0, numbers)
19print(list(even_numbers)) # [2, 4]
20
21# reversed() - reverses any sequence
22print(list(reversed(numbers))) # [5, 4, 3, 2, 1]
What to Notice:
- These functions return iterator objects (lazy evaluation)
- You need
list()
to see all values at once - They can be chained together for complex processing
Generator Functions
Basic Generator Syntax
1def count_up_to(n):
2 """Generator function that yields numbers from 0 to n-1"""
3 count = 0
4 while count < n:
5 yield count
6 count += 1
7
8# Using the generator
9counter = count_up_to(5)
10print(type(counter)) # <class 'generator'>
11
12for num in counter:
13 print(num) # 0, 1, 2, 3, 4
14
15# Generators are one-time use
16print(list(counter)) # [] - empty, already exhausted
What to Notice:
yield
makes a function a generator- Generators maintain state between calls
- They’re exhausted after one complete iteration
Generator State and Resumption
1def fibonacci_generator():
2 """Infinite Fibonacci sequence generator"""
3 a, b = 0, 1
4 while True:
5 yield a
6 a, b = b, a + b
7
8# Take first 10 Fibonacci numbers
9fib = fibonacci_generator()
10first_ten = [next(fib) for _ in range(10)]
11print(first_ten) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
12
13# Generator maintains state - continues from where it left off
14next_five = [next(fib) for _ in range(5)]
15print(next_five) # [55, 89, 144, 233, 377]
16
17def debug_generator():
18 """Shows how generators maintain state"""
19 print("Generator started")
20
21 yield 1
22 print("After first yield")
23
24 yield 2
25 print("After second yield")
26
27 yield 3
28 print("Generator finished")
29
30# Step through execution
31gen = debug_generator()
32print("Created generator")
33print(f"First: {next(gen)}")
34print(f"Second: {next(gen)}")
35print(f"Third: {next(gen)}")
What to Notice:
- Code execution pauses at each
yield
- Local variables maintain their values between yields
- Execution resumes exactly where it left off
Generator Expressions
1# Generator expression (like list comprehension but lazy)
2numbers = range(10)
3squares_gen = (x**2 for x in numbers)
4print(type(squares_gen)) # <class 'generator'>
5
6# Memory efficient - values generated on demand
7print(next(squares_gen)) # 0
8print(next(squares_gen)) # 1
9print(next(squares_gen)) # 4
10
11# Comparing memory usage
12import sys
13
14# List comprehension - all values in memory
15squares_list = [x**2 for x in range(1000000)]
16print(f"List size: {sys.getsizeof(squares_list)} bytes")
17
18# Generator expression - minimal memory
19squares_gen = (x**2 for x in range(1000000))
20print(f"Generator size: {sys.getsizeof(squares_gen)} bytes")
21
22# Complex generator expressions
23data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
24even_squares = (x**2 for x in data if x % 2 == 0)
25print(list(even_squares)) # [4, 16, 36, 64, 100]
What to Notice:
- Generator expressions use parentheses instead of square brackets
- Much more memory efficient than list comprehensions
- Can include conditions and complex expressions
Advanced Generator Patterns
Sending Values to Generators
1def accumulator():
2 """Generator that accumulates sent values"""
3 total = 0
4 while True:
5 value = yield total
6 if value is not None:
7 total += value
8
9# Using send() to pass values to generator
10acc = accumulator()
11next(acc) # Prime the generator (advance to first yield)
12
13print(acc.send(10)) # 10
14print(acc.send(5)) # 15
15print(acc.send(3)) # 18
16
17def logger_generator():
18 """Generator that processes log messages"""
19 messages = []
20 while True:
21 message = yield len(messages)
22 if message:
23 timestamp = f"[{len(messages)+1}] "
24 messages.append(timestamp + message)
25 print(f"Logged: {timestamp + message}")
26
27# Using the logger
28logger = logger_generator()
29next(logger) # Prime the generator
30
31logger.send("User logged in")
32logger.send("Database connected")
33logger.send("Process completed")
What to Notice:
send()
method passes values into the generator- Generator must be “primed” with
next()
before sending yield
can both produce and receive values
Generator Delegation with yield from
1def inner_generator():
2 """Simple generator"""
3 yield 1
4 yield 2
5 yield 3
6
7def outer_generator():
8 """Generator that delegates to another generator"""
9 yield 0
10 yield from inner_generator() # Delegate to inner generator
11 yield 4
12
13# Using yield from
14for value in outer_generator():
15 print(value) # 0, 1, 2, 3, 4
16
17def flatten_nested_lists(nested_list):
18 """Flatten arbitrarily nested lists using yield from"""
19 for item in nested_list:
20 if isinstance(item, list):
21 yield from flatten_nested_lists(item) # Recursive delegation
22 else:
23 yield item
24
25# Flattening nested structure
26nested = [1, [2, 3], [4, [5, 6]], 7]
27flat = list(flatten_nested_lists(nested))
28print(flat) # [1, 2, 3, 4, 5, 6, 7]
29
30def read_files(*filenames):
31 """Read lines from multiple files using yield from"""
32 for filename in filenames:
33 try:
34 with open(filename, 'r') as file:
35 yield from file # Delegate to file iterator
36 except FileNotFoundError:
37 yield f"Error: {filename} not found\n"
What to Notice:
yield from
delegates iteration to another iterable- Useful for flattening structures and chaining generators
- Handles StopIteration automatically
Practical Applications
Data Processing Pipeline
1def read_csv_lines(filename):
2 """Generator to read CSV file line by line"""
3 with open(filename, 'r') as file:
4 for line in file:
5 yield line.strip()
6
7def parse_csv_line(line):
8 """Parse a single CSV line"""
9 return line.split(',')
10
11def filter_valid_records(records):
12 """Filter out invalid records"""
13 for record in records:
14 if len(record) >= 3 and record[0]: # Must have ID and at least 3 fields
15 yield record
16
17def transform_record(record):
18 """Transform record format"""
19 return {
20 'id': record[0],
21 'name': record[1],
22 'value': float(record[2]) if record[2].replace('.', '').isdigit() else 0.0
23 }
24
25def process_csv_file(filename):
26 """Complete processing pipeline using generators"""
27 lines = read_csv_lines(filename)
28 records = (parse_csv_line(line) for line in lines)
29 valid_records = filter_valid_records(records)
30
31 for record in valid_records:
32 yield transform_record(record)
33
34# Example usage (if file exists)
35# for processed_record in process_csv_file('data.csv'):
36# print(processed_record)
Infinite Sequences
1def repeater(value):
2 """Infinite generator that repeats a value"""
3 while True:
4 yield value
5
6def cycle(iterable):
7 """Infinite generator that cycles through an iterable"""
8 saved = []
9 for element in iterable:
10 yield element
11 saved.append(element)
12 while saved:
13 for element in saved:
14 yield element
15
16def counter(start=0, step=1):
17 """Infinite counter generator"""
18 current = start
19 while True:
20 yield current
21 current += step
22
23# Using infinite generators with itertools
24import itertools
25
26# Take first 5 from infinite sequence
27colors = cycle(['red', 'green', 'blue'])
28first_five_colors = list(itertools.islice(colors, 5))
29print(first_five_colors) # ['red', 'green', 'blue', 'red', 'green']
30
31# Combine infinite generators
32numbers = counter(1, 2) # 1, 3, 5, 7, 9, ...
33letters = cycle('ABC') # A, B, C, A, B, C, ...
34combined = zip(numbers, letters)
35first_six = list(itertools.islice(combined, 6))
36print(first_six) # [(1, 'A'), (3, 'B'), (5, 'C'), (7, 'A'), (9, 'B'), (11, 'C')]
Memory-Efficient Data Processing
1def large_file_processor(filename, chunk_size=1024):
2 """Process large files in chunks"""
3 with open(filename, 'rb') as file:
4 while True:
5 chunk = file.read(chunk_size)
6 if not chunk:
7 break
8 yield chunk
9
10def word_frequency_counter(text_generator):
11 """Count word frequencies from a text generator"""
12 word_counts = {}
13 for text_chunk in text_generator:
14 words = text_chunk.decode('utf-8', errors='ignore').split()
15 for word in words:
16 word = word.lower().strip('.,!?";')
17 word_counts[word] = word_counts.get(word, 0) + 1
18 return word_counts
19
20def moving_average_generator(data, window_size):
21 """Calculate moving average using generator"""
22 window = []
23 for value in data:
24 window.append(value)
25 if len(window) > window_size:
26 window.pop(0)
27 if len(window) == window_size:
28 yield sum(window) / window_size
29
30# Example: Processing streaming data
31streaming_data = [1, 4, 2, 8, 5, 7, 3, 6, 9, 2]
32moving_avg = moving_average_generator(streaming_data, 3)
33for avg in moving_avg:
34 print(f"Average: {avg:.2f}")
Performance Considerations
Memory Efficiency Comparison
1import sys
2import time
3
4def measure_memory_and_time(operation_name, operation):
5 """Measure memory usage and execution time"""
6 start_time = time.time()
7 result = operation()
8 end_time = time.time()
9
10 memory_usage = sys.getsizeof(result)
11 execution_time = end_time - start_time
12
13 print(f"{operation_name}:")
14 print(f" Memory: {memory_usage} bytes")
15 print(f" Time: {execution_time:.6f} seconds")
16 return result
17
18# Compare list vs generator for large datasets
19size = 100000
20
21# List comprehension - all in memory
22list_result = measure_memory_and_time(
23 "List comprehension",
24 lambda: [x**2 for x in range(size)]
25)
26
27# Generator expression - lazy evaluation
28gen_result = measure_memory_and_time(
29 "Generator expression",
30 lambda: (x**2 for x in range(size))
31)
32
33# Processing comparison
34def process_list(data):
35 return sum(x for x in data if x % 2 == 0)
36
37def process_generator(data_gen):
38 return sum(x for x in data_gen if x % 2 == 0)
39
40print("\nProcessing results:")
41print(f"List result: {process_list(list_result[:1000])}") # First 1000 elements
42print(f"Generator result: {process_generator(x**2 for x in range(1000) if x % 2 == 0)}")
Generator Best Practices
1def efficient_batch_processor(data, batch_size):
2 """Process data in batches efficiently"""
3 batch = []
4 for item in data:
5 batch.append(item)
6 if len(batch) == batch_size:
7 yield batch
8 batch = []
9
10 # Yield remaining items
11 if batch:
12 yield batch
13
14def cached_fibonacci():
15 """Fibonacci generator with caching for efficiency"""
16 cache = {0: 0, 1: 1}
17
18 def fib(n):
19 if n not in cache:
20 cache[n] = fib(n-1) + fib(n-2)
21 return cache[n]
22
23 n = 0
24 while True:
25 yield fib(n)
26 n += 1
27
28def generator_pipeline(*generators):
29 """Chain multiple generators together"""
30 for generator in generators:
31 yield from generator
32
33# Example usage
34def odds():
35 yield from range(1, 10, 2)
36
37def evens():
38 yield from range(0, 10, 2)
39
40def squares():
41 yield from (x**2 for x in range(5))
42
43# Chain generators
44all_numbers = generator_pipeline(odds(), evens(), squares())
45print(list(all_numbers)) # [1, 3, 5, 7, 9, 0, 2, 4, 6, 8, 0, 1, 4, 9, 16]
Common Pitfalls and Solutions
Generator Exhaustion
1# Problem: Generators can only be iterated once
2def simple_generator():
3 yield 1
4 yield 2
5 yield 3
6
7gen = simple_generator()
8print(list(gen)) # [1, 2, 3]
9print(list(gen)) # [] - Generator is exhausted!
10
11# Solution 1: Create generator factory
12def generator_factory():
13 def inner():
14 yield 1
15 yield 2
16 yield 3
17 return inner
18
19gen_factory = generator_factory()
20print(list(gen_factory())) # [1, 2, 3]
21print(list(gen_factory())) # [1, 2, 3] - Works!
22
23# Solution 2: Use itertools.tee for multiple iterators
24import itertools
25
26original_gen = simple_generator()
27gen1, gen2 = itertools.tee(original_gen, 2)
28print(list(gen1)) # [1, 2, 3]
29print(list(gen2)) # [1, 2, 3]
Proper Resource Management
1def file_reader_generator(filename):
2 """Proper resource management in generators"""
3 file = None
4 try:
5 file = open(filename, 'r')
6 for line in file:
7 yield line.strip()
8 finally:
9 if file:
10 file.close()
11
12# Better approach: Use context manager
13def safe_file_reader(filename):
14 """Safe file reading with context manager"""
15 with open(filename, 'r') as file:
16 for line in file:
17 yield line.strip()
18
19# Best approach: Generator with context manager pattern
20from contextlib import contextmanager
21
22@contextmanager
23def managed_generator(generator_func, *args, **kwargs):
24 """Context manager for generators"""
25 gen = generator_func(*args, **kwargs)
26 try:
27 yield gen
28 finally:
29 gen.close() # Properly close generator
Summary
Generators and iterators are powerful Python features that enable:
- Memory efficiency through lazy evaluation
- Clean, readable code for data processing pipelines
- Infinite sequences without memory concerns
- State maintenance between function calls
- Elegant solutions to complex iteration problems
Key Takeaways
- Iterators: Objects that implement
__iter__()
and__next__()
- Generators: Functions that use
yield
to produce values lazily - Generator expressions: Memory-efficient alternative to list comprehensions
yield from
: Delegates iteration to other iterables- One-time use: Generators are exhausted after complete iteration
- Memory efficiency: Ideal for large datasets and streaming data
When to Use Generators
- Processing large files or datasets
- Creating infinite sequences
- Building data processing pipelines
- Memory-constrained environments
- When you need lazy evaluation