Generators and Iterators: Lazy Evaluation in Python

Generators and iterators are fundamental to Python’s approach to handling sequences and data streams efficiently. They enable lazy evaluation, memory-efficient processing, and elegant solutions to complex iteration problems.

Key Concept: Iterators produce values on-demand, while generators are a special type of iterator that use yield to produce values lazily.

Understanding Iterators

What Makes Something Iterable?

 1# Objects that can be iterated over
 2iterable_examples = [
 3    [1, 2, 3],           # list
 4    "hello",             # string
 5    (1, 2, 3),           # tuple
 6    {1, 2, 3},           # set
 7    {'a': 1, 'b': 2},    # dict (iterates over keys)
 8    range(5),            # range object
 9]
10
11# All these work in for loops
12for item in [1, 2, 3]:
13    print(item)
14
15# Behind the scenes: iter() creates an iterator
16numbers = [1, 2, 3]
17iterator = iter(numbers)
18print(next(iterator))  # 1
19print(next(iterator))  # 2
20print(next(iterator))  # 3
21# print(next(iterator))  # StopIteration exception

What to Notice:

iter() creates an iterator from any iterable
next() gets the next value from an iterator
StopIteration is raised when no more values are available

The Iterator Protocol

 1class NumberIterator:
 2    """Custom iterator that yields numbers from start to end"""
 3    
 4    def __init__(self, start, end):
 5        self.current = start
 6        self.end = end
 7    
 8    def __iter__(self):
 9        """Return the iterator object (self)"""
10        return self
11    
12    def __next__(self):
13        """Return the next value in the sequence"""
14        if self.current >= self.end:
15            raise StopIteration
16        
17        value = self.current
18        self.current += 1
19        return value
20
21# Using the custom iterator
22numbers = NumberIterator(1, 5)
23for num in numbers:
24    print(num)  # 1, 2, 3, 4
25
26# Manual iteration
27manual_iter = NumberIterator(10, 13)
28print(next(manual_iter))  # 10
29print(next(manual_iter))  # 11
30print(next(manual_iter))  # 12
31# next(manual_iter)       # StopIteration

What to Notice:

Iterator classes implement __iter__() and __next__()
__iter__() returns self for iterator objects
__next__() produces the next value or raises StopIteration

Built-in Iterator Functions

 1# enumerate() - adds index to any iterable
 2fruits = ['apple', 'banana', 'cherry']
 3for index, fruit in enumerate(fruits):
 4    print(f"{index}: {fruit}")
 5
 6# zip() - combines multiple iterables
 7names = ['Alice', 'Bob', 'Charlie']
 8ages = [25, 30, 35]
 9for name, age in zip(names, ages):
10    print(f"{name} is {age} years old")
11
12# map() - applies function to each element
13numbers = [1, 2, 3, 4, 5]
14squared = map(lambda x: x**2, numbers)
15print(list(squared))  # [1, 4, 9, 16, 25]
16
17# filter() - filters elements based on condition
18even_numbers = filter(lambda x: x % 2 == 0, numbers)
19print(list(even_numbers))  # [2, 4]
20
21# reversed() - reverses any sequence
22print(list(reversed(numbers)))  # [5, 4, 3, 2, 1]

What to Notice:

These functions return iterator objects (lazy evaluation)
You need list() to see all values at once
They can be chained together for complex processing

Generator Functions

Basic Generator Syntax

 1def count_up_to(n):
 2    """Generator function that yields numbers from 0 to n-1"""
 3    count = 0
 4    while count < n:
 5        yield count
 6        count += 1
 7
 8# Using the generator
 9counter = count_up_to(5)
10print(type(counter))  # <class 'generator'>
11
12for num in counter:
13    print(num)  # 0, 1, 2, 3, 4
14
15# Generators are one-time use
16print(list(counter))  # [] - empty, already exhausted

What to Notice:

yield makes a function a generator
Generators maintain state between calls
They’re exhausted after one complete iteration

Generator State and Resumption

 1def fibonacci_generator():
 2    """Infinite Fibonacci sequence generator"""
 3    a, b = 0, 1
 4    while True:
 5        yield a
 6        a, b = b, a + b
 7
 8# Take first 10 Fibonacci numbers
 9fib = fibonacci_generator()
10first_ten = [next(fib) for _ in range(10)]
11print(first_ten)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
12
13# Generator maintains state - continues from where it left off
14next_five = [next(fib) for _ in range(5)]
15print(next_five)  # [55, 89, 144, 233, 377]
16
17def debug_generator():
18    """Shows how generators maintain state"""
19    print("Generator started")
20    
21    yield 1
22    print("After first yield")
23    
24    yield 2
25    print("After second yield")
26    
27    yield 3
28    print("Generator finished")
29
30# Step through execution
31gen = debug_generator()
32print("Created generator")
33print(f"First: {next(gen)}")
34print(f"Second: {next(gen)}")
35print(f"Third: {next(gen)}")

What to Notice:

Code execution pauses at each yield
Local variables maintain their values between yields
Execution resumes exactly where it left off

Generator Expressions

 1# Generator expression (like list comprehension but lazy)
 2numbers = range(10)
 3squares_gen = (x**2 for x in numbers)
 4print(type(squares_gen))  # <class 'generator'>
 5
 6# Memory efficient - values generated on demand
 7print(next(squares_gen))  # 0
 8print(next(squares_gen))  # 1
 9print(next(squares_gen))  # 4
10
11# Comparing memory usage
12import sys
13
14# List comprehension - all values in memory
15squares_list = [x**2 for x in range(1000000)]
16print(f"List size: {sys.getsizeof(squares_list)} bytes")
17
18# Generator expression - minimal memory
19squares_gen = (x**2 for x in range(1000000))
20print(f"Generator size: {sys.getsizeof(squares_gen)} bytes")
21
22# Complex generator expressions
23data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
24even_squares = (x**2 for x in data if x % 2 == 0)
25print(list(even_squares))  # [4, 16, 36, 64, 100]

What to Notice:

Generator expressions use parentheses instead of square brackets
Much more memory efficient than list comprehensions
Can include conditions and complex expressions

Advanced Generator Patterns

Sending Values to Generators

 1def accumulator():
 2    """Generator that accumulates sent values"""
 3    total = 0
 4    while True:
 5        value = yield total
 6        if value is not None:
 7            total += value
 8
 9# Using send() to pass values to generator
10acc = accumulator()
11next(acc)  # Prime the generator (advance to first yield)
12
13print(acc.send(10))  # 10
14print(acc.send(5))   # 15
15print(acc.send(3))   # 18
16
17def logger_generator():
18    """Generator that processes log messages"""
19    messages = []
20    while True:
21        message = yield len(messages)
22        if message:
23            timestamp = f"[{len(messages)+1}] "
24            messages.append(timestamp + message)
25            print(f"Logged: {timestamp + message}")
26
27# Using the logger
28logger = logger_generator()
29next(logger)  # Prime the generator
30
31logger.send("User logged in")
32logger.send("Database connected")
33logger.send("Process completed")

What to Notice:

send() method passes values into the generator
Generator must be “primed” with next() before sending
yield can both produce and receive values

Generator Delegation with `yield from`

 1def inner_generator():
 2    """Simple generator"""
 3    yield 1
 4    yield 2
 5    yield 3
 6
 7def outer_generator():
 8    """Generator that delegates to another generator"""
 9    yield 0
10    yield from inner_generator()  # Delegate to inner generator
11    yield 4
12
13# Using yield from
14for value in outer_generator():
15    print(value)  # 0, 1, 2, 3, 4
16
17def flatten_nested_lists(nested_list):
18    """Flatten arbitrarily nested lists using yield from"""
19    for item in nested_list:
20        if isinstance(item, list):
21            yield from flatten_nested_lists(item)  # Recursive delegation
22        else:
23            yield item
24
25# Flattening nested structure
26nested = [1, [2, 3], [4, [5, 6]], 7]
27flat = list(flatten_nested_lists(nested))
28print(flat)  # [1, 2, 3, 4, 5, 6, 7]
29
30def read_files(*filenames):
31    """Read lines from multiple files using yield from"""
32    for filename in filenames:
33        try:
34            with open(filename, 'r') as file:
35                yield from file  # Delegate to file iterator
36        except FileNotFoundError:
37            yield f"Error: {filename} not found\n"

What to Notice:

yield from delegates iteration to another iterable
Useful for flattening structures and chaining generators
Handles StopIteration automatically

Practical Applications

Data Processing Pipeline

 1def read_csv_lines(filename):
 2    """Generator to read CSV file line by line"""
 3    with open(filename, 'r') as file:
 4        for line in file:
 5            yield line.strip()
 6
 7def parse_csv_line(line):
 8    """Parse a single CSV line"""
 9    return line.split(',')
10
11def filter_valid_records(records):
12    """Filter out invalid records"""
13    for record in records:
14        if len(record) >= 3 and record[0]:  # Must have ID and at least 3 fields
15            yield record
16
17def transform_record(record):
18    """Transform record format"""
19    return {
20        'id': record[0],
21        'name': record[1],
22        'value': float(record[2]) if record[2].replace('.', '').isdigit() else 0.0
23    }
24
25def process_csv_file(filename):
26    """Complete processing pipeline using generators"""
27    lines = read_csv_lines(filename)
28    records = (parse_csv_line(line) for line in lines)
29    valid_records = filter_valid_records(records)
30    
31    for record in valid_records:
32        yield transform_record(record)
33
34# Example usage (if file exists)
35# for processed_record in process_csv_file('data.csv'):
36#     print(processed_record)

Infinite Sequences

 1def repeater(value):
 2    """Infinite generator that repeats a value"""
 3    while True:
 4        yield value
 5
 6def cycle(iterable):
 7    """Infinite generator that cycles through an iterable"""
 8    saved = []
 9    for element in iterable:
10        yield element
11        saved.append(element)
12    while saved:
13        for element in saved:
14            yield element
15
16def counter(start=0, step=1):
17    """Infinite counter generator"""
18    current = start
19    while True:
20        yield current
21        current += step
22
23# Using infinite generators with itertools
24import itertools
25
26# Take first 5 from infinite sequence
27colors = cycle(['red', 'green', 'blue'])
28first_five_colors = list(itertools.islice(colors, 5))
29print(first_five_colors)  # ['red', 'green', 'blue', 'red', 'green']
30
31# Combine infinite generators
32numbers = counter(1, 2)  # 1, 3, 5, 7, 9, ...
33letters = cycle('ABC')   # A, B, C, A, B, C, ...
34combined = zip(numbers, letters)
35first_six = list(itertools.islice(combined, 6))
36print(first_six)  # [(1, 'A'), (3, 'B'), (5, 'C'), (7, 'A'), (9, 'B'), (11, 'C')]

Memory-Efficient Data Processing

 1def large_file_processor(filename, chunk_size=1024):
 2    """Process large files in chunks"""
 3    with open(filename, 'rb') as file:
 4        while True:
 5            chunk = file.read(chunk_size)
 6            if not chunk:
 7                break
 8            yield chunk
 9
10def word_frequency_counter(text_generator):
11    """Count word frequencies from a text generator"""
12    word_counts = {}
13    for text_chunk in text_generator:
14        words = text_chunk.decode('utf-8', errors='ignore').split()
15        for word in words:
16            word = word.lower().strip('.,!?";')
17            word_counts[word] = word_counts.get(word, 0) + 1
18    return word_counts
19
20def moving_average_generator(data, window_size):
21    """Calculate moving average using generator"""
22    window = []
23    for value in data:
24        window.append(value)
25        if len(window) > window_size:
26            window.pop(0)
27        if len(window) == window_size:
28            yield sum(window) / window_size
29
30# Example: Processing streaming data
31streaming_data = [1, 4, 2, 8, 5, 7, 3, 6, 9, 2]
32moving_avg = moving_average_generator(streaming_data, 3)
33for avg in moving_avg:
34    print(f"Average: {avg:.2f}")

Performance Considerations

Memory Efficiency Comparison

 1import sys
 2import time
 3
 4def measure_memory_and_time(operation_name, operation):
 5    """Measure memory usage and execution time"""
 6    start_time = time.time()
 7    result = operation()
 8    end_time = time.time()
 9    
10    memory_usage = sys.getsizeof(result)
11    execution_time = end_time - start_time
12    
13    print(f"{operation_name}:")
14    print(f"  Memory: {memory_usage} bytes")
15    print(f"  Time: {execution_time:.6f} seconds")
16    return result
17
18# Compare list vs generator for large datasets
19size = 100000
20
21# List comprehension - all in memory
22list_result = measure_memory_and_time(
23    "List comprehension",
24    lambda: [x**2 for x in range(size)]
25)
26
27# Generator expression - lazy evaluation
28gen_result = measure_memory_and_time(
29    "Generator expression",
30    lambda: (x**2 for x in range(size))
31)
32
33# Processing comparison
34def process_list(data):
35    return sum(x for x in data if x % 2 == 0)
36
37def process_generator(data_gen):
38    return sum(x for x in data_gen if x % 2 == 0)
39
40print("\nProcessing results:")
41print(f"List result: {process_list(list_result[:1000])}")  # First 1000 elements
42print(f"Generator result: {process_generator(x**2 for x in range(1000) if x % 2 == 0)}")

Generator Best Practices

 1def efficient_batch_processor(data, batch_size):
 2    """Process data in batches efficiently"""
 3    batch = []
 4    for item in data:
 5        batch.append(item)
 6        if len(batch) == batch_size:
 7            yield batch
 8            batch = []
 9    
10    # Yield remaining items
11    if batch:
12        yield batch
13
14def cached_fibonacci():
15    """Fibonacci generator with caching for efficiency"""
16    cache = {0: 0, 1: 1}
17    
18    def fib(n):
19        if n not in cache:
20            cache[n] = fib(n-1) + fib(n-2)
21        return cache[n]
22    
23    n = 0
24    while True:
25        yield fib(n)
26        n += 1
27
28def generator_pipeline(*generators):
29    """Chain multiple generators together"""
30    for generator in generators:
31        yield from generator
32
33# Example usage
34def odds(): 
35    yield from range(1, 10, 2)
36
37def evens(): 
38    yield from range(0, 10, 2)
39
40def squares(): 
41    yield from (x**2 for x in range(5))
42
43# Chain generators
44all_numbers = generator_pipeline(odds(), evens(), squares())
45print(list(all_numbers))  # [1, 3, 5, 7, 9, 0, 2, 4, 6, 8, 0, 1, 4, 9, 16]

Common Pitfalls and Solutions

Generator Exhaustion

 1# Problem: Generators can only be iterated once
 2def simple_generator():
 3    yield 1
 4    yield 2
 5    yield 3
 6
 7gen = simple_generator()
 8print(list(gen))  # [1, 2, 3]
 9print(list(gen))  # [] - Generator is exhausted!
10
11# Solution 1: Create generator factory
12def generator_factory():
13    def inner():
14        yield 1
15        yield 2
16        yield 3
17    return inner
18
19gen_factory = generator_factory()
20print(list(gen_factory()))  # [1, 2, 3]
21print(list(gen_factory()))  # [1, 2, 3] - Works!
22
23# Solution 2: Use itertools.tee for multiple iterators
24import itertools
25
26original_gen = simple_generator()
27gen1, gen2 = itertools.tee(original_gen, 2)
28print(list(gen1))  # [1, 2, 3]
29print(list(gen2))  # [1, 2, 3]

Proper Resource Management

 1def file_reader_generator(filename):
 2    """Proper resource management in generators"""
 3    file = None
 4    try:
 5        file = open(filename, 'r')
 6        for line in file:
 7            yield line.strip()
 8    finally:
 9        if file:
10            file.close()
11
12# Better approach: Use context manager
13def safe_file_reader(filename):
14    """Safe file reading with context manager"""
15    with open(filename, 'r') as file:
16        for line in file:
17            yield line.strip()
18
19# Best approach: Generator with context manager pattern
20from contextlib import contextmanager
21
22@contextmanager
23def managed_generator(generator_func, *args, **kwargs):
24    """Context manager for generators"""
25    gen = generator_func(*args, **kwargs)
26    try:
27        yield gen
28    finally:
29        gen.close()  # Properly close generator

Summary

Generators and iterators are powerful Python features that enable:

Memory efficiency through lazy evaluation
Clean, readable code for data processing pipelines
Infinite sequences without memory concerns
State maintenance between function calls
Elegant solutions to complex iteration problems

Key Takeaways

Iterators: Objects that implement __iter__() and __next__()
Generators: Functions that use yield to produce values lazily
Generator expressions: Memory-efficient alternative to list comprehensions
yield from: Delegates iteration to other iterables
One-time use: Generators are exhausted after complete iteration
Memory efficiency: Ideal for large datasets and streaming data

When to Use Generators

Processing large files or datasets
Creating infinite sequences
Building data processing pipelines
Memory-constrained environments
When you need lazy evaluation

← Previous: Slicing Next: File I/O Operations →

Python Lambdas and Functional Programming Why PIP?

Generators and Iterators: Lazy Evaluation in Python

Understanding Iterators

What Makes Something Iterable?

The Iterator Protocol

Built-in Iterator Functions

Generator Functions

Basic Generator Syntax

Generator State and Resumption

Generator Expressions

Advanced Generator Patterns

Sending Values to Generators

Generator Delegation with yield from

Practical Applications

Data Processing Pipeline

Infinite Sequences

Memory-Efficient Data Processing

Performance Considerations

Memory Efficiency Comparison

Generator Best Practices

Common Pitfalls and Solutions

Generator Exhaustion

Proper Resource Management

Summary

Key Takeaways

When to Use Generators

Generator Delegation with `yield from`