Generators and Iterators: Lazy Evaluation in Python

Generators and Iterators: Lazy Evaluation in Python

Generators and iterators are fundamental to Python’s approach to handling sequences and data streams efficiently. They enable lazy evaluation, memory-efficient processing, and elegant solutions to complex iteration problems.

Key Concept: Iterators produce values on-demand, while generators are a special type of iterator that use yield to produce values lazily.

Understanding Iterators

What Makes Something Iterable?

 1# Objects that can be iterated over
 2iterable_examples = [
 3    [1, 2, 3],           # list
 4    "hello",             # string
 5    (1, 2, 3),           # tuple
 6    {1, 2, 3},           # set
 7    {'a': 1, 'b': 2},    # dict (iterates over keys)
 8    range(5),            # range object
 9]
10
11# All these work in for loops
12for item in [1, 2, 3]:
13    print(item)
14
15# Behind the scenes: iter() creates an iterator
16numbers = [1, 2, 3]
17iterator = iter(numbers)
18print(next(iterator))  # 1
19print(next(iterator))  # 2
20print(next(iterator))  # 3
21# print(next(iterator))  # StopIteration exception

What to Notice:

  • iter() creates an iterator from any iterable
  • next() gets the next value from an iterator
  • StopIteration is raised when no more values are available

The Iterator Protocol

 1class NumberIterator:
 2    """Custom iterator that yields numbers from start to end"""
 3    
 4    def __init__(self, start, end):
 5        self.current = start
 6        self.end = end
 7    
 8    def __iter__(self):
 9        """Return the iterator object (self)"""
10        return self
11    
12    def __next__(self):
13        """Return the next value in the sequence"""
14        if self.current >= self.end:
15            raise StopIteration
16        
17        value = self.current
18        self.current += 1
19        return value
20
21# Using the custom iterator
22numbers = NumberIterator(1, 5)
23for num in numbers:
24    print(num)  # 1, 2, 3, 4
25
26# Manual iteration
27manual_iter = NumberIterator(10, 13)
28print(next(manual_iter))  # 10
29print(next(manual_iter))  # 11
30print(next(manual_iter))  # 12
31# next(manual_iter)       # StopIteration

What to Notice:

  • Iterator classes implement __iter__() and __next__()
  • __iter__() returns self for iterator objects
  • __next__() produces the next value or raises StopIteration

Built-in Iterator Functions

 1# enumerate() - adds index to any iterable
 2fruits = ['apple', 'banana', 'cherry']
 3for index, fruit in enumerate(fruits):
 4    print(f"{index}: {fruit}")
 5
 6# zip() - combines multiple iterables
 7names = ['Alice', 'Bob', 'Charlie']
 8ages = [25, 30, 35]
 9for name, age in zip(names, ages):
10    print(f"{name} is {age} years old")
11
12# map() - applies function to each element
13numbers = [1, 2, 3, 4, 5]
14squared = map(lambda x: x**2, numbers)
15print(list(squared))  # [1, 4, 9, 16, 25]
16
17# filter() - filters elements based on condition
18even_numbers = filter(lambda x: x % 2 == 0, numbers)
19print(list(even_numbers))  # [2, 4]
20
21# reversed() - reverses any sequence
22print(list(reversed(numbers)))  # [5, 4, 3, 2, 1]

What to Notice:

  • These functions return iterator objects (lazy evaluation)
  • You need list() to see all values at once
  • They can be chained together for complex processing

Generator Functions

Basic Generator Syntax

 1def count_up_to(n):
 2    """Generator function that yields numbers from 0 to n-1"""
 3    count = 0
 4    while count < n:
 5        yield count
 6        count += 1
 7
 8# Using the generator
 9counter = count_up_to(5)
10print(type(counter))  # <class 'generator'>
11
12for num in counter:
13    print(num)  # 0, 1, 2, 3, 4
14
15# Generators are one-time use
16print(list(counter))  # [] - empty, already exhausted

What to Notice:

  • yield makes a function a generator
  • Generators maintain state between calls
  • They’re exhausted after one complete iteration

Generator State and Resumption

 1def fibonacci_generator():
 2    """Infinite Fibonacci sequence generator"""
 3    a, b = 0, 1
 4    while True:
 5        yield a
 6        a, b = b, a + b
 7
 8# Take first 10 Fibonacci numbers
 9fib = fibonacci_generator()
10first_ten = [next(fib) for _ in range(10)]
11print(first_ten)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
12
13# Generator maintains state - continues from where it left off
14next_five = [next(fib) for _ in range(5)]
15print(next_five)  # [55, 89, 144, 233, 377]
16
17def debug_generator():
18    """Shows how generators maintain state"""
19    print("Generator started")
20    
21    yield 1
22    print("After first yield")
23    
24    yield 2
25    print("After second yield")
26    
27    yield 3
28    print("Generator finished")
29
30# Step through execution
31gen = debug_generator()
32print("Created generator")
33print(f"First: {next(gen)}")
34print(f"Second: {next(gen)}")
35print(f"Third: {next(gen)}")

What to Notice:

  • Code execution pauses at each yield
  • Local variables maintain their values between yields
  • Execution resumes exactly where it left off

Generator Expressions

 1# Generator expression (like list comprehension but lazy)
 2numbers = range(10)
 3squares_gen = (x**2 for x in numbers)
 4print(type(squares_gen))  # <class 'generator'>
 5
 6# Memory efficient - values generated on demand
 7print(next(squares_gen))  # 0
 8print(next(squares_gen))  # 1
 9print(next(squares_gen))  # 4
10
11# Comparing memory usage
12import sys
13
14# List comprehension - all values in memory
15squares_list = [x**2 for x in range(1000000)]
16print(f"List size: {sys.getsizeof(squares_list)} bytes")
17
18# Generator expression - minimal memory
19squares_gen = (x**2 for x in range(1000000))
20print(f"Generator size: {sys.getsizeof(squares_gen)} bytes")
21
22# Complex generator expressions
23data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
24even_squares = (x**2 for x in data if x % 2 == 0)
25print(list(even_squares))  # [4, 16, 36, 64, 100]

What to Notice:

  • Generator expressions use parentheses instead of square brackets
  • Much more memory efficient than list comprehensions
  • Can include conditions and complex expressions

Advanced Generator Patterns

Sending Values to Generators

 1def accumulator():
 2    """Generator that accumulates sent values"""
 3    total = 0
 4    while True:
 5        value = yield total
 6        if value is not None:
 7            total += value
 8
 9# Using send() to pass values to generator
10acc = accumulator()
11next(acc)  # Prime the generator (advance to first yield)
12
13print(acc.send(10))  # 10
14print(acc.send(5))   # 15
15print(acc.send(3))   # 18
16
17def logger_generator():
18    """Generator that processes log messages"""
19    messages = []
20    while True:
21        message = yield len(messages)
22        if message:
23            timestamp = f"[{len(messages)+1}] "
24            messages.append(timestamp + message)
25            print(f"Logged: {timestamp + message}")
26
27# Using the logger
28logger = logger_generator()
29next(logger)  # Prime the generator
30
31logger.send("User logged in")
32logger.send("Database connected")
33logger.send("Process completed")

What to Notice:

  • send() method passes values into the generator
  • Generator must be “primed” with next() before sending
  • yield can both produce and receive values

Generator Delegation with yield from

 1def inner_generator():
 2    """Simple generator"""
 3    yield 1
 4    yield 2
 5    yield 3
 6
 7def outer_generator():
 8    """Generator that delegates to another generator"""
 9    yield 0
10    yield from inner_generator()  # Delegate to inner generator
11    yield 4
12
13# Using yield from
14for value in outer_generator():
15    print(value)  # 0, 1, 2, 3, 4
16
17def flatten_nested_lists(nested_list):
18    """Flatten arbitrarily nested lists using yield from"""
19    for item in nested_list:
20        if isinstance(item, list):
21            yield from flatten_nested_lists(item)  # Recursive delegation
22        else:
23            yield item
24
25# Flattening nested structure
26nested = [1, [2, 3], [4, [5, 6]], 7]
27flat = list(flatten_nested_lists(nested))
28print(flat)  # [1, 2, 3, 4, 5, 6, 7]
29
30def read_files(*filenames):
31    """Read lines from multiple files using yield from"""
32    for filename in filenames:
33        try:
34            with open(filename, 'r') as file:
35                yield from file  # Delegate to file iterator
36        except FileNotFoundError:
37            yield f"Error: {filename} not found\n"

What to Notice:

  • yield from delegates iteration to another iterable
  • Useful for flattening structures and chaining generators
  • Handles StopIteration automatically

Practical Applications

Data Processing Pipeline

 1def read_csv_lines(filename):
 2    """Generator to read CSV file line by line"""
 3    with open(filename, 'r') as file:
 4        for line in file:
 5            yield line.strip()
 6
 7def parse_csv_line(line):
 8    """Parse a single CSV line"""
 9    return line.split(',')
10
11def filter_valid_records(records):
12    """Filter out invalid records"""
13    for record in records:
14        if len(record) >= 3 and record[0]:  # Must have ID and at least 3 fields
15            yield record
16
17def transform_record(record):
18    """Transform record format"""
19    return {
20        'id': record[0],
21        'name': record[1],
22        'value': float(record[2]) if record[2].replace('.', '').isdigit() else 0.0
23    }
24
25def process_csv_file(filename):
26    """Complete processing pipeline using generators"""
27    lines = read_csv_lines(filename)
28    records = (parse_csv_line(line) for line in lines)
29    valid_records = filter_valid_records(records)
30    
31    for record in valid_records:
32        yield transform_record(record)
33
34# Example usage (if file exists)
35# for processed_record in process_csv_file('data.csv'):
36#     print(processed_record)

Infinite Sequences

 1def repeater(value):
 2    """Infinite generator that repeats a value"""
 3    while True:
 4        yield value
 5
 6def cycle(iterable):
 7    """Infinite generator that cycles through an iterable"""
 8    saved = []
 9    for element in iterable:
10        yield element
11        saved.append(element)
12    while saved:
13        for element in saved:
14            yield element
15
16def counter(start=0, step=1):
17    """Infinite counter generator"""
18    current = start
19    while True:
20        yield current
21        current += step
22
23# Using infinite generators with itertools
24import itertools
25
26# Take first 5 from infinite sequence
27colors = cycle(['red', 'green', 'blue'])
28first_five_colors = list(itertools.islice(colors, 5))
29print(first_five_colors)  # ['red', 'green', 'blue', 'red', 'green']
30
31# Combine infinite generators
32numbers = counter(1, 2)  # 1, 3, 5, 7, 9, ...
33letters = cycle('ABC')   # A, B, C, A, B, C, ...
34combined = zip(numbers, letters)
35first_six = list(itertools.islice(combined, 6))
36print(first_six)  # [(1, 'A'), (3, 'B'), (5, 'C'), (7, 'A'), (9, 'B'), (11, 'C')]

Memory-Efficient Data Processing

 1def large_file_processor(filename, chunk_size=1024):
 2    """Process large files in chunks"""
 3    with open(filename, 'rb') as file:
 4        while True:
 5            chunk = file.read(chunk_size)
 6            if not chunk:
 7                break
 8            yield chunk
 9
10def word_frequency_counter(text_generator):
11    """Count word frequencies from a text generator"""
12    word_counts = {}
13    for text_chunk in text_generator:
14        words = text_chunk.decode('utf-8', errors='ignore').split()
15        for word in words:
16            word = word.lower().strip('.,!?";')
17            word_counts[word] = word_counts.get(word, 0) + 1
18    return word_counts
19
20def moving_average_generator(data, window_size):
21    """Calculate moving average using generator"""
22    window = []
23    for value in data:
24        window.append(value)
25        if len(window) > window_size:
26            window.pop(0)
27        if len(window) == window_size:
28            yield sum(window) / window_size
29
30# Example: Processing streaming data
31streaming_data = [1, 4, 2, 8, 5, 7, 3, 6, 9, 2]
32moving_avg = moving_average_generator(streaming_data, 3)
33for avg in moving_avg:
34    print(f"Average: {avg:.2f}")

Performance Considerations

Memory Efficiency Comparison

 1import sys
 2import time
 3
 4def measure_memory_and_time(operation_name, operation):
 5    """Measure memory usage and execution time"""
 6    start_time = time.time()
 7    result = operation()
 8    end_time = time.time()
 9    
10    memory_usage = sys.getsizeof(result)
11    execution_time = end_time - start_time
12    
13    print(f"{operation_name}:")
14    print(f"  Memory: {memory_usage} bytes")
15    print(f"  Time: {execution_time:.6f} seconds")
16    return result
17
18# Compare list vs generator for large datasets
19size = 100000
20
21# List comprehension - all in memory
22list_result = measure_memory_and_time(
23    "List comprehension",
24    lambda: [x**2 for x in range(size)]
25)
26
27# Generator expression - lazy evaluation
28gen_result = measure_memory_and_time(
29    "Generator expression",
30    lambda: (x**2 for x in range(size))
31)
32
33# Processing comparison
34def process_list(data):
35    return sum(x for x in data if x % 2 == 0)
36
37def process_generator(data_gen):
38    return sum(x for x in data_gen if x % 2 == 0)
39
40print("\nProcessing results:")
41print(f"List result: {process_list(list_result[:1000])}")  # First 1000 elements
42print(f"Generator result: {process_generator(x**2 for x in range(1000) if x % 2 == 0)}")

Generator Best Practices

 1def efficient_batch_processor(data, batch_size):
 2    """Process data in batches efficiently"""
 3    batch = []
 4    for item in data:
 5        batch.append(item)
 6        if len(batch) == batch_size:
 7            yield batch
 8            batch = []
 9    
10    # Yield remaining items
11    if batch:
12        yield batch
13
14def cached_fibonacci():
15    """Fibonacci generator with caching for efficiency"""
16    cache = {0: 0, 1: 1}
17    
18    def fib(n):
19        if n not in cache:
20            cache[n] = fib(n-1) + fib(n-2)
21        return cache[n]
22    
23    n = 0
24    while True:
25        yield fib(n)
26        n += 1
27
28def generator_pipeline(*generators):
29    """Chain multiple generators together"""
30    for generator in generators:
31        yield from generator
32
33# Example usage
34def odds(): 
35    yield from range(1, 10, 2)
36
37def evens(): 
38    yield from range(0, 10, 2)
39
40def squares(): 
41    yield from (x**2 for x in range(5))
42
43# Chain generators
44all_numbers = generator_pipeline(odds(), evens(), squares())
45print(list(all_numbers))  # [1, 3, 5, 7, 9, 0, 2, 4, 6, 8, 0, 1, 4, 9, 16]

Common Pitfalls and Solutions

Generator Exhaustion

 1# Problem: Generators can only be iterated once
 2def simple_generator():
 3    yield 1
 4    yield 2
 5    yield 3
 6
 7gen = simple_generator()
 8print(list(gen))  # [1, 2, 3]
 9print(list(gen))  # [] - Generator is exhausted!
10
11# Solution 1: Create generator factory
12def generator_factory():
13    def inner():
14        yield 1
15        yield 2
16        yield 3
17    return inner
18
19gen_factory = generator_factory()
20print(list(gen_factory()))  # [1, 2, 3]
21print(list(gen_factory()))  # [1, 2, 3] - Works!
22
23# Solution 2: Use itertools.tee for multiple iterators
24import itertools
25
26original_gen = simple_generator()
27gen1, gen2 = itertools.tee(original_gen, 2)
28print(list(gen1))  # [1, 2, 3]
29print(list(gen2))  # [1, 2, 3]

Proper Resource Management

 1def file_reader_generator(filename):
 2    """Proper resource management in generators"""
 3    file = None
 4    try:
 5        file = open(filename, 'r')
 6        for line in file:
 7            yield line.strip()
 8    finally:
 9        if file:
10            file.close()
11
12# Better approach: Use context manager
13def safe_file_reader(filename):
14    """Safe file reading with context manager"""
15    with open(filename, 'r') as file:
16        for line in file:
17            yield line.strip()
18
19# Best approach: Generator with context manager pattern
20from contextlib import contextmanager
21
22@contextmanager
23def managed_generator(generator_func, *args, **kwargs):
24    """Context manager for generators"""
25    gen = generator_func(*args, **kwargs)
26    try:
27        yield gen
28    finally:
29        gen.close()  # Properly close generator

Summary

Generators and iterators are powerful Python features that enable:

  • Memory efficiency through lazy evaluation
  • Clean, readable code for data processing pipelines
  • Infinite sequences without memory concerns
  • State maintenance between function calls
  • Elegant solutions to complex iteration problems

Key Takeaways

  1. Iterators: Objects that implement __iter__() and __next__()
  2. Generators: Functions that use yield to produce values lazily
  3. Generator expressions: Memory-efficient alternative to list comprehensions
  4. yield from: Delegates iteration to other iterables
  5. One-time use: Generators are exhausted after complete iteration
  6. Memory efficiency: Ideal for large datasets and streaming data

When to Use Generators

  • Processing large files or datasets
  • Creating infinite sequences
  • Building data processing pipelines
  • Memory-constrained environments
  • When you need lazy evaluation

← Previous: Slicing Next: File I/O Operations →