Python Comprehensions
Comprehensions are one of Python’s most elegant features, providing a concise way to create lists, dictionaries, and sets from existing iterables. They combine the power of loops and conditionals into readable, efficient expressions that embody Python’s philosophy of beautiful, expressive code.
Understanding List Comprehensions
List comprehensions provide a concise way to create lists by applying an expression to each item in an iterable, optionally filtering items with a condition.
Basic List Comprehensions
1# Traditional approach
2numbers = [1, 2, 3, 4, 5]
3squares = []
4for num in numbers:
5 squares.append(num ** 2)
6print(squares) # [1, 4, 9, 16, 25]
7
8# List comprehension approach
9numbers = [1, 2, 3, 4, 5]
10squares = [num ** 2 for num in numbers]
11print(squares) # [1, 4, 9, 16, 25]
12
13# Working with strings
14words = ['hello', 'world', 'python', 'comprehension']
15lengths = [len(word) for word in words]
16print(lengths) # [5, 5, 6, 12]
17
18uppercase_words = [word.upper() for word in words]
19print(uppercase_words) # ['HELLO', 'WORLD', 'PYTHON', 'COMPREHENSION']
20
21# Mathematical operations
22celsius_temps = [0, 20, 30, 40, 100]
23fahrenheit_temps = [(temp * 9/5) + 32 for temp in celsius_temps]
24print(fahrenheit_temps) # [32.0, 68.0, 86.0, 104.0, 212.0]
Key Benefits:
- More concise and readable than traditional loops
- Often faster execution due to optimized C implementation
- Creates the list in a single operation
1# Filter even numbers and square them
2numbers = range(1, 11)
3even_squares = [num ** 2 for num in numbers if num % 2 == 0]
4print(even_squares) # [4, 16, 36, 64, 100]
5
6# Filter and transform strings
7words = ['apple', 'banana', 'cherry', 'date', 'elderberry']
8long_words_upper = [word.upper() for word in words if len(word) > 5]
9print(long_words_upper) # ['BANANA', 'CHERRY', 'ELDERBERRY']
10
11# Multiple conditions
12numbers = range(1, 21)
13special_numbers = [num for num in numbers if num % 3 == 0 and num % 2 != 0]
14print(special_numbers) # [3, 9, 15] - odd multiples of 3
15
16# Working with file processing
17file_lines = [
18 "# This is a comment",
19 "import os",
20 "",
21 "def hello():",
22 " # Another comment",
23 " print('Hello')",
24 "",
25 "hello()"
26]
27
28# Extract non-empty, non-comment lines
29code_lines = [
30 line.strip() for line in file_lines
31 if line.strip() and not line.strip().startswith('#')
32]
33print(code_lines)
34# ['import os', 'def hello():', "print('Hello')", 'hello()']
1# Cartesian product using nested comprehensions
2colors = ['red', 'green', 'blue']
3sizes = ['S', 'M', 'L']
4
5# Traditional nested loops
6products = []
7for color in colors:
8 for size in sizes:
9 products.append(f"{color}-{size}")
10
11# List comprehension with nested loops
12products = [f"{color}-{size}" for color in colors for size in sizes]
13print(products)
14# ['red-S', 'red-M', 'red-L', 'green-S', 'green-M', 'green-L', 'blue-S', 'blue-M', 'blue-L']
15
16# Creating a multiplication table
17multiplication_table = [
18 [i * j for j in range(1, 11)]
19 for i in range(1, 11)
20]
21
22# Print the table nicely
23for row in multiplication_table[:5]: # Show first 5 rows
24 print([f"{num:3d}" for num in row])
25
26# Flattening nested lists
27nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
28flattened = [item for sublist in nested_list for item in sublist]
29print(flattened) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
30
31# Complex example: Finding pairs of numbers that sum to a target
32numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
33target_sum = 10
34
35pairs = [
36 (x, y) for x in numbers for y in numbers
37 if x < y and x + y == target_sum
38]
39print(pairs) # [(1, 9), (2, 8), (3, 7), (4, 6)]
Advanced List Comprehension Patterns
1# Using conditional expressions (ternary operator) in comprehensions
2numbers = [-2, -1, 0, 1, 2, 3, 4, 5]
3
4# Convert negative numbers to 0, keep positive numbers
5processed = [num if num > 0 else 0 for num in numbers]
6print(processed) # [0, 0, 0, 1, 2, 3, 4, 5]
7
8# Categorize numbers
9categories = [
10 'negative' if num < 0 else 'zero' if num == 0 else 'positive'
11 for num in numbers
12]
13print(categories)
14# ['negative', 'negative', 'zero', 'positive', 'positive', 'positive', 'positive', 'positive']
15
16# Grade calculator
17scores = [95, 87, 76, 69, 58, 91, 82]
18grades = [
19 'A' if score >= 90 else
20 'B' if score >= 80 else
21 'C' if score >= 70 else
22 'D' if score >= 60 else 'F'
23 for score in scores
24]
25print(list(zip(scores, grades)))
26# [(95, 'A'), (87, 'B'), (76, 'C'), (69, 'D'), (58, 'F'), (91, 'A'), (82, 'B')]
27
28# Data cleaning with conditional expressions
29raw_data = ['123', 'abc', '456', '', 'def', '789', None]
30cleaned_numbers = [
31 int(item) if item and item.isdigit() else 0
32 for item in raw_data if item is not None
33]
34print(cleaned_numbers) # [123, 0, 456, 0, 0, 789]
1import math
2
3# Applying different functions based on conditions
4def process_number(n):
5 """Apply different mathematical operations based on number properties."""
6 if n < 0:
7 return abs(n) # Absolute value for negative numbers
8 elif n == 0:
9 return 1 # Special case for zero
10 elif n % 2 == 0:
11 return n ** 2 # Square even numbers
12 else:
13 return math.sqrt(n) # Square root for odd positive numbers
14
15numbers = [-4, -1, 0, 1, 2, 3, 4, 5, 6]
16results = [process_number(n) for n in numbers]
17print(results)
18# [4, 1, 1, 1.0, 4, 1.7320508075688772, 16, 2.23606797749979, 36]
19
20# Using lambda functions in comprehensions
21data = ['apple', 'Banana', 'CHERRY', 'date']
22normalized = [(word.lower(), len(word)) for word in data]
23print(normalized) # [('apple', 5), ('banana', 6), ('cherry', 6), ('date', 4)]
24
25# Applying multiple transformations
26text_data = [' Hello World ', 'PYTHON programming', 'data Science ']
27processed = [
28 text.strip().title().replace(' ', '_')
29 for text in text_data
30]
31print(processed) # ['Hello_World', 'Python_Programming', 'Data_Science']
32
33# Working with dates
34from datetime import datetime, timedelta
35
36base_date = datetime(2024, 1, 1)
37dates = [base_date + timedelta(days=i*7) for i in range(5)] # Weekly dates
38formatted_dates = [
39 date.strftime('%Y-%m-%d (%A)')
40 for date in dates
41]
42print(formatted_dates)
43# ['2024-01-01 (Monday)', '2024-01-08 (Monday)', '2024-01-15 (Monday)', '2024-01-22 (Monday)', '2024-01-29 (Monday)']
1# Safe operations with error handling
2def safe_divide(a, b):
3 """Safely divide two numbers, return None if division by zero."""
4 try:
5 return a / b
6 except ZeroDivisionError:
7 return None
8
9numerators = [10, 20, 30, 40]
10denominators = [2, 0, 3, 4]
11
12# Handle errors gracefully in comprehensions
13results = [
14 safe_divide(num, den)
15 for num, den in zip(numerators, denominators)
16]
17print(results) # [5.0, None, 10.0, 10.0]
18
19# Filter out error results
20valid_results = [
21 result for result in results
22 if result is not None
23]
24print(valid_results) # [5.0, 10.0, 10.0]
25
26# Convert strings to numbers with error handling
27mixed_data = ['123', 'abc', '45.6', 'def', '789', '12.34']
28
29def safe_float_convert(value):
30 try:
31 return float(value)
32 except ValueError:
33 return 0.0
34
35numbers = [safe_float_convert(item) for item in mixed_data]
36print(numbers) # [123.0, 0.0, 45.6, 0.0, 789.0, 12.34]
37
38# Extract valid numbers only
39valid_numbers = [
40 float(item) for item in mixed_data
41 if item.replace('.', '').replace('-', '').isdigit()
42]
43print(valid_numbers) # [123.0, 45.6, 789.0, 12.34]
Dictionary Comprehensions
Dictionary comprehensions provide a concise way to create dictionaries by transforming or filtering key-value pairs.
1# Creating dictionaries from lists
2numbers = [1, 2, 3, 4, 5]
3squares_dict = {num: num ** 2 for num in numbers}
4print(squares_dict) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
5
6# Creating dictionaries from two lists
7names = ['Alice', 'Bob', 'Charlie', 'Diana']
8ages = [25, 30, 35, 28]
9people = {name: age for name, age in zip(names, ages)}
10print(people) # {'Alice': 25, 'Bob': 30, 'Charlie': 35, 'Diana': 28}
11
12# Reversing key-value pairs
13original = {'a': 1, 'b': 2, 'c': 3}
14reversed_dict = {value: key for key, value in original.items()}
15print(reversed_dict) # {1: 'a', 2: 'b', 3: 'c'}
16
17# Creating lookup tables
18fruits = ['apple', 'banana', 'cherry', 'date']
19fruit_lengths = {fruit: len(fruit) for fruit in fruits}
20print(fruit_lengths) # {'apple': 5, 'banana': 6, 'cherry': 6, 'date': 4}
21
22# Word frequency counting (simple version)
23text = "hello world hello python world"
24words = text.split()
25word_count = {word: words.count(word) for word in set(words)}
26print(word_count) # {'world': 2, 'hello': 2, 'python': 1}
1# Filter and transform data
2student_scores = {
3 'Alice': 95,
4 'Bob': 67,
5 'Charlie': 78,
6 'Diana': 92,
7 'Eve': 84,
8 'Frank': 56
9}
10
11# Filter students with scores above 80
12high_achievers = {
13 name: score for name, score in student_scores.items()
14 if score >= 80
15}
16print(high_achievers) # {'Alice': 95, 'Diana': 92, 'Eve': 84}
17
18# Transform scores to letter grades
19def score_to_grade(score):
20 if score >= 90: return 'A'
21 elif score >= 80: return 'B'
22 elif score >= 70: return 'C'
23 elif score >= 60: return 'D'
24 else: return 'F'
25
26student_grades = {
27 name: score_to_grade(score)
28 for name, score in student_scores.items()
29}
30print(student_grades)
31# {'Alice': 'A', 'Bob': 'D', 'Charlie': 'C', 'Diana': 'A', 'Eve': 'B', 'Frank': 'F'}
32
33# Environment variable processing
34import os
35env_vars = dict(os.environ)
36
37# Filter environment variables containing 'PATH'
38path_vars = {
39 key: value for key, value in env_vars.items()
40 if 'PATH' in key.upper()
41}
42print(list(path_vars.keys())[:3]) # Show first 3 PATH-related variables
43
44# Configuration processing
45raw_config = {
46 'debug': 'true',
47 'max_connections': '100',
48 'timeout': '30.0',
49 'host': 'localhost',
50 'port': '8080'
51}
52
53# Convert string values to appropriate types
54def convert_value(value):
55 if value.lower() in ('true', 'false'):
56 return value.lower() == 'true'
57 elif value.isdigit():
58 return int(value)
59 elif '.' in value and value.replace('.', '').isdigit():
60 return float(value)
61 else:
62 return value
63
64typed_config = {
65 key: convert_value(value)
66 for key, value in raw_config.items()
67}
68print(typed_config)
69# {'debug': True, 'max_connections': 100, 'timeout': 30.0, 'host': 'localhost', 'port': 8080}
1# Working with nested data structures
2employees = [
3 {'name': 'Alice', 'department': 'Engineering', 'salary': 95000, 'years': 3},
4 {'name': 'Bob', 'department': 'Marketing', 'salary': 67000, 'years': 1},
5 {'name': 'Charlie', 'department': 'Engineering', 'salary': 78000, 'years': 2},
6 {'name': 'Diana', 'department': 'Sales', 'salary': 82000, 'years': 4},
7 {'name': 'Eve', 'department': 'Engineering', 'salary': 89000, 'years': 5}
8]
9
10# Create name to salary mapping for engineering department
11eng_salaries = {
12 emp['name']: emp['salary']
13 for emp in employees
14 if emp['department'] == 'Engineering'
15}
16print(eng_salaries) # {'Alice': 95000, 'Charlie': 78000, 'Eve': 89000}
17
18# Calculate salary per year of experience
19salary_per_year = {
20 emp['name']: round(emp['salary'] / emp['years'], 2)
21 for emp in employees
22}
23print(salary_per_year)
24# {'Alice': 31666.67, 'Bob': 67000.0, 'Charlie': 39000.0, 'Diana': 20500.0, 'Eve': 17800.0}
25
26# Group employees by department
27from collections import defaultdict
28
29dept_employees = defaultdict(list)
30for emp in employees:
31 dept_employees[emp['department']].append(emp['name'])
32
33# Convert to regular dict using comprehension
34dept_dict = {dept: names for dept, names in dept_employees.items()}
35print(dept_dict)
36# {'Engineering': ['Alice', 'Charlie', 'Eve'], 'Marketing': ['Bob'], 'Sales': ['Diana']}
37
38# Data aggregation
39sales_data = [
40 {'product': 'Widget A', 'region': 'North', 'sales': 1000},
41 {'product': 'Widget B', 'region': 'North', 'sales': 1500},
42 {'product': 'Widget A', 'region': 'South', 'sales': 800},
43 {'product': 'Widget B', 'region': 'South', 'sales': 1200},
44 {'product': 'Widget A', 'region': 'East', 'sales': 900},
45 {'product': 'Widget B', 'region': 'East', 'sales': 1100}
46]
47
48# Total sales by product
49product_totals = {}
50for sale in sales_data:
51 product = sale['product']
52 product_totals[product] = product_totals.get(product, 0) + sale['sales']
53
54# Using comprehension with aggregation helper
55product_sales = {
56 product: sum(sale['sales'] for sale in sales_data if sale['product'] == product)
57 for product in {sale['product'] for sale in sales_data}
58}
59print(product_sales) # {'Widget A': 2700, 'Widget B': 3800}
Set Comprehensions
Set comprehensions create sets with automatic duplicate removal and unordered storage.
1# Creating sets from lists with duplicates
2numbers = [1, 2, 2, 3, 3, 3, 4, 4, 5]
3unique_squares = {num ** 2 for num in numbers}
4print(unique_squares) # {1, 4, 9, 16, 25}
5
6# Extract unique characters from strings
7words = ['hello', 'world', 'python']
8all_chars = {char for word in words for char in word}
9print(sorted(all_chars)) # ['d', 'e', 'h', 'l', 'n', 'o', 'p', 'r', 't', 'w', 'y']
10
11# Unique word lengths
12sentences = [
13 "The quick brown fox",
14 "jumps over the lazy dog",
15 "Python is awesome",
16 "Set comprehensions are powerful"
17]
18
19unique_lengths = {
20 len(word) for sentence in sentences
21 for word in sentence.split()
22}
23print(sorted(unique_lengths)) # [2, 3, 4, 5, 7, 8, 9, 13]
24
25# Extract file extensions
26filenames = [
27 'document.pdf', 'image.jpg', 'script.py',
28 'data.csv', 'photo.jpg', 'code.py', 'text.txt'
29]
30
31extensions = {filename.split('.')[-1] for filename in filenames}
32print(extensions) # {'pdf', 'jpg', 'py', 'csv', 'txt'}
33
34# Domain extraction from email addresses
35emails = [
36 'alice@gmail.com', 'bob@yahoo.com', 'charlie@gmail.com',
37 'diana@hotmail.com', 'eve@yahoo.com', 'frank@outlook.com'
38]
39
40domains = {email.split('@')[1] for email in emails}
41print(domains) # {'gmail.com', 'yahoo.com', 'hotmail.com', 'outlook.com'}
1# Clean and deduplicate data
2raw_tags = [
3 'Python', 'python', 'PYTHON', 'Java', 'java',
4 'JavaScript', 'javascript', 'Go', 'go', 'Rust'
5]
6
7# Normalize and deduplicate tags
8clean_tags = {tag.lower() for tag in raw_tags}
9print(clean_tags) # {'python', 'java', 'javascript', 'go', 'rust'}
10
11# Extract valid identifiers from mixed data
12identifiers = [
13 'user123', 'admin', '123invalid', 'test_user',
14 'user-name', 'valid_id', '_private', '9numbers'
15]
16
17# Valid Python identifiers (simplified check)
18valid_ids = {
19 id_name for id_name in identifiers
20 if id_name.isidentifier()
21}
22print(valid_ids) # {'user123', 'admin', 'test_user', 'valid_id', '_private'}
23
24# Extract unique error types from log entries
25log_entries = [
26 "ERROR: Connection timeout",
27 "WARNING: Low memory",
28 "ERROR: File not found",
29 "INFO: Process started",
30 "ERROR: Connection timeout",
31 "ERROR: Permission denied",
32 "WARNING: Low memory"
33]
34
35error_types = {
36 entry.split(':')[0] for entry in log_entries
37 if entry.split(':')[0] in ['ERROR', 'WARNING']
38}
39print(error_types) # {'ERROR', 'WARNING'}
40
41# More specific: extract actual error messages
42error_messages = {
43 entry.split(':', 1)[1].strip() for entry in log_entries
44 if entry.startswith('ERROR:')
45}
46print(error_messages)
47# {'Connection timeout', 'File not found', 'Permission denied'}
48
49# Process survey responses
50responses = [
51 'Yes', 'No', 'yes', 'YES', 'no', 'Maybe',
52 'maybe', 'MAYBE', 'Definitely', 'Absolutely not'
53]
54
55# Normalize yes/no responses
56normalized = {
57 'yes' if response.lower() in ['yes', 'definitely']
58 else 'no' if response.lower() in ['no', 'absolutely not']
59 else 'maybe'
60 for response in responses
61}
62print(normalized) # {'yes', 'no', 'maybe'}
1# Mathematical operations with set comprehensions
2numbers_a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
3numbers_b = {5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
4
5# Find numbers that are perfect squares
6perfect_squares = {num for num in range(1, 101) if int(num ** 0.5) ** 2 == num}
7print(list(sorted(perfect_squares))[:10]) # [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
8
9# Prime numbers using simple sieve (not efficient for large numbers)
10def is_prime(n):
11 if n < 2:
12 return False
13 return all(n % i != 0 for i in range(2, int(n ** 0.5) + 1))
14
15primes_under_50 = {num for num in range(2, 50) if is_prime(num)}
16print(sorted(primes_under_50))
17# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
18
19# Fibonacci numbers
20def fibonacci_set(limit):
21 fibs = set()
22 a, b = 0, 1
23 while a <= limit:
24 fibs.add(a)
25 a, b = b, a + b
26 return fibs
27
28fib_set = fibonacci_set(100)
29print(sorted(fib_set)) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
30
31# Digit analysis
32numbers = [123, 456, 789, 321, 654, 987, 111, 222]
33
34# All unique digits used
35all_digits = {digit for num in numbers for digit in str(num)}
36print(sorted(all_digits)) # ['1', '2', '3', '4', '5', '6', '7', '8', '9']
37
38# Numbers containing only even digits
39even_digit_numbers = {
40 num for num in numbers
41 if all(int(digit) % 2 == 0 for digit in str(num))
42}
43print(even_digit_numbers) # {456, 222}
44
45# Palindromic numbers
46palindromes = {
47 num for num in range(100, 1000)
48 if str(num) == str(num)[::-1]
49}
50print(list(sorted(palindromes))[:10]) # [101, 111, 121, 131, 141, 151, 161, 171, 181, 191]
Performance Considerations
1import time
2
3# Performance comparison: List comprehension vs for loop
4def test_performance():
5 data_size = 1000000
6
7 # Using list comprehension
8 start_time = time.time()
9 squares_comp = [x**2 for x in range(data_size)]
10 comp_time = time.time() - start_time
11
12 # Using traditional loop
13 start_time = time.time()
14 squares_loop = []
15 for x in range(data_size):
16 squares_loop.append(x**2)
17 loop_time = time.time() - start_time
18
19 print(f"List comprehension: {comp_time:.4f} seconds")
20 print(f"Traditional loop: {loop_time:.4f} seconds")
21 print(f"Comprehension is {loop_time/comp_time:.2f}x faster")
22
23# Memory efficiency comparison
24def memory_efficient_processing():
25 """Demonstrate memory-efficient data processing."""
26
27 # Generator expression (memory efficient)
28 large_data = range(1000000)
29
30 # This doesn't create the entire list in memory at once
31 processed_generator = (x**2 for x in large_data if x % 2 == 0)
32
33 # Process in chunks
34 chunk_size = 1000
35 chunk = []
36 for i, value in enumerate(processed_generator):
37 chunk.append(value)
38 if len(chunk) == chunk_size:
39 # Process chunk (e.g., write to file, send to database)
40 avg = sum(chunk) / len(chunk)
41 print(f"Chunk {i//chunk_size + 1} average: {avg:.2f}")
42 chunk = []
43
44 # Only process first few chunks for demo
45 if i > 5000:
46 break
47
48# Test performance
49test_performance()
50memory_efficient_processing()
1# Guidelines for choosing the right comprehension type
2
3# Use LIST comprehensions when:
4# 1. You need to maintain order
5# 2. You need to access elements by index
6# 3. You may have duplicate values
7example_list = [word.lower() for word in ['Apple', 'Banana', 'apple', 'Cherry']]
8print(f"List preserves duplicates: {example_list}")
9# ['apple', 'banana', 'apple', 'cherry']
10
11# Use SET comprehensions when:
12# 1. You need unique values only
13# 2. Order doesn't matter
14# 3. You want fast membership testing
15example_set = {word.lower() for word in ['Apple', 'Banana', 'apple', 'Cherry']}
16print(f"Set removes duplicates: {example_set}")
17# {'apple', 'banana', 'cherry'}
18
19# Use DICT comprehensions when:
20# 1. You need key-value mapping
21# 2. You want O(1) lookup time
22# 3. You're transforming or filtering existing dictionaries
23words = ['Apple', 'Banana', 'Cherry']
24example_dict = {word: len(word) for word in words}
25print(f"Dict for mapping: {example_dict}")
26# {'Apple': 5, 'Banana': 6, 'Cherry': 6}
27
28# Performance implications
29def performance_comparison():
30 data = list(range(10000))
31
32 # List comprehension - creates full list in memory
33 start = time.time()
34 result_list = [x**2 for x in data if x % 2 == 0]
35 list_time = time.time() - start
36
37 # Set comprehension - creates set (hash table)
38 start = time.time()
39 result_set = {x**2 for x in data if x % 2 == 0}
40 set_time = time.time() - start
41
42 # Dictionary comprehension - creates hash table with key-value pairs
43 start = time.time()
44 result_dict = {x: x**2 for x in data if x % 2 == 0}
45 dict_time = time.time() - start
46
47 print(f"List comprehension: {list_time:.6f} seconds")
48 print(f"Set comprehension: {set_time:.6f} seconds")
49 print(f"Dict comprehension: {dict_time:.6f} seconds")
50
51 # Membership testing comparison
52 test_value = 5000**2
53
54 start = time.time()
55 found_in_list = test_value in result_list
56 list_search_time = time.time() - start
57
58 start = time.time()
59 found_in_set = test_value in result_set
60 set_search_time = time.time() - start
61
62 start = time.time()
63 found_in_dict = test_value in result_dict.values()
64 dict_search_time = time.time() - start
65
66 print(f"\nMembership testing:")
67 print(f"List search: {list_search_time:.8f} seconds")
68 print(f"Set search: {set_search_time:.8f} seconds")
69 print(f"Dict values search: {dict_search_time:.8f} seconds")
70
71performance_comparison()
Common Patterns and Best Practices
Data Processing Patterns
Data Transformation Pipeline
1# Multi-step data processing using comprehensions
2raw_data = [
3 " John Doe, 25, Engineer ",
4 " Jane Smith, 30, Designer ",
5 " Bob Johnson, 35, Manager ",
6 " Alice Brown, 28, Developer "
7]
8
9# Step 1: Clean and split data
10cleaned_data = [
11 line.strip().split(', ')
12 for line in raw_data
13 if line.strip()
14]
15
16# Step 2: Convert to dictionaries
17people = [
18 {
19 'name': parts[0],
20 'age': int(parts[1]),
21 'job': parts[2]
22 }
23 for parts in cleaned_data
24]
25
26# Step 3: Filter and transform
27senior_employees = {
28 person['name']: person['job']
29 for person in people
30 if person['age'] >= 30
31}
32
33print(senior_employees)
34# {'Jane Smith': 'Designer', 'Bob Johnson': 'Manager'}
Configuration Processing
1# Processing configuration files
2config_lines = [
3 "# Database configuration",
4 "db_host=localhost",
5 "db_port=5432",
6 "",
7 "# API settings",
8 "api_key=secret123",
9 "api_timeout=30",
10 "debug=true"
11]
12
13# Parse configuration
14config = {
15 key: value
16 for line in config_lines
17 if '=' in line
18 for key, value in [line.split('=', 1)]
19}
20
21print(config)
22# {'db_host': 'localhost', 'db_port': '5432', 'api_key': 'secret123', 'api_timeout': '30', 'debug': 'true'}
Error Handling in Comprehensions
Limitation: Comprehensions don’t support try/except statements directly. Use helper functions for error handling:
1def safe_int_convert(value):
2 try:
3 return int(value)
4 except ValueError:
5 return None
6
7# Good: Using helper function
8mixed_data = ['123', 'abc', '456', 'def']
9numbers = [safe_int_convert(x) for x in mixed_data]
10valid_numbers = [x for x in numbers if x is not None]
11
12# Alternative: Filter first, then convert
13valid_numbers = [
14 int(x) for x in mixed_data
15 if x.isdigit()
16]
Advanced Comprehension Techniques
1# Using the walrus operator (:=) in comprehensions
2import re
3
4# Process text and capture match groups
5text_lines = [
6 "Error: File not found at line 42",
7 "Warning: Memory usage at 85%",
8 "Error: Connection timeout after 30s",
9 "Info: Process completed successfully"
10]
11
12# Extract error details using walrus operator
13error_pattern = r"Error: (.+) at (.+)"
14error_details = [
15 (match.group(1), match.group(2))
16 for line in text_lines
17 if (match := re.search(error_pattern, line))
18]
19print(error_details)
20# [('File not found', 'line 42'), ('Connection timeout', '30s')]
21
22# Calculate expensive operations once
23def expensive_calculation(x):
24 """Simulate an expensive calculation."""
25 import time
26 time.sleep(0.01) # Simulate processing time
27 return x ** 2 + x ** 0.5
28
29numbers = [1, 4, 9, 16, 25]
30
31# Without walrus operator (calculates twice)
32# results = [calc for calc in [expensive_calculation(x) for x in numbers] if calc > 10]
33
34# With walrus operator (calculates once)
35results = [
36 calc for x in numbers
37 if (calc := expensive_calculation(x)) > 10
38]
39print(results)
1# Working with complex nested data
2data = {
3 'users': [
4 {
5 'id': 1,
6 'name': 'Alice',
7 'posts': [
8 {'title': 'Python Tips', 'tags': ['python', 'programming']},
9 {'title': 'Data Science', 'tags': ['data', 'science', 'python']}
10 ]
11 },
12 {
13 'id': 2,
14 'name': 'Bob',
15 'posts': [
16 {'title': 'Web Development', 'tags': ['web', 'html', 'css']},
17 {'title': 'JavaScript Guide', 'tags': ['javascript', 'web']}
18 ]
19 }
20 ]
21}
22
23# Extract all unique tags from all posts
24all_tags = {
25 tag
26 for user in data['users']
27 for post in user['posts']
28 for tag in post['tags']
29}
30print(sorted(all_tags))
31# ['css', 'data', 'html', 'javascript', 'programming', 'python', 'science', 'web']
32
33# Create user-to-tags mapping
34user_tags = {
35 user['name']: {
36 tag
37 for post in user['posts']
38 for tag in post['tags']
39 }
40 for user in data['users']
41}
42print(user_tags)
43# {'Alice': {'python', 'programming', 'data', 'science'}, 'Bob': {'web', 'html', 'css', 'javascript'}}
44
45# Find users who have posted about specific topics
46python_users = {
47 user['name']
48 for user in data['users']
49 if any('python' in post['tags'] for post in user['posts'])
50}
51print(python_users) # {'Alice'}
52
53# Complex aggregation: count posts per tag
54from collections import defaultdict
55
56tag_counts = defaultdict(int)
57for user in data['users']:
58 for post in user['posts']:
59 for tag in post['tags']:
60 tag_counts[tag] += 1
61
62# Convert to regular dict using comprehension
63tag_frequency = {tag: count for tag, count in tag_counts.items()}
64print(tag_frequency)
65# {'python': 2, 'programming': 1, 'data': 1, 'science': 1, 'web': 2, 'html': 1, 'css': 1, 'javascript': 1}
1# Using generator expressions for memory efficiency
2import sys
3
4# Memory-efficient processing of large datasets
5def process_large_file(filename):
6 """Process a large file line by line."""
7 # Simulate file content
8 lines = [f"line {i}: some data here" for i in range(1000)]
9
10 # Memory-efficient: generator expression inside sum
11 total_length = sum(len(line) for line in lines if 'data' in line)
12
13 # Memory-efficient: process in chunks
14 chunk_size = 100
15 chunks = [
16 lines[i:i + chunk_size]
17 for i in range(0, len(lines), chunk_size)
18 ]
19
20 # Process each chunk
21 chunk_stats = [
22 {
23 'chunk_id': i,
24 'line_count': len(chunk),
25 'avg_length': sum(len(line) for line in chunk) / len(chunk)
26 }
27 for i, chunk in enumerate(chunks)
28 ]
29
30 return chunk_stats[:3] # Return first 3 chunks for demo
31
32# Nested generator expressions
33matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
34
35# Sum all elements using nested generator
36total = sum(cell for row in matrix for cell in row)
37print(f"Matrix sum: {total}") # 45
38
39# Find maximum in nested structure efficiently
40max_value = max(cell for row in matrix for cell in row)
41print(f"Maximum value: {max_value}") # 9
42
43# Memory comparison
44def memory_comparison():
45 size = 10000
46
47 # List comprehension - creates full list
48 list_comp = [x**2 for x in range(size)]
49 list_size = sys.getsizeof(list_comp)
50
51 # Generator expression - creates generator object
52 gen_expr = (x**2 for x in range(size))
53 gen_size = sys.getsizeof(gen_expr)
54
55 print(f"List comprehension size: {list_size} bytes")
56 print(f"Generator expression size: {gen_size} bytes")
57 print(f"Memory savings: {list_size / gen_size:.1f}x")
58
59process_result = process_large_file("dummy.txt")
60print(f"Processed chunks: {len(process_result)}")
61memory_comparison()
Common Pitfalls and Solutions
Common Mistakes:
- Modifying lists during iteration:
1# Wrong: Modifying the original list
2numbers = [1, 2, 3, 4, 5]
3# Don't do this in a comprehension context
4
5# Right: Create a new list
6even_numbers = [x for x in numbers if x % 2 == 0]
- Complex logic in comprehensions:
1# Wrong: Too complex for a comprehension
2complex_result = [
3 x if x > 0 else -x if x < -10 else 0
4 for x in numbers
5 if some_complex_condition(x) and another_condition(x)
6]
7
8# Right: Use a helper function
9def process_number(x):
10 if not (some_complex_condition(x) and another_condition(x)):
11 return None
12 return x if x > 0 else -x if x < -10 else 0
13
14simple_result = [
15 result for x in numbers
16 if (result := process_number(x)) is not None
17]
- Side effects in comprehensions:
1# Wrong: Side effects make code unclear
2results = [print(x) or x**2 for x in numbers] # Don't do this
3
4# Right: Separate side effects from data creation
5for x in numbers:
6 print(x)
7results = [x**2 for x in numbers]
Summary
Python comprehensions are powerful tools that enable:
- Concise, readable code that expresses intent clearly
- Better performance than equivalent loop constructs
- Functional programming patterns in Python
- Memory-efficient data processing with generator expressions
When to Use Comprehensions
✅ Use comprehensions for:
- Simple transformations and filtering
- Creating collections from existing iterables
- Data cleaning and normalization
- Mathematical operations on sequences
❌ Avoid comprehensions for:
- Complex logic that hurts readability
- Operations with side effects
- Very long or deeply nested expressions
- Cases where traditional loops are clearer
Master comprehensions to write more Pythonic, efficient, and elegant code!
Related Topics
- Generators & Iterators - Lazy evaluation and memory efficiency
- Python Fundamentals - Core concepts and data types
- File I/O Operations - Working with files using comprehensions
Next Steps
Ready to explore more advanced Python patterns? Check out:
- Decorators & Context Managers - Code enhancement patterns
- Object-Oriented Programming (Coming Soon)
- Concurrent Programming (Coming Soon)