Aggregation and Grouping
Here’s where SQL gets really powerful. You’ve learned to find individual rows with SELECT and WHERE. Now let’s learn to summarize entire datasets. Want to know how many employees you have? What your average salary is? Which department spends the most? That’s aggregation, and it’s going to change how you think about data.
The Five Essential Aggregation Functions
SQL gives you five workhorses for summarizing data:
COUNT - How Many?
1SELECT COUNT(*) FROM employees;
This tells you how many rows are in the employees table. COUNT(*) counts everything, including rows with NULL values.
1SELECT COUNT(middle_name) FROM employees;
This counts only rows where middle_name is NOT NULL. Big difference.
SUM - Add Them Up
1SELECT SUM(salary) FROM employees;
Total payroll across all employees. SUM ignores NULL values automatically.
AVG - What’s the Average?
1SELECT AVG(salary) FROM employees;
Average salary. Again, NULL values are ignored in the calculation.
MIN and MAX - Extremes
1SELECT MIN(salary), MAX(salary) FROM employees;
Lowest and highest salaries. You can combine multiple aggregations in one query.
GROUP BY: The Game Changer
Here’s where things get interesting. Aggregation functions alone give you one number for your entire table. GROUP BY lets you get one number for each group within your table.
1SELECT department, COUNT(*)
2FROM employees
3GROUP BY department;
This gives you something like:
| department | count |
|-------------|-------|
| IT | 15 |
| Sales | 23 |
| Marketing | 8 |
| Engineering | 12 |
You’re essentially saying: “Group all employees by department, then count how many are in each group.”
More GROUP BY Examples
Average salary by department:
1SELECT department, AVG(salary) as avg_salary
2FROM employees
3GROUP BY department;
Total sales by year:
1SELECT YEAR(sale_date) as year, SUM(amount) as total_sales
2FROM sales
3GROUP BY YEAR(sale_date);
Employee count and average salary by department:
1SELECT department,
2 COUNT(*) as employee_count,
3 AVG(salary) as avg_salary
4FROM employees
5GROUP BY department;
The Golden Rule of GROUP BY
If you use GROUP BY, every column in your SELECT must either be:
- In the GROUP BY clause, or
- Inside an aggregation function
1-- This works:
2SELECT department, COUNT(*)
3FROM employees
4GROUP BY department;
5
6-- This breaks:
7SELECT department, first_name, COUNT(*)
8FROM employees
9GROUP BY department;
10-- Error! first_name isn't grouped or aggregated
Why? Because if you’re grouping by department, which first_name would SQL show for the IT department? There are 15 people in IT - it can’t pick just one name.
HAVING vs. WHERE: Filtering Groups vs. Filtering Rows
This trips up everyone at first. WHERE filters rows before grouping. HAVING filters groups after grouping.
1-- WHERE: Filter employees first, then group
2SELECT department, AVG(salary)
3FROM employees
4WHERE hire_date > '2020-01-01' -- Only recent hires
5GROUP BY department;
6
7-- HAVING: Group first, then filter groups
8SELECT department, AVG(salary)
9FROM employees
10GROUP BY department
11HAVING AVG(salary) > 60000; -- Only departments with high avg salary
You can use both:
1SELECT department, AVG(salary) as avg_salary
2FROM employees
3WHERE status = 'Active' -- Only active employees
4GROUP BY department
5HAVING AVG(salary) > 50000 -- Only departments averaging over 50k
6ORDER BY avg_salary DESC;
Common Grouping Patterns
Top spenders by category:
1SELECT category, SUM(amount) as total_spent
2FROM purchases
3GROUP BY category
4ORDER BY total_spent DESC
5LIMIT 5;
Monthly sales trends:
1SELECT YEAR(order_date) as year,
2 MONTH(order_date) as month,
3 COUNT(*) as order_count,
4 SUM(total) as monthly_revenue
5FROM orders
6GROUP BY YEAR(order_date), MONTH(order_date)
7ORDER BY year, month;
Performance by employee:
1SELECT employee_id,
2 COUNT(*) as sales_count,
3 AVG(sale_amount) as avg_sale,
4 SUM(sale_amount) as total_sales
5FROM sales
6WHERE sale_date >= '2024-01-01'
7GROUP BY employee_id
8HAVING COUNT(*) >= 10 -- Only employees with 10+ sales
9ORDER BY total_sales DESC;
Real-World Example: Sales Analysis
Let’s say you’re analyzing sales performance:
1SELECT
2 region,
3 COUNT(*) as num_sales,
4 SUM(amount) as total_revenue,
5 AVG(amount) as avg_sale_size,
6 MIN(amount) as smallest_sale,
7 MAX(amount) as largest_sale
8FROM sales
9WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
10GROUP BY region
11HAVING COUNT(*) > 100 -- Regions with substantial activity
12ORDER BY total_revenue DESC;
This gives you a complete regional performance breakdown in one query.
Common Mistakes
Forgetting to group by non-aggregated columns:
1-- Wrong:
2SELECT department, first_name, COUNT(*)
3FROM employees
4GROUP BY department;
5
6-- Right:
7SELECT department, COUNT(*)
8FROM employees
9GROUP BY department;
Using WHERE instead of HAVING for group conditions:
1-- Wrong:
2SELECT department, AVG(salary)
3FROM employees
4WHERE AVG(salary) > 50000 -- Can't use aggregation in WHERE
5GROUP BY department;
6
7-- Right:
8SELECT department, AVG(salary)
9FROM employees
10GROUP BY department
11HAVING AVG(salary) > 50000;
Counting wrong:
1-- COUNT(*) counts all rows
2-- COUNT(column) counts non-NULL values in that column
3-- They're different if your column has NULLs
Why This Matters
Aggregation and grouping transform you from someone who looks at individual records to someone who understands data patterns. Instead of scrolling through thousands of employee records, you can instantly see:
- Which departments are growing or shrinking
- Where your revenue is coming from
- What your key performance metrics look like
- How different segments of your business are performing
This is the difference between data and information. Raw data tells you John Smith makes $65,000. Aggregated data tells you your engineering department’s average salary increased 8% this year.
Practice Ideas
- Find the total number of employees in each department
- Calculate average, minimum, and maximum salaries by job title
- Count how many orders each customer has placed
- Find departments with more than 10 employees
- Calculate monthly revenue trends for the past year
Master these patterns, and you’ll be able to answer most business questions that come your way. Every dashboard, every report, every “how are we doing?” question - they all come down to aggregation and grouping.