Aggregation and Grouping

Aggregation and Grouping

Here’s where SQL gets really powerful. You’ve learned to find individual rows with SELECT and WHERE. Now let’s learn to summarize entire datasets. Want to know how many employees you have? What your average salary is? Which department spends the most? That’s aggregation, and it’s going to change how you think about data.

The Five Essential Aggregation Functions

SQL gives you five workhorses for summarizing data:

COUNT - How Many?

1SELECT COUNT(*) FROM employees;

This tells you how many rows are in the employees table. COUNT(*) counts everything, including rows with NULL values.

1SELECT COUNT(middle_name) FROM employees;

This counts only rows where middle_name is NOT NULL. Big difference.

SUM - Add Them Up

1SELECT SUM(salary) FROM employees;

Total payroll across all employees. SUM ignores NULL values automatically.

AVG - What’s the Average?

1SELECT AVG(salary) FROM employees;

Average salary. Again, NULL values are ignored in the calculation.

MIN and MAX - Extremes

1SELECT MIN(salary), MAX(salary) FROM employees;

Lowest and highest salaries. You can combine multiple aggregations in one query.

GROUP BY: The Game Changer

Here’s where things get interesting. Aggregation functions alone give you one number for your entire table. GROUP BY lets you get one number for each group within your table.

1SELECT department, COUNT(*)
2FROM employees
3GROUP BY department;

This gives you something like:

| department  | count |
|-------------|-------|
| IT          | 15    |
| Sales       | 23    |
| Marketing   | 8     |
| Engineering | 12    |

You’re essentially saying: “Group all employees by department, then count how many are in each group.”

More GROUP BY Examples

Average salary by department:

1SELECT department, AVG(salary) as avg_salary
2FROM employees
3GROUP BY department;

Total sales by year:

1SELECT YEAR(sale_date) as year, SUM(amount) as total_sales
2FROM sales
3GROUP BY YEAR(sale_date);

Employee count and average salary by department:

1SELECT department, 
2       COUNT(*) as employee_count,
3       AVG(salary) as avg_salary
4FROM employees
5GROUP BY department;

The Golden Rule of GROUP BY

If you use GROUP BY, every column in your SELECT must either be:

  1. In the GROUP BY clause, or
  2. Inside an aggregation function
 1-- This works:
 2SELECT department, COUNT(*)
 3FROM employees
 4GROUP BY department;
 5
 6-- This breaks:
 7SELECT department, first_name, COUNT(*)
 8FROM employees
 9GROUP BY department;
10-- Error! first_name isn't grouped or aggregated

Why? Because if you’re grouping by department, which first_name would SQL show for the IT department? There are 15 people in IT - it can’t pick just one name.

HAVING vs. WHERE: Filtering Groups vs. Filtering Rows

This trips up everyone at first. WHERE filters rows before grouping. HAVING filters groups after grouping.

 1-- WHERE: Filter employees first, then group
 2SELECT department, AVG(salary)
 3FROM employees
 4WHERE hire_date > '2020-01-01'  -- Only recent hires
 5GROUP BY department;
 6
 7-- HAVING: Group first, then filter groups
 8SELECT department, AVG(salary)
 9FROM employees
10GROUP BY department
11HAVING AVG(salary) > 60000;  -- Only departments with high avg salary

You can use both:

1SELECT department, AVG(salary) as avg_salary
2FROM employees
3WHERE status = 'Active'  -- Only active employees
4GROUP BY department
5HAVING AVG(salary) > 50000  -- Only departments averaging over 50k
6ORDER BY avg_salary DESC;

Common Grouping Patterns

Top spenders by category:

1SELECT category, SUM(amount) as total_spent
2FROM purchases
3GROUP BY category
4ORDER BY total_spent DESC
5LIMIT 5;

Monthly sales trends:

1SELECT YEAR(order_date) as year, 
2       MONTH(order_date) as month,
3       COUNT(*) as order_count,
4       SUM(total) as monthly_revenue
5FROM orders
6GROUP BY YEAR(order_date), MONTH(order_date)
7ORDER BY year, month;

Performance by employee:

1SELECT employee_id, 
2       COUNT(*) as sales_count,
3       AVG(sale_amount) as avg_sale,
4       SUM(sale_amount) as total_sales
5FROM sales
6WHERE sale_date >= '2024-01-01'
7GROUP BY employee_id
8HAVING COUNT(*) >= 10  -- Only employees with 10+ sales
9ORDER BY total_sales DESC;

Real-World Example: Sales Analysis

Let’s say you’re analyzing sales performance:

 1SELECT 
 2    region,
 3    COUNT(*) as num_sales,
 4    SUM(amount) as total_revenue,
 5    AVG(amount) as avg_sale_size,
 6    MIN(amount) as smallest_sale,
 7    MAX(amount) as largest_sale
 8FROM sales
 9WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
10GROUP BY region
11HAVING COUNT(*) > 100  -- Regions with substantial activity
12ORDER BY total_revenue DESC;

This gives you a complete regional performance breakdown in one query.

Common Mistakes

Forgetting to group by non-aggregated columns:

1-- Wrong:
2SELECT department, first_name, COUNT(*)
3FROM employees
4GROUP BY department;
5
6-- Right:
7SELECT department, COUNT(*)
8FROM employees
9GROUP BY department;

Using WHERE instead of HAVING for group conditions:

 1-- Wrong:
 2SELECT department, AVG(salary)
 3FROM employees
 4WHERE AVG(salary) > 50000  -- Can't use aggregation in WHERE
 5GROUP BY department;
 6
 7-- Right:
 8SELECT department, AVG(salary)
 9FROM employees
10GROUP BY department
11HAVING AVG(salary) > 50000;

Counting wrong:

1-- COUNT(*) counts all rows
2-- COUNT(column) counts non-NULL values in that column
3-- They're different if your column has NULLs

Why This Matters

Aggregation and grouping transform you from someone who looks at individual records to someone who understands data patterns. Instead of scrolling through thousands of employee records, you can instantly see:

  • Which departments are growing or shrinking
  • Where your revenue is coming from
  • What your key performance metrics look like
  • How different segments of your business are performing

This is the difference between data and information. Raw data tells you John Smith makes $65,000. Aggregated data tells you your engineering department’s average salary increased 8% this year.

Practice Ideas

  1. Find the total number of employees in each department
  2. Calculate average, minimum, and maximum salaries by job title
  3. Count how many orders each customer has placed
  4. Find departments with more than 10 employees
  5. Calculate monthly revenue trends for the past year

Master these patterns, and you’ll be able to answer most business questions that come your way. Every dashboard, every report, every “how are we doing?” question - they all come down to aggregation and grouping.