Advanced Queries
Master JOINs, subqueries, window functions, and complex SQL operations
Table Relationships and JOINs
JOINs are used to combine data from multiple tables based on relationships between them. Understanding JOINs is crucial for working with relational databases effectively.
Before diving into JOINs, make sure you understand primary keys and foreign keys, as they form the foundation of table relationships.
Sample Data Structure
Let's work with these example tables throughout this guide:
-- Users table
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100),
city VARCHAR(50)
);
-- Orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT,
order_date DATE,
total_amount DECIMAL(10,2),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
-- Products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10,2),
category VARCHAR(50)
);
-- Order items table (many-to-many relationship)
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
unit_price DECIMAL(10,2),
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);Types of JOINs
INNER JOIN
Returns only rows that have matching values in both tables.
-- Get users with their orders
SELECT u.username, u.email, o.order_date, o.total_amount
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;
-- Multiple table INNER JOIN
SELECT u.username, p.product_name, oi.quantity, oi.unit_price
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id;LEFT JOIN (LEFT OUTER JOIN)
Returns all rows from the left table and matching rows from the right table. If no match, NULL values for right table columns.
-- Get all users, including those without orders
SELECT u.username, u.email, o.order_date, o.total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;
-- Count orders per user (including users with 0 orders)
SELECT u.username, COUNT(o.order_id) as order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username;RIGHT JOIN (RIGHT OUTER JOIN)
Returns all rows from the right table and matching rows from the left table.
-- Get all orders, even if user data is missing
SELECT u.username, o.order_date, o.total_amount
FROM users u
RIGHT JOIN orders o ON u.user_id = o.user_id;FULL OUTER JOIN
Returns all rows when there's a match in either table. Not supported by all databases (e.g., MySQL).
-- Get all users and all orders (PostgreSQL, SQL Server)
SELECT u.username, o.order_date, o.total_amount
FROM users u
FULL OUTER JOIN orders o ON u.user_id = o.user_id;
-- MySQL alternative using UNION
SELECT u.username, o.order_date, o.total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
UNION
SELECT u.username, o.order_date, o.total_amount
FROM users u
RIGHT JOIN orders o ON u.user_id = o.user_id;CROSS JOIN
Returns the Cartesian product of both tables (every row from first table combined with every row from second table).
-- Generate all possible user-product combinations
SELECT u.username, p.product_name
FROM users u
CROSS JOIN products p;
-- Useful for generating test data or reports
SELECT u.username, p.category
FROM users u
CROSS JOIN (SELECT DISTINCT category FROM products) p;Subqueries
Subqueries are queries nested inside other queries. They can be used in SELECT, FROM, WHERE, and HAVING clauses.
Subqueries in WHERE Clause
-- Find users who have placed orders
SELECT username, email
FROM users
WHERE user_id IN (
SELECT DISTINCT user_id
FROM orders
);
-- Find users with above-average order totals
SELECT username, email
FROM users
WHERE user_id IN (
SELECT user_id
FROM orders
WHERE total_amount > (
SELECT AVG(total_amount) FROM orders
)
);
-- Find products never ordered
SELECT product_name, price
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
WHERE product_id IS NOT NULL
);Subqueries in SELECT Clause
-- Get user info with order count
SELECT
username,
email,
(SELECT COUNT(*) FROM orders o WHERE o.user_id = u.user_id) as order_count,
(SELECT MAX(total_amount) FROM orders o WHERE o.user_id = u.user_id) as max_order_amount
FROM users u;Subqueries in FROM Clause (Derived Tables)
-- Use subquery as a temporary table
SELECT category, avg_price, product_count
FROM (
SELECT
category,
AVG(price) as avg_price,
COUNT(*) as product_count
FROM products
GROUP BY category
) as category_stats
WHERE avg_price > 50;Correlated Subqueries
Subqueries that reference columns from the outer query.
-- Find users whose latest order is above average
SELECT username, email
FROM users u
WHERE (
SELECT MAX(total_amount)
FROM orders o
WHERE o.user_id = u.user_id
) > (
SELECT AVG(total_amount) FROM orders
);
-- Find products with above-average price in their category
SELECT product_name, price, category
FROM products p1
WHERE price > (
SELECT AVG(price)
FROM products p2
WHERE p2.category = p1.category
);Common Table Expressions (CTEs)
CTEs provide a way to write more readable queries by defining temporary result sets.
-- Basic CTE
WITH user_order_stats AS (
SELECT
u.user_id,
u.username,
COUNT(o.order_id) as order_count,
COALESCE(SUM(o.total_amount), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username
)
SELECT username, order_count, total_spent
FROM user_order_stats
WHERE total_spent > 100;
-- Multiple CTEs
WITH monthly_sales AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') as month,
SUM(total_amount) as monthly_total
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
),
avg_monthly_sales AS (
SELECT AVG(monthly_total) as avg_monthly
FROM monthly_sales
)
SELECT m.month, m.monthly_total, a.avg_monthly
FROM monthly_sales m
CROSS JOIN avg_monthly_sales a
WHERE m.monthly_total > a.avg_monthly;Window Functions
Window functions perform calculations across related rows without grouping them into a single output row.
ROW_NUMBER, RANK, and DENSE_RANK
-- Number orders by date for each user
SELECT
username,
order_date,
total_amount,
ROW_NUMBER() OVER (PARTITION BY u.user_id ORDER BY order_date) as order_sequence,
RANK() OVER (ORDER BY total_amount DESC) as amount_rank,
DENSE_RANK() OVER (ORDER BY total_amount DESC) as dense_amount_rank
FROM orders o
JOIN users u ON o.user_id = u.user_id;Aggregate Window Functions
-- Running total of orders
SELECT
order_date,
total_amount,
SUM(total_amount) OVER (ORDER BY order_date) as running_total,
AVG(total_amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg_3
FROM orders
ORDER BY order_date;
-- Compare each order to user's average
SELECT
u.username,
o.order_date,
o.total_amount,
AVG(o.total_amount) OVER (PARTITION BY u.user_id) as user_avg_order,
o.total_amount - AVG(o.total_amount) OVER (PARTITION BY u.user_id) as diff_from_avg
FROM orders o
JOIN users u ON o.user_id = u.user_id;LAG and LEAD Functions
-- Compare current order with previous order
SELECT
u.username,
o.order_date,
o.total_amount,
LAG(o.total_amount) OVER (PARTITION BY u.user_id ORDER BY o.order_date) as prev_order_amount,
LEAD(o.order_date) OVER (PARTITION BY u.user_id ORDER BY o.order_date) as next_order_date
FROM orders o
JOIN users u ON o.user_id = u.user_id;Advanced Query Patterns
CASE Statements
-- Categorize users by order activity
SELECT
username,
email,
COUNT(o.order_id) as order_count,
CASE
WHEN COUNT(o.order_id) = 0 THEN 'New'
WHEN COUNT(o.order_id) BETWEEN 1 AND 3 THEN 'Regular'
WHEN COUNT(o.order_id) BETWEEN 4 AND 10 THEN 'Frequent'
ELSE 'VIP'
END as customer_category
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username, u.email;
-- Conditional aggregation
SELECT
category,
COUNT(*) as total_products,
COUNT(CASE WHEN price > 100 THEN 1 END) as expensive_products,
COUNT(CASE WHEN price <= 20 THEN 1 END) as budget_products
FROM products
GROUP BY category;UNION and UNION ALL
-- Combine results from multiple queries
SELECT 'High Value' as customer_type, username, email
FROM users u
WHERE user_id IN (
SELECT user_id FROM orders GROUP BY user_id HAVING SUM(total_amount) > 500
)
UNION
SELECT 'Recent' as customer_type, username, email
FROM users u
WHERE user_id IN (
SELECT user_id FROM orders WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
);
-- UNION ALL keeps duplicates (faster)
SELECT product_name, 'Expensive' as price_category FROM products WHERE price > 100
UNION ALL
SELECT product_name, 'Budget' as price_category FROM products WHERE price <= 20;EXISTS and NOT EXISTS
-- Find users who have placed orders (more efficient than IN for large datasets)
SELECT username, email
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.user_id
);
-- Find products that have never been ordered
SELECT product_name, price
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id
);Query Optimization Tips
Using Indexes Effectively
-- Good: Uses index on order_date
SELECT * FROM orders WHERE order_date >= '2024-01-01';
-- Bad: Function on column prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
-- Better: Rewrite to use index
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';Avoiding Common Performance Issues
Performance Tips
- Use LIMIT to restrict large result sets
- Be careful with functions in WHERE clauses - they can prevent index usage
- Consider using EXISTS instead of IN for subqueries with large result sets
- Use appropriate JOINs instead of subqueries when possible
-- Use LIMIT for large datasets
SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;
-- Use EXISTS instead of IN for better performance
SELECT username FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id);
-- Use specific columns instead of SELECT *
SELECT username, email FROM users WHERE city = 'New York';Real-World Query Examples
Sales Report Query
WITH monthly_stats AS (
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') as month,
COUNT(DISTINCT o.order_id) as total_orders,
COUNT(DISTINCT o.user_id) as unique_customers,
SUM(o.total_amount) as total_revenue,
AVG(o.total_amount) as avg_order_value
FROM orders o
GROUP BY DATE_FORMAT(o.order_date, '%Y-%m')
),
previous_month_stats AS (
SELECT
month,
LAG(total_revenue) OVER (ORDER BY month) as prev_month_revenue,
total_revenue
FROM monthly_stats
)
SELECT
ms.month,
ms.total_orders,
ms.unique_customers,
ms.total_revenue,
ms.avg_order_value,
CASE
WHEN pms.prev_month_revenue IS NOT NULL
THEN ROUND(((ms.total_revenue - pms.prev_month_revenue) / pms.prev_month_revenue) * 100, 2)
ELSE NULL
END as revenue_growth_percent
FROM monthly_stats ms
LEFT JOIN previous_month_stats pms ON ms.month = pms.month
ORDER BY ms.month;Customer Segmentation Query
WITH customer_metrics AS (
SELECT
u.user_id,
u.username,
u.email,
COUNT(o.order_id) as total_orders,
COALESCE(SUM(o.total_amount), 0) as lifetime_value,
COALESCE(AVG(o.total_amount), 0) as avg_order_value,
MAX(o.order_date) as last_order_date,
DATEDIFF(CURDATE(), MAX(o.order_date)) as days_since_last_order
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username, u.email
),
customer_segments AS (
SELECT
*,
CASE
WHEN total_orders = 0 THEN 'New Customer'
WHEN days_since_last_order > 365 THEN 'Inactive'
WHEN days_since_last_order > 90 THEN 'At Risk'
WHEN lifetime_value > 1000 AND total_orders >= 5 THEN 'VIP'
WHEN total_orders >= 3 THEN 'Loyal'
ELSE 'Regular'
END as customer_segment,
NTILE(5) OVER (ORDER BY lifetime_value DESC) as value_quintile
FROM customer_metrics
)
SELECT
customer_segment,
COUNT(*) as customer_count,
AVG(total_orders) as avg_orders_per_customer,
AVG(lifetime_value) as avg_lifetime_value,
AVG(avg_order_value) as avg_order_value
FROM customer_segments
GROUP BY customer_segment
ORDER BY avg_lifetime_value DESC;Product Performance Analysis
WITH product_performance AS (
SELECT
p.product_id,
p.product_name,
p.category,
p.price,
COALESCE(SUM(oi.quantity), 0) as total_quantity_sold,
COALESCE(SUM(oi.quantity * oi.unit_price), 0) as total_revenue,
COUNT(DISTINCT oi.order_id) as times_ordered,
AVG(oi.unit_price) as avg_selling_price
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name, p.category, p.price
),
category_stats AS (
SELECT
category,
AVG(total_revenue) as category_avg_revenue,
AVG(total_quantity_sold) as category_avg_quantity
FROM product_performance
GROUP BY category
)
SELECT
pp.product_name,
pp.category,
pp.price,
pp.total_quantity_sold,
pp.total_revenue,
pp.times_ordered,
ROUND(pp.avg_selling_price, 2) as avg_selling_price,
CASE
WHEN pp.total_revenue > cs.category_avg_revenue THEN 'Above Average'
WHEN pp.total_revenue = 0 THEN 'Not Sold'
ELSE 'Below Average'
END as performance_category,
RANK() OVER (PARTITION BY pp.category ORDER BY pp.total_revenue DESC) as category_revenue_rank
FROM product_performance pp
JOIN category_stats cs ON pp.category = cs.category
ORDER BY pp.total_revenue DESC;Advanced JOIN Patterns
Self-Joins
-- Find users from the same city
SELECT
u1.username as user1,
u2.username as user2,
u1.city
FROM users u1
JOIN users u2 ON u1.city = u2.city AND u1.user_id < u2.user_id
ORDER BY u1.city, u1.username;
-- Compare each user's orders to other users' orders
SELECT
u1.username as customer,
u2.username as comparison_customer,
u1.total_spent,
u2.total_spent,
u1.total_spent - u2.total_spent as spending_difference
FROM (
SELECT u.user_id, u.username, COALESCE(SUM(o.total_amount), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username
) u1
JOIN (
SELECT u.user_id, u.username, COALESCE(SUM(o.total_amount), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username
) u2 ON u1.user_id != u2.user_id
WHERE u1.total_spent > u2.total_spent;Multiple Table Complex Joins
-- Complete order details with customer and product information
SELECT
o.order_id,
o.order_date,
u.username as customer_name,
u.email as customer_email,
u.city as customer_city,
p.product_name,
p.category as product_category,
oi.quantity,
oi.unit_price,
(oi.quantity * oi.unit_price) as line_total,
o.total_amount as order_total,
-- Calculate what percentage of order this line item represents
ROUND(
((oi.quantity * oi.unit_price) / o.total_amount) * 100,
2
) as percentage_of_order
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
ORDER BY o.order_date DESC, o.order_id, line_total DESC;Recursive Queries (CTEs)
Some databases support recursive CTEs for hierarchical data:
-- Example: Employee hierarchy (if you had an employees table with manager_id)
WITH RECURSIVE employee_hierarchy AS (
-- Base case: Top-level managers
SELECT
employee_id,
first_name,
last_name,
manager_id,
0 as level,
CAST(first_name + ' ' + last_name AS VARCHAR(1000)) as hierarchy_path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: Employees with managers
SELECT
e.employee_id,
e.first_name,
e.last_name,
e.manager_id,
eh.level + 1,
CAST(eh.hierarchy_path + ' -> ' + e.first_name + ' ' + e.last_name AS VARCHAR(1000))
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT
REPLICATE(' ', level) + first_name + ' ' + last_name as indented_name,
level,
hierarchy_path
FROM employee_hierarchy
ORDER BY hierarchy_path;Query Debugging and Analysis
Using EXPLAIN
-- Analyze query performance
EXPLAIN SELECT
u.username,
COUNT(o.order_id) as order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username
HAVING COUNT(o.order_id) > 2;
-- With more detailed analysis (MySQL)
EXPLAIN FORMAT=JSON
SELECT * FROM orders WHERE order_date >= '2024-01-01';Query Performance Monitoring
Performance Monitoring
Always monitor query performance in production. Look for:
- Long-running queries (>1 second for OLTP)
- High CPU usage queries
- Queries that scan large numbers of rows
- Missing index warnings
Next Steps
You've now mastered advanced SQL query techniques including JOINs, subqueries, window functions, and complex analytical queries. These skills will help you handle sophisticated data analysis tasks and optimize query performance.
Last updated on