Advanced Queries

Master JOINs, subqueries, window functions, and complex SQL operations

Table Relationships and JOINs

JOINs are used to combine data from multiple tables based on relationships between them. Understanding JOINs is crucial for working with relational databases effectively.

Before diving into JOINs, make sure you understand primary keys and foreign keys, as they form the foundation of table relationships.

Sample Data Structure

Let's work with these example tables throughout this guide:

Sample Tables
-- Users table
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(100),
    city VARCHAR(50)
);

-- Orders table  
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    user_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Products table
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    price DECIMAL(10,2),
    category VARCHAR(50)
);

-- Order items table (many-to-many relationship)
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    unit_price DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Types of JOINs

INNER JOIN

Returns only rows that have matching values in both tables.

INNER JOIN Examples
-- Get users with their orders
SELECT u.username, u.email, o.order_date, o.total_amount
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;

-- Multiple table INNER JOIN
SELECT u.username, p.product_name, oi.quantity, oi.unit_price
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id;

LEFT JOIN (LEFT OUTER JOIN)

Returns all rows from the left table and matching rows from the right table. If no match, NULL values for right table columns.

LEFT JOIN Examples
-- Get all users, including those without orders
SELECT u.username, u.email, o.order_date, o.total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;

-- Count orders per user (including users with 0 orders)
SELECT u.username, COUNT(o.order_id) as order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username;

RIGHT JOIN (RIGHT OUTER JOIN)

Returns all rows from the right table and matching rows from the left table.

RIGHT JOIN Example
-- Get all orders, even if user data is missing
SELECT u.username, o.order_date, o.total_amount
FROM users u
RIGHT JOIN orders o ON u.user_id = o.user_id;

FULL OUTER JOIN

Returns all rows when there's a match in either table. Not supported by all databases (e.g., MySQL).

FULL OUTER JOIN Example
-- Get all users and all orders (PostgreSQL, SQL Server)
SELECT u.username, o.order_date, o.total_amount
FROM users u
FULL OUTER JOIN orders o ON u.user_id = o.user_id;

-- MySQL alternative using UNION
SELECT u.username, o.order_date, o.total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
UNION
SELECT u.username, o.order_date, o.total_amount
FROM users u
RIGHT JOIN orders o ON u.user_id = o.user_id;

CROSS JOIN

Returns the Cartesian product of both tables (every row from first table combined with every row from second table).

CROSS JOIN Example
-- Generate all possible user-product combinations
SELECT u.username, p.product_name
FROM users u
CROSS JOIN products p;

-- Useful for generating test data or reports
SELECT u.username, p.category
FROM users u
CROSS JOIN (SELECT DISTINCT category FROM products) p;

Subqueries

Subqueries are queries nested inside other queries. They can be used in SELECT, FROM, WHERE, and HAVING clauses.

Subqueries in WHERE Clause

WHERE Subqueries
-- Find users who have placed orders
SELECT username, email
FROM users
WHERE user_id IN (
    SELECT DISTINCT user_id 
    FROM orders
);

-- Find users with above-average order totals
SELECT username, email
FROM users
WHERE user_id IN (
    SELECT user_id
    FROM orders
    WHERE total_amount > (
        SELECT AVG(total_amount) FROM orders
    )
);

-- Find products never ordered
SELECT product_name, price
FROM products
WHERE product_id NOT IN (
    SELECT DISTINCT product_id 
    FROM order_items 
    WHERE product_id IS NOT NULL
);

Subqueries in SELECT Clause

SELECT Subqueries
-- Get user info with order count
SELECT 
    username,
    email,
    (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.user_id) as order_count,
    (SELECT MAX(total_amount) FROM orders o WHERE o.user_id = u.user_id) as max_order_amount
FROM users u;

Subqueries in FROM Clause (Derived Tables)

FROM Subqueries
-- Use subquery as a temporary table
SELECT category, avg_price, product_count
FROM (
    SELECT 
        category,
        AVG(price) as avg_price,
        COUNT(*) as product_count
    FROM products
    GROUP BY category
) as category_stats
WHERE avg_price > 50;

Correlated Subqueries

Subqueries that reference columns from the outer query.

Correlated Subqueries
-- Find users whose latest order is above average
SELECT username, email
FROM users u
WHERE (
    SELECT MAX(total_amount)
    FROM orders o
    WHERE o.user_id = u.user_id
) > (
    SELECT AVG(total_amount) FROM orders
);

-- Find products with above-average price in their category
SELECT product_name, price, category
FROM products p1
WHERE price > (
    SELECT AVG(price)
    FROM products p2
    WHERE p2.category = p1.category
);

Common Table Expressions (CTEs)

CTEs provide a way to write more readable queries by defining temporary result sets.

CTE Examples
-- Basic CTE
WITH user_order_stats AS (
    SELECT 
        u.user_id,
        u.username,
        COUNT(o.order_id) as order_count,
        COALESCE(SUM(o.total_amount), 0) as total_spent
    FROM users u
    LEFT JOIN orders o ON u.user_id = o.user_id
    GROUP BY u.user_id, u.username
)
SELECT username, order_count, total_spent
FROM user_order_stats
WHERE total_spent > 100;

-- Multiple CTEs
WITH monthly_sales AS (
    SELECT 
        DATE_FORMAT(order_date, '%Y-%m') as month,
        SUM(total_amount) as monthly_total
    FROM orders
    GROUP BY DATE_FORMAT(order_date, '%Y-%m')
),
avg_monthly_sales AS (
    SELECT AVG(monthly_total) as avg_monthly
    FROM monthly_sales
)
SELECT m.month, m.monthly_total, a.avg_monthly
FROM monthly_sales m
CROSS JOIN avg_monthly_sales a
WHERE m.monthly_total > a.avg_monthly;

Window Functions

Window functions perform calculations across related rows without grouping them into a single output row.

ROW_NUMBER, RANK, and DENSE_RANK

Ranking Functions
-- Number orders by date for each user
SELECT 
    username,
    order_date,
    total_amount,
    ROW_NUMBER() OVER (PARTITION BY u.user_id ORDER BY order_date) as order_sequence,
    RANK() OVER (ORDER BY total_amount DESC) as amount_rank,
    DENSE_RANK() OVER (ORDER BY total_amount DESC) as dense_amount_rank
FROM orders o
JOIN users u ON o.user_id = u.user_id;

Aggregate Window Functions

Aggregate Window Functions
-- Running total of orders
SELECT 
    order_date,
    total_amount,
    SUM(total_amount) OVER (ORDER BY order_date) as running_total,
    AVG(total_amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg_3
FROM orders
ORDER BY order_date;

-- Compare each order to user's average
SELECT 
    u.username,
    o.order_date,
    o.total_amount,
    AVG(o.total_amount) OVER (PARTITION BY u.user_id) as user_avg_order,
    o.total_amount - AVG(o.total_amount) OVER (PARTITION BY u.user_id) as diff_from_avg
FROM orders o
JOIN users u ON o.user_id = u.user_id;

LAG and LEAD Functions

LAG and LEAD Functions
-- Compare current order with previous order
SELECT 
    u.username,
    o.order_date,
    o.total_amount,
    LAG(o.total_amount) OVER (PARTITION BY u.user_id ORDER BY o.order_date) as prev_order_amount,
    LEAD(o.order_date) OVER (PARTITION BY u.user_id ORDER BY o.order_date) as next_order_date
FROM orders o
JOIN users u ON o.user_id = u.user_id;

Advanced Query Patterns

CASE Statements

CASE Statement Examples
-- Categorize users by order activity
SELECT 
    username,
    email,
    COUNT(o.order_id) as order_count,
    CASE 
        WHEN COUNT(o.order_id) = 0 THEN 'New'
        WHEN COUNT(o.order_id) BETWEEN 1 AND 3 THEN 'Regular'
        WHEN COUNT(o.order_id) BETWEEN 4 AND 10 THEN 'Frequent'
        ELSE 'VIP'
    END as customer_category
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username, u.email;

-- Conditional aggregation
SELECT 
    category,
    COUNT(*) as total_products,
    COUNT(CASE WHEN price > 100 THEN 1 END) as expensive_products,
    COUNT(CASE WHEN price <= 20 THEN 1 END) as budget_products
FROM products
GROUP BY category;

UNION and UNION ALL

UNION Examples
-- Combine results from multiple queries
SELECT 'High Value' as customer_type, username, email
FROM users u
WHERE user_id IN (
    SELECT user_id FROM orders GROUP BY user_id HAVING SUM(total_amount) > 500
)
UNION
SELECT 'Recent' as customer_type, username, email
FROM users u
WHERE user_id IN (
    SELECT user_id FROM orders WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
);

-- UNION ALL keeps duplicates (faster)
SELECT product_name, 'Expensive' as price_category FROM products WHERE price > 100
UNION ALL
SELECT product_name, 'Budget' as price_category FROM products WHERE price <= 20;

EXISTS and NOT EXISTS

EXISTS Examples
-- Find users who have placed orders (more efficient than IN for large datasets)
SELECT username, email
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.user_id
);

-- Find products that have never been ordered
SELECT product_name, price
FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id
);

Query Optimization Tips

Using Indexes Effectively

Index-Friendly Queries
-- Good: Uses index on order_date
SELECT * FROM orders WHERE order_date >= '2024-01-01';

-- Bad: Function on column prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

-- Better: Rewrite to use index
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

Avoiding Common Performance Issues

Performance Tips

  • Use LIMIT to restrict large result sets
  • Be careful with functions in WHERE clauses - they can prevent index usage
  • Consider using EXISTS instead of IN for subqueries with large result sets
  • Use appropriate JOINs instead of subqueries when possible
Performance Best Practices
-- Use LIMIT for large datasets
SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;

-- Use EXISTS instead of IN for better performance
SELECT username FROM users u 
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id);

-- Use specific columns instead of SELECT *
SELECT username, email FROM users WHERE city = 'New York';

Real-World Query Examples

Sales Report Query

Monthly Sales Report
WITH monthly_stats AS (
    SELECT 
        DATE_FORMAT(o.order_date, '%Y-%m') as month,
        COUNT(DISTINCT o.order_id) as total_orders,
        COUNT(DISTINCT o.user_id) as unique_customers,
        SUM(o.total_amount) as total_revenue,
        AVG(o.total_amount) as avg_order_value
    FROM orders o
    GROUP BY DATE_FORMAT(o.order_date, '%Y-%m')
),
previous_month_stats AS (
    SELECT 
        month,
        LAG(total_revenue) OVER (ORDER BY month) as prev_month_revenue,
        total_revenue
    FROM monthly_stats
)
SELECT 
    ms.month,
    ms.total_orders,
    ms.unique_customers,
    ms.total_revenue,
    ms.avg_order_value,
    CASE 
        WHEN pms.prev_month_revenue IS NOT NULL 
        THEN ROUND(((ms.total_revenue - pms.prev_month_revenue) / pms.prev_month_revenue) * 100, 2)
        ELSE NULL 
    END as revenue_growth_percent
FROM monthly_stats ms
LEFT JOIN previous_month_stats pms ON ms.month = pms.month
ORDER BY ms.month;

Customer Segmentation Query

Customer Segmentation Analysis
WITH customer_metrics AS (
    SELECT 
        u.user_id,
        u.username,
        u.email,
        COUNT(o.order_id) as total_orders,
        COALESCE(SUM(o.total_amount), 0) as lifetime_value,
        COALESCE(AVG(o.total_amount), 0) as avg_order_value,
        MAX(o.order_date) as last_order_date,
        DATEDIFF(CURDATE(), MAX(o.order_date)) as days_since_last_order
    FROM users u
    LEFT JOIN orders o ON u.user_id = o.user_id
    GROUP BY u.user_id, u.username, u.email
),
customer_segments AS (
    SELECT 
        *,
        CASE 
            WHEN total_orders = 0 THEN 'New Customer'
            WHEN days_since_last_order > 365 THEN 'Inactive'
            WHEN days_since_last_order > 90 THEN 'At Risk'
            WHEN lifetime_value > 1000 AND total_orders >= 5 THEN 'VIP'
            WHEN total_orders >= 3 THEN 'Loyal'
            ELSE 'Regular'
        END as customer_segment,
        NTILE(5) OVER (ORDER BY lifetime_value DESC) as value_quintile
    FROM customer_metrics
)
SELECT 
    customer_segment,
    COUNT(*) as customer_count,
    AVG(total_orders) as avg_orders_per_customer,
    AVG(lifetime_value) as avg_lifetime_value,
    AVG(avg_order_value) as avg_order_value
FROM customer_segments
GROUP BY customer_segment
ORDER BY avg_lifetime_value DESC;

Product Performance Analysis

Product Performance Analysis
WITH product_performance AS (
    SELECT 
        p.product_id,
        p.product_name,
        p.category,
        p.price,
        COALESCE(SUM(oi.quantity), 0) as total_quantity_sold,
        COALESCE(SUM(oi.quantity * oi.unit_price), 0) as total_revenue,
        COUNT(DISTINCT oi.order_id) as times_ordered,
        AVG(oi.unit_price) as avg_selling_price
    FROM products p
    LEFT JOIN order_items oi ON p.product_id = oi.product_id
    GROUP BY p.product_id, p.product_name, p.category, p.price
),
category_stats AS (
    SELECT 
        category,
        AVG(total_revenue) as category_avg_revenue,
        AVG(total_quantity_sold) as category_avg_quantity
    FROM product_performance
    GROUP BY category
)
SELECT 
    pp.product_name,
    pp.category,
    pp.price,
    pp.total_quantity_sold,
    pp.total_revenue,
    pp.times_ordered,
    ROUND(pp.avg_selling_price, 2) as avg_selling_price,
    CASE 
        WHEN pp.total_revenue > cs.category_avg_revenue THEN 'Above Average'
        WHEN pp.total_revenue = 0 THEN 'Not Sold'
        ELSE 'Below Average'
    END as performance_category,
    RANK() OVER (PARTITION BY pp.category ORDER BY pp.total_revenue DESC) as category_revenue_rank
FROM product_performance pp
JOIN category_stats cs ON pp.category = cs.category
ORDER BY pp.total_revenue DESC;

Advanced JOIN Patterns

Self-Joins

Self-Join Examples
-- Find users from the same city
SELECT 
    u1.username as user1,
    u2.username as user2,
    u1.city
FROM users u1
JOIN users u2 ON u1.city = u2.city AND u1.user_id < u2.user_id
ORDER BY u1.city, u1.username;

-- Compare each user's orders to other users' orders
SELECT 
    u1.username as customer,
    u2.username as comparison_customer,
    u1.total_spent,
    u2.total_spent,
    u1.total_spent - u2.total_spent as spending_difference
FROM (
    SELECT u.user_id, u.username, COALESCE(SUM(o.total_amount), 0) as total_spent
    FROM users u
    LEFT JOIN orders o ON u.user_id = o.user_id
    GROUP BY u.user_id, u.username
) u1
JOIN (
    SELECT u.user_id, u.username, COALESCE(SUM(o.total_amount), 0) as total_spent
    FROM users u
    LEFT JOIN orders o ON u.user_id = o.user_id
    GROUP BY u.user_id, u.username
) u2 ON u1.user_id != u2.user_id
WHERE u1.total_spent > u2.total_spent;

Multiple Table Complex Joins

Complex Multi-Table Join
-- Complete order details with customer and product information
SELECT 
    o.order_id,
    o.order_date,
    u.username as customer_name,
    u.email as customer_email,
    u.city as customer_city,
    p.product_name,
    p.category as product_category,
    oi.quantity,
    oi.unit_price,
    (oi.quantity * oi.unit_price) as line_total,
    o.total_amount as order_total,
    -- Calculate what percentage of order this line item represents
    ROUND(
        ((oi.quantity * oi.unit_price) / o.total_amount) * 100, 
        2
    ) as percentage_of_order
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
ORDER BY o.order_date DESC, o.order_id, line_total DESC;

Recursive Queries (CTEs)

Some databases support recursive CTEs for hierarchical data:

Recursive CTE Example
-- Example: Employee hierarchy (if you had an employees table with manager_id)
WITH RECURSIVE employee_hierarchy AS (
    -- Base case: Top-level managers
    SELECT 
        employee_id,
        first_name,
        last_name,
        manager_id,
        0 as level,
        CAST(first_name + ' ' + last_name AS VARCHAR(1000)) as hierarchy_path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: Employees with managers
    SELECT 
        e.employee_id,
        e.first_name,
        e.last_name,
        e.manager_id,
        eh.level + 1,
        CAST(eh.hierarchy_path + ' -> ' + e.first_name + ' ' + e.last_name AS VARCHAR(1000))
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT 
    REPLICATE('  ', level) + first_name + ' ' + last_name as indented_name,
    level,
    hierarchy_path
FROM employee_hierarchy
ORDER BY hierarchy_path;

Query Debugging and Analysis

Using EXPLAIN

Query Execution Plan
-- Analyze query performance
EXPLAIN SELECT 
    u.username,
    COUNT(o.order_id) as order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username
HAVING COUNT(o.order_id) > 2;

-- With more detailed analysis (MySQL)
EXPLAIN FORMAT=JSON
SELECT * FROM orders WHERE order_date >= '2024-01-01';

Query Performance Monitoring

Performance Monitoring

Always monitor query performance in production. Look for:

  • Long-running queries (>1 second for OLTP)
  • High CPU usage queries
  • Queries that scan large numbers of rows
  • Missing index warnings

Next Steps

You've now mastered advanced SQL query techniques including JOINs, subqueries, window functions, and complex analytical queries. These skills will help you handle sophisticated data analysis tasks and optimize query performance.

Last updated on