Data25 min

SQL for Product Managers and Business Analysts

The 4 queries that cover 90% of what PMs and BAs actually need — feature adoption, funnel drop-off, retention cohorts, and revenue attribution. Written in plain English first. SQL second. With AI doing the heavy lifting.

Why PMs and BAs should learn SQL (and what to skip)

You don't need to become a data engineer. You need to answer your own questions without opening a ticket. That's a different bar — and a much lower one.

The goal is not to write perfect SQL. The goal is to write correct enough SQL to validate a hypothesis, answer a leadership question, or build the context you need before handing off to engineering. Claude writes the first draft. You read it, verify one number by hand, and move on.

What you need to learn: SELECT, WHERE, GROUP BY, JOIN, aggregate functions, and CTEs. That's it. Everything else you ask Claude. The 4 queries below demonstrate all of these in realistic PM/BA contexts.

Before you write queries: read your schema

SQL works on your actual table names and column names — not generic ones. Before writing any query, run this in your database dashboard (Supabase, Metabase, DBeaver, etc.):

-- PostgreSQL / Supabase SELECT table_name, column_name, data_type FROM information_schema.columns WHERE table_schema = 'public' ORDER BY table_name, ordinal_position;

Paste the output into Claude: “Here is my schema. Now write me a query that [your question].” This one habit will make every query 10x more accurate.

The 4 queries

Who is using this feature?

Feature adoption · Usage segmentation · Rollout tracking

The situation

You shipped a feature 3 weeks ago. Leadership asks: 'Who's using it?' You need a count by cohort (new vs. returning users, plan tier, signup date) — not a screenshot from Amplitude.

Plain English (give this to Claude first)

"Give me a count of users who triggered 'feature_x_clicked' in the last 30 days, grouped by plan tier. Include total event count and first/last use dates."

SQL

SELECT
  u.plan_tier,
  COUNT(DISTINCT e.user_id)      AS users_who_used_feature,
  COUNT(e.id)                    AS total_events,
  MIN(e.created_at)::date        AS first_used,
  MAX(e.created_at)::date        AS last_used
FROM events e
JOIN users u ON u.id = e.user_id
WHERE e.event_name = 'feature_x_clicked'
  AND e.created_at >= NOW() - INTERVAL '30 days'
GROUP BY u.plan_tier
ORDER BY users_who_used_feature DESC;

What this teaches you

SELECT, FROM, JOIN, WHERE, GROUP BY, COUNT(DISTINCT). These 6 keywords appear in almost every PM query you'll ever write.

Where are users dropping off?

Funnel analysis · Onboarding drop-off · Checkout abandonment

The situation

You have a 5-step onboarding flow. 2,400 users start it. 310 finish. Where exactly are they leaving — and is it getting better or worse over time?

Plain English (give this to Claude first)

"Show me how many users hit each step of onboarding in the last 30 days. Use conditional aggregation so each step is a column. Include the completion rate."

SQL

WITH funnel AS (
  SELECT
    user_id,
    MAX(CASE WHEN event_name = 'onboarding_step_1' THEN 1 ELSE 0 END) AS step_1,
    MAX(CASE WHEN event_name = 'onboarding_step_2' THEN 1 ELSE 0 END) AS step_2,
    MAX(CASE WHEN event_name = 'onboarding_step_3' THEN 1 ELSE 0 END) AS step_3,
    MAX(CASE WHEN event_name = 'onboarding_step_4' THEN 1 ELSE 0 END) AS step_4,
    MAX(CASE WHEN event_name = 'onboarding_completed' THEN 1 ELSE 0 END) AS completed
  FROM events
  WHERE created_at >= NOW() - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT
  SUM(step_1)     AS entered_step_1,
  SUM(step_2)     AS reached_step_2,
  SUM(step_3)     AS reached_step_3,
  SUM(step_4)     AS reached_step_4,
  SUM(completed)  AS completed_onboarding,
  ROUND(SUM(completed)::numeric / NULLIF(SUM(step_1), 0) * 100, 1) AS completion_rate_pct
FROM funnel;

What this teaches you

CTEs (WITH), CASE WHEN, conditional aggregation, NULLIF to avoid divide-by-zero. These are the patterns that turn raw events into funnel charts.

Are users coming back?

Retention analysis · Cohort tables · Churn prediction

The situation

Your sign-up numbers look healthy. But are those users still around in week 2? Week 4? A retention cohort shows the real picture — and it's almost always harder to hear than the signup chart.

Plain English (give this to Claude first)

"Build a weekly cohort retention table. For each signup cohort, show how many users came back in week 1 and week 4. Last 12 weeks only."

SQL

WITH cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC('week', MIN(created_at)) AS cohort_week
  FROM events
  WHERE event_name = 'account_created'
  GROUP BY user_id
),
activity AS (
  SELECT
    e.user_id,
    DATE_TRUNC('week', e.created_at) AS activity_week
  FROM events e
  WHERE e.event_name = 'session_start'
  GROUP BY e.user_id, DATE_TRUNC('week', e.created_at)
)
SELECT
  c.cohort_week,
  COUNT(DISTINCT c.user_id)                                      AS cohort_size,
  COUNT(DISTINCT CASE WHEN a.activity_week = c.cohort_week + INTERVAL '1 week'
                      THEN a.user_id END)                        AS week_1_retained,
  COUNT(DISTINCT CASE WHEN a.activity_week = c.cohort_week + INTERVAL '4 weeks'
                      THEN a.user_id END)                        AS week_4_retained
FROM cohorts c
LEFT JOIN activity a ON a.user_id = c.user_id
WHERE c.cohort_week >= NOW() - INTERVAL '12 weeks'
GROUP BY c.cohort_week
ORDER BY c.cohort_week;

What this teaches you

DATE_TRUNC for time-based grouping, LEFT JOIN to include users with no activity, cohort analysis pattern. This query is the foundation of every retention deck you'll ever build.

What's driving revenue?

Revenue attribution · Plan upgrades · Expansion MRR

The situation

Finance wants to know which features correlate with upgrades. You want to know if the users who hit your key activation event are 3x more likely to pay — and you want to show the data, not guess.

Plain English (give this to Claude first)

"Compare conversion rate and average LTV between users who hit the key activation event vs. those who didn't. Last 90 days, new users only."

SQL

SELECT
  CASE
    WHEN activated.user_id IS NOT NULL THEN 'Activated'
    ELSE 'Not Activated'
  END                                        AS activation_status,
  COUNT(DISTINCT u.id)                       AS users,
  COUNT(DISTINCT CASE WHEN u.plan_tier = 'paid'
                      THEN u.id END)         AS paid_users,
  ROUND(
    COUNT(DISTINCT CASE WHEN u.plan_tier = 'paid' THEN u.id END)::numeric
    / NULLIF(COUNT(DISTINCT u.id), 0) * 100, 1
  )                                          AS conversion_rate_pct,
  ROUND(AVG(u.lifetime_value_cents) / 100.0, 2) AS avg_ltv_usd
FROM users u
LEFT JOIN (
  SELECT DISTINCT user_id
  FROM events
  WHERE event_name = 'key_activation_event'
    AND created_at >= NOW() - INTERVAL '90 days'
) activated ON activated.user_id = u.id
WHERE u.created_at >= NOW() - INTERVAL '90 days'
GROUP BY activation_status;

What this teaches you

LEFT JOIN with a subquery, CASE WHEN for segmentation, AVG with arithmetic for unit conversion. This is the activation → revenue correlation every growth PM needs.

The 5-step workflow for writing SQL with AI

Don't start with the SQL. Start with the question. This workflow works for every query you'll ever write.

Write the question in plain English

Start here: "I want to know which plan tier has the highest feature adoption rate in the last 30 days." This is your spec. Don't touch SQL yet.

Describe your schema to Claude

Paste your table definitions. "I have a `users` table (id, plan_tier, created_at) and an `events` table (id, user_id, event_name, created_at). Write a query for my question above."

Ask Claude to explain before you run

"Before I run this, explain what each clause does in plain English. I want to understand it, not just copy it." This catches errors and teaches you the pattern.

Run it against a date-limited sample first

Add `LIMIT 100` and a tight date range before running against your full table. A runaway query on 10M rows can lock your database.

Verify one number by hand

Pick one row in the result. Count the raw events manually (or spot-check against a known source). One verified data point is worth 10 unverified ones.

3 mistakes that waste hours

Querying without a LIMIT

Always add LIMIT 100 when exploring. A full table scan on 50M rows will time out, burn compute credits, and frustrate your data team. Add the LIMIT first, verify the shape of the data, then remove it for the real run.

Using SELECT * in production queries

SELECT * pulls every column — including blobs, PII, and columns you don't need. Name your columns explicitly. It's better documentation and significantly faster on wide tables.

Trusting the first number without a sanity check

AI-written SQL is usually correct. Usually. Before putting a number in a deck or a message to leadership, verify one row by hand against a source of truth (your BI tool, a raw export, a known data point). One verified number is worth 10 unverified ones.

The one skill that makes everything else faster

Learn to read a database schema like a PM reads a PRD — not to memorize it, but to understand the nouns and verbs of the system. Tables are nouns (users, orders, events). Joins are relationships between them. Once you see the schema as a data model instead of a technical artifact, every question you have maps naturally to a query structure.

This skill — reading schemas, understanding data models, knowing what's possible — is what separates PMs who wait for dashboards from PMs who build them. Zero to Ship Module 12 covers the full data product stack: schema design, query patterns, and shipping interactive data apps to real URLs.

Ready to build the full data product?

Module 12 (Data Products) takes you from SQL queries to a deployed interactive data app — charts, filters, drill-downs, and a live URL you can share with stakeholders.

Start with Module 1 — free View full curriculum