Module 12: Data Products

Tier 3: Advanced | Estimated time: 5-7 hours | Prerequisites: Foundations + at least 2 Intermediate

What You'll Get Out of This

Most product and BI roles revolve around data. This module connects your building skills to data workflows — querying, transforming, and displaying data in useful ways. You'll build data product prototypes using synthetic datasets, understand data architecture patterns, and learn how to hand off a working prototype to engineering for production integration.

Important note on data and AI tools: Before feeding any organizational data into AI-assisted tools like your AI tool, check your organization's data governance and AI usage policies. Different organizations have different rules about what data can be processed through AI tools. This course uses only synthetic and sample datasets. When you're ready to work with real data, have the policy conversation first.

Part 1: What's a Data Product?

A data product is anything that takes raw data and turns it into something people use to make decisions. Dashboards, reports, query tools, data catalogs, quality scorecards — all data products.

The pattern is always the same:

Data Source → Query → Transform → Display

Data Source: Where the data lives (database, CSV, API, warehouse)
Query: Asking for specific data (SQL, API calls, file reads)
Transform: Cleaning, aggregating, calculating (averages, totals, trends)
Display: Showing it in a useful way (tables, charts, cards, alerts)

Part 2: Data Architecture Concepts

You don't need to build a data warehouse. But you need to understand the patterns so you can speak the same language as your data engineering team and make informed product decisions.

The Layer Cake

Modern data architectures typically have these layers:

Sources → Raw data from operational systems (CRM, billing, analytics, support)

Staging (stg_) → Raw data copied into the warehouse with minimal transformation. Think of this as "the filing cabinet" — organized but unprocessed.

Intermediate (int_) → Cleaned and joined data. Deduplication, type casting, joining related tables. This is where data starts becoming useful.

Marts / Facts and Dimensions:

Fact tables (fct_) — Events that happened: orders placed, features shipped, requests submitted. Each row is an event with a timestamp.
Dimension tables (dim_) — Context about entities: customers, products, team members. Each row describes a thing.

Consumption → What end users see: dashboards, reports, data products.

Why This Matters for You

When building data products, you need to know:

Which layer to query (don't build on top of staging data — it's raw and unreliable)
How fact and dimension tables join (fact tables reference dimension tables via IDs)
What transformations have already been done (so you don't duplicate logic)
Who owns the data model (your data engineering team)

Part 3: Building with Sample Data

Since you'll prototype with synthetic data, let's create a realistic dataset.

Creating Synthetic Data

You can use the included dataset at examples/sample-data/feature-requests.json (50 realistic records ready to go — you'll create this file as part of the exercise, or find it in the examples folder), or create your own using a prompt like this:

Create a JSON file called sample-data.json with 50 records simulating 
feature requests for a data platform team.

Each record should have:
- id: auto-incrementing number
- title: realistic feature request title (e.g., "Add date range filter to usage dashboard")
- description: 1-2 sentence description
- submitter: realistic name
- submitter_team: one of ["Engineering", "Product", "Finance", "Operations", "Marketing"]
- priority: one of ["P0", "P1", "P2", "P3"] (weighted: P0 rare, P2/P3 common)
- status: one of ["New", "In Review", "Approved", "In Progress", "Shipped", "Declined"]
- submitted_date: ISO date within the last 6 months
- resolved_date: ISO date (null for unresolved, within 1-60 days of submitted for resolved)
- category: one of ["Dashboard", "Pipeline", "Data Quality", "Access Control", "Documentation"]
- effort_estimate: one of ["Small", "Medium", "Large", "X-Large"]

Make the data realistic — include patterns like:
- P0s are mostly resolved quickly
- Some old requests are still "In Review" (stuck)
- Engineering submits the most requests
- Dashboard and Pipeline are the most common categories

This gives you a rich dataset to build on without any real organizational data.

Building a Data Product Prototype

Build a Feature Request Analytics dashboard using the data from 
sample-data.json.

Components:
1. SUMMARY CARDS (top):
   - Total requests
   - Open requests (not Shipped or Declined)
   - Average days to resolution
   - Requests submitted this month

2. CHARTS:
   - Bar chart: Requests by category
   - Stacked bar chart: Status distribution by team
   - Line chart: Requests submitted per month (trend)
   - Donut chart: Priority distribution

3. DATA TABLE (below charts):
   - All requests, sortable and filterable
   - Filters: status, priority, team, category, date range
   - Click to expand full description
   - Color-coded priority badges

4. INSIGHTS SECTION:
   - "Stuck requests": In Review for >30 days
   - "Top requesters": Teams with most submissions
   - "Resolution speed": Average days by priority

All charts and cards should update when filters are applied.
Single HTML file, use Chart.js for visualizations.

Part 4: SQL in Context

SQL (Structured Query Language) is how you ask databases for data. You don't need to become a SQL expert, but you need to read it and write basic queries — because many data products include SQL, and understanding it helps you direct your AI tool effectively.

Reading SQL

SELECT
    category,
    COUNT(*) as request_count,
    AVG(DATEDIFF(day, submitted_date, resolved_date)) as avg_resolution_days
FROM feature_requests
WHERE status != 'Declined'
    AND submitted_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY category
ORDER BY request_count DESC;

Translation: "Give me the count and average resolution time for each category, excluding declined requests from this year, sorted by most requests first."

The SQL Building Blocks

SELECT — What columns you want
FROM — What table to query
WHERE — Filter conditions
GROUP BY — Aggregate by a dimension
ORDER BY — Sort the results
JOIN — Combine data from multiple tables

Asking Your AI Tool to Write SQL

Write a SQL query that answers: "Which teams have the most P0 requests 
that are still unresolved, and what's the average age in days?"

The table is called feature_requests with columns: id, title, 
submitter_team, priority, status, submitted_date, resolved_date.

Unresolved means status is not 'Shipped' and not 'Declined'.
Age in days = today's date minus submitted_date.

Your AI tool writes the SQL. You review it against your understanding. Does the logic match the question?

Part 5: Production-Readiness (for Handoff)

Your prototype works with sample data. Getting it to production — connected to real data sources — is typically engineering work. But you can make the handoff smooth by documenting exactly what's needed.

How Connections Work

In production, the app would connect to a database using a connection string:

postgresql://username:password@host:port/database

Or a warehouse like Snowflake:

snowflake://account.region/warehouse/database/schema

(Exact format varies by driver and environment.)

These credentials live in environment variables (.env files), never in code, and never in Git.

The Handoff Document

When you hand a data product prototype to engineering, they need:

# Data Product Handoff: Feature Request Analytics

## What This Is
A dashboard prototype showing feature request analytics. Currently 
running on synthetic data. Needs production data connection.

## Data Requirements
- Source table: feature_requests (or equivalent)
- Required columns: [list]
- Refresh cadence: Daily (overnight refresh is fine)
- Data volume: ~500 records currently, expected to grow to ~5,000

## Queries
[Include the SQL queries your prototype uses, clearly commented]

## Access Control
- Who should see this: Product team leads, engineering managers
- Sensitive data: No PII. Submitter names are internal employees.

## Environment
- Currently: Single HTML file with Chart.js, data from JSON
- Production needs: Database connection, server for queries, 
  authentication layer

## Open Questions
- Should this use the existing BI tool (Tableau/Looker) or be standalone?
- Where should this be hosted?
- Who owns data model changes if the underlying schema evolves?

Lab: Build a Data Product Prototype

Create a synthetic dataset relevant to your role using the technique from Part 3
Build a dashboard with at least:
- 3 summary metrics
- 2 different chart types
- A filterable data table
- An insights or anomaly section
Write at least 2 SQL queries that answer specific business questions about your data (even if running against JSON — practice the SQL thinking)
Write a handoff document describing what engineering would need to connect this to production data
Commit everything including the synthetic data, the prototype, and the handoff doc

Critical Evaluation

For data products, the stakes are higher because people make decisions based on what you show them:

Are the numbers correct? Verify aggregations manually on a few records.
Does the "average" mean what you think it means? (Mean vs. median matters)
Are edge cases handled? What about null dates, zero values, or empty categories?
Would you make a business decision based on this dashboard's output?
Is the prototype misleading in any way? (Cherry-picked date ranges, incorrect denominators)

Go Deeper

Try these prompts to extend your data product skills:

"Add a 'Data Health' section that flags records with missing dates, duplicate titles, or null values"
"Write a SQL query that finds the requests stuck longest in 'In Review' status and explain what each clause does"
"Create a second view of this dashboard optimized for a mobile/tablet screen"
"Add a 'What If' simulator: if we approved all P1 requests, what would the status distribution look like?"

If You Get Stuck

Charts aren't showing data: Open the browser console (Right-click → Inspect → Console). Chart.js errors usually tell you exactly what's wrong — often a mismatch between the data format and what the chart expects. Paste the error into your AI tool.

Filters don't update charts: This is a state management issue (Module 9). The charts are likely reading from the original data instead of the filtered data. Tell your AI tool: "When I apply a filter, the table updates but the charts don't. Make the charts re-render using the same filtered dataset as the table."

SQL looks intimidating: Read it bottom-to-top: the last line (ORDER BY) is easiest, GROUP BY tells you what's being aggregated, WHERE tells you what's filtered, SELECT tells you what columns you'll see. Ask your AI tool: "Explain this SQL query line by line in plain English."

Not sure if your numbers are right: Pick 3-5 records from your sample data. Calculate the expected result by hand (on paper or in a calculator). Compare to what the dashboard shows. If they don't match, you've found a bug — tell your AI tool exactly which number is wrong and what it should be.

Try This

Build a "data quality scorecard" for your sample dataset: what percentage of records have all fields filled in? What's the distribution of time-to-resolution? Are there any obvious data anomalies (future dates, negative values, duplicates)? This exercise teaches you to look at data critically — a skill that transfers directly to production data work.

Checkpoint

Created a realistic synthetic dataset (30+ records)
Built a data product prototype with charts, table, and summary metrics
Filters work correctly across all components
Wrote at least 2 SQL queries and can explain them
Wrote a production handoff document
Verified calculations manually for at least 3 metrics
Can explain the staging → intermediate → mart data architecture pattern

Previous: ← Module 11: UI Design Principles for Builders Next: Module 13: Automations & Workflows →