Module 13: Automations & Workflows

Tier 3: Advanced | Estimated time: 5-6 hours | Prerequisites: Foundations + at least 2 Intermediate

What You'll Get Out of This

Every product person has tasks they do manually every week that follow the same pattern. Export a CSV, reformat it, email it. Copy data from one tool to another. Generate the same report with different date ranges. This module teaches you to identify those tasks, scope them as automations, and build Python scripts that eliminate them.

The goal is tangible time savings — not automation for its own sake.

Part 1: The Automation Decision Framework

Before building anything, decide if automation is worth it.

Automate When:

You do the task at least weekly
The steps are the same every time (or close)
Errors in the manual process are costly (missed data, wrong formatting)
The task takes 15+ minutes per occurrence
You can describe every step precisely

Don't Automate When:

You do the task twice a year (just do it manually)
Every instance is different and requires judgment
The tools involved change frequently
The automation would take longer to build and maintain than the manual effort saves
The task involves sensitive decisions that shouldn't be automated

The ROI Test

Time spent manually: ___ minutes × ___ times per month = ___ minutes/month
Estimated build time: ___ hours
Estimated maintenance: ___ minutes/month

Breakeven: [build time] ÷ [monthly savings] = ___ months

If breakeven is more than 3 months, question whether it's worth it. If breakeven is less than 1 month, build it immediately.

Part 2: Python for Automation

Python is the best language for automation scripts because it's readable, has libraries for everything, and AI coding tools generate it fluently.

File Processing

The most common automation: reading a file, transforming the data, and writing a new file.

Build a Python script called format_report.py that:
1. Reads a CSV file from an "input" folder
2. Filters to only rows where status is "Active"
3. Renames columns: "emp_name" → "Employee Name", "dept" → "Department"
4. Sorts by Department, then by Employee Name
5. Adds a "Generated On" column with today's date
6. Writes the result to an "output" folder as a formatted CSV
7. Prints a summary: "Processed X records, Y active, saved to [filename]"

Include error handling:
- If the input folder doesn't exist, create it and print a helpful message
- If the CSV has unexpected columns, print which columns are missing
- If the output folder doesn't exist, create it

Report Generation

Build a Python script called weekly_summary.py that:
1. Reads all CSV files in a "weekly-data" folder
2. For each file, calculates:
   - Total records
   - Records by status (count and percentage)
   - Average processing time
3. Compiles results into a summary markdown file with:
   - Date range covered
   - A table comparing metrics across files
   - A "highlights" section noting any anomalies 
     (e.g., files with >20% error rate)
4. Saves the markdown file to "reports/weekly-summary-[date].md"

Data Formatting

Build a Python script called clean_export.py that:
1. Reads a messy CSV export (inconsistent date formats, extra whitespace, 
   mixed case in text fields)
2. Standardizes dates to YYYY-MM-DD format
3. Trims whitespace from all text fields
4. Normalizes text to Title Case for name fields
5. Removes completely empty rows
6. Validates email format for email columns (flag invalid ones, don't delete)
7. Writes a clean version and a separate "flagged_records.csv" with issues

Part 3: API Integrations

APIs let your scripts interact with other tools — sending Slack messages, creating tickets, pulling data from services.

Sending Slack Messages

Build a Python script that sends a formatted Slack message using
a webhook URL.

The message should include:
- A header: "Weekly Metrics Update"
- 3 key metrics with emoji indicators (green for up, red for down)
- A link to the full report

The webhook URL should come from an environment variable (SLACK_WEBHOOK_URL), 
not hardcoded in the script.

Include error handling for network failures.

The Anatomy of an Automation

Every good automation follows this structure:

1. TRIGGER    → What starts it (manual run, schedule, file change)
2. INPUT      → What data it needs (file, API response, user input)
3. VALIDATE   → Check that the input is good before processing
4. PROCESS    → The actual transformation or work
5. OUTPUT     → Where the results go (file, API call, message)
6. NOTIFY     → Tell someone it finished (log, Slack message, email)
7. LOG        → Record what happened for debugging

Part 4: Error Handling and Logging

The difference between an automation that works once and one that works reliably is error handling.

Try/Except Pattern

import logging
import pandas as pd

logger = logging.getLogger(__name__)

try:
    data = pd.read_csv('input/report.csv')
    processed = data[data['status'] == 'Active']  # your transformation logic
    processed.to_csv('output/clean_report.csv', index=False)
    logger.info(f"Successfully processed {len(processed)} records")
except FileNotFoundError:
    logger.error("Input file not found. Place the CSV in the input/ folder.")
except Exception as e:
    logger.error(f"Unexpected error: {e}")

Tell your AI tool: "Add comprehensive error handling. Every operation that could fail should be wrapped in try/except with a helpful error message."

Logging

Add logging to this script using Python's logging module:
- INFO level for successful operations ("Processed 50 records")
- WARNING level for non-critical issues ("3 records had missing dates, skipped")
- ERROR level for failures ("Could not read input file")

Log to both the console and a file called "logs/automation.log".
Include timestamps in the log format.

Part 5: Scheduling

Manual Trigger (Simplest)

Run the script when you need it:

python3 weekly_summary.py

Cron (Mac/Linux)

Schedule scripts to run automatically:

# Open cron editor
crontab -e

# Run every Monday at 9 AM
0 9 * * 1 cd /path/to/project && python3 weekly_summary.py >> logs/cron.log 2>&1

Windows users: Use Task Scheduler instead of cron. Open Task Scheduler, create a Basic Task, set the trigger to weekly on Monday at 9 AM, and set the action to run python weekly_summary.py in your project directory.

GitHub Actions (Platform-Independent)

Create .github/workflows/weekly-report.yml:

name: Weekly Report
on:
  schedule:
    - cron: '0 9 * * 1'  # Every Monday at 9 AM UTC
  workflow_dispatch:  # Also allow manual trigger

jobs:
  generate-report:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: python weekly_summary.py

Tell your AI tool: "Create a GitHub Actions workflow that runs this script every Monday at 9 AM. Include the ability to trigger it manually."

Part 6: Documentation

Every automation needs a README. Future-you (in 6 months, having forgotten this project) will thank present-you.

# Weekly Summary Report Generator

## What it does
Reads CSV exports from the weekly-data folder, compiles metrics,
and generates a formatted markdown summary report.

## How to run
```bash
python3 weekly_summary.py
```

## Input
Place CSV files in the `weekly-data/` folder. Expected columns:
- id, status, processing_time, created_date

## Output
Generates `reports/weekly-summary-YYYY-MM-DD.md`

## Schedule
Runs automatically every Monday at 9 AM via GitHub Actions.
Can also be triggered manually.

## Configuration
- SLACK_WEBHOOK_URL: Set in .env for Slack notifications
- LOG_LEVEL: Set in .env (default: INFO)

## Troubleshooting
- "File not found": Ensure CSVs are in weekly-data/
- "Missing columns": Check CSV headers match expected format
- "Slack notification failed": Verify webhook URL in .env

Lab: Build an Automation

Identify a real manual task you do at least weekly
Scope it: Write the trigger, input, process, output, and notification steps
Calculate ROI: How long does it take manually? How long to build?
Build it in Python with your AI tool's help
Add error handling and logging
Write a README
Run it on real (or realistic) data and verify the output
Commit to Git with blueprints

Critical Evaluation

Is this automation actually saving time, or did you automate something for fun?
What happens when the input format changes? Is the script robust?
Could someone else run this without your help (does the README cover it)?
What's the failure mode? If the script silently produces wrong output, how would you know?

Go Deeper

Try these prompts in your AI tool to extend your automation skills:

"Add a --dry-run flag that shows what the script would do without actually changing any files"
"Add a progress bar that shows how many records have been processed out of the total"
"Make this script accept the input filename as a command-line argument instead of hardcoding it"
"Add a summary email that sends the results to a specified address using SMTP"

If You Get Stuck

Script runs but produces empty output: Add print statements at each step: print(f"Read {len(data)} records"), print(f"After filtering: {len(filtered)} records"). Find where the data disappears. Common cause: the filter condition doesn't match the actual data values (e.g., checking for "active" when the data says "Active").

"ModuleNotFoundError": The library isn't installed. Run pip install [library-name] or pip3 install [library-name]. If you have a requirements.txt, run pip install -r requirements.txt.

Script worked yesterday but not today: Check if the input data changed format. Check if an API endpoint changed. Check if a file path moved. Add logging that records the input state so you can debug retroactively.

Not sure if the automation is worth it: Use the ROI calculation from Part 1. If you've spent more time building the automation than it would save in 3 months of manual work, consider whether the learning experience was the real value — and whether you should simplify the automation to just the highest-impact part.

Try This

Time yourself doing a manual task you do regularly. Then build an automation for it. Time the automation. Calculate the actual time savings — not the theoretical savings. Was it worth it? Write this up honestly, including the time spent building.

Checkpoint

Built at least one working automation
Automation has error handling (doesn't crash silently)
Automation has logging (you can see what it did)
Written a README documenting what it does and how to run it
Can estimate time saved per week
Can articulate whether the automation was worth building (ROI)

Previous: ← Module 12: Data Products Next: Module 14: Docs, Security, Testing & Shipping →