Data Governance Workshop
Overview · 1 of 14
Getting Started

Data Governance Workshop

A practical session to align your team on project structure, governance ownership, naming conventions, and the Amplitude features that keep your data clean at scale.

What You'll Build Today

By the end of this session, you'll walk away with four concrete decisions and artifacts:

Your project strategy
How you structure Amplitude projects — per platform, per BU, aggregate, or Portfolios
Your governance model
A process framework matched to your team size, maturity, and org structure
Your taxonomy & instrumentation standards
Naming conventions, abstraction level, tracking plan setup, governance controls, and AI-assisted cleanup — the full operational layer
Your governance process table
Who owns what, how they do it, and when — ready to share with your team
Full agenda
Project Strategy — choose how to structure your Amplitude project(s)
What is Data Governance? — the pillars, the quality flywheel, and why it matters
Governance Model — pick the ownership and approval model that fits your org
Taxonomy Design — build your naming style guide and explore the abstraction spectrum
Tracking Plan — branches, schema settings, virtual data extensions, and transformations
Governance Controls — DAC tags, schema enforcement, and Observe monitoring
Data Assistant Agent — automate taxonomy cleanup and governance suggestions with AI
Workshop Activity — build your governance process table together
Next Steps — your prioritized action plan and recommended resources
Step 3 of 10

What is Data Governance?

Data governance is the set of standards, processes, and ownership that ensures your analytics data stays accurate, trustworthy, and actionable over time.

📘

Education

Align teams on what to measure, why it matters, and how to name and classify events consistently across the org.

🔧

Instrumentation

Build the right tracking from the start — correct event schemas, property standards, and validation so data arrives clean.

🔄

Maintenance

Continuously review and clean your taxonomy — remove stale events, resolve duplicates, enforce ownership over time.

The Quality Flywheel
🏗️

Define standards

Style guides & tracking plan

🚀

Instrument cleanly

Branches & schema enforcement

🔍

Review & approve

Observe QA, merge requests

🧹

Maintain & prune

Data Assistant Agent, deprecation

📊

Trust the data

Better analysis outcomes

💡
Governance isn't a one-time project — it's an ongoing practice. The flywheel above shows how each step reinforces the next, compounding data quality over time.
Step 2 of 10

Define Your Project Strategy

How you structure Amplitude projects determines your governance scope, cross-team visibility, and reporting flexibility. Choose the pattern that fits your organization.

🗂️ Choose a Project Strategy

Click a strategy to explore how it fits your organization. You can compare all options side-by-side using the toggle above.

📱
Per Platform
Best for platform-differentiated teams
🏢
Per Business Unit
Best for independent product lines
🌐
Aggregate Project
Best for unified analytics
📊
Portfolios
Best for cross-project rollups

💡
Your project strategy is a foundational decision — it shapes everything from your tracking plan scope to how you report cross-team metrics. Most teams start simple (Aggregate or Per Platform) and evolve to Portfolios as they scale.
Step 4 of 10

Choose Your Governance Model

Answer four quick questions to get a recommended model — or select one directly below.

🎯 Model Configurator
1 team
2–5 teams
5+ teams
Nobody owns it formally
A central data/analytics team
Each team manages their own
A cross-functional governance council
Just starting — no formal process
Some standards, inconsistently applied
Established process, needs scaling
Anyone can add events
One person reviews all changes
Team lead approves for each squad
Cross-team committee reviews
🏆 Recommended Model

Program Owner
Approval Process
Effort Level

Resource Checklist for this Model

    💡
    These models aren't rigid — most teams blend elements. The goal is to match governance overhead to your team's capacity and data maturity.
    Step 4.1 of 9

    Build Your Style Guide

    Define your naming conventions once — the preview updates live as you configure your rules.

    🔗 Connect to Live Project optional

    Pull real events from your Amplitude project to auto-detect their naming conventions.

    0 custom events loaded

    Enter your Data tab URL for the project of interest to generate a direct link and a pre-configured bookmarklet.

    1
    Drag this to your bookmarks bar (one-time setup):
    📊 Get Amplitude Events
    2
    Paste your Data tab URL above, then open your Data Tab
    3
    Once Amplitude loads, click the 📊 Get Amplitude Events bookmark — your style guide will be configured automatically
    📊 Style guide configured from 0 events
    Amplitude default & system events excluded · applied to style rules and live preview
    🎨 Style Rules

    Live Preview

    Properties on button_clicked
    🔍 Naming Validator

    Type an event name to check it against your style guide rules.

    Explore how granularity decisions affect your event names and the properties needed to make them useful.

    ⚡ Too Granular ⚖️ Just Right 📦 Too Consolidated
    Step 5.1 of 9

    Tracking Plan & Branches

    The tracking plan is your source of truth. Branches let teams propose changes without breaking production data.

    📋 What Goes in the Tracking Plan

    Events

    • Event name, description, and expected event properties
    • Event ownership — who is responsible for each event
    • Event status: Planned → Live → Unexpected
    • Schema enforcement rules (project-wide)

    User Properties

    • User traits set via identify() calls — e.g. plan_type, country
    • Group properties for account-level analysis (company, org)
    • Persist across sessions — unlike event properties which are per-event

    Shared Resources

    • Property groups — reusable property sets applied across multiple events
    ⎇ Branch Workflow
    1

    Create a branch

    Engineer proposes new events or property changes in an isolated branch

    2

    Instrument against branch

    SDK validation runs against branch schema — catches issues before prod

    3

    Submit merge request

    Auto-notifies reviewers in Slack — shows diff of what changed

    4

    Review & merge

    Approver accepts or requests changes — clean merge into main

    Schema enforcement is configured project-wide — the same rules apply across all your sources. Observe then lets you filter violations by source to pinpoint which SDK or integration is sending bad data. These settings are the rules engine that powers Amplitude Observe.

    👁️
    How this connects to Observe: Your schema settings define what "good data" looks like. Amplitude Observe (found at Data → Events) continuously compares your live event stream against those rules — surfacing unexpected events, missing required properties, type mismatches, and volume anomalies in real time. No code needed.

    Unexpected Events

    Mark as Unexpected

    Event is ingested but flagged in Observe for review

    What Observe shows: Events appear in Amplitude tagged "Unexpected" with a distinct status indicator. They're queryable but clearly signal the event isn't in the approved tracking plan. Great for discovery without data loss — your team can review and add to plan directly from the Observe view.

    Reject at Ingestion

    Unplanned events are dropped at the edge — never ingested

    ⚠️ Caution: Data is permanently lost — Observe will not see it. Best for teams with mature tracking plans where any unapproved event is genuinely invalid. Requires high confidence in your tracking plan completeness.

    Unexpected Properties

    Allow

    Unexpected properties are ingested and visible

    Best for discovery phases. New properties sent from the SDK appear in Amplitude and can be retroactively added to the tracking plan. Observe will show them as part of the event stream.

    Mark as Unexpected

    Property is ingested but flagged for review

    Ingested and queryable, but tagged in Observe so data governors can review and approve or block. Good balance of control and data safety.

    Reject

    Unexpected property values are dropped

    ⚠️ Caution: Property values are permanently lost and won't appear in Observe. Best for properties with strict PII or compliance requirements.

    Property Type Validation

    Required Properties

    Mark properties as required in the tracking plan. Observe surfaces events where required properties are missing — the "% Seen" column turns red when a required field is absent, making gaps immediately visible.

    Type Enforcement

    Define expected type (string, number, boolean, array) per property. Observe flags type mismatches in red — preventing silent bugs where revenue arrives as a string.

    👁️ What Observe Surfaces (Data → Events)
    Valid
    Event matches your current schema exactly
    Unexpected
    Event not yet in the tracking plan — review and add, or reject
    Invalid
    Event deviates from schema — missing required props or wrong types
    Out of Date
    Event matches a previous version of the schema — SDK not yet updated

    Enrich your taxonomy without re-instrumentation — these three features create new data dimensions retroactively, no SDK changes required.

    Custom Events

    Combine multiple existing events with an OR clause into a single reusable metric. Useful when multiple actions represent the same user intent — e.g. "Play Song" OR "Search Song" as a unified engagement event.

    Key constraints

    • Appear with a [Custom] prefix in charts
    • Editing breaks charts that reference them
    • Event property queries only work if the property exists on all component events
    • Not supported in Redshift queries
    🧮

    Derived Properties

    Compute new properties on the fly from existing event or user properties using formulas — no new data required. Calculated retroactively, so they update historical charts automatically.

    Supported functions

    • String: REGEXEXTRACT, CONCAT, SPLIT, LOWERCASE
    • Math: SUM, MULTIPLY, DIVIDE, MIN, MAX
    • Date: EVENT_HOUR_OF_DAY, DATE_TIME_FORMATTER
    • Conditional: IF, SWITCH, COALESCE
    • Max 10 property references per derived property
    📡

    Channel Classifiers

    Categorize traffic by UTM parameters and referrer data into named marketing channels — computed on the fly, aligned to GA4's 29-category model by default. Retroactive definitions update all existing charts.

    Key details

    • Default classifier includes Paid/Organic Search, Social, Email, Display, LLM Search, and more
    • Define custom channels by building row-based rules (all conditions in a row must be true)
    • Use OR logic by adding separate rows with the same channel name
    • Max 149 rows and 1,000 total cells per classifier

    Fix common instrumentation mistakes retroactively — no code changes, no re-deployment. Transformations apply at query time and leave your raw data untouched.

    💡
    When to use: Transformations are ideal for correcting data quality issues that are too costly or slow to fix in code — renaming inconsistent events, merging duplicates, or standardizing property values across historical data.
    🔀 Merge Events

    Consolidate multiple events that represent the same action into a single event name.

    // Before
    comment_reply_like
    comment_share
    // After merge
    comment (comment_type: "like" | "share")

    Optionally add a distinguishing property to preserve the original intent.

    🏷️ Merge Properties

    Combine properties with different names that represent the same dimension.

    // Before
    title
    TITLE
    item_title
    // After merge
    Title

    Works for both event properties and user properties.

    ✏️ Rename Property Values

    Correct misspellings or standardize inconsistent casing in property values.

    // Before
    paid_subscription: "true"
    paid_subscription: "TRUE"
    // After rename
    paid_subscription: "True"

    Reassigns specific values — does not affect all values of the property.

    🙈 Hide Property Values

    Remove unwanted or noisy values from charts and dropdowns without deleting raw data.

    // Hide test/internal values
    user_type: "(none)"
    user_type: "test_user"
    // Hidden from charts,
    // visible in event stream

    Raw data preserved — reversible at any time.

    ⚠️ Important Constraints
    • Only available on the main branch — not on feature branches
    • Default Amplitude user properties (e.g. platform, country) cannot be transformed
    • Transformed properties are not available for block/drop filters in Data Management
    • Transformations apply at query time — raw data in connected warehouses is unchanged
    • All transformations are non-permanent and can be edited or deleted anytime
    🔗
    Find transformations in Amplitude under Data → Transformations. Requires Manager or Admin role to create.
    Step 7 of 10

    Governance Controls

    Classify your properties, control who sees sensitive data, and manage event lifecycle.

    Property Classification
    Property Example Values Classification What Restricted Users See
    user_email alice@company.com PII Property hidden from charts and user streams
    plan_mrr $1,200 Revenue Values masked — "Restricted" shown in place of value
    device_id a1b2c3d4… Sensitive Not available in group-by or filters
    plan_type Pro, Growth, Enterprise Standard Fully visible to all users
    experiment_variant control, treatment_a Standard Fully visible to all users
    Event Lifecycle

    Hide vs Block vs Delete

    These are not the same operation — understand the consequences before acting.

    🙈 Hide

    • Event is removed from the event picker UI
    • Data is still ingested and stored
    • Existing charts still work
    • Reversible — unhide anytime
    • Best for: decluttering noise

    🚫 Block

    • New incoming data is dropped at edge
    • Historical data is preserved
    • Old charts using this event still render
    • Partially reversible (unblock re-enables)
    • Best for: stopping bad instrumentation

    🗑️ Delete

    • All historical data is permanently erased
    • New data is also dropped
    • All charts referencing event break
    • Irreversible
    • Best for: compliance / data minimization only

    Data Deprecation

    Automated Cleanup with Data Assistant

    Automated tasks run daily across your workspace — surfacing stale events so data governors can act without manual audits. Requires Admin/Manager permissions and must be enabled by Amplitude Support.

    🕐 Stale Events (90-day default)

    Events not queried in any chart for 90+ days are flagged as candidates. The threshold is configurable. Includes a mandatory 30-day notification window before deletion — event owners are notified via email and Slack so they can save events they still need.

    📅 Single-Day Events (test data)

    Events that fired only on a single calendar day (typically test instrumentation) are automatically identified and scheduled for deletion after the 30-day window. Threshold is configurable.

    ⚠️
    Recovery: Deleted events are restorable anytime via Data → Events → Deleted Events → Restore — this does not recover the data sent to Amplitude while the event was deleted.

    4-Step Deprecation Workflow

    1
    Identify
    Data Assistant surfaces events unused for 90+ days or fired on a single day
    2
    Notify
    Owners notified via email + Slack 30 days before deletion — they can save the event
    3
    Hide
    Remove from picker, confirm no active charts depend on the event
    4
    Delete
    Permanent removal — only for compliance or data minimization requirements
    Step 8 of 10

    AI-Powered Data Governance

    Amplitude's AI agents help you build, maintain, and scale your taxonomy — from first instrumentation through ongoing governance.

    Today — Data Assistant

    Available now. Continuously analyzes your event stream and surfaces issues for review.

    🔍

    Auto-Detect Issues

    Surfaces stale events (90+ days without a query), duplicates, and events missing descriptions or owners.

    Bulk Actions

    Accept or reject suggested changes in bulk — categorize, tag, and clean your taxonomy in minutes, not days.

    📊

    Taxonomy Health Score

    Proactive health scoring shows your taxonomy quality over time — trend it in your quarterly review.

    🤖 Review Suggestions in Data Assistant

    Data Assistant analyzes your live event stream and surfaces the most impactful recommendations — stale events, missing metadata, duplicates, and more — directly in your project.

    🚀 Try the Data Assistant Agent

    The Data Assistant Agent analyzes your taxonomy and creates a prioritized action plan — surfacing stale events, missing metadata, duplicates, and AI readiness improvements. Launch it directly in your project.


    Coming in 2026 — AI Governance Agents

    Three new agents that automate the full governance lifecycle — from initial setup through continuous self-maintenance.

    🚀

    Quickstart Agent

    Scans your product or website during initial setup and suggests a complete tracking plan, then helps you instrument it — getting teams to clean data from day one without starting from a blank slate.

    Setup & Instrumentation
    🤖

    Data Assistant Agent

    Surfaces the top recommended actions to improve data quality on an ongoing basis. Goes beyond today's Data Assistant with richer AI-driven prioritization across four areas:

    • Updating descriptions and other missing metadata
    • Deleting unused and test events
    • Reducing duplicate events
    • Improving AI readiness of your taxonomy
    Ongoing Maintenance
    🔮

    Self-Building Agent

    Identifies new user paths and funnels in your product on a regular cadence, then proactively suggests new events and properties for your review — so your tracking plan evolves as your product does.

    Proactive Discovery
    Step 9 of 10

    Your Governance Process Table

    Fill in this table live during the workshop. This becomes your team's governance runbook — who does what, how, and when.

    📝
    Complete this table with your team now. When you're done, print or screenshot it — this is your governance process document.
    Process Area WHO is responsible? HOW does it work? WHEN does it happen?
    📐 Taxonomy Schema
    ➕ New Events
    🗑️ Event Removal
    🔧 Maintenance Issues
    🏷️ Style Guide
    🔐 Access Control

    Review Cadence
    FrequencyActivityOwnerTool
    WeeklyReview Observe for unexpected events from recent releasesEngineer / PMObserve
    Bi-weeklyReview and approve pending branch merge requestsData StewardBranches
    QuarterlyFull taxonomy health review — accept/reject Agent suggestionsGovernance OwnerData Assistant Agent
    AnnuallyStyle guide review + bulk event cleanupData CouncilBulk Edit + Agent
    Step 10 of 10

    Next Steps

    Your post-workshop action plan. Remove any items that aren't relevant to your team, then share or print this list.

    ✅ Your Post-Workshop Checklist
    • Share the completed governance process table with your team
    • Document your naming conventions from the style guide builder
    • Assign a governance owner or program lead
    • Set up your first branch in Amplitude Data
    • Configure schema enforcement settings for your project
    • Schedule a quarterly Data Assistant Agent review in your calendar