GSF: Maturity Assessment

Explore the Assessment →

GSF runs on dbt. That means the governance signal is already in the pipeline — every model documented or not, every test in place or missing, every data owner assigned or absent. We built a tool to read it.

Why Not Just Ask?

Every enterprise data governance program eventually produces a maturity assessment. They follow roughly the same playbook: distribute a questionnaire to data owners, score the responses against a framework (DCAM, DAMA, an internal rubric), and arrive at a number between one and five that summarizes where the organization stands.

The problem is not the frameworks. The problem is the input.

Self-reported assessments are optimistic by design. A team that built its pipeline under deadline pressure will rate their documentation practices higher than the documentation warrants. A data owner who has not looked at test coverage in six months will still check the box for data quality processes in place. These are not lies — they are the natural result of asking people to evaluate their own work, disconnected from what the pipeline actually contains.

For GSF, we wanted to know what the pipeline actually said — not what we thought it said.

The Signal Was Already There

Because GSF is built on dbt, its manifest.json already contained a complete, machine-readable picture of the pipeline — every model, column, test, source, relationship, and metadata tag captured automatically at build time.

A column either has a description or it does not. A model either has an owner tag or it does not. A field either has a test or it does not. These are not impressions — they are facts about the pipeline as it exists right now.

We built an assessment tool that reads those facts and converts them into a governance maturity score across five dimensions. No survey. No workshop. No consensus required.

The Five Dimensions

We scored GSF across five governance dimensions, each grounded in signals the manifest already contains:

Semantic

Are concepts documented? Column descriptions, business-facing definitions, and synonym declarations tell Cortex Analyst and downstream consumers what a field actually means. Without them, an AI query can produce a syntactically valid answer that is semantically wrong. A high Semantic score means the pipeline has done the work of making data interpretable — not just accessible.

Quality

Are data quality tests in place? dbt’s native test framework covers null checks, uniqueness constraints, referential integrity, and accepted-value validation. Test coverage at the column level is measurable directly from the manifest. A pipeline with no tests has no feedback loop — and no way to detect the silent failures that erode trust in reported numbers.

Lineage

Are sources and dependencies documented? A complete lineage graph — sources declared, models connected, transformations traceable — is the foundation for impact analysis, audit readiness, and incident response. The manifest DAG encodes this directly; gaps in source declarations are gaps in lineage.

Stewardship

Are data owners and stewards assigned? meta.owner and meta.team tags in dbt model YAML assign accountability to the data. Without stewardship metadata, no one is formally responsible for a model when it breaks, a definition changes, or a compliance question arrives. The manifest reveals which models have accountable owners and which are orphaned.

Access

Are sensitivity classifications applied? PII tags, sensitivity level metadata, and access tier declarations in the manifest provide the machine-readable basis for access policy enforcement. A field flagged as PII can be automatically restricted, masked, or audited. A field with no classification cannot be governed — because governance tools do not know it needs to be.

How We Tested

We built a browser-based tool that parses a dbt manifest.json and scores it across the five dimensions — no server-side processing, no data leaving the environment. For each dimension, the tool extracts the relevant signals, computes coverage ratios, and maps the result to a 1-5 maturity scale.

We ran GSF’s manifest through it. A score of 1 means a dimension is largely absent. A score of 5 means it is systematically applied. The output is a dimension-by-dimension breakdown with an overall maturity profile. The manifest said what it said.

What We Found

The GSF Semantic Pipeline was designed with governance as a first principle, not an afterthought. Four tiers of transformation — Bronze, Silver, Naive Gold, Semantic Gold — each add structure, documentation, and accountability. The assessment scores reflect that design investment.

GSF Semantic Pipeline governance maturity assessment results: overall score 2.3 out of 5 Reactive, with dimension scores for Lineage 2.0, Quality 3.5, Access 1.0, Semantic 4.0, and Stewardship 1.0

Semantic scored highest at 4.0 — the pipeline was purpose-built to resolve the 11 ambiguities that arise when three legacy source systems describe the same financial reality differently. Quality at 3.5 reflects the test coverage applied across models. Lineage, Access, and Stewardship show where real governance gaps remain: source declarations are incomplete, PII classification is absent, and ownership metadata has not been systematically applied. Those are the same gaps that matter most when AI tools start querying the data.

Explore the live demo →

Key Takeaways

1. The manifest does not lie.

If a column has no description, the score reflects that — regardless of what the team said in a governance workshop. Running GSF’s manifest made the gap between design intent and actual coverage concrete and measurable.

2. Artifact-driven assessment scales.

Survey-based maturity reviews require coordination and periodic repetition. A manifest-driven score runs in seconds, updates with every pipeline change, and requires no human consensus. For GSF, that means governance is observable at any point in the development cycle.

3. Governance gaps are specific, not vague.

The five-dimension breakdown tells you exactly which dimensions are weak and which models are the problem. That specificity is what turns a maturity score into a prioritized backlog.

Built on the GSF dbt pipeline — part of the GSF Semantic Pipeline project.

View on GitHub →

See the full GSF data governance story →