June 25, 2026

Clean Before You Compute: Why Payers Need a Data Foundation Before AI by Kim Perry · 5 min

Executive Summary

Most payer AI initiatives fail because of poor data quality, not poor algorithms
Between 60 and 80 percent of healthcare data exists as unstructured information
Risk stratification depends on complete and contextual clinical data
A clinical data mart should serve as a source of truth, not simply a storage location
Organizations that clean data before deploying AI achieve stronger outcomes and better ROI

Why Are So Many Healthcare AI Initiatives Falling Short?

Healthcare organizations continue to invest heavily in AI, however:

According to McKinsey, more than 70 percent of healthcare organizations are piloting or deploying AI initiatives. Yet only 19 percent report strong results in improving clinical diagnoses, and only 38 percent report success with risk stratification efforts.
Generative AI in healthcare: Adoption trends and what’s next

The issue is rarely a lack of ambition. The issue is data.

Most healthcare organizations attempt to build advanced models before addressing the quality and usability of the information feeding them. Without clean inputs, even sophisticated AI systems produce unreliable outputs.

What Happens When Foundational Data Is Weak?

Many payer organizations rely heavily on structured fields. Unfortunately, structured fields do not always tell the complete story. Important clinical context often remains hidden in:

Physician notes
Care management documentation
CCDs
Discharge summaries
Call center interactions
Social determinants of health assessments

Many systems treat structured fields as fact even when those fields are incomplete, outdated, or copied forward from previous encounters. The result is:

Missed risk indicators
Delayed interventions
Failed audits
Poor outreach targeting
Increased costs

Why Unstructured Data Matters for Risk Stratification

Risk stratification is one of the most important capabilities in value-based care. It depends on identifying members who require intervention before costly events occur.

The challenge is that many of the strongest indicators of risk are buried in unstructured documentation. Consider a member with heart failure. Claims data may reveal:

Diagnosis codes
Medication fills
Prior admissions

But clinical notes may reveal:

Medication side effects
Social isolation
Functional decline
Symptoms worsening over time

These factors often drive outcomes more than the diagnosis itself. When organizations cannot access this context, risk models become less effective.

Comparison: Weak Data Foundation vs Strong Data Foundation

Capability	Weak Data Foundation	Strong Data Foundation
Risk Stratification	Incomplete	Context-rich
Care Management	Reactive	Proactive
AI Performance	Inconsistent	Reliable
Clinical Context	Limited	Comprehensive
Operational Efficiency	Lower	Higher
Quality Reporting	Incomplete	Accurate
Population Health Programs	Delayed	Timely
Return on Investment	Reduced	Improved

What is a Clinical Data Mart?

Many organizations think of a data mart as a storage repository. That mindset limits its value.

A modern clinical data mart should function as a trusted source of truth. It should:

Validate structured data
Capture unstructured data
Normalize terminology
Support analytics
Enable AI workflows
Create reusable clinical intelligence

The goal is not simply storing data. The goal is making data usable.

The Payer Data Readiness Checklist

Before investing in new AI initiatives, payer organizations should answer “yes” to the following questions.

Data Visibility

Do we know what data we collect?
Do we know where that data originates?
Have we identified missing data sources?

Data Quality

Is data complete?
Is data accurate?
Is data standardized?
Are duplicate records identified?

Unstructured Data

Are physician notes accessible?
Are CCDs incorporated?
Are narrative clinical documents searchable?
Is unstructured data incorporated into analytics workflows?

AI Readiness

Are key clinical variables extracted before storage?
Are data quality metrics monitored?
Can outputs be traced back to source data?
Are governance policies documented?

Operational Readiness

Are success metrics defined?
Are business outcomes measurable?
Is ROI tracked?

Payer AI Decision Matrix

Question	If YES	If NO
Can you trust the data?	Proceed with AI initiatives	Improve data quality first
Can you access unstructured data?	Expand analytics capabilities	Address data accessibility gaps
Can you trace outputs to source information?	Scale AI use cases	Improve transparency
Can you measure outcomes?	Continue investment	Establish KPIs
Can your data support risk stratification?	Optimize population health programs	Strengthen data foundation

How to Build a Smarter Data Pipeline

Kim Perry recommends a practical approach rather than a complete platform replacement. Organizations should:

Run a data scan to identify gaps and inconsistencies.
Remove unnecessary or low-value data.
Apply NLP to extract meaningful variables before storage.
Standardize terminology and formatting upstream.
Establish governance around quality, refresh rates, and auditability.

These are not advanced techniques. They are foundational practices. Yet many organizations skip them.

Why This Matters for Payers

Every major payer decision depends on data. Examples include:

Risk adjustment
Quality scores
Care management
Provider contracting
Outreach programs
Population health initiatives

When data quality improves, organizations can achieve:

Better risk scoring
Fewer readmissions
Improved HEDIS performance
Stronger STAR ratings
Faster access to actionable insights

The benefits extend beyond AI. They improve the entire organization.

Final Word: Clean Before You Compute

Everyone wants scalable AI. Few organizations want to do the foundational work first. The reality is simple.

AI cannot fix broken data, but clean data can dramatically improve the value AI delivers. Organizations that invest in data quality, governance, and clinical context today will be better positioned to succeed tomorrow.

About the Author

Kim Perry is Chief Growth Officer at emtelligent and President of Women’s Health Leadership TRUST. She has spent her career helping healthcare organizations improve outcomes through technology, operations, and data-driven decision-making. Her work focuses on helping payers, providers, and healthcare innovators unlock value from clinical data using AI and automation. Based on original insights shared in Healthcare Business Today.

Resources

Medical Language Engine
https://emtelligent.com/medical-language-engine/

Clinical Workflow
https://emtelligent.com/products/clinical-workflow/

Payers
https://emtelligent.com/payers/

Features
https://emtelligent.com/features/

About emtelligent
https://emtelligent.com/about