September 12, 2025
The value of unstructured data for drug companies
by Tim O'Connell 5 minutes
Unstructured data comprises most of all data in electronic health records (EHRs), but unlike structured data (e.g., medical codes, discrete measurement values, etc.), it doesn’t reside in neat, standardised fields. Instead, it’s found in clinician notes, diagnostic summaries, referral letters, and patient communications – often in free-text or image format. This data provides deep insights into a patient’s medical history, comorbidities, disease progression, and treatment responses that pharmaceutical researchers can use to accelerate the development of drugs that improve and save lives.
Historically, this type of information has been difficult and costly to extract at-scale. Manual chart reviews are labour-intensive and error prone; and early-generation software tools like generic natural language processing (NLP) fall short when applied to the domain specific language of healthcare. Without the ability to integrate and cross-reference unstructured content across disparate systems, pharmaceutical researchers have struggled to identify more subtle or longitudinal patterns that could inform drug development.
Fortunately, advancements in medical AI now allow pharmaceutical companies to leverage unstructured clinical data at scale.