Stop Following the Institutions, Follow your Actual Patients: The Case for Patient-Authorized RWE

Blog
Author
Christine Hung
Novellia
Date Published
February 10, 2026

If you’re a biostatistician, epidemiologist, or RWE/HEOR leader in pharma, you know this feeling well:

Your medical director or colleagues message you with sounds like a straightforward question:

  • What’s the real-world duration of response?
  • What happens after discontinuation?
  • What's the time for the next treatment?
  • What's the total cost of care?
  • What are long-term outcomes?

You pull up an RWE dataset from a reputable data vendor. The cohort definition is clean. The structured fields look rich. The sample size is strong.

And then you start analyzing… and the story starts falling apart.

Patients disappear mid-treatment. Lines of therapy look artificially short. Outcomes get censored in ways that don’t match clinical reality. You see a start date but not an end. You see a medication order but not whether the patient actually took it. You see an academic center encounter but miss what happened before and after.

Not because the data is “bad.”

Because the patient’s care didn’t happen in one place.

In real-world evidence, we’ve become comfortable treating single-system electronic health records as our primary window into patient care: major academic medical centers with deep patient networks and broad reach; integrated care delivery networks owned by managed services organizations (MSOs) using the same EHR system with millions of patient records. The promise of “complete” and “rich” data captured through sophisticated clinical documentation and abstraction.

But here’s one big problem that  every experienced RWE researcher knows: comprehensive documentation from a single-system data source, whether it’s a network of clinics all using the same EMR vendor or a large health system with broad coverage, doesn’t equal a comprehensive patient journey.

There’s a fundamental category of research questions that single-system EHR data, no matter how detailed or well-structured and abstracted, simply cannot answer. Not because the clinical documentation lacks quality, but because most patients don’t live their healthcare lives within a single system’s walls.

The Single-System Limitation

EHR-based RWE datasets capture rich clinical detail: lab values, imaging reports, clinician notes, medication orders. This depth is genuinely valuable.

But single-system EHR data comes with structural limitations that compound as you try to answer increasingly sophisticated questions about patient journeys.

The (Lack Of) Cross-Institutional Data Problem

Patients don’t receive care within a single system.

They get second opinions at academic centers while receiving treatment locally. They switch systems due to insurance changes, relocations, or physician referrals.

A patient with breast cancer might initially be diagnosed with early-stage breast cancer at a community oncology clinic, receive treatment planning at an academic center, surgery at a specialty oncology facility, and ongoing monitoring back with their local oncologist, while relevant clinical information is scattered across four separate EHR systems.

To make things even more fragmented, a decade later the same patient may be diagnosed with metastatic breast cancer and receive care for this more complex diagnosis within a completely separate health system from their initial early breast cancer treatment.

When this patient presents for first-line therapy for their metastatic diagnosis, their EHR might contain recent labs and imaging, but the detailed history of their initial diagnosis, surgery and radiation details, adjuvant treatment information, and response to their early-stage treatment is often reduced to a brief summary in a consultation note.

Dates are missing. Regimens are vague. Dosing is absent. Outside records may not have been obtained at all.

Then once the patient progresses on their first-line therapy, they may seek out a second opinion or receive the remainder of their care at a more advanced academic medical center with experience with  newer therapies—meaning we miss details about long-term outcomes like survival, late toxicities, and quality of life exactly when understanding these outcomes matters most.

From a modeling perspective, this shows up in all the familiar ways: artificially short treatment durations, misleading discontinuation signals, and outcomes that are censored not by biology, but by fragmentation.

The Patient Experience Gap

Single-system EHR data is typically collected through institutional business associate agreements (BAA), which means there’s no direct relationship with patients and no mechanism to capture what happens between clinical encounters.

Even when clinical documentation is thorough, the patient’s lived experience remains a black box.

Are patients actually taking their oral medications as prescribed? What symptoms are they experiencing at home? How is treatment affecting their daily functioning?

Some patients journal diligently on their own, but this information rarely makes it back to providers, let alone into the EHR.

Without direct patient engagement, researchers have no way to gather patient-reported outcomes, track medication adherence, or understand the real-world burden of treatment from the patient’s perspective.

The Claims Linkage Challenge

Even when researchers attempt to augment single-system EHR data with claims data for a more complete utilization picture, they face the tokenization black hole.

Traditional linkage requires matching patient identifiers across databases, a process notorious for low match rates (often 40–60% at best) due to data quality issues and missing identifiers.

You start with a promising EHR cohort and lose half your sample in the linkage process, introducing bias before you even begin the analysis.

When You Actually Need the Complete Story

Consider the research questions that pharmaceutical medical affairs and HEOR teams are asking right now:

Understanding Real-World Treatment Effectiveness

For patients with HER2-low metastatic breast cancer receiving a new antibody-drug conjugate (ADC) at an academic medical center, what’s the real-world duration of response?

A single-system EHR might capture the first few months of treatment, but if patients return to community oncologists for maintenance therapy, you’ve lost the signal exactly when durability of response becomes meaningful.

You can’t assess real-world effectiveness when you’re missing a large part of the patient’s “real world” journey.

Characterizing Treatment Sequences Across Settings

How do treatment patterns differ between academic centers and community practices? What factors drive sequencing decisions?

Single-system EHRs can only show you their own patterns, creating fundamental selection bias.

Patients who seek care at major cancer centers aren’t representative of the broader treatment landscape—yet that’s the only population you can study.

Quantifying Healthcare Utilization in Context

For total cost of care analyses, you need to capture all healthcare utilization, not just what happens at one institution or one network.

A patient might receive oncology care at an academic center but visit their local emergency room for complications, get imaging at an outpatient facility, and see specialists at different hospitals.

Single-system data captures only a fraction of the total cost picture.

Attempting to link to claims data through traditional tokenization means losing a substantial portion of your cohort before the analysis even begins.

Assessing Long-Term Safety and Outcomes

Understanding late effects of therapy, second cancers, long-term survival, and quality of life requires following patients across years and across the natural transitions in their care.

Single-system EHRs lose patients at precisely the moments that matter most for understanding long-term outcomes.

The Patient-Authorized Registry Alternative

This is where patient-authorized, longitudinal registries become essential.

These aren’t EHR databases from a single institution or a single EHR vendor. They’re purpose-built data collection platforms where patients actively authorize comprehensive data capture across their entire healthcare journey digitally, regardless of where care occurs.

The fundamental difference is simple: when patients control access to their full healthcare journey, the data becomes more complete.

In a patient-authorized registry:

  • Patients digitally authorize access and linkage to medical records across all their healthcare systems, including academic centers, community hospitals, specialist offices, and imaging facilities. You capture the complete clinical picture, not just one institution or one EHR system.
  • You maintain access to valuable historical data—records from before patients entered your registry, potentially going back 20 years—capturing diagnosis details, prior treatment history, and the clinical context that shapes current decisions.
  • You track patients longitudinally regardless of where they receive care, following them across system transitions, insurance changes, and geographic moves. The patient remains your anchor point, not the health system or EMR vendor.
  • Patients can directly authorize linkage to their insurance claims data digitally, bypassing the entire tokenization process. Instead of losing 40–60% of your cohort to match failures, you maintain near-complete linkage.
  • You can access unstructured medical records, enabling extraction from pathology reports, imaging narratives, clinician notes, and genetic testing results across all sites of care. This allows both capture of nuance invisible in structured fields and development of custom variables and endpoints aligned to sponsor objectives.
  • You can integrate patient-reported outcomes prospectively via the patient facing app or portal, capturing quality of life, symptom burden, functional status, and adherence.
  • You can re-contact patients to fill gaps, clarify outcomes, or collect additional endpoints, enabling iterative data enrichment over time.

Emerging patient-authorized data platforms (like Novellia) are making this model operational at scale—not by replacing traditional RWE sources, but by filling the structural gaps that single-system EHR datasets can’t solve: cross-institution continuity, complete historical records, high-yield claims linkage, and the ability to incorporate patient-reported outcomes prospectively.

If you’re actively running studies where loss-to-follow-up, incomplete treatment history, or low-yield claims linkage is limiting what you can publish or submit, we’d love to compare notes. This is exactly the problem we’re solving at Novellia.

When it comes to RWE research, we strongly believe that putting the patients in the center of everything is the way to go. We follow the patients, not the EHRs or the institutions, and we invite you to join us!

Get Novellia to improve your research

Let’s talk about how our patient authorized data can power your next milestone.