What Data Curation Actually Means — and Why It Matters for Your Evaluations

Resources
Resources
Apr 20, 2026
What Data Curation Actually Means — and Why It Matters for Your Evaluations

Every upstream data provider claims their data is high quality. Most of them mean something different by it.

For some, quality means comprehensive coverage: every well in every state, every production record filed with the agency. Coverage matters, but it is not the same as curation. A dataset can be comprehensive and still be a mess to work with. Seventy versions of the same operator name. Mislocated wellbores. Lease-level production volumes with no well-level allocation. Target formation fields that are blank half the time and inconsistently entered the other half.

If you have spent time doing PDP valuations, type curve work, or A&D screening on raw state data, you already know the problem. Garbage in, garbage out. At Energy Domain, curation is not a feature we describe in a product brochure. It is the methodology work we do before the data ever reaches your model, and we document every step of it.

The Operator Name Problem

It sounds trivial until you have actually tried to aggregate production across a multi-state dataset. A single E&P company can appear under dozens of name variations across state filings: legal entity names, DBA names, predecessor companies, post-acquisition reporting lags, and simple data entry inconsistencies at the clerk or agency level.

Unresolved operator names mean fragmented data. If you are screening acreage by operator, tracking a company's activity across a basin, or building a comp set for a deal evaluation, missing records from name mismatches mean your analysis is incomplete and you may not know it.

Energy Domain maintains a canonical operator hierarchy that normalizes every variation to a single entity and aliases predecessor names, acquired entities, and reporting inconsistencies across all states. This is not a one-time data cleaning exercise. It is an ongoing maintenance commitment, because operators change names, get acquired, and spin off subsidiaries continuously. We maintain that hierarchy as a core part of the platform, updated as state filings and industry activity create new variations to resolve.

Confirmed Intervals: Where Wells Actually Land

State-reported target formation fields are notoriously unreliable. They are self-reported, often entered at the time of permit application before a well is drilled, and rarely updated to reflect where the well actually landed. In high-activity basins like the Permian, where the same lateral can pass through multiple benches, a self-reported formation field tells you almost nothing useful for type curve segmentation or spacing analysis.

Energy Domain's Confirmed Intervals are a proprietary derived dataset built from actual trajectory data. We digitized directional surveys programmatically using LLMs, then applied a comprehensive formation tops dataset to land every horizontal well at the micro-reservoir level. The result identifies the primary formation where 51% or more of a well's production originates.

In the Permian, that means you can filter by Wolfcamp A, Wolfcamp B, Wolfcamp C/D, Spraberry, Second Bone Spring, Third Bone Spring, and similar targets with specificity that the state-reported field cannot provide. Confidence scores are shown at the well level. Where trajectory data is dense and formation tops are well-constrained, the assignment is high confidence. Where data is sparser, we document the uncertainty rather than obscuring it. This is not a pass-through of the agency's formation field. It is derived, documented, and designed to hold up in a reserves model.

Allocated Production: From Lease to Well

Texas and Louisiana report production at the lease level. When multiple commingled wells produce to the same tank battery, only the lease total gets filed with the state. To get to well-level production, which is what you need for type curve analysis, decline curve modeling, PDP valuation, and reserves estimation, that lease total has to be allocated across the contributing wells.

Our allocation methodology rests on three principles. First, volume conservation: allocated well-level production always sums exactly to the reported lease total. No volume is created or lost, every barrel is accounted for. Second, test-based allocation: well tests provide direct measurements of individual well performance at specific points in time, and those measurements drive the proportional distribution. Third, state-appropriate methods: Texas and Louisiana provide different underlying data, so we apply different approaches. Louisiana's production grouping records include explicit temporal data. Texas does not, so we use a Well Date Finder algorithm that synthesizes completion timing, production peaks, and other signals to estimate when each well began contributing.

Confidence is transparent. Single-well leases are definitive. Multi-well leases with dense test coverage are high confidence. Sparse test data or active drilling introduces more uncertainty, and we document it. Allocations re-run as new well test data arrives, so historical estimates improve continuously as the record builds.

Why This Matters in Practice

The downstream effects of uncurated data compound quickly. A mislocated wellbore throws off spacing calculations. An unresolved name drops wells from your comp set. A missing or incorrect formation assignment skews your type curve by mixing Wolfcamp A completions with Wolfcamp B. Lease-level production reported as well-level inflates individual well performance and distorts decline curve inputs.

Each error is individually small. In aggregate, across hundreds or thousands of wells in a typical A&D screening or portfolio review, they add up to a materially different picture of the asset. That difference has real consequences when capital is being deployed against the analysis.

Energy Domain's full data dictionary is available to all subscribers. It documents the methodology behind every derived field, including Confirmed Intervals, allocated production, and operator aliasing, so your engineering team knows exactly what assumptions are baked in before those numbers go into a model.

Reach out, or book a walkthrough at energydomain.com.