
The traditional approach to causal inference has relied heavily on Randomized Controlled Trials (RCTs). While well-designed experiments and A/B tests have proven to be highly effective, applying traditional machine learning methods to causal questions has always been challenging.
Even the most advanced machine learning and deep learning models excel at learning relationships, often complex and non-linear ones, but they are fundamentally designed to answer associational questions, not causal ones. Questions such as: Is variable X related to outcome Y? can be answered well. However, questions like Is this relationship causal? or What confounders influence both X and Y? are outside the scope of standard predictive models.
An oft-repeated aphorism goes, “correlation does not imply causation.” But what, then, does? While multiple definitions of causation exist, a commonly accepted one, popularized by Judea Pearl, is that X is a cause of Y if intervening to change X results in a change in Y, all else being equal.
There is extensive literature on how to define and identify causal relationships. In this article, I introduce EconML, a class of models that use machine learning techniques to estimate causal effects from observational data under well-defined assumptions. I also present a real-world use case from one of our clients, where we applied EconML models to inform product color strategy.
EconML brings machine learning into causal inference by carefully separating prediction from causation. Instead of directly modeling outcomes alone, these methods explicitly model how treatments are assigned and how outcomes behave and then use that structure to estimate what would have happened under alternative decisions.
In practice, EconML models control confounders variables that affect both the treatment and the outcome and apply machine learning to estimate the causal impact of a treatment on an outcome, rather than merely learning correlations.
Consider a simple example. You notice that lakes tend to freeze on days when it snows. Looking at the data, snow and frozen lakes are strongly correlated. But does that mean snow causes lakes to freeze?
In reality, both snow and frozen lakes are driven by a third factor: temperature.
A traditional machine learning model might implicitly learn: Snow → Frozen Lake
But the actual causal structure is:
Temperature → Snow
Temperature → Frozen Lake
EconML approaches this problem by explicitly controlling for temperature when estimating the effect of snow.
The model first learns:
- How likely it is to snow given the temperature
- How likely a lake is to freeze given the temperature
It then removes these effects and asks:
Holding temperature constant, does forcing snow to occur change whether the lake freezes?
The answer, unsurprisingly, is very little.
This distinction between observing patterns and estimating the effect of an intervention is central to causal machine learning.
A leading consumer electronics company in the audio equipment space was interested in understanding the impact of introducing limited-edition (LE) colors on overall product sales. Specifically, they wanted to study:
- Incremental uplift from new color introductions
- Cannibalization effects on core products
- Sensitivity of sales to the number of colors offered.
The company, with annual revenues ranging between $3-5B, was considering a strategy that involved launching multiple limited-edition colors and needed data-driven insights to guide these decisions.
Among other approaches, we experimented with DRLearner, one of the causal models available in EconML, to address these questions. After carefully defining the treatment variables (number of limited-edition colors), outcome variables (sales), and relevant confounders (such as seasonality, promotions, and product maturity), we conducted a sensitivity analysis to estimate the causal impact of color breadth.
Conditioning on observed confounders and varying only the treatment, we found that introducing more than three limited-edition colors at a time did not lead to additional incremental uplift in sales for certain product categories.
Where data allowed, we were also able to assess relative color popularity and substitution patterns across products using these causal approaches.
It is important to note that this class of problems comes with important assumptions and limitations. Causal machine learning models require:
- Sufficient data across treatment contrasts
- Adequate overlap in propensity scores between treated and untreated observations
In cases where certain contrasts were absent or data was sparse, estimates relied heavily on extrapolation and were therefore less reliable. While DRLearner proved useful in parts of our analysis, it is worth emphasizing that there is no free lunch in causal modeling. Not every business question is best answered using EconML or causal machine learning methods, and forcing these approaches when the data does not support the required assumptions can lead to misleading conclusions.
In some settings, alternative approaches such as carefully designed experiments, structural models, or descriptive and constraint-based analyses may be more appropriate. The key is to match the method to the problem, rather than the other way around.
Interested in discussing this further? Let’s collaborate to unlock new opportunities in marketing attribution. At Acies Global, we work closely with our clients to identify high-ROI use cases, define problems sharply, apply appropriate algorithms and technology stacks, and deliver scalable solutions that support real-world decision-making.