modeling is the top of analytics worth. It doesn’t deal with what occurred, and even what will occur – it takes analytics additional by telling us what we must always do to alter what will occur. To harness this additional prescriptive energy, nevertheless, we should tackle a further assumption…a causal assumption. The naive practitioner is probably not conscious that shifting from predictive to prescriptive comes with the bags of this lurking assumption. I Googled ‘prescriptive analytics’ and searched the primary ten articles for the phrase ‘causal.’ To not my shock (however to my disappointment), I didn’t get a single hit. I loosened the specificity of my phrase search by attempting ‘assumption’ – this one did shock me, not a single hit both! It’s clear to me that that is an under-taught element of prescriptive modeling. Let’s repair that!
Whenever you use prescriptive modeling, you make causal bets, whether or not you already know it or not. And from what I’ve seen it is a terribly under-emphasized level on the subject given its significance.
By the top of this text, you’ll have a transparent understanding of why prescriptive modeling has causal assumptions and how one can determine in case your mannequin/method meets them. We’ll get there by masking the matters beneath:
- Temporary overview of prescriptive modeling
- Why does prescriptive modeling have a causal assumption?
- How do we all know if we have now met the causal assumption?
What’s Prescriptive Modeling?
Earlier than we get too far, I wish to say that that is not an article on prescriptive analytics – there’s loads of details about that elsewhere. This portion will probably be a fast overview to function a refresher for readers who’re already a minimum of considerably acquainted with the subject.
There’s a extensively recognized hierarchy of three analytics sorts: (1) descriptive analytics, (2) predictive analytics, and (3) prescriptive analytics.
Descriptive analytics seems to be at attributes and qualities within the information. It calculates developments, averages, medians, normal deviations, and so on. Descriptive analytics doesn’t try and say something extra concerning the information than is empirically observable. Typically, descriptive analytics are present in dashboards and experiences. The worth it gives is in informing the person of the important thing statistics within the information.
Predictive analytics goes a step past descriptive analytics. As a substitute of summarizing information, predictive analytics finds relationships inside the information. It makes an attempt to separate the noise from the sign in these relationships to search out underlying, generalizable patterns. From these patterns, it could actually make predictions on unseen information. It goes additional than descriptive analytics as a result of it gives insights on unseen information, quite than simply the information which might be instantly noticed.
Prescriptive analytics goes a further step past predictive analytics. Prescriptive analytics makes use of fashions created by predictive analytics to advocate sensible or optimum actions. Typically, prescriptive analytics will run simulations by predictive fashions and advocate the technique with probably the most fascinating final result.
Let’s contemplate an instance to higher illustrate the distinction between predictive and prescriptive analytics. Think about you’re a information scientist at an organization that sells subscriptions to on-line publications. You’ve developed a mannequin that predicts that chance {that a} buyer will cancel their subscription in a given month. The mannequin has a number of inputs, together with promotions despatched to the client. Thus far, you’ve solely engaged in predictive modeling. Someday, you get the intense concept that you need to enter completely different reductions into your predictive mannequin, observe the affect of the reductions on buyer churn, and advocate the reductions that finest stability the price of the low cost with the advantage of elevated buyer retention. Along with your shift in focus from prediction to intervention, you will have graduated to prescriptive analytics!
Under are examples of potential analyses for the client churn mannequin for every degree of analytics:

Now that we’ve been refreshed on the three sorts of analytics, let’s get into the causal assumption that’s distinctive to prescriptive analytics.
The Causal Assumption in Prescriptive Analytics
Transferring from predictive to prescriptive analytics feels intuitive and pure. You’ve a mannequin that predicts an vital final result utilizing options, a few of that are in your management. It is sensible to then simulate manipulating these options to drive in the direction of a desired final result. What doesn’t really feel intuitive (a minimum of to a junior modeler) is that doing so strikes you right into a harmful house in case your mannequin hasn’t captured the causal relationships between the goal variable and the options you propose to alter.
We’ll first present the risks with a easy instance involving a rubber duck, leaves and a pool. We’ll then transfer on to real-world failures which have come from making causal bets after they weren’t warranted.
Leaves, a pool and a rubber duck
You get pleasure from spending time outdoors close to your pool. As an astute observer of your atmosphere, you discover that your favourite pool toy – a rubber duck – is often in the identical a part of the pool because the leaves that fall from a close-by tree.

Ultimately, you resolve that it’s time to clear the leaves out of the pool. There’s a particular nook of the pool that’s best to entry, and also you need all the leaves to be in that space so you possibly can extra simply accumulate and discard them. Given the mannequin you will have created – the rubber duck is in the identical space because the leaves – you resolve that it might be very intelligent to maneuver the toy to the nook and watch in delight because the leaves comply with the duck. Then you’ll simply scoop them up and proceed with the remainder of the day, having fun with your newly cleaned pool.
You make the change and really feel like a idiot as you stand within the nook of the pool, proper over the rubber duck, web in hand, whereas the leaves stubbornly keep in place. You’ve made the horrible mistake of utilizing prescriptive analytics when your mannequin doesn’t cross the causal assumption!

Perplexed, you look into the pool once more. You discover a slight disturbance within the water coming from the pool jets. You then resolve to rethink your predictive modeling method utilizing the angle of the jets to foretell the placement of the leaves as an alternative of the rubber duck. With this new mannequin, you estimate how you could configure the jets to get the leaves to your favourite nook. You progress the jets and this time you might be profitable! The leaves drift to the nook, you take away them and go on together with your day a better information scientist!
This can be a quirky instance, but it surely does illustrate a couple of factors nicely. Let me name them out.
- The rubber duck is a basic ‘confounding’ variable. It’s also affected by the pool jets and has no affect on the placement of the leaves.
- Each the rubber duck and the pool jet fashions made correct predictions – if we merely wished to know the place the leaves had been, they may very well be equivalently good.
- What breaks the rubber duck mannequin has nothing to do with the mannequin itself and all the pieces to do with the way you used the mannequin. The causal assumption wasn’t warranted however you moved ahead anyway!
I hope you loved the whimsical instance – let’s transition to speaking about real-world examples.
Shark Tank Pitch
In case you haven’t seen it, Shark Tank is a present the place entrepreneurs pitch their enterprise concept to rich buyers (referred to as ‘sharks’) with the hopes of securing funding cash.
I used to be lately watching a Shark Tank re-run (as one does) – one of many pitches within the episode (Season 10, Episode 15) was for a corporation referred to as GoalSetter. GoalSetter is an organization that enables dad and mom to open ‘mini’ financial institution accounts of their baby’s identify that household and mates could make deposits into. The thought is that as an alternative of giving toys or present playing cards to youngsters as presents, individuals can provide deposit certificates and youngsters can save up for issues (‘objectives’) they wish to buy.
I’ve no qualms with the enterprise concept, however within the presentation, the entrepreneur made this declare:
…youngsters who’ve financial savings accounts of their identify are six occasions extra prone to go to school and 4 occasions extra prone to personal shares by the point they’re younger adults…
Assuming this statistic is true, this assertion, by itself, is all positive and nicely. We are able to take a look at the information and see that there’s a relationship between a toddler having a checking account of their identify and going to school and/or investing (descriptive). We may even develop a mannequin that predicts if a toddler will go to school or personal shares utilizing checking account of their identify as a predictor (predictive). However this doesn’t inform us something about causation! The funding pitch has this delicate prescriptive message – “give your child a GoalSetting account and they are going to be extra prone to go to school and personal shares.” Whereas semantically just like the quote above, these two statements are worlds aside! One is an announcement of statistical undeniable fact that depends on no assumptions, and the opposite is a prescriptive assertion that has a enormous causal assumption! I hope that confounding variable alarms are ringing in your head proper now. It appears a lot extra seemingly that issues like family earnings, monetary literacy of fogeys and cultural influences would have a relationship with each the chance of opening a checking account in a toddler’s identify and that baby going to school. It doesn’t appear seemingly that giving a random child a checking account of their identify will improve their possibilities of going to school. That is like shifting the duck within the pool and anticipating the leaves to comply with!
Studying Is Basic Program
Within the Nineteen Sixties, there was a government-funded program referred to as ‘Studying is Basic (RIF).’ A part of this program targeted on placing books within the houses of low-income youngsters. The aim was to extend literacy in these households. The technique was partially primarily based on the concept houses with extra books in them had extra literate youngsters. You would possibly know the place I’m going with this one primarily based on the Shark Tank instance we simply mentioned. Observing that houses with plenty of books have extra literate youngsters is descriptive. There may be nothing unsuitable with that. However, while you begin making suggestions, you step out of descriptive house and leap into the prescriptive world – and as we’ve established, that comes with the causal assumption. Placing books in houses assumes that the books trigger the literacy! Analysis by Susan Neuman discovered that placing books in houses was not adequate in rising literacy with out extra assets1.
After all, giving books to youngsters who can’t afford them is an effective factor – you don’t want a causal assumption to do good issues 😊. However, you probably have the particular aim of accelerating literacy, you’ll be well-advised to evaluate the validity of the causal assumption behind your actions to appreciate your required outcomes!
How do we all know if we fulfill the causality assumption?
We’ve established that prescriptive modeling requires a causal assumption (a lot that you’re in all probability exhausted!). However how can we all know if the idea is met by our mannequin? When eager about causality and information, I discover it useful to separate my ideas between experimental and observational information. Let’s undergo how we are able to really feel good (or possibly a minimum of ‘okay’) about causal assumptions with these two sorts of information.
Experimental Knowledge
When you’ve got entry to good experimental information on your prescriptive modeling, you might be very fortunate! Experimental information is the gold normal for establishing causal relationships. The small print of why that is the case are out of scope of this text, however I’ll say that the randomized task of remedies in a well-designed experiment offers with confounders, so that you don’t have to fret about them ruining your informal assumptions.
We are able to prepare predictive fashions on the output of a great experiment – i.e., good experimental information. On this case, the data-generating course of meets causal identification situations between the goal variables and variables that had been randomly assigned remedies. I wish to emphasize that solely variables which might be randomly assigned within the experiment will qualify for the causal declare on the premise of the experiment alone. The causal impact of different variables (referred to as covariates) might or is probably not accurately captured. For instance, think about that we ran an experiment that randomly offered a number of crops with varied ranges of nitrogen, phosphorus and potassium and we measured the plant progress. From this experimental information, we created the mannequin beneath:

As a result of nitrogen, phosphorus and potassium had been remedies that had been randomly assigned within the experiment, we are able to conclude that betas 1 by 3 estimate a causal relationship on plant progress. Solar publicity was not randomly assigned which prevents us from claiming a causal relationship by the facility of experimental information. This isn’t to say {that a} causal declare is probably not justified for covariates, however the declare would require extra assumptions that we are going to cowl within the observational information part developing.
I’ve used the qualifier good when speaking about experimental information a number of occasions now. What’s a good experiment? I’ll go over two widespread points I’ve seen that stop an experiment from creating good information, however there’s much more that may go unsuitable. You need to learn up on experimental design if you want to go deeper.
Execution errors: This is likely one of the commonest points with experiments. I used to be as soon as assigned to a mission a couple of years in the past the place an experiment was run, however some information had been combined up relating to which topics acquired which remedies – the information was not usable! If there have been vital execution errors it’s possible you’ll not be capable of draw legitimate causal conclusions from the experimental information.
Underpowered experiments: This will occur for a number of causes – for instance, there is probably not sufficient sign coming from the remedy, or there might have been too few experimental models. Even with good execution, an underpowered research might fail to uncover actual results which may stop you from assembly the causal conclusion required for prescriptive modeling.
Observational Knowledge
Satisfying the causal assumption with observational information is way more troublesome, dangerous and controversial than with experimental information. The randomization that may be a key half in creating experimental information is highly effective as a result of it removes the issues brought on by all confounding variables – recognized and unknown, noticed and unobserved. With observational information, we don’t have entry to this extraordinarily helpful energy.
Theoretically, if we are able to accurately management for all confounding variables, we are able to nonetheless make causal claims with observational information. Whereas some might disagree with this assertion, it’s extensively accepted in precept. The true problem lies within the utility.
To accurately management for a confounding variable, we have to (1) have high-quality information for the variable and (2) accurately mannequin the connection between the confounder and our goal variable. Doing this for every recognized confounder is troublesome, but it surely isn’t the worst half. The worst half is that you would be able to by no means know with certainty that you’ve got accounted for all confounders. Even with robust area information, the likelihood that there’s an unknown confounder “on the market” stays. The most effective we are able to do is embrace each confounder we are able to consider after which depend on what known as the ‘no unmeasured confounder’ assumption to estimate causal relationships.
Modeling with observational information can nonetheless add lots of worth in prescriptive analytics, although we are able to by no means know with certainty that we accounted for all confounding variables. With observational information, I consider the causal assumption as being met in levels as an alternative of in a binary vogue. As we account for extra confounders, we seize the causal impact higher and higher. Even when we miss a couple of confounders, the mannequin should still add worth. So long as the confounders don’t have too massive of an affect on the estimated causal relationships, we might be able to add extra worth making selections with a barely biased causal mannequin than utilizing the method we had earlier than we used prescriptive modeling (e.g., guidelines or intuition-based selections).
Having a practical mindset with observational information could be vital since (1) observational information is cheaper and way more widespread than experimental information and (2) if we depend on hermetic causal conclusions (which we are able to’t get with observational information), we could also be leaving worth on the desk by ruling out causal fashions which might be ‘adequate’, although not good. You and your corporation companions must resolve the extent of leniency to have with assembly the causal assumption, a mannequin constructed on observational information may nonetheless add main worth!
Wrapping it up
Whereas prescriptive analytics is highly effective and has the potential so as to add lots of worth, it depends on causal assumptions whereas descriptive and predictive analytics don’t. It is very important perceive and to fulfill the causal assumption in addition to potential.
Experimental information is the gold normal of estimating causal relationships. A mannequin constructed on good experimental information is in a powerful place to fulfill the causal assumptions required by prescriptive modeling.
Establishing causal relationships with observational information could be harder due to the potential of unknown or unobserved confounding variables. We should always stability rigor and pragmatism when utilizing observational information for prescriptive modeling – rigor to consider and try to regulate for each confounder potential and pragmatism to grasp that whereas the causal results is probably not completely captured, the mannequin might add extra worth than the present decision-making course of.
I hope that this text has helped you achieve a greater understanding of why prescriptive modeling depends on causal assumptions and how one can deal with assembly these assumptions. Completely satisfied modeling!
- Neuman, S. B. (2017). Principled Adversaries: Literacy Analysis for Political Motion. Academics School Document, 119(6), 1–32.