In an earlier post, I described the challenges of estimating direct and indirect effects of a treatment on an outcome. In this post, I’ll consider some ways forward despite these difficulties. I believe that the trick is to slightly change the questions that we are asking. That is, I think that we should distinguish between mediation as a measurement problem (where most of the current work is focused) and mediation as a theoretical problem. I’ll argue that although we cannot precisely estimate direct and indirect effects, we can still collect evidence that supports a proposed causal chain.
Researchers frequently consider a system where a treatment (T) leads to an outcome (Y) in part through a mediator (M). Thus, the mediator can be consider a mechanism through which the treatment influences the outcomes.
However, in this post, I’ll consider the more complicated, and more realistic, system where a treatment (T) leads to an outcome (Y) in part through a mediator (M) and possibly through other mediators (M’) and (M”).
I think that considering this more complex system with multiple possible mediators is critical because I cannot think of an interesting situation in the social science where we can be sure that there is only one possible mediator; our knowledge of social systems is just not that developed. Even if M’ and M” are not in fact mediators, the possibility that they might be mediators will open up any analysis about just the proposed mediator (M) to critique. Throughout this post, we will see the possibility of multiple mediators really makes things complicated and difficult. And, by the way, I’ve implicitly assumed here the mediators are independent. When the mediators affect each other, things get even more hard.
When trying establish a causal chain about one possible mediator, Spencer et al argue that the design researchers should use depends on two factors: 1) whether it is easy or hard to measure the mediator and 2) whether it is easy or hard to manipulate the mediator. These two dimensions result in the following typology:
Most standard lab experiments use a “measurement-of-mediation” design because it is hard to experimentally manipulate the mediator, but it is possible to measure the proposed mediator (as well as other possible mediators) by asking survey questions to participants. On the other hand, field experiments and web experiments usually fall into the difficult region where both manipulation and measurement of the mediator is hard. Let’s go through the table in greater detail.
- When manipulation and measurement of mediator is easy: Experimental-causal-chain design
In an experimental-causal-chain design, researchers actually perform two experiments: one to demonstrate the effect of the treatment on the mediator (T -> M) and a second to demonstrate the effect of the mediator on the outcome (M -> Y). An example of causal chain designs comes from the work of Dov Eden and colleagues who have studies self-fulfilling prophecies on the performance of trainees of the Israel Defense Force (IDF). In particular, Eden was interested in the relationship between leader expectations and subordinate performance, and he believed that subordinate self-efficacy is a key mediator. That is, Eden et al. believed the model: leader expectation -> subordinate self-efficacy -> subordinate performance. Support for this model came from two different experiments. In one experiment, Davidson and Eden demonstrated that artificially increasing leader expectations resulted in increased subordinate self-efficacy. Further, in another experiment, Eden and Zuck demonstrated that artificially increasing trainee self-efficacy resulted in improved performance. Thus, by showing that leader expectation -> subordinate self-efficacy and subordinate self-efficacy -> subordinate performance, these two experiments collectively provide support for the overall model.
While obviously appealing, there are a few important limitations of experimental-causal-chain designs. First, there could be many different mediators, and the proposed two experiments can shed no light on this possibility. Second, it is difficult to design a treatment that manipulates the mediator without manipulating other related factors that might also influence the outcome. For example, imagine if Eden and Zuck manipulated self-efficacy by telling the trainees that they were exceptionally smart. This manipulation might affect the trainees sense of self-efficacy but it might also increase their commitment to the organization, which could also impact performance. Thus, what might appear to be an effect of self-efficacy could possibly be an effect of related (and possibly unmeasured) factor. When there are multiple mediators, which there almost certainly are, then approaches which focus on a single mediator can be misleading.
Further, when there is heterogeneity among participants, counter-intuitive problems can arise. That is, it may be the case that the average treatment effect of leader expectation on subordinate self-efficacy is positive and the average treatment effect of subordinate self-efficacy on subordinate performance is also positive but the average treatment effect of leader expectation on subordinate performance is zero, or even negative. Failure to anticipate this possibility is quite common, and has been called the product fallacy of indirect effects. To understand how this fallacy works, imagine that half of the trainees come from good families and half come from broken homes. Further, for the trainees from good homes, imagine that leader expectation has no effect on self-efficacy (because they already have high self-efficacy), but that for these trainees other processes that lead to increased self-efficacy would lead to increased performance. On the other hand, for trainees from broken homes, leader expectation might have a positive effect on self-efficacy, but for these trainees, increased self-efficacy has no effect on performance (because they have other barriers to performance). If you were to average the effects for these two groups together, you would find leader expectation -> subordinate self-efficacy and subordinate self-efficacy -> subordinate performance, but you would not find that leader expectation -> subordinate performance. While this particular example may seem a bit contrived, Glynn argues that these problems can be quite common in social sciences because treatment effect heterogeneity is common.
Thus, even when the mediator can be measured and manipulated, which should be the easiest case, it still can be quite complex to demonstrate that a particular mechanism explains an observed effect. Before moving on to cases where the mediator is hard to measure or hard to manipulate (or both), it is worth pointing out that even performing half of the causal-chain designs can be helpful. These designs, called mechanism designs, are particularly useful in cases where researchers are interested in the effect of some policy (P) on some outcome (Y), but they are not able to randomly assign the policy, possibly due to cost constraints. Instead, mechanism designs run a related experiment to measure the effect of the mechanism underlying the policy (M) on the outcome (Y). For example, imagine if researchers believed that a policy of inducing grocery stores to locate in “food deserts” (P) would decrease obesity (Y) by making fresh fruit and vegetables widely available (M). Unfortunately, this policy would be difficult to test because it would be expensive to induce grocery stores to move into under-served neighborhoods. Thus, it would be hard to estimate P -> Y. However, imagine instead that researchers provided home delivery of fruits and vegetables to residents in food deserts and then measured the effect of these deliveries on obesity. Thus, rather than trying to estimate P -> Y, researchers can try to estimate M -> Y. If these home deliveries did not decrease obesity, then this makes it seem less plausible that groceries stores could have that effect. This particular example demonstrates one of the main benefits of mechanism experiments: they can be an efficient way to rule out (or nearly rule out) policies that are not likely to be effective. Thus, sometimes there are reasons to pursue part of a causal chain design, even if one is only interested in the effect of the treatment on the outcome.
- When manipulation of mediator is easy and measurement of mediator is hard: Moderation-of-process designs
In some cases it is possible to manipulate the mediator, but not measure it (possibly because researchers have no post-treatment contact with participants or possibly because measuring the mediator might influence the behavior of participants). For example, Zanna and Cooper were interested in the role of aversive arousal as the process through which cognitive dissonance leads to attitude change. That is, they were interested in this following process: conflict between attitudes and behaviors -> aversive arousal -> attitude change. To develop evidence to support this causal chain, under the guise of conducting a memory study, the researchers had participants take a pill which they were lead to believe would either cause 1) tenseness, 2) relaxation, or 3) would have no effect on the their mood. After participants took the pill, they were assigned to write a counterattudinal essay, but some participants were assigned to a high-choice condition and some to a low-choice condition, which differed based on how much freedom the participants believed that they had about the topic of their essay. Previous work on cognitive dissonance had suggested that there is more attitude change in the high-choice condition than the low-choice condition. This 2 (high-choice vs low choice) by 3 (pill causes tension, pill causes relaxation, pill has no effect) design leads to three specific predictions:
- standard dissonance effect (ie., more attitude change under high-choice condition than low-choice condition) when the participants were told the pill had no effect.
- diminished dissonance effect when participants were told the pill would make them tense.
- enhanced dissonance effect when participants were told that the pill would make them relaxed.
The data from the experiment support each of these predictions. Together, then, the experiment shows that the relationship between attitude-behavior conflict and attitude change is influenced by arousal. That is, when you change arousal, you change the relationship between conflict between attitudes and behaviors and attitude change. However, this experiment also illustrates the potential difficulty of distinguishing mediators and moderators. One could argue that these results show that arousal in a moderator because the relationship attitude-behavior conflict and attitude change is determined by the level of some third variable: arousal.
Aside from these questions of interpretation, what makes this particular experiment unusual is that it was possible to manipulate the proposed mediator (arousal) and nothing else through the use of the placebo pill. In many settings, it is not possible to perform a manipulation that is so focused on just one mediator, and so moderation-of-process designs are not common.
- When manipulation of mediator is hard and measurement of mediator is easy: Measurement-of-mediation designs
The previous cases we have considered involved experimental manipulation of the mediator, but, unfortunately, such experimental manipulations can be quite difficult, if not impossible. Instead what is far more common is for researchers to randomly deliver a treatment (T) and then measure the outcome (Y) and the proposed mediators (M, M’, M”, etc). Given this measurement-of-mediation design, researchers can perform statistical mediation analysis in which they try to estimate the effect of the treatment (T) on the outcome (Y) that is partially explained by the mediator (M).
By far the most common approach for measurement-of-mediation analysis is the Baron and Kenny procedure, which breaks down into a simple recipe. Baron and Kenny write:
“To test for mediation, one should estimate the three following regression equations: first, regressing the mediator on the independent variable; second, regressing the dependent variable on the independent variable; and third, regressing the dependent variable on both the independent variable and on the mediator.”
In the language that I’ve been using, this means fitting the following regression equations:
Returning to Baron and Kenny:
“To establish mediation, the following conditions must hold: First, the independent variable must affect the mediator in the first equation; second, the independent variable must be shown to affect the dependent variable in the second equation; and third, the mediator must affect the dependent variable in the third equation. If these conditions all hold in the predicted direction, then the effect of the independent variable on the dependent variable must be less in the third equation than in the second. Perfect mediation holds if the independent variable has no effect when the mediator is controlled.”
In the language of these regression equations (and neglecting any issues of sampling variability), there is support for mediation if
It turns out that there are many problems with the automatic application of this recipe. Many of these problems stem from the fact that even though there was an experiment, estimates of the effect of the mediator (M) on the outcome (Y) are non-experimental because the mediator (M) was not manipulated. Thus, statistical mediation analysis can be thought of as an observational study within an experiment. And, as we know, making causal claims from observational studies is tricky. Of course, that is not to say that statistical mediation analysis is not helpful; we can learn a lot from observational studies. But, the formulaic application of statistical mediation analysis will not lead to strong evidence of mediation. There is a large literature on problems with and improvements to the Baron and Kenny procedure. Here are three articles that I think are particularly helpful:
Bullock et al. (2010) “Yes, but what is the mechanism? (Don’t expect an easy answer).” Journal of Personality and Social Psychology.
Fiedler et al. (2011) “What mediation analysis can (not) do.” Journal of Experimental Social Psychology.
Zhao et al. (2010) “Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis.” Journal of Consumer Research.
- When manipulation and measurement of mediator is hard: No design
The majority of this post has focused on situations where it is possible to either manipulate or measure the mediator (or both). Unfortunately, for many field and web experiments neither is possible. Typically, it is difficult to measure the mediators because this information usually comes from surveys, and we usually cannot get participants in field and web experiments to complete these surveys. For example, in Pager’s field experiment on discrimination in the low wage labor market it was not possible to possible to ask employers to fill out a survey about each candidate they considered. In field and web experiments, it is also difficult to manipulate the mediator and only the mediator. For example, in the earlier described experiment by Zanna and Cooper about cognitive dissonance, the researchers used a placebo pill with different labels to manipulate tension among their participants. Unfortunately, it is hard to see how something so precise could be employed in a field setting.
So, what can be done? I see two main options.
1) Combine field and lab experiments: Even though it is typically not possible to manipulate or measure the mediators in a field experiment, manipulation or measurement may be possible in a lab experiment. Therefore, a combination of lab and field experiments can be compelling. A wonderful example of this combination is the research by Correll et al on the motherhood penalty, which combined a field experiment about hiring and a lab experiment, which enabled a statistical mediation analysis. Each study individually had weaknesses: the field experiment offers no evidence of the mechanisms that lead to the motherhood penalty and the lab experiment might have problems with external validity. Together, however, the two experiments definitely increase our understanding of the motherhood penalty.
2) Implicit mediation analysis: Another possibility when researchers cannot manipulate or measure mediators is to measure the effects of many related treatments (T, T’, T”), an approach that Gerber and Green call implicit mediation analysis. This approach is particularly useful when the treatment can be broken down into many distinct components that can be tried individually or in combination. An excellent example of this approach comes from efforts to understand the mechanisms behind the success of conditional cash transfer programs. These programs offer cash transfers to poor families in exchange for meeting certain conditions, such as regular school attendance. Several evaluation studies have found that conditional cash transfers have positive effects on childrens’ outcomes. But, are the improved outcomes because of the cash or because of the conditions (e.g., the requirement that the children attend school)? In other words, what is the dominate mechanism that explains the success of conditional cash transfer programs? In order to partially address this question, Baird et al ran an experiment with three conditions: conditional cash transfers, unconditional cash transfers, and no intervention. They found that those who received conditional cash transfers had better schooling outcomes than those that received unconditional cash transfers. Thus, they found that the conditional aspect of the program was important to its success. Implicit mediation analysis seems most useful when the treatment can easily be reduced to its constituent outcomes, and it is not clear to me how many treatments have this characteristic.
To conclude, mediation analysis is important but difficult. In web experiments it will be very unlikely that researchers can precisely manipulate the mediator of interest, and it is unlikely that they will be able to measure the mediators of interest (which often need to be collected in a survey). Therefore, I think the most likely ways forward are to combine web experiments with lab experiments and implicit mediation analysis, in which multiple slightly different treatments are deployed. Neither approach precisely enables the estimation of direct and indirect effects, but carefully designed approaches can advance our understand of the pathways through which observed effects occur.