An economist studying the returns to education runs a wage regression and finds that an extra year of schooling raises hourly earnings by about three dollars. The number is precise, significant, and easy to report. But it hides a question the regression as written cannot answer: does that extra year pay the same for everyone? Perhaps the return is larger for workers in skilled occupations and smaller for those in routine jobs, or larger early in a career and flatter later on. A single coefficient forces the model to assume that the effect of education is identical across every group and every context, when the entire interest of the question may lie in how that effect changes. Interaction terms regression is the tool that relaxes this assumption, allowing the effect of one variable to depend on the value of another.
The idea is one of the most useful extensions of ordinary regression, and also one of the most misread. An interaction term is simple to add, just the product of two variables, but the coefficients it produces do not mean what they appear to mean if read with the habits of a standard regression. The main effects change interpretation, the constant shifts in role, and a coefficient that looks insignificant may be carrying a real effect that the model has relocated elsewhere. Getting the mechanics right is the difference between a model that reveals how context shapes an effect and one that quietly misleads.
What a Model Without Interaction Assumes
Start with a standard specification. Suppose wages depend on years of education and on whether a worker is in a skilled occupation, entered as a dummy variable equal to one for skilled and zero otherwise.
Additive Model, No Interaction
This model is more flexible than it first appears, but only in one direction. It allows skilled and unskilled workers to earn different wages, through \(\beta_2\), but it forces the slope on education to be the same for both. Graphically, it produces two parallel lines: one for skilled workers and one for unskilled workers, separated by a constant vertical distance of \(\beta_2\), both rising with the same slope \(\beta_1\). The model can say that one group earns more, but it cannot say that education pays off differently for one group than the other. That restriction is an assumption, not a finding, and it is imposed silently the moment the interaction is left out.
This is the same logic that runs through any multiple regression model. Each coefficient measures the effect of its variable holding the others fixed, and in the additive form those effects do not bend or combine. The model adds influences together. An interaction term is what lets them multiply.
Adding the Interaction Term
To let the return to education differ by occupation, include the product of education and the skilled dummy as an additional regressor.
Model With Interaction
The cleanest way to understand what each coefficient now measures is to write out the model separately for each group. For unskilled workers the dummy is zero, so the interaction term vanishes and the equation collapses to a simple line. For skilled workers the dummy is one, so both the level term and the interaction term switch on.
Group-Specific Lines
Now the interpretation is exact. For unskilled workers the return to a year of education is \(\beta_1\). For skilled workers it is \(\beta_1 + \beta_3\). The interaction coefficient \(\beta_3\) is the difference between the two returns, the amount by which the education slope steepens or flattens when a worker is skilled. If \(\beta_3\) is positive, education pays more for skilled workers. If it is negative, it pays less. If it is zero, the two slopes are equal and the interaction was unnecessary.
Reading the Coefficients Correctly
The most common mistake with interaction terms is to read the main-effect coefficients the way they would be read in an additive model. Once an interaction is present, the coefficient on education, \(\beta_1\), is no longer the average return to education across everyone. It is the return to education specifically when the interacting variable equals zero, that is, for unskilled workers only. Likewise \(\beta_2\) is no longer a constant wage gap; it is the gap between skilled and unskilled workers specifically when education equals zero, which may be far outside the range of the data and therefore close to meaningless on its own.
| Coefficient | Naive reading (wrong) | Correct reading with interaction |
|---|---|---|
| \(\beta_0\) (constant) | Baseline wage | Wage when educ = 0 and skilled = 0 |
| \(\beta_1\) (education) | Average return to education | Return to education for unskilled workers only |
| \(\beta_2\) (skilled) | Wage gap between groups | Skilled-unskilled gap when educ = 0 |
| \(\beta_3\) (interaction) | Often ignored | Difference in the education slope between groups |
| Skilled return | Not recoverable from one coefficient | \(\beta_1 + \beta_3\), summed from two terms |
The practical consequence is that the significance of an interaction model cannot be judged from the main effects alone. A coefficient on education might appear small or insignificant simply because the return for the baseline group happens to be modest, while the return for the other group, captured by \(\beta_1 + \beta_3\), is large and important. Conversely, a researcher who drops the interaction because \(\beta_3\) looks small should test it formally rather than eyeball it, since the interaction’s contribution is about the difference in slopes, not about either slope on its own.
Warning. Never drop the main effect of a variable while keeping its interaction term. A model with the interaction but without the corresponding main effect forces the two group lines to meet at education zero, imposing a constraint the data did not justify and distorting every coefficient in the model. If the interaction is in, both main effects belong in too.
A Worked Example: Two Returns From One Regression
Consider a stylized example with internally consistent numbers. The regression of hourly wage on education, the skilled dummy, and their interaction produces the estimates below. Reading them correctly turns four coefficients into two complete wage equations.
| Term | Coefficient | Std. error | Interpretation |
|---|---|---|---|
| Constant \(\beta_0\) | 7.50 | (0.80) | Wage at zero education, unskilled |
| Education \(\beta_1\) | 2.10*** | (0.18) | Return per year, unskilled workers |
| Skilled \(\beta_2\) | -3.20 | (2.40) | Gap at zero education (outside data range) |
| Education × Skilled \(\beta_3\) | 1.40*** | (0.22) | Extra return per year for skilled workers |
| Skilled return \(\beta_1 + \beta_3\) | 3.50 | — | Return per year, skilled workers |
The story is now far richer than a single return to education. For unskilled workers each year of schooling adds 2.10 dollars to the hourly wage. For skilled workers each year adds 3.50 dollars, the baseline return plus the 1.40 interaction premium. The interaction coefficient is the one that carries the headline finding: education pays meaningfully more in skilled occupations, and the gap widens with every additional year of schooling. Note also the coefficient on the skilled dummy, which is negative and statistically insignificant. Read naively, that would suggest skilled workers earn less, which is absurd. Read correctly, it is the projected gap at zero years of education, an extrapolation far below any real worker in the sample, and it should not be interpreted in isolation at all.
Interactions Between Two Continuous Variables
The education-by-occupation case interacts a continuous variable with a binary one, which is the easiest to visualize because it produces two clean lines. Interactions between two continuous variables work the same way algebraically but require a different reading. Suppose wage depends on education and experience, with their product included.
Two Continuous Variables
Here there are no longer two fixed groups, so the effect of education is best expressed as a derivative. Taking the partial derivative of wage with respect to education gives \(\beta_1 + \beta_3 \, \text{exper}\), which means the return to education is not a single number but a function of how much experience a worker has. This connects interactions directly to the logic of marginal analysis: the interaction term makes the marginal effect itself a moving quantity rather than a constant. To report such a model meaningfully, economists evaluate the marginal effect at representative values of the conditioning variable, for example the return to education at low, median, and high levels of experience, rather than quoting a single coefficient that no longer summarizes the relationship on its own.
When an Interaction Earns Its Place
Interactions are powerful, but they are not free. Each one adds a parameter, consumes degrees of freedom, and complicates interpretation, and a model stuffed with interactions can fit the sample beautifully while generalizing poorly. The decision to include an interaction should be driven by a substantive reason to expect that an effect depends on context, not by a search for any product term that happens to be significant. Testing many interactions and keeping only the ones that clear a threshold is a route to false discoveries, the same hazard that arises whenever a specification is chosen by fishing through the data.
There is also a connection to specification error worth keeping in view. Omitting a genuine interaction is a form of misspecification: the model imposes a constant effect where the true effect varies, and the estimated single slope becomes a blurred average of two different relationships. This is related to the broader problem of leaving out variables that belong in the model, discussed in the treatment of omitted variable bias, although an omitted interaction biases the description of an effect’s structure rather than necessarily biasing the average effect itself. The discipline is the same in both cases: the form of the model should match the economic question, and a constant-effect assumption should be a tested choice rather than an unexamined default.
Caveat. A significant interaction describes how an effect varies; it does not by itself establish that the variation is causal. If the conditioning variable is correlated with unobserved factors, the interaction can reflect those factors rather than a genuine change in the underlying effect. Interactions sharpen description, but they inherit whatever identification problems the underlying regression already has.
Where Interaction Terms Sit in the Toolkit
Interaction terms are the standard way to move a regression from “what is the effect of X” to “for whom, and under what conditions, is the effect of X larger or smaller.” That shift matters across applied economics, from the returns to education differing by background, to the effect of a price change differing by income, to the impact of a policy differing across regions. The mechanics rest on the foundations of fitting and interpreting a line, covered in simple linear regression, extended to several variables in multiple regression.
The same product-term logic carries into nonlinear models, where it must be handled with extra care. In a logit or probit model, the coefficient on an interaction term does not even equal the interaction effect on the predicted probability, because the nonlinear link function makes the marginal effects depend on all the other variables at once. The broad lesson is consistent across these settings: an interaction is not a decorative addition to a regression but a statement that an effect is conditional, and once that statement is in the model, every coefficient has to be read in light of it.
Explains
Three ideas that make interaction terms click
Build your regression foundations one specification at a time.
Explore the MASEconomics BlogConclusion
The value of interaction terms regression is that it replaces a single, context-free effect with one that can bend according to circumstance. By adding the product of two variables, the model allows the slope on one variable to depend on the value of another, turning a pair of parallel lines into lines that diverge or converge, and turning a constant marginal effect into a function of the conditioning variable. The wage example makes the payoff concrete: one regression delivers a return to education of 2.10 dollars for unskilled workers and 3.50 dollars for skilled workers, a distinction that a model without the interaction would have erased into a single blended number.
The cost of this flexibility is interpretive discipline. With an interaction present, a main-effect coefficient measures the effect of its variable only when the interacting variable is zero, the constant describes a point that may lie outside the data, and the effect of interest must often be assembled by summing coefficients or evaluating a derivative at chosen values. Both main effects must stay in the model whenever their interaction does, and an interaction should be included because theory expects an effect to vary, not because a product term happened to be significant. Handled with that care, interaction terms are among the most direct ways a regression can answer the question that single coefficients cannot: not just how large an effect is, but how it depends on context.
Frequently Asked Questions
What is an interaction term in regression?
An interaction term is the product of two explanatory variables added to a regression so that the effect of one variable is allowed to depend on the value of the other. Instead of forcing a single, constant effect, the model lets the slope on one variable change as the other variable changes. In a wage regression, an education-by-occupation interaction lets the return to education differ between skilled and unskilled workers.
How do you interpret an interaction coefficient?
The interaction coefficient measures the difference in the slope of one variable across values of the other. For a continuous-by-binary interaction, it is how much steeper or flatter the slope becomes for the group coded one. The full effect for that group is the main-effect coefficient plus the interaction coefficient. A positive interaction means the effect is larger in that group; a negative one means it is smaller.
Why does the main effect change meaning when an interaction is added?
Because the interaction term redistributes the relationship. With an interaction present, the coefficient on a variable is its effect specifically when the interacting variable equals zero, not its average effect across the whole sample. The constant likewise describes the predicted outcome when all interacting variables are zero, which can be a point far outside the observed data and therefore not meaningful on its own.
Should you keep the main effect if you include its interaction?
Yes. Both main effects should remain in the model whenever their interaction is included. Dropping a main effect while keeping the interaction forces the group lines to meet at zero of the interacting variable, imposing a restriction the data did not support and distorting the other coefficients. Removing a main effect is only appropriate when there is a strong theoretical reason to constrain the relationship that way, which is rare.
When should you include an interaction term?
Include an interaction when there is a substantive reason to expect that an effect depends on context, for example when theory or prior evidence suggests a policy works differently across groups or a price responds differently across income levels. Interactions should not be added by searching for any product term that turns out significant, since testing many interactions and keeping the lucky ones produces false discoveries. The reason to include one should come before the regression, not after it.
Thanks for reading! If you found this helpful, share it with friends and spread the knowledge. Happy learning with MASEconomics