Difference in Differences and Beyond

$20.52
SKU
9781635190267
+ Wish
[Free shipping over $100]

Standard Shipping estimated by Fri 05/31 - Thu 06/6 (주문일로부 10-14 영업일)

Express Shipping estimated by Tue 05/28 - Thu 05/30 (주문일로부 7-9 영업일)

* 안내되는 배송 완료 예상일은 유통사/배송사의 상황에 따라 예고 없이 변동될 수 있습니다.
Publication Date 2021/01/31
Pages/Weight/Size 188*257*20mm
ISBN 9781635190267
Categories 경제 경영 > 경제
Description
Preface

Difference in differences (DD) is one of the most popular approaches in economics and other disciplines of social science. An article (November 26, 2016) in The Economi , entitled “Economists are prone to Fads, and the latest is machine learning", analyzed the most frequently used techniques in economics. The analysis was based on key words in the abstracts of NBER working papers, and the most popular methods turned out to be DD, followed by regression discontinuity (RD), laboratory experiment, dynamic stochastic general equilibrium, randomized control trial, and machine-learning/big-data. According to the article, DD has been at the top since 2012, and its popularity has been increasing ever since, unlike some other methods such as dynamic stochastic general equilibrium whose popularity has been declining. Not just in social science, DD has been gaining popularity also in natural science disciplines, as can be seen in Jena et al. (2015), Cataife and Pagano (2017), Uber et al. (2018) and McGrath et al. (2019),
among many others.
There are various existing references for DD: Angrist and Krueger (1999), Shadish et al. (2002), Lee (2005, 2016a), Angrist and Pischke (2009), Morgan and Winship (2014), among many others. Most references are, however, either a little too long or too old. This short book introduces DD to readers equipped with basic graduate-level econometric knowledge and some exposure to panel data, and then examines recent advances in DD from a personal perspective. This book is an extended version of Lee and Sawada (2020), which in turn draws on Lee (2005, 2016a).
More specifically, first, details on DD identification and estimation using panel data and repeated cross-sections are provided for various DD cases such as constant/varying treatment effect and constant/varying treatment timing. Following these basics, topics such as ‘DD in reverse’, ‘fuzzy DD’, ‘synthetic control’, and ‘triple and generalized dif- ferences’ are examined. Throughout this book, many empirical examples appear, and long examples carry an explicit heading whereas short ones are buried in various parts without heading. There are parts with * attached, which are optional. It would be a good idea to skip those at the first reading because they are relatively more involved, and read them later when better motivated to learn DD. For readers in need of a quick re-fresher on treatment effect analysis, the appendix explains the basics and reviews various treatment effect estimators in cross-section context.
Difference in differences is often abbreviated as ‘DiD’ in the literature, but we use instead the simpler notation ‘DD’. Also, ‘difference in differences in differences’ is often abbreviated as ‘DiDiD’, but we use again a simpler notation ‘triple differences (TD)’. TD is a generalization of DD because it is a difference of two DD’s, where the difference can be taken between two groups (‘cross-section group-wise TD’) or between two time periods (‘time-wise TD’). There are other ways of generalizing DD, all of which may be called ‘generalized DD’.
One topic in DD that is not addressed in this book is the ‘DD inference issues’ involving ‘clustering/grouping’ that observations are related to one another by sharing the subject/individual index i (i.e., belonging to the same subject/individual), the time index t (i.e., belonging to the same period), or something else such as age or residential area. A treatment varying only at an aggregate level, not at the individual level, raises yet another DD inference problem. We do not address these inferential problems to keep this book’s technical difficulty at a reasonable level. Interested readers may refer to Bertrand et al. (2004), Lee (2016a), Brewer et al. (2018) and references therein, where the main message for DD inference seems to be “use at least panel generalized least squares estimator with a clustered variance estimator to account for serial correlations (and others)."
It is not too far-fetched to say that the main use of DD is in policy analysis, where finding effects of a policy/program/treatment on a response/outcome variable is the goal; e.g., effects of classroom size on test scores, and effects of minimum wage on employment. Although machine learning and big data analysis are gaining popularity these days, they are essentially for prediction and association, whereas policy analysis is for causal effects. The examples below illustrate well the difference between prediction and causal analysis, which gives a good reason to study causal analysis in the era of big data, and DD will always have its place in policy analysis, if nowhere else.
Suppose we use big data on individual physical attributes such as height, weight, waist-to-hip ratio, body symmetry, etc. to predict income, using ordinary least squares estimator (OLS) or one of the sophisticated machine-learning techniques; this kind of study has been done in ‘beauty economics’. The OLS result gives a prediction equation for income using those physical attributes. This is, however, not a causal analysis, because one may alter his/her body to maximize income according to the OLS result, but that would not make him/her rich. A big belly may be an outcome, not a cause, of having much money, and artificially making a belly big would not make the person rich. High levels of HDL cholesterol are associated with low risks of cardiovascular dis- eases. However, randomized studies (e.g., Schwarz et al. 2012) showed that increasing only HDL with a medication does not change cardiovascular disease occurrences. This means that the association is not causal. Probably there is something else changing
along with HDL cholesterol level, which affects cardiovascular disease occurrences.
The best prediction or association equation obtained using machine learning and big data should not be mistaken as something that reveals causal effects. Differently from prediction or association, if we find the causal effect of waist-to-hip ratio on marriage to be negative, then by reducing the waist-to-hip ratio, one can increase the probability of marriage. This is what ‘structural-form’ causal analysis can do, but ‘reduced-form’ prediction analysis cannot. Econometricians and statisticians may lose many jobs to artificial intelligence and machine learning in the future, but causal analysis will remain “human, not humanoid," subjects.
This book and its source, Lee and Sawada (2020), grew out of the lectures that I have been giving in many institutes. I am grateful to the lecture audiences for their feedback at Asian Development Bank, Australian National University, Korea Development Institute, Korea Institute of Public Finance, Seoul National University, and University of Luxem- bourg, which led to greatly improving the book manuscript. Also, Wonjun Choi, Hyerim Kim, Goeun Lee and Sanghee Mun proofread the book manuscript and provided help- ful comments. This research has been supported by a grant (NRF-2014S1A5B1014360) from the National Research Foundation of Korea.
Finally, a “warning" on using the index is warranted. Constructing an index is never a straightforward business. For example, ‘effect on the treated’ can make a single entry of its own, or may become a sub-entry ‘on the treated’ below the primary entry ‘effect’. Going further, ‘conditional effect on the treated’ may make a single entry of its own, or may become a sub-sub-entry below the sub-entry ‘the treated’ and the primary entry ‘effect’; it can also be a sub-entry ‘conditional’ below the primary entry ‘effect on the treated’. It would be nice to be coherent on this matter throughout the entire book, but this is not easy, as things look different at different times at different locations. So, when looking up some topic in the index, the advice is try different options as illustrated just now.
Contents
Chapter 1: Introduction. . .1

1. Basics and Three Examples. . .1
1.1. Example 1: Plastic Surgery on Beauty. . .2
1.1. Example 2: Three Strike Law on Crime Rate. . .3
1.1. Example 3: Influx of Cheap Labor on Unemployment. . .4
1. Notation. . .5
1.1. Time Periods and Treatment. . .5
1.1. Potential Responses, Covariates and Others. . .6
1. From BA to DD and to TD. . .7
1.1. Before-After (BA). . .7
1.1. Difference in Differences (DD). . .9
1.1. Triple Differences (TD). . .10
1.1. Explicitly Controlling Covariates*. . .12

Chapter 2: DD with Panel Data. . .15

1. Identification with Panel Data. . .15
1.1. Identification Basics. . .15
1.1. Remarks on Identification*. . .16
1. Graphical Demonstrations. . .18
1.1. Constant Effect and Constant Timing. . .18
1.1. Varying Effect and Constant Timing. . .20
1. Estimation with Panel Linear Models. . .22
1.1. Two Waves Only. . .22
1.1. Empirical Example: Highway Development Program. . .23
1.1. More than Two Waves. . .25
1.1.1 Level and Differenced Models. . .25
1.1.1 Time-Wise DD. . .27
1.1. Empirical Example: Tayo Bus Story. . .28
1.1.1 Tayo Bus Policy, Parallel Trends and Main Findings. . .29
1.1.1 Panel OLS for Tayo Bus Ridership*. . .33
1.1.1 Empirical Results for Tayo Bus Effects*. . .34
1.1. Treatment Endogeneity Allowed in DD*. . .36
1. Generalizations of Panel Linear Models. . .37
1.1. Duration-Dependent Effect and One-Shot Treatment. . .38
1.1. Treatment Dose/Intensity. . .39
1. DD versus Lagged Response Regressor. . .40
1.1. Neither is More General. . .41
1.1. Lagged Response and Qualification Dummy Relation. . .42
1.1. Common Factor and False Mean Reversion. . .44
1. No Obvious Qualification Variable*. . .46
1.1. Artificial Grouping and Time-Varying Effect. . .46
1.1. Estimation Details. . .47
1.1. DD to Remove Time and Individual Effects. . .49

Chapter 3: DD with Repeated Cross-Sections. . .53

1. Random Sampling Every Period. . .53
1. Identification with RCS. . .54
1.1. Covariates Implicit. . .55
1.1. Covariates Explicit*. . .57
1.1. Identification Viewed from Linear Model*. . .59
1. Estimation with RCS. . .61
1.1. Single Equation Derivation with RCS. . .62
1.1.1 Constant Effect and Constant Timing. . .62
1.1.1 Duration-Varying Effect and Varying Timing*. . .63
1.1. Empirical Examples: Health-Book and EITC. . .65
1.1. Remarks. . .66
1.1. Empirical Example: Ability Mixing. . .67
1.1.1 Background and Data. . .67
1.1.1 Empirical Results for Ability Mixing. . .69
1.1.1 Causality Problem in Quantile Regression. . .70
1. DD with Cross-Section Data. . .72
1.1. Interaction Effects in Cross-Section Data. . .72
1.1. Double-Score Regression Discontinuity (RD)*. . .74
1.1.1 Partial Effects. . .74
1.1.1 Four Potential Responses. . .75
1. DD with Limited Dependent Variables (LDV). . .77
1.1. Effect for Latent Variable and Effect for LDV. . .77
1.1. Binary and Count Responses. . .78
1.1.1 DD with Probit. . .78
1.1.1 Ratio in Ratios (RR) for Counts. . .79
1.1.1 Odds-Ratio in Odds-Ratios (RRR) for Logit*. . .80
1.1. Empirical Example: Screen Door to Prevent Suicide. . .81
1. Fuzzy DD. . .82

Chapter 4: Topics for DD. . .85

1. DD in Reverse (DDR). . .85
1.1. DDR Basics. . .85
1.1. DDR Identification. . .86
1.1. DDR Estimation with OLS. . .89
1.1. Empirical Example: Work Hour Reduction Law. . .90
1. Synthetic Control. . .93
1.1. Main Idea. . .93
1.1. Inference with Permutation Test. . .94
1.1. Applications and Remarks. . .96
1. Triple Differences (TD). . .98
1.1. TD Basics. . .98
1.1. TD with Panel Data. . .100
1.1.1 TD Identification with Panel Data. . .100
1.1.1 TD Estimation with Panel Linear Model. . .102
1.1.1 Empirical Example: Tax-inclusive Price on Demand. . .103
1.1. TD with RCS. . .105
1.1.1 TD Identification with RCS. . .105
1.1.1 TD Estimation with RCS. . .106
1.1.1 Empirical Example: Mandated Benefit on Wage. . .108
1. Generalized Difference in Differences (GDD). . .110
1.1. Motivation for GDD and Beyond. . .111
1.1. OLS for DD, GDD and QD. . .113
1.1. Empirical Example: Sulfa Drug on Mortality. . .113
1.1.1 Sulfa Drug Background. . .113
1.1.1 Empirical Results for Sulfa Drug Data. . .115
1. Panel Stayer DD for Time-Varying Qualification. . .117
1.1. Motivations. . .117
1.1. Effect on In-Stayer. . .119
1.1. Identification and Estimation with Panel Linear Model. . .120
1.1.1 Untreated Moving Effect versus Treatment Effects. . .120
1.1.1 Ashenfelter Dip and Path-Dependent Moving Effect*. . .121
1.1. Empirical Example: Pension Effect on Expenditure. . .122

Appendix for Treatment Effect Basics and Chapters. . .125

1. Reduced Form (RF) and Propensity Score (PS). . .125
1. Treatment Effect Basics and Confounding. . .127
1.1. Overt Bias and Hidden Bias. . .127
1.1. Overt and Hidden Biases in Linear Models. . .128
1.1. Instrumental Residual Estimator*. . .129
1. Various Effects and Simpson’s Paradox. . .133
1.1. Effect on Treated. . .133
1.1. Effect on Untreated and Effect on Population. . .134
1.1. Simpson’s Paradox. . .135
1. Covariates to Control. . .137
1. Treatment Effect Estimators. . .140
1.1. Propensity Score Matching (PSM) Estimators. . .141
1.1. Regression Adjustment/Imputation Estimators. . .144
1.1. Propensity Score Residual Estimator (PSR). . .146
1.1. Inverse Probability Weighting Estimators (IPW). . .148
1. Work-Hour Reduction Law and Randomization. . .149
1. Endogeneity, IVE and Complier Effect. . .152
1.1. Linear Projection for Non-Causal Model*. . .152
1.1. Causal Model, Endogeneity, IVE and Wald Estimator. . .154
1.1. Wald Estimator for Effect on Complier*. . .156
1.1. Coherence Checks for Confounder Detection. . .158
1. Appendix for Chapter 2. . .161
1.1. Constant Effect and Varying Timing Simulation. . .161
1.1. Varying Effect and Varying Timing Simulation*. . .163
1.1. Details on Panel OLS for Differenced Model*. . .164
1.1. Trend, Seasonality and Effect Estimation with Panel Data. . .167
1. Appendix for Chapter 3. . .170
1.1. Matching Estimation for Marginal Effect with RCS*. . .170
1.1. Trend, Seasonality and Effect Estimation with RCS. . .172
1.1. Two versus Four Potential Responses*. . .175
1. Appendix for Chapter 4. . .176
1.1. TD Estimation with Differenced Panel Linear Model*. . .176
1.1. TD Identification with RCS Linear Model. . .177

References. . .181
Author
이명재
Professor Myoung-jae Lee is an econometrician and statistician at Korea University. He received his Ph.D. in economics from University of Wisconsin-Madison in 1989. Since then, he held regular positions in various universities around the world, including Pennsylvania State University, Tilburg University, Singapore Management University, Chinese University of Hong Kong, and Australian National University. He published more than 80 papers on economics, statistics, political science, sociology, transportation research, and medical science. His papers appeared in many top-rated journals such as Econometrica, Journal of the Royal Statistical Society (Series B), Biometrika, Transportation Research (Part B), Political Analysis, and Sociological Methods & Research. Myoung-jae Lee also published five single-authored books from Springer, Academic Press and Oxford University Press, including Micro-econometrics for policy, program, and treatment effects (2005), Micro-Econometrics (2010), and Matching, regression discontinuity, difference in differences, and beyond (2016).
Professor Myoung-jae Lee is an econometrician and statistician at Korea University. He received his Ph.D. in economics from University of Wisconsin-Madison in 1989. Since then, he held regular positions in various universities around the world, including Pennsylvania State University, Tilburg University, Singapore Management University, Chinese University of Hong Kong, and Australian National University. He published more than 80 papers on economics, statistics, political science, sociology, transportation research, and medical science. His papers appeared in many top-rated journals such as Econometrica, Journal of the Royal Statistical Society (Series B), Biometrika, Transportation Research (Part B), Political Analysis, and Sociological Methods & Research. Myoung-jae Lee also published five single-authored books from Springer, Academic Press and Oxford University Press, including Micro-econometrics for policy, program, and treatment effects (2005), Micro-Econometrics (2010), and Matching, regression discontinuity, difference in differences, and beyond (2016).