MatchIt/ 0000755 0001762 0000144 00000000000 12163141375 011606 5 ustar ligges users MatchIt/MD5 0000644 0001762 0000144 00000010344 12163141375 012120 0 ustar ligges users 1d6ec647adf3b150e99efc85bc3fcefe *DESCRIPTION dc204a70f2b67992241ee0e5408ebd23 *NAMESPACE bc8a6a1ff969b14f81e943ddf3e4cb83 *R/discard.R 105f86dfd4e75ca99cc84145f77724f4 *R/distance2GAM.R 7609f0509313c85eebe66fe1493f1389 *R/distance2glm.R 144da2b6b682f5c41a9482dd487b53b9 *R/distance2mahalanobis.R ce14dad890ce5071548cbde4d3b7aee4 *R/distance2nnet.R 931e8cd4c0ae35bd58e396468ea2b340 *R/distance2rpart.R 5f580ff3918496a1a445853fdc6b8fe0 *R/eqqplot.R ebea74fac3a3c58bebf5271051968aec *R/help.matchit.R 52e4ed5c0327fbcc2b668b3f1af53a84 *R/hist.pscore.R 3c36d8f77d3aec21fca511bd1eef4f16 *R/jitter.pscore.R 00a4040ed1dd52f712af0d0b30b52db5 *R/match.data.R 6cbef71c72b30938dd82993e06398ca3 *R/match.qoi.R 1ccd1a4b57991395229239054b9b9a3e *R/matchit.R 4b04e5ff6fc50b37773b4cb1aecbf2cf *R/matchit.qqplot.R cd4e84995fff9f7b96662135b79bf786 *R/matchit2cem.R 21a14b702e7d501cdf4161af256b2e7c *R/matchit2exact.R 07fd880f983711d3b2e4dd7be94cc73c *R/matchit2full.R 914addf1d6c24a82fbc36d838ddd012e *R/matchit2genetic.R aa55850e4a029db31a1c57328d5a785b *R/matchit2nearest.R faff750b898c0e464fd0bb0527fd2ac6 *R/matchit2optimal.R 8991d0cf380be259b3f76d8d12de7378 *R/matchit2subclass.R 0aea9be3857a1615224574d3bb070c17 *R/plot.matchit.R 42df7223665e64039685c04df289ec02 *R/plot.matchit.subclass.R 56f6332707c706c90031deba83c11997 *R/plot.summary.matchit.R a37cc8fc172ab18faabc3b51e279e6f7 *R/print.matchit.R f33bb649fdd5f26dd8612daa7b647cb9 *R/print.matchit.exact.R 62f401588a49e15e9d1896f480f86b26 *R/print.matchit.full.R ea652f0bcd9ec680649bc663642737ee *R/print.matchit.subclass.R 8bf365dfaf6270ea240a9814ded6ed08 *R/print.summary.matchit.R 74d977ab1c54520426a26f7de57eb7a3 *R/print.summary.matchit.exact.R 350d6d6bad592afa2e6b8daa53892e52 *R/print.summary.matchit.subclass.R 1dd4a4ccfab350865e0d5e218c156a8a *R/qqsum.R d61583cd5d64abb1c548dc7ca0ce6abf *R/summary.matchit.R abd44066b7992377acccebb825bb9fb2 *R/summary.matchit.exact.R b8eb4fe3ac4a5aa02a3f03c5fcf1cbc0 *R/summary.matchit.full.R 0e20487a85c99769f8b0e63368091c35 *R/summary.matchit.subclass.R 950b76a8b5b742dc4714ce93e730ff5f *R/user.prompt.R dc1fdc9ceaf5b35d0a69cc189db06dc2 *R/weights.matrix.R 5d82fa6c1ae051325128757a29e38c55 *R/weights.subclass.R a1a7a516af5ac0a8d1967cb29eb4acfe *data/lalonde.tab.gz 870d0ee00f3fd621950dc7cfe268317a *demo/00Index da045e6883d634901d49484ab0dfcb1d *demo/analysis.R cd3f75d34843736106323d8a41b601af *demo/cem.R fa999a9d842aba5b03f2c4e008c0bcea *demo/exact.R 37d86eff76f06c83cddea70da20e4795 *demo/full.R a6aa059e34c6673ff254a08726f2264d *demo/genetic.R 8985a35d629f63c965fc833677d8394a *demo/match.data.R c959726a30ddc8b1883717eedbc7c912 *demo/nearest.R 878dbcbd1c485d1b3e71a72f73960385 *demo/optimal.R e08c2401c4a03d9e5f37456be103572a *demo/subclass.R 3eda425777172d9aaa9d907061df3387 *inst/CITATION 6312a97d096669571c74667ef6cc078a *man/help.matchit.Rd 2406dec35f1350b6ddbe61a3e3a2e6f7 *man/lalonde.Rd d3b0d22dad0e64fcf5b84f3303f7bc7a *man/match.data.Rd 605af5ede3b2290c6233be639f85cb58 *man/matchit.Rd 4cc2194b9b06d3765105272f7618dff5 *man/user.prompt.Rd 74af268b63206f2e4d8f5a2fb2524294 *vignettes/Makefile c3c7f26c2e84a1eecfb3ec0c0b37159d *vignettes/asa.bst f87006e9491e7a4177175b7520a8910b *vignettes/balance.tex d816267de5c98f628c5f66e3ad65f1f1 *vignettes/dcolumn.sty a3707e9f4be0b8ab99cc8a31bf591a90 *vignettes/face_off.jpg ac28f0dbf93b6d593a6b3719497ac133 *vignettes/faq.tex 20775eb895fffe614893f4f1b2ae19e5 *vignettes/gk.bib a35da60f3f139a9a7cd6749353eb430f *vignettes/gkpubs.bib 1d64d9435ec019005bcd4768cd64d5bb *vignettes/graphics.R d66fa8f1c8db29c2d11842496bf29064 *vignettes/html.sty bb062f81aba5fc15b3dd7b8b95c6f950 *vignettes/index.shtml d3d0a275f38ef9a70cc29ea6891ab5f6 *vignettes/intro.tex fe44cc2288769a96718c4a3377af2e6a *vignettes/makematchH 27ebdf9a173b6210592f5ef7ff5518b9 *vignettes/matchit.pdf 4cd51e387b66c3183da2cdd8f1eb309c *vignettes/matchit.tex a627a20fb6aab7beddaedf6d317ab080 *vignettes/matchit2zelig.tex d1e3ba947a8ba3d434096bd4677e1eba *vignettes/matchitref.tex f50b58f9f778170f752e6924022993b2 *vignettes/mdataref.tex 9301a74af38f9e7259e75be6594541d3 *vignettes/notation.tex b54051c8d51f38cf128c09e50841081a *vignettes/overview.tex 77e2abd6da40f9154ee2683e4f280b41 *vignettes/plotref.tex 662649fb68f014608424bc21b49237f6 *vignettes/preprocess.tex f6e303070a59b15096f3bcda0e871258 *vignettes/summaryref.tex MatchIt/vignettes/ 0000755 0001762 0000144 00000000000 12163124614 013613 5 ustar ligges users MatchIt/vignettes/summaryref.tex 0000644 0001762 0000144 00000014221 12162551623 016532 0 ustar ligges users \section{\texttt{summary()}: Numerical Summaries of Balance} The \texttt{summary()} command returns numerical summaries of balance diagnostics. \subsubsection{Syntax} \begin{verbatim} summary(object, interactions = FALSE, addlvariables = NULL, standardize = FALSE, ...) \end{verbatim} \subsubsection{Arguments} \begin{itemize} \item \texttt{object}: the output from {\tt matchit()}. \item \texttt{interactions}: an option to calculate summary statistics in \texttt{sum.all} and \texttt{sum.matched} for all covariates, their squares, and two-way interactions when \texttt{interactions = TRUE} and only the covariates themselves when \texttt{interactions = FALSE}, (DEFAULT = {\tt FALSE}). \item \texttt{addlvariables}: additional variables on which to calculate the diagnostic statistics (in addition to the variables included in the matching procedure) (DEFAULT = {\tt NULL}). \texttt{addlvariables}: a data frame containing additional variables whose balance is examined. The data should come with the same number of units and units in the same order as in the data set used for {\tt matchit()}. \item \texttt{standardize}: a logical variable indicating whether to standardize balance measures, i.e., whether the difference in means should be divided by the standard deviation in the original treated group. (DEFAULT = {\tt FALSE}) \end{itemize} \subsubsection{Output Values} The output from the \texttt{summary()} command includes the following elements, when applicable: \begin{itemize} \item The original assignment model call. \item \texttt{sum.all}: a data frame that contains variable names and interactions down the row names, and summary statistics on \emph{all observations} in each of the columns. The columns in \texttt{sum.all} contain: %\footnote{The output for full matching is % slightly different from that described here; see Section % \ref{subsubsec:full} for details.} \begin{itemize} \item means of all covariates $X$ for treated and control units, where \texttt{Means Treated}$= \mu_{X|T=1} = \frac{1}{n_1} \sum_{T=1} X_i$ and \texttt{Means Control}$= \mu_{X|T=0} = \frac{1}{n_0} \sum_{T=0} X_i$, \item standard deviation in the control group for all covariates $X$, where applicable, $$\quad s_{x|T=0} = \sqrt{\frac{\sum_{i \in \{i: T_i=0\}} (X_i - \mu_{X|T=0})^2}{n_0-1} }.$$ \item balance statistics of the original data (before matching), which compare treated and control covariate distributions. If {\tt standardize = FALSE}, balance measures will be presented on the original scale. Specifically, mean differences (\texttt{Mean Diff.}) as well as the median, mean, and maximum value of differences in empirical quantile functions for each covariate will be given (\texttt{eQQ Med}, \texttt{eQQ Mean}, and \texttt{eQQ Max}, respectively). If {\tt standardize = TRUE}, the balance measures will be standardized. Standardized mean differences (\texttt{Std.\ Mean Diff.}), defined as $\frac{\mu_{X|T=1} - \mu_{X|T=0}}{s_{x|T=1}}$, as well as the median, mean, and maximum value of differences in empirical cumulative distribution functions for each covariate will be given (\texttt{eCDF Med}, \texttt{eCDF Mean}, and \texttt{eCDF Max}, respectively). \end{itemize} \item \texttt{sum.matched}: a data frame which contains variable names down the row names, and summary statistics on only the \emph{matched observations} in each of the columns. Specifically, the columns in \texttt{sum.matched} contain the following elements: %\footnote{The % values output for full matching are slightly different from that % described here; see Section \ref{subsubsec:full} for details}: \begin{itemize} \item weighted means for matched treatment units and matched control units of all covariates $X$ and their interactions, where \texttt{Means Treated}$= \mu_{wX|T=1} = \frac{1}{n_1} \sum_{T=1} w_iX_i$ and \texttt{Means Control}$=\mu_{wX|T=0} = \frac{1}{n_0} \sum_{T=0} w_iX_i$, \item weighted standard deviations in the matched control group for all covariates $X$, where applicable, where \texttt{SD} $= s_{wX} = \sqrt{\frac{1}{n} \sum_{i} (w_iX_i - \overline{X}^*)^2}$, where $\overline{X}^*$ is the weighted mean of $X$ in the matched control group, and \item balance statistics of the matched data (after matching), which compare treated and control covariate distributions. If {\tt standardize = FALSE}, balance measures will be presented on the original scale. Specifically, mean differences (\texttt{Mean Diff.}) as well as the median, mean, and maximum value of differences in empirical quantile functions for each covariate will be given (\texttt{eQQ Med}, \texttt{eQQ Mean}, and \texttt{eQQ Max}, respectively). If {\tt standardize = TRUE}, the balance measures will be standardized. Standardized mean differences (\texttt{Std.\ Mean Diff.}), defined as $\frac{\mu_{wX|T=1} - \mu_{wX|T=0}}{s_{x|T=1}}$, as well as the median, mean, and maximum value of differences in empirical cumulative distribution functions for each covariate will be given (\texttt{eCDF Med}, \texttt{eCDF Mean}, and \texttt{eCDF Max}, respectively). \end{itemize} where $w$ represents the vector of \texttt{weights}. \item \texttt{reduction}: the percent reduction in the difference in means achieved in each of the balance measures in \texttt{sum.all} and \texttt{sum.matched}, defined as $100(|a|-|b|)/|a|$, where $a$ was the value of the balance measure before matching and $b$ is the value of the balance measure after matching. \item \texttt{nn}: the sample sizes in the full and matched samples and the number of discarded units, by treatment and control. \item \texttt{q.table}: an array that contains the same information as \texttt{sum.matched} by subclass. \item \texttt{qn}: the sample sizes in the full and matched samples and the number of discarded units, by subclass and by treatment and control. \item \texttt{match.matrix}: the same object is contained in the output of {\tt matchit()}. \end{itemize} %%% Local Variables: %%% mode: latex %%% TeX-master: "matchit" %%% End: MatchIt/vignettes/preprocess.tex 0000644 0001762 0000144 00000025202 12162551623 016526 0 ustar ligges users \section{Preprocessing via Matching} \label{sec:matching} \subsection{Quick Overview} The main command \texttt{matchit()} implements the matching procedures. A general syntax is: \begin{verbatim} > m.out <- matchit(treat ~ x1 + x2, data = mydata) \end{verbatim} where {\tt treat} is the dichotomous treatment variable, and {\tt x1} and {\tt x2} are pre-treatment covariates, all of which are contained in the data frame {\tt mydata}. The dependent variable (or variables) may be included in \texttt{mydata} for convenience but is never used by \MatchIt\ or included in the formula. This command creates the \MatchIt\ object called \texttt{m.out}. Name the output object to see a quick summary of the results: \begin{verbatim} > m.out \end{verbatim} \subsection{Examples} To run any of the examples below, you first must load the library and and data: \begin{verbatim} > library(MatchIt) > data(lalonde) \end{verbatim} Our example data set is a subset of the job training program analyzed in \citet{lalonde86} and \citet{DehWah99}. \MatchIt\ includes a subsample of the original data consisting of the National Supported Work Demonstration (NSW) treated group and the comparison sample from the Population Survey of Income Dynamics (PSID).\footnote{This data set, \texttt{lalonde}, was created using NSWRE74$\_$TREATED.TXT and CPS3$\_$CONTROLS.TXT from http://www.columbia.edu/$\sim$rd247/nswdata.} The variables in this data set include participation in the job training program (\texttt{treat}, which is equal to 1 if participated in the program, and 0 otherwise), age ({\tt age}), years of education ({\tt educ}), race (\texttt{black} which is equal to 1 if black, and 0 otherwise; \texttt{hispan} which is equal to 1 if hispanic, and 0 otherwise), marital status (\texttt{married}, which is equal to 1 if married, 0 otherwise), high school degree (\texttt{nodegree}, which is equal to 1 if no degree, 0 otherwise), 1974 real earnings (\texttt{re74}), 1975 real earnings (\texttt{re75}), and the main outcome variable, 1978 real earnings (\texttt{re78}). \subsubsection{Exact Matching} \label{subsubsec:exact} The simplest version of matching is exact. This technique matches \emph{each} treated unit to \emph{all} possible control units with exactly the same values on all the covariates, forming subclasses such that within each subclass all units (treatment and control) have the same covariate values. Exact matching is implemented in \MatchIt\ using \texttt{method = "exact"}. Exact matching will be done on all covariates included on the right-hand side of the \texttt{formula} specified in the \MatchIt\ call. There are no additional options for exact matching. (Exact restrictions on a subset of covariates can also be specified in nearest neighbor matching; see Section~\ref{subsubsec:nearest}.) The following example can be run by typing {\tt demo(exact)} at the R prompt, \begin{verbatim} > m.out <- matchit(treat ~ educ + black + hispan, data = lalonde, method = "exact") \end{verbatim} \subsubsection{Subclassification} \label{subsubsec:subclass} When there are many covariates (or some covariates can take a large number of values), finding sufficient exact matches will often be impossible. The goal of subclassification is to form subclasses, such that in each the distribution (rather than the exact values) of covariates for the treated and control groups are as similar as possible. Various subclassification schemes exist, including the one based on a scalar distance measure such as the propensity score estimated using the \texttt{distance} option (see Section~\ref{subsubsec:inputs-all}). Subclassification is implemented in \MatchIt\ using \texttt{method = "subclass"}. The following example script can be run by typing {\tt demo(subclass)} at the R prompt, \begin{verbatim} > m.out <- matchit(treat ~ re74 + re75 + educ + black + hispan + age, data = lalonde, method = "subclass") \end{verbatim} The above syntax forms 6 subclasses, which is the default number of subclasses, based on a distance measure (the propensity score) estimated using logistic regression. By default, each subclass will have approximately the same number of treated units. Subclassification may also be used in conjunction with nearest neighbor matching described below, by leaving the default of \texttt{method = "nearest"} but adding the option \texttt{subclass}. When you choose this option, \MatchIt\ selects matches using nearest neighbor matching, but after the nearest neighbor matches are chosen it places them into subclasses, and adds a variable to the output object indicating subclass membership. \subsubsection{Nearest Neighbor Matching} \label{subsubsec:nearest} Nearest neighbor matching selects the $r$ (default=1) best control matches for each individual in the treatment group (excluding those discarded using the \texttt{discard} option). Matching is done using a distance measure specified by the {\tt distance} option (default=logit). Matches are chosen for each treated unit one at a time, with the order specified by the \texttt{m.order} command (default=largest to smallest). At each matching step we choose the control unit that is not yet matched but is closest to the treated unit on the distance measure. Nearest neighbor matching is implemented in \MatchIt\ using the \texttt{method = "nearest"} option. The following example script can be run by typing {\tt demo(nearest)}: \begin{verbatim} > m.out <- matchit(treat ~ re74 + re75 + educ + black + hispan + age, data = lalonde, method = "nearest") \end{verbatim} \subsubsection{Optimal Matching} \label{subsubsec:optimal} The default nearest neighbor matching method in \MatchIt\ is ``greedy'' matching, where the closest control match for each treated unit is chosen one at a time, without trying to minimize a global distance measure. In contrast, ``optimal'' matching finds the matched samples with the smallest average absolute distance across all the matched pairs. \citet{GuRos93} find that greedy and optimal matching approaches generally choose the same sets of controls for the overall matched samples, but optimal matching does a better job of minimizing the distance within each pair. In addition, optimal matching can be helpful when there are not many appropriate control matches for the treated units. Optimal matching is performed with \MatchIt\ by setting \texttt{method = "optimal"}, which automatically loads an add-on package called \texttt{optmatch} \citep{Hansen04}. The following example can also be run by typing {\tt demo(optimal)} at the R prompt. We conduct 2:1 optimal ratio matching based on the propensity score from the logistic regression. \begin{verbatim} > m.out <- matchit(treat ~ re74 + re75 + age + educ, data = lalonde, method = "optimal", ratio = 2) \end{verbatim} \subsubsection{Full Matching} \label{subsubsec:full} Full matching is a particular type of subclassification that forms the subclasses in an optimal way \citep{Rosenbaum02, Hansen04}. A fully matched sample is composed of matched sets, where each matched set contains one treated unit and one or more controls (or one control unit and one or more treated units). As with subclassification, the only units not placed into a subclass will be those discarded (if a \texttt{discard} option is specified) because they are outside the range of common support. Full matching is optimal in terms of minimizing a weighted average of the estimated distance measure between each treated subject and each control subject within each subclass. Full matching can be performed with \MatchIt\ by setting \texttt{method = "full"}. Just as with optimal matching, we use the \texttt{optmatch} package \citep{Hansen04}, which automatically loads when needed. The following example with full matching (using the default propensity score based on logistic regression) can also be run by typing {\tt demo(full)} at the R prompt: \begin{verbatim} > m.out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = "full") \end{verbatim} \subsubsection{Genetic Matching} \label{subsub:genetic} Genetic matching automates the process of finding a good matching solution \citep{DiaSek05}. The idea is to use a genetic search algorithm to find a set of weights for each covariate such that the a version of optimal balance is achieved after matching. As currently implemented, matching is done with replacement using the matching method of \citet{AbaImb07} and balance is determined by two univariate tests, paired t-tests for dichotomous variables and a Kolmogorov-Smirnov test for multinomial and continuous variables, but these options can be changed. Genetic matching can be performed with \MatchIt\ by setting \texttt{method = "genetic"}, which automatically loads the \texttt{Matching} \citep{Sekhon04} package. The following example of genetic matching (using the estimated propensity score based on logistic regression as one of the covariates) can also be run by typing {\tt demo(genetic)}: \begin{verbatim} > m.out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = "genetic") \end{verbatim} \subsubsection{Coarsened Exact Matching} \label{subsub:cem} Coarsened Exact Matching (CEM) is a Monotonoic Imbalance Bounding (MIB) matching method --- which means that the balance between the treated and control groups is chosen by the user ex ante rather than discovered through the usual laborious process of checking after the fact and repeatedly reestimating, and so that adjusting the imbalance on one variable has no effect on the maximum imbalance of any other. CEM also strictly bounds through ex ante user choice both the degree of model dependence and the average treatment effect estimation error, eliminates the need for a separate procedure to restrict data to common empirical support, meets the congruence principle, is robust to measurement error, works well with multiple imputation methods for missing data, and is extremely fast computationally even with very large data sets. CEM also works well for multicategory treatments, determining blocks in experimental designs, and evaluating extreme counterfactuals \citep{IacKinPor08}. CEM can be performed with \MatchIt\ by setting \texttt{method = "cem"}, which automatically loads the \texttt{cem} package. The following examples of CEM (with automatic coarsening) can also be run by typing \texttt{demo(cem)}: \begin{verbatim} m.out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = "cem") \end{verbatim} %%% Local Variables: %%% mode: latex %%% TeX-master: "matchit" %%% End: MatchIt/vignettes/plotref.tex 0000644 0001762 0000144 00000007620 12162551623 016020 0 ustar ligges users \section{\texttt{plot()}: Graphical Summaries of Balance} \subsection{Plot options for the matchit object} The \texttt{plot()} command allows you to check the distributions of propensity scores and covariates in the assignment model, squares, and interactions, and within each subclasses if specified. \subsubsection{Syntax} \begin{verbatim} > plot(m.out, discrete.cutoff = 5, type = "QQ", numdraws = 5000, interactive = TRUE, which.xs = NULL, ...) \end{verbatim} \subsubsection{Arguments} \begin{itemize} \item {\tt type}: type of output graph. \texttt{type = "QQ"} (default) outputs empirical quantile-quantile plots of each covariate to check balance of marginal distributions. Alternatively, \texttt{type = "jitter"} outputs jitter plots of the propensity score for treated and control units. Finally, \texttt{type="hist"} outputs histograms of the propensity score in the original treated and control groups and weighted histograms of the propensity score in the matched treated and control groups. \item {\tt discrete.cutoff}: For quantile-quantile plots, discrete covariates that take 5 or fewer values are jittered for visibility. This may be changed by setting this argument to any other positive integer. \item {\tt interactive}: If \texttt{TRUE} (default), users can identify individual units by clicking on the graph with the left mouse button, and (when applicable) choose subclasses to plot. \item {\tt which.xs}: For quantitle-quantile plots, specifies particular covariate names in a character vector to plot only a subset of the covariates. \item {\tt subclass}: If \texttt{interactive = FALSE}, users can specify which subclass to plot. \end{itemize} \subsubsection{Output Values} \begin{itemize} \item Empirical quantile-quantile plot: This graph plots covariate values that fall in (approximately) the same quantile of treated and control distributions. Control unit quantile values are plotted on the x-axis, and treated unit quantile values are plotted on the y-axis. If values fall below the 45 degree line, control units generally take lower values of the covariate. Data points that fall exactly on the 45 degree line indicate that the marginal distributions are identical. \item Jitter plots: This graph plots jittered estimated propensity scores of treated and control units. Dark diamonds indicate matched units and grey diamonds indicate unmatched or discarded units. The area of the diamond is proportional to the weights. Vertical lines are plotted if subclassification is used. \item Histograms: This graph plots histograms of the estimated propensity scores in the original treated and control groups and weighted histograms of the estimated propensity scores in the matched treated and control groups. Plots can be compared vertically to quickly check the balance before and after matching. \end{itemize} \subsection{Plot options for the matchit summary object} You can also send a matchit summary object to the \texttt{plot()} command, to obtain a summary of the balance on each covariate before and after matching. The summary() object must have been created using the option \texttt{standardize=TRUE}. The idea for this plot came from the ``twang" package by McCaffrey, Ridgeway, and Morral. \subsubsection{Syntax} \begin{verbatim} > s.out <- summary(object, standardize=TRUE, ...) > plot(s.out, ...) \end{verbatim} \subsubsection{Arguments} \begin{itemize} \item {\tt interactive}: If \texttt{TRUE} (default), users can identify individual variables by clicking on the graph with the left mouse button. \end{itemize} \subsubsection{Output Values} \begin{itemize} \item Line plot of standardized differences in means before and after matching. Numbers plotted are those output by the summary() command in the sum.all and sum.matched objects. \end{itemize} %%% Local Variables: %%% mode: latex %%% TeX-master: "matchit" %%% End: MatchIt/vignettes/overview.tex 0000644 0001762 0000144 00000026033 12162551623 016212 0 ustar ligges users \MatchIt\ is designed for causal inference with a dichotomous treatment variable and a set of pretreatment control variables. Any number or type of dependent variables can be used. (If you are interested in the causal effect of more than one variable in your data set, run \MatchIt\ separately for each one; it is unlikely in any event that any one parametric model will produce valid causal inferences for more than one treatment variable at a time.) \MatchIt\ can be used for other types of causal variables by dichotomizing them, perhaps in multiple ways \citep[see also][]{ImaDyk04}. \MatchIt\ works for experimental data, but is usually used for observational studies where the treatment variable is not randomly assigned by the investigator, or the random assignment goes awry. We adopt the same notation as in \citet*{HoImaKin07}. Unless otherwise noted, let $i$ index the $n$ units in the data set, $n_1$ denote the number of treated units, $n_0$ denote the number of control units (such that $n=n_0+n_1$), and $x_i$ indicate a vector of pretreatment (or control) variables for unit $i$. Let $t_i=1$ when unit $i$ is assigned treatment, and $t_i=0$ when unit $i$ is assigned control. (The labels ``treatment'' and ``control'' and values 1 and 0 respectively are arbitrary and can be switched for convenience, except that some methods of matching are keyed to the definition of the treated group.) Denote $y_i(1)$ as the potential outcome of unit $i$ under treatment --- the value the outcome variable would take if $t_i$ were equal to 1, whether or not $t_i$ in fact is 0 or 1 -- and $y_i(0)$ the potential outcome of unit $i$ under control --- the value the outcome variable would take if $t_i$ were equal to 0, regardless of its value in fact. The variables $y_i(1)$ and $y_i(0)$ are jointly unobservable, and for each $i$, we observe one $y_i=t_iy_i(1)+(1-t_i)y_i(0)$, and not the other. Also denote a fixed vector of exogenous, pretreatment measured confounders as $X_i$. These variables are defined in the hope or under the assumption that conditioning on them appropriately will make inferences ignorable. Measures of balance should be computed with respect to all of $X$, even if some methods of matching only use some components. \section{Preprocessing via Matching} If $t_i$ and $X_i$ were independent, we would not need to control for $X_i$, and any parametric analysis would effectively reduce to a difference in means of $Y$ for the treated and control groups. The goal of matching is to preprocess the data prior to the parametric analysis so that the actual relationship between $t_i$ and $X_i$ is eliminated or reduced without introducing bias and or increasing inefficiency too much. When matching we select, duplicate, or selectively drop observations from our data, and we do so without inducing bias as long as we use a rule that is a function only of $t_i$ and $X_i$ and does not depend on the outcome variable $Y_i$. Many methods that offer this preprocessing are included here, including exact, subclassification, nearest neighbor, optimal, and genetic matching. For many of these methods the propensity score--defined as the probability of receiving the treatment given the covariates--is a key tool. In order to avoid changing the quantity of interest, most \MatchIt\ routines work by retaining all treated units and selecting (or weighting) control units to include in the final data set; this enables one to estimate the average treatment effect on the treated (the purpose of which is described in Section \ref{s:qoi}). \MatchIt\ implements and evaluates the choice of the rules for matching. Matching sometimes increases efficiency by eliminating heterogeneity or deleting observations outside of an area where a model can reasonably be used to extrapolate, but one needs to be careful not to lose too many observations in matching or efficiency will drop more than the reduction in bias that is achieved. The simplest way to obtain good matches (as defined above) is to use one-to-one exact matching, which pairs each treated unit with one control unit for which the values of $X_i$ are identical. However, with many covariates and finite numbers of potential matches, sufficient exact matches often cannot be found. Indeed, many of the other methods implemented in \MatchIt\ attempt to balance the overall covariate distributions as much as possible, when sufficient one-to-one exact matches are not available. A key point in \citet*{HoImaKin07} is that matching methods by themselves are not methods of estimation: Every use of matching in the literature involves an analysis step following the matching procedure, but almost all analyses use a simple difference in means. This procedure is appropriate only if exact matching was conducted. In almost all other cases, some adjustment is required, and there is no reason to degrade your inferences by using an inferior method of analysis such as a difference in means even when improving your inferences via preprocessing. Thus, with \MatchIt, you can improve your analyses in two ways. \MatchIt\ analyses are ``doubly robust'' in that if \emph{either} the matching analysis \emph{or} the analysis model is correct (but not necessarily both) your inferences will be statistically consistent. In practice, the modeling choices you make at the analysis stage will be much less consequential if you match first. \section{Checking Balance} \label{sec:balance-sum} The goal of matching is to create a data set that looks closer to one that would result from a perfectly blocked (and possibly randomized) experiment. When we get close, we break the link between the treatment variable and the pretreatment controls, which makes the parametric form of the analysis model less relevant or irrelevant entirely. To break this link, we need the distribution of covariates to be the same within the matched treated and control groups. A crucial part of any matching procedure is, therefore, to assess how close the (empirical) covariate distributions are in the two groups, which is known as ``balance.'' Because the outcome variable is not used in the matching procedure, any number of matching methods can be tried and evaluated, and the one matching procedure that leads to the best balance can be chosen. \MatchIt\ provides a number of ways to assess the balance of covariates after matching, including numerical summaries such as the ``mean Diff.'' (difference in means) or the difference in means divided by the treated group standard deviation, and summaries based on quantile-quantile plots that compare the empirical distributions of each covariate. The widely used procedure of doing t-tests of the difference in means is highly misleading and should never be used to assess balance; see \citet{ImaKinStu08}. These balance diagnostics should be performed on all variables in $X$, even if some are excluded from one of the matching procedures. \section{Conducting Analyses after Matching}\label{s:qoi} The most common way that parametric analyses are used to compute quantities of interest (without matching) is by (statistically) holding constant some explanatory variables, changing others, and computing predicted or expected values and taking the difference or ratio, all by using the parametric functional form. In the case of causal inference, this would mean looking at the effect on the expected value of the outcome variable when changing $T$ from 0 to 1, while holding constant the pretreatment control variables $X$ at their means or medians. This, and indeed any other appropriate analysis procedure, would be a perfectly reasonable way to proceed with analysis after matching. If it is the chosen way to proceed, then either treated or control units may be deleted during the matching stage, since the same parametric structure is assumed to apply to all observations. In other instances, researchers wish to reduce the assumptions inherent in their statistical model and so want to allow for the possibility that their treatment effect to vary over observations. In this situation, one popular quantity of interest used is the \emph{average treatment effect on the treated} (ATT). For example, for the treated group, the potential outcomes under control, $Y_i(0)$, are missing, whereas the outcomes under treatment, $Y_i(1)$, are observed, and the goal of the analysis is to impute the missing outcomes, $Y_i(0)$ for observations with $T_i=1$. We do this via simulation using a parametric statistical model such as regression, logit, or others (as described below). Once those potential outcomes are imputed from the model, the estimate of individual $i$'s treatment effect is $Y_i(1)-\widehat{Y}_i(0)$ where $\widehat{Y}_i(0)$ is a predicted value of the dependent variable for unit $i$ under the counterfactual condition where $T_i=0$. The in-sample average treatment effect for the treated individuals can then be obtained by averaging this difference over all observations $i$ where in fact $T_i=1$. Most \MatchIt\ algorithms retain all treated units, and choose some subset of or repeated units from the control group, so that estimating the ATT is straightforward. If one chooses options that allow matching with replacement, or any solution that has different numbers of controls (or treateds) within each subclass or strata (such as full matching), then the parametric analysis following matching must accomodate these procedures, such as by using fixed effects or weights, as appropriate. (Similar procedures can also be used to estimate various other quantities of interest such as the average treatment effect by computing it for all observations, but then one must be aware that the quantity of interest may change during the matching procedure as some control units may be dropped.) The imputation from the model can be done in at least two ways. Recall that the model is used to impute \emph{the value that the outcome variable would take among the treated units if those treated units were actually controls}. Thus, one reasonable approach would be to fit a model to the matched data and create simulated predicted values of the dependent variable for the treated units with $T_i$ switched counterfactually from 1 to 0. An alternative approach would be to fit a model without $T$ by using only the outcomes of the matched control units (i.e., using only observations where $T_i=0$). Then, given this fitted model, the missing outcomes $Y_i(0)$ are imputed for the matched treated units by using the values of the explanatory variables for the treated units. The first approach will usually have lower variance, since all observations are used, and the second may have less bias, since no assumption of constant parameters across the models of the potential outcomes under treatment and control is needed. See \citet*{HoImaKin07} for more details. Other quantities of interest can also be computed at the parametric stage, following any procedures you would have followed in the absence of matching. The advantage is that if matching is done well your answers will be more robust to many small changes in parametric specification. %%% Local Variables: %%% mode: latex %%% TeX-master: "matchit" %%% End: MatchIt/vignettes/notation.tex 0000644 0001762 0000144 00000017367 12162551623 016211 0 ustar ligges users \documentclass[oneside,letterpaper,titlepage,12pt]{article} %\usepackage[ae,hyper]{/usr/lib/R/share/texmf/Rd} \usepackage{makeidx} \usepackage{graphicx} \usepackage{natbib} \usepackage[reqno]{amsmath} \usepackage{amssymb} \usepackage{verbatim} \usepackage{epsf} \usepackage{url} \usepackage{html} \usepackage{dcolumn} \usepackage{longtable} \usepackage{vmargin} \setpapersize{USletter} \newcolumntype{.}{D{.}{.}{-1}} \newcolumntype{d}[1]{D{.}{.}{#1}} %\pagestyle{myheadings} \htmladdtonavigation{ \htmladdnormallink{% \htmladdimg{http://gking.harvard.edu/pics/home.gif}} {http://gking.harvard.edu/}} \newcommand{\hlink}{\htmladdnormallink} \bodytext{ BACKGROUND="http://gking.harvard.edu/pics/temple.setcounter"} \setcounter{tocdepth}{3} \parindent=0cm \newcommand{\MatchIt}{\textsc{MatchIt}} \begin{document} \begin{center} Notation for \MatchIt \\ Elizabeth Stuart \\ \end{center} This document details the notation to be used in the matching paper to accompany \MatchIt, as discussed in conference call on April 27, 2004 and emails on April 27-29, 2004. \\ We first provide a general idea of the notation and ideas. Formal notation follows. \begin{enumerate} \item There exists in nature fixed values, $\theta_{1i}$ and $\theta_{0i}$, the potential outcomes under treatment and control, respectively. They, or more usually their difference, are our quantities of interest and hence the inferential target in this exercise. They are not generally known. There also exist in nature a vector of fixed values $X_i$ that are known and will play the role of covariates to condition on. Note that the use of $X_i$ in matching assumes that they are not affected by treatment assignment: $X_{1i}=X_{0i}=X_i$. If a researcher is interested in adjusting for a variable potentially affected by treatment assignment, then methods such as principal stratification should be used. [We can write a paragraph on that somewhere.] \item Then (i.e., only after step 1) $T_i$ is created, preferably by the experimenter, who assigns its values randomly, but in some cases $T_i$ is created by the world and we hope that it was created effectively randomly (or randomly conditional on the observed $X_i$). This is what is known as the assumption of unconfounded treatment assignment: That treatment assignment is independent of the potential outcomes given the covariates $X_i$. \item Finally, the observed outcome variable $Y_i$ is calculated deterministically as $Y_i = \theta_{1i}T_i + \theta_{0i}(1-T_i)$. This is an accounting identity, not a regression equation with an error term; i.e., it is just true, not really an assumption. \end{enumerate} The accounting identity in 3 implies that we observe $\theta_{1i}=Y_i$ (but not $\theta_{0i}$) when $T_i=1$, and we observe $\theta_{0i}=Y_i$ (but not $\theta_{1i}$) when $T_i=0$. Since the fundamental problem of causal inference indicates that either $\theta_{0i}$ or $\theta_{1i}$ will be unobserved for each $i$, we will need to infer their values, for which we define $\tilde\theta_{1i} \sim p(\theta_{1i})$ and $\tilde\theta_{0i} \sim p(\theta_{0i})$ as draws of the potential outcomes under treatment and control, respectively, from their respective predictive posterior distributions.\\ We now detail the notation utilized. Fixed, but sometimes unknown values are represented by Greek letters. Draws from the posterior distribution of these unknown values are represented by a tilde over the variable of interest. Fixed, but always known values are represented by capital Roman letters. We first define notation for individual $i$: \begin{itemize} \item $\theta_{1i}=$ individual $i$'s potential outcome under treatment \item $\theta_{0i}=$ individual $i$'s potential outcome under control \item $T_i=$ individual $i$'s observed treatment assignment \begin{itemize} \item $T_i=1$ means individual $i$ receives treatment \item $T_i=0$ means individual $i$ receives control \end{itemize} \item $\theta_{0i}$ and $\theta_{1i}$ are considered fixed quantities. Which one is observed depends on the random variable $T_i$. \item $Y_i=T_i \theta_{1i} + (1-T_i) \theta_{0i}$ is individual $i$'s observed outcome \item Let $Y_{1i}=(\theta_{1i}|T_i=1)$. Similarly, $Y_{0i}=(\theta_{0i}|T_i=0)$. One of these is observed for individual $i$. [Do we actually want this notation? It's not quite right since the capital Roman letter would imply that $Y_{1i}$ is always observed, but the notation may be handy (see below).] \item Let $\tilde{\theta}_{0i}$ be a draw from the posterior distribution of $\theta_{0i}$: $\tilde{\theta}_{0i} \sim p(\theta_{0i})$. Similarly, let $\tilde{\theta}_{1i}$ be a draw from the posterior distribution of $\theta_{1i}$: $\tilde{\theta}_{1i} \sim p(\theta_{1i})$. \item Consider variables $X_i$. If $X_{0i}=X_{1i}=X_i$ then $X$ is a ``proper covariate'' in that it is not affected by treatment assignment. Only proper covariates should be used in the matching procedure. That $X_i$ is not affected by treatment assignment is always an assumption (sometimes more reasonable than other times)--we never observe $X_{0i}$ and $X_{1i}$ for individual $i$. \end{itemize} For individual $i$, the potentially observed values can be represented by the following 2x2 table: \begin{center} \begin{tabular}{cc|c|c|} \multicolumn{4}{c}{\hspace*{4cm} Actual treatment assignment} \\ \\ \multicolumn{2}{c}{} & \multicolumn{2}{c}{Control \hfill Treatment} \\ \cline{3-4} & & & \phantom{abcd} \\ & Control & $\theta_{0i}$, $X_i$ & $\tilde{\theta}_{0i}$, $X_i$ \\ Potential & & & \phantom{abcd} \\ \cline{3-4} outcomes under & & & \phantom{abcd} \\ & Treatment & $\tilde{\theta}_{1i}$, $X_i$ & $\theta_{1i}$, $X_i$ \\ & & & \phantom{abcd} \\ \cline{3-4} \end{tabular} \end{center} For each individual, we are in either Column 1 or Column 2 (depending on treatment assignment). Within each column, 1 number will be observed and 1 will be a drawn value from the posterior distribution (i.e., the true value for that cell is missing). Thus, for each individual we are in the situation of one of the following two vectors of potential outcomes: \begin{center} \begin{tabular}{rl} If $T_i=1$: & $(\tilde\theta_{0i}, \theta_{1i})$ \\ If $T_i=0$: & $(\theta_{0i}, \tilde\theta_{1i})$ \\ \end{tabular} \end{center} Now consider $n$ individuals observed, with $n_1$ in the treated group and $n_0$ in the control group ($n=n_0+n_1$). Then we have the following notation: \begin{itemize} \item $n_1=\sum_{i=1}^n T_i$, $n_0=\sum_{i=1}^n (1-T_i)$ \item ${\bf Y_0} = \{Y_{i}|T_i=0\}$. ${\bf Y_0}$ is of length $n_0$. \item ${\bf Y_1} = \{Y_{i}|T_i=1\}$. ${\bf Y_1}$ is of length $n_1$. \item ${\bf Y} = \{ {\bf Y_0}, {\bf Y_1}\}$. ${\bf Y}$ is of length $n$. \item $\overline{Y}_1 = \frac{\sum_{i=1}^n T_i \theta_{1i}}{\sum_{i=1}^n T_i}=\frac{\sum_{i=1}^{n_1} Y_{1i}}{n_1}$ \item $\overline{Y}_0 = \frac{\sum_{i=1}^n (1-T_i) \theta_{0i}}{\sum_{i=1}^n (1-T_i)}=\frac{\sum_{i=1}^{n_0} Y_{0i}}{n_0}$ \item Observed sample variances would be calculated in a similar way (using ${\bf Y_0}$ and ${\bf Y_1}$). \end{itemize} Throughout, we will use $p()$ to represent pdf's and $g()$ for functional forms of models. One point to make sure we mention in the write-up is to say what OLS assumes. OLS with a treatment indicator does: $$Y_i|X_i, T_i \sim N(\alpha + \beta_0 T_i + {\boldsymbol \beta} {\bf X_i}, \sigma^2)$$ which assumes that the models of the potential outcomes follow parallel lines with equal residual variance: $$\theta_{1i}|X_i \sim N(\alpha + \beta_0 + {\boldsymbol \beta} {\bf X_i}, \sigma^2)$$ $$\theta_{0i}|X_i \sim N(\alpha + {\boldsymbol \beta} {\bf X_i}, \sigma^2).$$ We can also have a graphical representation of this, perhaps with some examples of outcomes that clearly don't have that nice parallel linear relationship, and show that this would be a very special case of what we're advocating. \end{document} MatchIt/vignettes/mdataref.tex 0000644 0001762 0000144 00000006604 12162551623 016131 0 ustar ligges users \section{\texttt{match.data()}: Extracting the Matched Data Set} \label{sec:match.data} \subsection{Usage} To extract the matched data set for subsequent analyses from the output object (see Section~\ref{sec:analysis}), we provide the function {\tt match.data()}. This is used as follows: \begin{verbatim} > m.data <- match.data(object, group = "all", distance = "distance", weights = "weights", subclass = "subclass") \end{verbatim} The output of the function {\tt match.data()} is the original data frame where additional information about matching (i.e., distance measure as well as resulting weights and subclasses) is added, restricted to units that were matched. \subsection{Arguments} {\tt match.data()} takes the following inputs: \begin{enumerate} \item {\tt object} is the output object from {\tt matchit()}. This is a required input. \item {\tt group} specifies for which matched group the user wants to extract the data. Available options are {\tt "all"} (all matched units), {\tt "treat"} (matched units in the treatment group), and {\tt "control"} (matched units in the control group). The default is {\tt "all"}. \item {\tt distance} specifies the variable name used to store the distance measure. The default is {\tt "distance"}. \item {\tt weights} specifies the variable name used to store the resulting weights from matching. The default is {\tt "weights"}. See Section~\ref{subsec:weights} for more details on the weights. \item {\tt subclass} specifies the variable name used to store the subclass indicator. The default is {\tt "subclass"}. \end{enumerate} \subsection{Examples} Here, we present examples for using {\tt match.data()}. Users can run these commands by typing {\tt demo(match.data)} at the R prompt. First, we load the Lalonde data, \begin{verbatim} > data(lalonde) \end{verbatim} The next line performs nearest neighbor matching based on the estimated propensity score from the logistic regression, \begin{verbatim} > m.out1 <- matchit(treat ~ re74 + re75 + age + educ, data = lalonde, + method = "nearest", distance = "logit") \end{verbatim} To obtain matched data, type the following command, \begin{verbatim} > m.data1 <- match.data(m.out1) \end{verbatim} It is easy to summarize the resulting matched data, \begin{verbatim} > summary(m.data1) \end{verbatim} To obtain matched data for the treatment or control group, specify the option {\tt group} as follows, \begin{verbatim} > m.data2 <- match.data(m.out1, group = "treat") > summary(m.data2) > m.data3 <- match.data(m.out1, group = "control") > summary(m.data3) \end{verbatim} We can also use the function to return unmatched data: \begin{verbatim} > unmatched.data <- lalonde[!row.names(lalonde)%in%row.names(match.data(m.out1)),] \end{verbatim} We can also specify different names for the subclass indicator, the weight variable, and the estimated distance measure. The following example first does a subclassification method, obtains the matched data with specified names for those three variables, and then print out the names of all variables in the resulting matched data. \begin{verbatim} > m.out2 <- matchit(treat ~ re74 + re75 + age + educ, data = lalonde, + method = "subclass") > m.data4 <- match.data(m.out2, subclass = "block", weights = "w", + distance = "pscore") > names(m.data4) \end{verbatim} %%% Local Variables: %%% mode: latex %%% TeX-master: "matchit" %%% End: MatchIt/vignettes/matchitref.tex 0000644 0001762 0000144 00000043453 12162551623 016477 0 ustar ligges users \subsubsection{Syntax} \begin{verbatim} > m.out <- matchit(formula, data, method = "nearest", verbose = FALSE, ...) \end{verbatim} \subsubsection{Arguments} \paragraph{Arguments for All Matching Methods} \begin{itemize} \item \texttt{formula}: formula used to calculate the distance measure for matching (e.g., the propensity score model). It takes the usual syntax of R formulas, {\tt treat \~\ x1 + x2}, where {\tt treat} is a binary treatment indicator, and {\tt x1} and {\tt x2} are the pre-treatment covariates. Both the treatment indicator and pre-treatment covariates must be contained in the same data frame, which is specified as {\tt data} (see below). All of the usual R syntax for formulas work here. For example, {\tt x1:x2} represents the first order interaction term between {\tt x1} and {\tt x2}, and {\tt I(x1 \^\ 2)} represents the square term of {\tt x1}. See {\tt help(formula)} for details. \item \texttt{data}: the data frame containing the variables called in {\tt formula}. \item \texttt{method}: the matching method (default = \texttt{"nearest"}, nearest neighbor matching). Currently, \texttt{"exact"} (exact matching), \texttt{"full"} (full matching), \texttt{"nearest"} (nearest neighbor matching), \texttt{"optimal"} (optimal matching), \texttt{"subclass"} (subclassification), \texttt{"genetic"} (genetic matching), and \texttt{"cem"} (coarsened exact matching) are available. Note that within each of these matching methods, \MatchIt\ offers a variety of options. See below for more details. \item \texttt{verbose}: a logical value indicating whether to print the status of the matching algorithm (default = \texttt{FALSE}). \end{itemize} \paragraph{Additional Arguments for Specification of Distance Measures} \label{subsubsec:inputs-all} The following arguments specify distance measures that are used for matching methods. These arguments apply to all matching methods {\it except exact matching}. \begin{itemize} \item \texttt{distance}: the method used to estimate the distance measure (default = {\tt "logit"}, logistic regression) or a numerical vector of user's own distance measure. Before using any of these techniques, it is best to understand the theoretical groundings of these techniques and to evaluate the results. Most of these methods (such as logistic or probit regression) define the distance by first estimating the propensity score, defined as the probability of receiving treatment, conditional on the covariates. Available methods include: \begin{itemize} \item {\tt "mahalanobis"}: the Mahalanobis distance measure. \item binomial generalized linear models with one of the following link functions: \begin{itemize} \item \texttt{"logit"}: logistic link \item {\tt "linear.logit"}: logistic link with linear propensity score\footnote{The linear propensity scores are obtained by transforming back onto a linear scale.} \item \texttt{"probit"}: probit link \item {\tt "linear.probit"}: probit link with linear propensity score \item {\tt "cloglog"}: complementary log-log link \item {\tt "linear.cloglog"}: complementary log-log link with linear propensity score \item {\tt "log"}: log link \item {\tt "linear.log"}: log link with linear propensity score \item {\tt "cauchit"} Cauchy CDF link \item {\tt "linear.cauchit"} Cauchy CDF link with linear propensity score \end{itemize} \item Choose one of the following generalized additive models (see {\tt help(gam)} for more options). \begin{itemize} \item \texttt{"GAMlogit"}: logistic link \item {\tt "GAMlinear.logit"}: logistic link with linear propensity score \item \texttt{"GAMprobit"}: probit link \item {\tt "GAMlinear.probit"}: probit link with linear propensity score \item {\tt "GAMcloglog"}: complementary log-log link \item {\tt "GAMlinear.cloglog"}: complementary log-log link with linear propensity score \item {\tt "GAMlog"}: log link \item {\tt "GAMlinear.log"}: log link with linear propensity score, \item {\tt "GAMcauchit"}: Cauchy CDF link \item {\tt "GAMlinear.cauchit"}: Cauchy CDF link with linear propensity score \end{itemize} \item \texttt{"nnet"}: neural network model. See {\tt help(nnet)} for more options. \item \texttt{"rpart"}: classification trees. See {\tt help(rpart)} for more options. \end{itemize} \item \texttt{distance.options}: optional arguments for estimating the distance measure. The input to this argument should be a list. For example, if the distance measure is estimated with a logistic regression, users can increase the maximum IWLS iterations by \texttt{distance.options = list(maxit = 5000)}. Find additional options for general linear models using {\tt help(glm)} or {\tt help(family)}, for general additive models using {\tt help(gam)}, for neutral network models {\tt help(nnet)}, and for classification trees {\tt help(rpart)}. \item \texttt{discard}: specifies whether to discard units that fall outside some measure of support of the distance measure (default = \texttt{"none"}, discard no units). Discarding units may change the quantity of interest being estimated by changing the observations left in the analysis. Enter a logical vector indicating which unit should be discarded or choose from the following options: \begin{itemize} \item \texttt{"none"}: no units will be discarded before matching. Use this option when the units to be matched are substantially similar, such as in the case of matching treatment and control units from a field experiment that was close to (but not fully) randomized (e.g., \citealt{Imai05}), when caliper matching will restrict the donor pool, or when you do not wish to change the quantity of interest and the parametric methods to be used post-matching can be trusted to extrapolate. \item \texttt{"hull.both"}: all units that are not within the convex hull will be discarded. See \citet{KinZen06,KinZen07} for information about the convex hull in this context and as a measure of model dependence. \item \texttt{"both"}: all units (treated and control) that are outside the support of the distance measure will be discarded. \item \texttt{"hull.control"}: only control units that are not within the convex hull of the treated units will be discarded. \item \texttt{"control"}: only control units outside the support of the distance measure of the treated units will be discarded. Use this option when the average treatment effect on the treated is of most interest and when you are unwilling to discard non-overlapping treatment units (which would change the quantity of interest). \item \texttt{"hull.treat"}: only treated units that are not within the convex hull of the control units will be discarded. \item \texttt{"treat"}: only treated units outside the support of the distance measure of the control units will be discarded. Use this option when the average treatment effect on the control units is of most interest and when unwilling to discard control units. \end{itemize} \item \texttt{reestimate}: If {\tt FALSE} (default), the model for the distance measure will not be re-estimated after units are discarded. The input must be a logical value. Re-estimation may be desirable for efficiency reasons, especially if many units were discarded and so the post-discard samples are quite different from the original samples. \end{itemize} \paragraph{Additional Arguments for Subclassification} \label{subsubsec:inputs-subclass} \begin{itemize} \item \texttt{sub.by}: criteria for subclassification. Choose from: \texttt{"treat"} (default), the number of treatment units; \texttt{"control"}, the number of control units; or \texttt{"all"}, the total number of units. Changing the default will likely also signal a change in your quantity of interest from the average treatment effect on the treated to other quantities. \item \texttt{subclass}: either a scalar specifying the number of subclasses, or a vector of probabilities bounded between 0 and 1, which create quantiles of the distance measure using the units in the group specified by \texttt{sub.by} (default = \texttt{subclass = 6}). \end{itemize} \paragraph{Additional Arguments for Nearest Neighbor Matching} \label{subsubsec:inputs-nearest} \begin{itemize} \item \texttt{m.order}: the order in which to match treatment units with control units. \begin{itemize} \item {\tt "largest"} (default): matches from the largest value of the distance measure to the smallest. \item {\tt "smallest"}: matches from the smallest value of the distance measure to the largest. \item {\tt "random"}: matches in random order. \end{itemize} \item \texttt{replace}: logical value indicating whether each control unit can be matched to more than one treated unit (default = {\tt replace = FALSE}, each control unit is used at most once -- i.e., sampling without replacement). For matching with replacement, use \texttt{replace = TRUE}. After matching with replacement, the weights can be used to reflect the frequency with which each control unit was matched. \item \texttt{ratio}: the number of control units to match to each treated unit (default = {\tt 1}). If matching is done without replacement and there are fewer control units than {\tt ratio} times the number of eligible treated units (i.e., there are not enough control units for the specified method), then the higher ratios will have \texttt{NA} in place of the matching unit number in \texttt{match.matrix}. \item \texttt{exact}: variables on which to perform exact matching within the nearest neighbor matching (default = {\tt NULL}, no exact matching). If \texttt{exact} is specified, only matches that exactly match on the covariates in \texttt{exact} will be allowed. Within the matches that match on the variables in \texttt{exact}, the match with the closest distance measure will be chosen. \texttt{exact} should be entered as a vector of variable names (e.g., \texttt{exact = c("X1", "X2")}). \item \texttt{caliper}: the number of standard deviations of the distance measure within which to draw control units (default = {\tt 0}, no caliper matching). If a caliper is specified, a control unit within the caliper for a treated unit is randomly selected as the match for that treated unit. If \texttt{caliper != 0}, there are two additional options: \begin{itemize} \item \texttt{calclosest}: whether to take the nearest available match if no matches are available within the \texttt{caliper} (default = {\tt FALSE}). \item \texttt{mahvars}: variables on which to perform Mahalanobis-metric matching within each caliper (default = {\tt NULL}). Variables should be entered as a vector of variable names (e.g., \texttt{mahvars = c("X1", "X2")}). If \texttt{mahvars} is specified without \texttt{caliper}, the caliper is set to 0.25. \end{itemize} \item \texttt{subclass} and \texttt{sub.by}: See the options for subclassification for more details on these options. If a \texttt{subclass} is specified within \texttt{method = "nearest"}, the matched units will be placed into subclasses after the nearest neighbor matching is completed. \end{itemize} \paragraph{Additional Arguments for Optimal Matching} \label{subsubsec:inputs-optimal} \begin{itemize} \item {\tt ratio}: the number of control units to be matched to each treatment unit (default = {\tt 1}). \item {\tt ...}: additional inputs that can be passed to the {\tt fullmatch()} function in the {\tt optmatch} package. See {\tt help(fullmatch)} or \hlink{http://www.stat.lsa.umich.edu/\~{}bbh/optmatch.html}{http://www.stat.lsa.umich.edu/~bbh/optmatch.html} for details. \end{itemize} \paragraph{Additional Arguments for Full Matching} \label{subsubsec:inputs-full} See {\tt help(fullmatch)} (part of this information is copied below) or \hlink{http://www.stat.lsa.umich.edu/\~{}bbh/optmatch.html}{http://www.stat.lsa.umich.edu/~bbh/optmatch.html} for details. \begin{itemize} \item {\tt min.controls}: The minimum ratio of controls to treatments that is to be permitted within a matched set: should be nonnegative and finite. If {\tt min.controls} is not a whole number, the reciprocal of a whole number, or zero, then it is rounded down to the nearest whole number or reciprocal of a whole number. \item {\tt max.controls}: The maximum ratio of controls to treatments that is to be permitted within a matched set: should be positive and numeric. If {\tt max.controls} is not a whole number, the reciprocal of a whole number, or {\tt Inf}, then it is rounded up to the nearest whole number or reciprocal of a whole number. \item {\tt omit.fraction}: Optionally, specify what fraction of controls or treated subjects are to be rejected. If {\tt omit.fraction} is a positive fraction less than one, then {\tt fullmatch()} leaves up to that fraction of the control reservoir unmatched. If {\tt omit.fraction} is a negative number greater than $-1$, then {\tt fullmatch()} leaves up to $|{\rm omit.fraction}|$ of the treated group unmatched. Positive values are only accepted if ${\rm max.controls} >= 1$; negative values, only if ${\rm min.controls} <= 1$. If {\tt omit.fraction} is not specified, then only those treated and control subjects without permissible matches among the control and treated subjects, respectively, are omitted. \item {\tt ...}: Additional inputs that can be passed to the {\tt fullmatch()} function in the {\tt optmatch} package. \end{itemize} \paragraph{Additional Arguments for Genetic Matching} \label{subsubsec:inputs-genetic} The available options are listed below. \begin{itemize} \item {\tt ratio}: the number of control units to be matched to each treatment unit (default = {\tt 1}). \item {\tt ...}: additional minor inputs that can be passed to the {\tt GenMatch()} function in the {\tt Matching} package. See {\tt help(GenMatch)} or\\ \hlink{http://sekhon.polisci.berkeley.edu/library/Matching/html/GenMatch.html}{http://sekhon.polisci.berkeley.edu/library/Matching/html/GenMatch.html} for details. \end{itemize} \paragraph{Additional Arguments for Coarsened Exact Matching} \label{subsubsec:inputs-cem} The available options are listed here: \begin{itemize} \item{cutpoints} named list each describing the cutpoints for the variables. Each list element is either a vector of cutpoints, a number of cutpoints, a method for automatic bin contruction. \item{k2k} return k-to-k matching? \item{verbose} controls level of verbosity \item{dist} user defined distance function \item {\tt ...}: additional minor inputs that can be passed to the {\tt cem()} function in the {\tt cem} package. See {\tt help(cem)} or \hlink{http://gking.harvard.edu/cem}{http://gking.harvard.edu/cem} for details. \end{itemize} \subsubsection{Output Values} \label{sec:outputs} Regardless of the type of matching performed, the \texttt{matchit} output object contains the following elements:\footnote{When inapplicable or unnecessary, these elements may equal {\tt NULL}. For example, when exact matching, {\tt match.matrix = NULL}.} \begin{itemize} \item \texttt{call}: the original {\tt matchit()} call. \item \texttt{formula}: the formula used to specify the model for estimating the distance measure. \item \texttt{model}: the output of the model used to estimate the distance measure. \texttt{summary(m.out\$model)} will give the summary of the model where \texttt{m.out} is the output object from \texttt{matchit()}. \item \texttt{match.matrix}: an $n_1 \times$ \texttt{ratio} matrix where: \begin{itemize} \item the row names represent the names of the treatment units (which match the row names of the data frame specified in \texttt{data}). \item each column stores the name(s) of the control unit(s) matched to the treatment unit of that row. For example, when the \texttt{ratio} input for nearest neighbor or optimal matching is specified as 3, the three columns of \texttt{match.matrix} represent the three control units matched to one treatment unit). \item \texttt{NA} indicates that the treatment unit was not matched. \end{itemize} \item \texttt{discarded}: a vector of length $n$ that displays whether the units were ineligible for matching due to common support restrictions. It equals \texttt{TRUE} if unit $i$ was discarded, and it is set to \texttt{FALSE} otherwise. \item \texttt{distance}: a vector of length $n$ with the estimated distance measure for each unit. \item \texttt{weights}: a vector of length $n$ with the weights assigned to each unit in the matching process. Unmatched units have weights equal to $0$. Matched treated units have weight $1$. Each matched control unit has weight proportional to the number of treatment units to which it was matched, and the sum of the control weights is equal to the number of uniquely matched control units. \item \texttt{subclass}: the subclass index in an ordinal scale from 1 to the total number of subclasses as specified in \texttt{subclass} (or the total number of subclasses from full or exact matching). Unmatched units have \texttt{NA}. \item \texttt{q.cut}: the subclass cut-points that classify the distance measure. \item \texttt{treat}: the treatment indicator from \texttt{data} (the left-hand side of \texttt{formula}). \item \texttt{X}: the covariates used for estimating the distance measure (the right-hand side of \texttt{formula}). When applicable, \texttt{X} is augmented by covariates contained in \texttt{mahvars} and \texttt{exact}. \item \texttt{nn}: A basic summary table of matched data (e.g., the number of matched units) \end{itemize} %%% Local Variables: %%% mode: latex %%% TeX-master: "matchit" %%% End: MatchIt/vignettes/matchit2zelig.tex 0000644 0001762 0000144 00000030303 12162551623 017105 0 ustar ligges users \section{Conducting Analyses after Matching} \label{sec:analysis} Any software package may be used for parametric analysis following \MatchIt. This includes any of the relevant R packages, or other statistical software by exporting the resulting matched data sets using R commands such as {\tt write.csv()} and {\tt write.table()} for ASCII files or {\tt write.dta()} in the {\tt foreign} package for a STATA binary file. When variable numbers of treated and control units have been matched to each other (e.g., through exact matching, full matching, or k:1 matching with replacement), the weights created by MatchIt should be used (e.g., in a weighted regression) to ensure that the matched treated and control groups are weighted up to be similar. Users should also remember that the weights created by MatchIt estimate the average treatment effect on the treated, with the control units weighted to resemble the treated units. See below for more detail on the weights. With subclassification, estimates should be obtained within each subclass and then aggregated across subclasses. When it is not possible to calculate an effect within each subclass, again the weights can be used to weight the matched units. In this section, we show how to use \hlink{Zelig}{http://gking.harvard.edu/zelig/} with \MatchIt. Zelig \citep{ImaKinLau06} is an R package that implements a large variety of statistical models (using numerous existing R packages) with a single easy-to-use interface, gives easily interpretable results by simulating quantities of interest, provides numerical and graphical summaries, and is easily extensible to include new methods. \subsection{Quick Overview} The general syntax is as follows. First, we use \texttt{match.data()} to create the matched data from the \MatchIt\ output object (\texttt{m.out}) by excluding unmatched units from the original data, and including information produced by the particular matching procedure (i.e., primarily a new data set, but also information that may result such as weights, subclasses, or the distance measure). \begin{verbatim} > m.data <- match.data(m.out) \end{verbatim} where {\tt m.data} is the resulting matched data. Zelig analyses all use three commands --- \texttt{zelig}, \texttt{setx}, and \texttt{sim}. For example, the basic statistical analysis is performed first: \begin{verbatim} > z.out <- zelig(Y ~ treat + x1 + x2, model = mymodel, data = m.data) \end{verbatim} where {\tt Y} is the outcome variable, {\tt mymodel} is the selected model, and {\tt z.out} is the output object from {\tt zelig}. This output object includes estimated coefficients, standard errors, and other typical outputs from your chosen statistical model. Its contents can be examined via \texttt{summary(z.out)} or \texttt{plot(z.out)}, but the idea of Zelig is that these statistical results are typically only intermediate quantities needed to compute your ultimate quantities of interest, which in the case of matching are usually causal inferences. To get these causal quantities, we use Zelig's other two commands. Thus, we can set the explanatory variables at their means (the default) and change the treatment variable from a 0 to a 1: \begin{verbatim} > x.out <- setx(z.out, treat=0) > x1.out <- setx(z.out, treat=1) \end{verbatim} and finally compute the resulting estimates of the causal effects and examine a summary: \begin{verbatim} > s.out <- sim(z.out, x = x.out, x1 = x1.out) > summary(s.out) \end{verbatim} \subsection{Examples} We now give four examples using the Lalonde data. They are meant to be read sequentially. You can run these example commands by typing {\tt demo(analysis)}. Although we use the linear least squares model in these examples, a wide range of other models are available in Zelig (for the list of supported models, see \hlink{\url{http://gking.harvard.edu/zelig/docs/Models_Zelig_Can.html}} {http://gking.harvard.edu/zelig/docs/Models_Zelig_Can.html}. To load the Zelig package after installing it, type \begin{verbatim} > library(Zelig) \end{verbatim} \begin{description} \item[Model-Based Estimates] In our first example, we conduct a standard parametric analysis and compute quantities of interest in the most common way. We begin with nearest neighbor matching with a logistic regression-based propensity score, discarding control units outside the convex hull of the treated units \citep{KinZen06,KinZen07}: \begin{verbatim} > m.out <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest", discard = "hull.control", data = lalonde) \end{verbatim} Then we check balance using the summary and plot procedures (which we don't show here). When the best balance is achieved, we run the parametric analysis: \begin{verbatim} > z.out <- zelig(re78 ~ treat + age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out), model = "ls") \end{verbatim} and then set the explanatory variables at their means (the default) and change the treatment variable from a 0 to a 1: \begin{verbatim} > x.out <- setx(z.out, treat=0) > x1.out <- setx(z.out, treat=1) \end{verbatim} and finally compute the result and examine a summary: \begin{verbatim} > s.out <- sim(z.out, x = x.out, x1 = x1.out) > summary(s.out) \end{verbatim} \item[Average Treatment Effect on the Treated] We illustrate now how to estimate the average treatment effect on the treated in a way that is quite robust. We do this by estimating the coefficients in the control group alone. We begin by conducting nearest neighbor matching with a logistic regression-based propensity score: \begin{verbatim} > m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest", data = lalonde) \end{verbatim} Then we check balance using the summary and plot procedures (which we don't show here). We reestimate the matching procedure until we achieve the best balance possible. (The running examples here are meant merely to illustrate, not to suggest that we've achieved the best balance.) Then we go to Zelig, and in this case choose to fit a linear least squares model to the control group only: \begin{verbatim} > z.out1 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "control"), model = "ls") \end{verbatim} where the {\tt "control"} option in {\tt match.data()} extracts only the matched control units and {\tt ls} specifies least squares regression. In a smaller data set, this example should probably be changed to include all the data in this estimation (using \texttt{data = match.data(m.out1)} for the data) and by including the treatment indicator (which is excluded in the example since its a constant in the control group.) Next, we use the coefficients estimated in this way from the control group, and combine them with the values of the covariates set to the values of the treated units. We do this by choosing conditional prediction (which means use the observed values) in \texttt{setx()}. The {\tt sim()} command does the imputation. \begin{verbatim} > x.out1 <- setx(z.out1, data = match.data(m.out1, "treat"), cond = TRUE) > s.out1 <- sim(z.out1, x = x.out1) \end{verbatim} Finally, we obtain a summary of the results by \begin{verbatim} > summary(s.out1) \end{verbatim} \item[Average Treatment Effect (Overall)] To estimate the average treatment effect, we continue with the previous example and fit the linear model to the {\it treatment group}: \begin{verbatim} > z.out2 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "treat"), model = "ls") \end{verbatim} We then conduct the same simulation procedure in order to impute the counterfactual outcome for the {\it control group}, \begin{verbatim} > x.out2 <- setx(z.out2, data = match.data(m.out1, "control"), cond = TRUE) > s.out2 <- sim(z.out2, x = x.out2) \end{verbatim} In this calculation, Zelig is computing the difference between observed and the expected values. This means that the treatment effect for the control units is the effect of control (observed control outcome minus the imputed outcome under treatment from the model). Hence, to combine treatment effects just reverse the signs of the estimated treatment effect of controls. \begin{verbatim} > ate.all <- c(s.out1$qi$att.ev, -s.out2$qi$att.ev) \end{verbatim} The point estimate, its standard error, and the $95\%$ confidence interval is given by \begin{verbatim} > mean(ate.all) > sd(ate.all) > quantile(ate.all, c(0.025, 0.975)) \end{verbatim} \item[Subclassification] In subclassification, the average treatment effect estimates are obtained separately for each subclass, and then aggregated for an overall estimate. Estimating the treatment effects separately for each subclass, and then aggregating across subclasses, can increase the robustness of the ultimate results since the parametric analysis within each subclass requires only local rather than global assumptions. However, fewer observations are obviously available within each subclass, and so this option is normally chosen for larger data sets. We begin this example by conducting subclassification with four subclasses, \begin{verbatim} > m.out2 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = lalonde, method = "subclass", subclass = 4) \end{verbatim} When balance is as good as we can get it, we then fit a linear regression within each subclass by controlling for the estimated propensity score (called \texttt{distance}) and other covariates. In most software, this would involve running four separate regressions and then combining the results. In Zelig, however, all we need to do is to use the {\tt by} option: \begin{verbatim} > z.out3 <- zelig(re78 ~ re74 + re75 + distance, data = match.data(m.out2, "control"), model = "ls", by = "subclass") \end{verbatim} The same set of commands as in the first example are used to do the imputation of the counterfactual outcomes for the treated units: \begin{verbatim} > x.out3 <- setx(z.out3, data = match.data(m.out2, "treat"), fn = NULL, cond = TRUE) > s.out3 <- sim(z.out3, x = x.out3) > summary(s.out3) \end{verbatim} It is also possible to get the summary result for each subclass. For example, the following command summarizes the result for the second subclass. \begin{verbatim} > summary(s.out3, subset = 2) \end{verbatim} \item[How Adjustment After Exact Matching Has No Effect] Regression adjustment after exact one-to-one exact matching gives the identical answer as a simple, unadjusted difference in means. General exact matching, as implemented in MatchIt, allows one-to-many matches, so to see the same result we must weight when adjusting. In other words: weighted regression adjustment after general exact matching gives the identical answer as a simple, unadjusted weighted difference in means. For example: \begin{verbatim} > m.out <- matchit(treat ~ educ + black + hispan, data = lalonde, method = "exact") > m.data <- match.data(m.out) > ## weighted diff in means > weighted.mean(mdata$re78[mdata$treat == 1], mdata$weights[mdata$treat==1]) - weighted.mean(mdata$re78[mdata$treat==0], mdata$weights[mdata$treat==0]) [1] 807 > ## weighted least squares without covariates > zelig(re78 ~ treat, data = m.data, model = "ls", weights = "weights") Call: zelig(formula = re78 ~ treat, model = "ls", data = m.data, weights = "weights") Coefficients: (Intercept) treat 5524 807 > ## weighted least squares with covariates > zelig(re78 ~ treat + black + hispan + educ, data = m.data, model = "ls", weights = "weights") Call: zelig(formula = re78 ~ treat + black + hispan + educ, model = "ls", data = m.data, weights = "weights") Coefficients: (Intercept) treat black hispan educ 314 807 -1882 258 657 \end{verbatim} \end{description} %%% Local Variables: %%% mode: latex %%% TeX-master: "matchit" %%% End: MatchIt/vignettes/matchit.tex 0000644 0001762 0000144 00000020477 12162551623 016003 0 ustar ligges users \documentclass[oneside,letterpaper,12pt]{book} \usepackage{bibentry} \usepackage{graphicx} \usepackage{natbib} \usepackage[reqno]{amsmath} \usepackage{amssymb} \usepackage{verbatim} \usepackage{epsf} \usepackage{url} \usepackage{html} \usepackage{dcolumn} \usepackage{fullpage} \bibpunct{(}{)}{;}{a}{}{,} \newcolumntype{.}{D{.}{.}{-1}} \newcolumntype{d}[1]{D{.}{.}{#1}} %\pagestyle{myheadings} \htmladdtonavigation{ \htmladdnormallink{% \htmladdimg{http://gking.harvard.edu/pics/home.gif}} {http://gking.harvard.edu/}} \newcommand{\hlink}{\htmladdnormallink} %\bodytext{ BACKGROUND="http://gking.harvard.edu/pics/temple.jpg"} \setcounter{tocdepth}{3} \setcounter{secnumdepth}{4} \newcommand{\MatchIt}{\textsc{MatchIt}} \title{\MatchIt: Nonparametric Preprocessing for Parametric Causal Inference\thanks{We thank Olivia Lau for helpful suggestions about incorporating \MatchIt\, into Zelig.}} \author{Daniel E. Ho,\thanks{Assistant Professor of Law \& Robert E.\ Paradise Faculty Scholar, Stanford Law School (559 Nathan Abbott Way, Stanford CA 94305; \texttt{http://dho.stanford.edu}, \texttt{dho@law.stanford.edu}, (650) 723-9560).} \and % Kosuke Imai,\thanks{Assistant Professor, Department of Politics, Princeton University (Corwin Hall 041, Department of Politics, Princeton University, Princeton NJ 08544, USA; \texttt{http://imai.princeton.edu}, \texttt{kimai@Princeton.Edu}).} \and % Gary King,\thanks{David Florence Professor of Government, Harvard University (Institute for Quantitative Social Science, 1737 Cambridge Street, Harvard University, Cambridge MA 02138; \texttt{http://GKing.Harvard.Edu}, \texttt{King@Harvard.Edu}, (617) 495-2027).} \and % Elizabeth A. Stuart\thanks{Assistant Professor, Departments of Mental Health and Biostatistics, Johns Hopkins Bloomberg School of Public Health (624 N Broadway, Room 804, Baltimore, MD 21205; \texttt{http://www.biostat.jhsph.edu/$\sim$estuart}, \texttt{estuart@jhsph.edu}).}} %\makeindex \begin{document} \maketitle \begin{rawhtml}
[Also available is a downloadable PDF version of this entire
document]
\end{rawhtml}
\tableofcontents
\nobibliography*
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\clearpage
\chapter{Introduction}
\input{intro}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Statistical Overview}
\input{overview}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{User's Guide to \MatchIt}
\label{methods}
\input{preprocess}
\input{balance}
\input{matchit2zelig}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Reference Manual}
\label{chap:reference}
\section{\texttt{matchit()}: Implementation of Matching Methods}
\label{sec:matchit}
Use \texttt{matchit()} to implement a variety of matching procedures
including exact matching, nearest neighbor matching,
subclassification, optimal matching, genetic matching, and full
matching. The output of {\tt matchit()} can be analyzed via any
standard R package, by exporting the data for use in another program,
or most simply via \hlink{Zelig}{http://gking.harvard.edu/zelig} in R.
\input{matchitref}
\input{summaryref}
\input{plotref}
\input{mdataref}
\input{faq}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{What's New?}
\begin{itemize}
\item \textbf{2.4-20} (October 24, 2011): bug fix for GAM models (thanks to Felix Thoemmes)
\item \textbf{2.4-18} (April 26, 2011): JSS version, no change in code.
\item \textbf{2.4-17} (April 2, 2011): a minor documentation fix.
\item \textbf{2.4-16} (January 8, 2011): a bug fix for user defined distance.
\item \textbf{2.4-15} (December 11, 2010): a bug fix in the
mahalanobis matching.
\item \textbf{2.4-14} (August 12, 2010): a bug fix in {\tt
match.data()} so that it can be called within a function (thanks
to Ajay Shah, George Baah, and Ben Dominique); \MatchIt now does not
specify digits for printing (thanks to Chris Hane); A summary table
of matched data is now stored in the output (thanks to George Baah)
\item \textbf{2.4-11} (June 25, 2009): More flexible inputs in
plotting.
\item \textbf{2.4-10} (February 2, 2009): Minor documentation fixes
\item \textbf{2.4-8,2.4-9} (January 29, 2009): Minor documentation fixes
\item \textbf{2.4-7} (August 4, 2008): Fixed minor bug in subclassification
(thanks to Ben Domingue)
\item \textbf{2.4-6} (July 21, 2008): Improved summary object for
exact matching (thanks to Andrew Stokes)
\item \textbf{2.4-5} (July 20, 2008): Fixed a minor bug.
\item \textbf{2.4-4} (July 18, 2008): Fixed another bug with regard to the discard option (thanks to Ben Dominique).
\item \textbf{2.4-3} (July 18, 2008): Fixed a bug in full matching
regarding the discard option (thanks to Ben Dominique). Some updates
of documentation regarding coarsened exact matching (2.4-1 and
2.4-2).
\item \textbf{2.4} (June 12, 2008): Included coarsened exact matching;
documentation bug fixes (thanks to Will Lowe)
\item \textbf{2.3-1} (October 11, 2007): Stable release for R
2.6. Documentation improved. Some minor bug fixes and improvements.
\item \textbf{2.2-14} (September 2, 2007): Stable release for R 2.5.
Documentation improved for full matching. (Thanks to Langche Zeng)
\item \textbf{2.2-13} (April 10, 2007): Stable release for R
2.4. Additional fix to package dependencies. Bug fix for summary().
\item \textbf{2.2-12} (April 6, 2007): Stable release for R 2.4. Fix
to package dependencies.
\item \textbf{2.2-11} (July 13, 2006): Stable release for R 2.3.
Fix to ensure summary() command works with character variables in dataframe (thanks to Daniel Gerlanc).
\item \textbf{2.2-10} (May 9, 2006): Stable release for R 2.3.
A bug fix in {\tt demo(analysis)} (thanks to Julia Gray).
\item \textbf{2.2-9} (May 3, 2006): Stable release for R 2.3.
A minor change to DESCRIPTION file.
\item \textbf{2.2-8} (May 1, 2006): Stable release for R 2.3.
Removed dependency on Zelig (thanks to Dave Kane).
\item \textbf{2.2-7} (April 11, 2006): Stable release for R 2.2.
Error message for missing values in the data frame added
(thanks to Olivia Lau).
\item \textbf{2.2-6} (April 4, 2006): Stable release for R 2.2.
Bug fixes related to {\tt reestimate} in {\tt matchit()} and {\tt
match.data()} (thanks to Ani Ruhil and Claire Aussems).
\item \textbf{2.2-5} (December 7, 2005): Stable release for R 2.2.
Changed URL of {\tt WhatIf} to CRAN.
\item \textbf{2.2-4} (December 3, 2005): Stable release for R 2.2.
User's own distance measure can be used with \MatchIt\, (thanks to
Nelson Lim).
\item \textbf{2.2-3} (November 18, 2005): Stable release for R 2.2.
standardize option added to full matching and subclass (thanks to
Jeronimo Cortina).
\item \textbf{2.2-2} (November 9, 2005): Stable release for R 2.2.
{\tt optmatch} package now on CRAN. Changed URL for that package.
\item \textbf{2.2-1} (November 1, 2005): Stable release for R 2.2.
balance measures based on empirical CDF are added as a new option
{\tt standardize} in {\tt summary()}.
\item \textbf{2.1-4} (October 14, 2005): Stable release for R 2.2.
strictly empirical (no interpolation) quantile-quantile functions
and plots are used.
\item \textbf{2.1-3} (September 27, 2005): Stable release for R 2.1.
automated the installation of optional packages. fixed a coding
error in {\tt summary()}, the documentation edited.
\item \textbf{2.1-2} (September 27, 2005): Stable release for R 2.1.
minor changes to file names, the option {\tt "whichxs"} added to the
{\tt plot()}, major editing of the documentation.
\item \textbf{2.1-1} (September 16, 2005): Stable release for R
2.1. Genetic matching added.
\item \textbf{2.0-1} (August 29, 2005): Stable release for R 2.1.
Major revisions including some syntax changes. Statistical tests are
no longer used for balance checking, which are now based on the
empirical covariate distributions (e.g., quantile-quantile plot).
\item \textbf{1.0-2} (August 10, 2005): Stable release for R
2.1. Minor bug fixes (Thanks to Bart Bonikowski).
\item \textbf{1.0-1} (January 3, 2005): Stable release for R 2.0. The
first official version of \MatchIt
\end{itemize}
\clearpage
\bibliographystyle{asa}
\bibliography{gk,gkpubs}
\end{document}
MatchIt/vignettes/matchit.pdf 0000644 0001762 0000144 00002225214 12162551623 015752 0 ustar ligges users %PDF-1.4
3 0 obj <<
/Length 1508
/Filter /FlateDecode
>>
stream
xÚWKsÛF¾çWðÔ¡f¢õ¾éÅvìÆvj×mœéLZ¢-Ö´¨¡Vñ¤¿¾ –”©†r2¹HûÀ~À€åñÍ«ƒŸ„Ë$gÖJ“ÝÜeBx¦×™1–9¯]v3ÿ3¿,&Öæq¶8“¿o.àòi)eÆIøÍdª¹Í¯šåª˜ˆ¼-Ë8™Â ‚Ùl2U\å×m¹j›ÉTÚ|Vâæz¿(±¼G™ßÁ¸iQ]6•’Yï4ÜZÒs=ÓE‹:ñpÛj–ξÅÅ
Aâ¨N«çˆ~W¶¤p zÓU|&4SÚJ¼ŠTšñ tæ˜^’"‘ŒPš3ïT6µœíí þ²"HT&²ôÖ¼N|X°Ò’«ó^ºÌ0É 9f‡Ìã³é@ð}³Þ