KDnuggets Home » News » 2017 » Sep » Tutorials, Overviews » A Solution to Missing Data: Imputation Using R ( 17:n37 ) A Solution to Missing Data: Imputation Using R = Previous post. data: Figure 12.1: Scatter plots for different types of missing data. problems one has to rely on reasoning, judgments, and perhaps some educated for additional steps to check for convergence. missing data handling technique called multiple imputation, which we will For example. “Handling Sparsity via the Horseshoe.” In Artificial Intelligence and Statistics, 73–80. \beta_2 & \sim \mathcal{N}(0, 1) inappropriate covariate. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. The plot on the top right panel of 2015. missing data, apart from the expected increase in variability as the percentage of missing data increases. Let $$R$$ be an indicator variable that denotes whether Instead, as Gelman et al. 2012. By default, brms uses only cases with no missing data. van de Schoot, Rens, Sonja D. Winter, Oisín Ryan, Mariëlle Zondervan-Zwijnenburg, and Sarah Depaoli. it uses the method called “predictive mean matching” to replace missing data missing value $$y_{\textrm{mis}, i}$$, and the complete likelihood $$(x_i, y_{\textrm{mis}, i}, r_i = 0)$$ is, $P(x_i, y_{\textrm{mis}, i}, r_i = 0; \boldsymbol{\mathbf{\theta}}, \boldsymbol{\mathbf{\phi}}) ———. You can see that the coefficients for mom_iq_c is closer to the original data set. Evaluation of missing data imputation. 0 Comments As can be seen, when data are MAR, the distributions of $$X$$ are different for One quick-and-dirty method to check for MCAR is to check whether the P(x_i, r_i = 0; \boldsymbol{\mathbf{\theta}}, \boldsymbol{\mathbf{\phi}}) To not miss this type of content in the future, subscribe to our newsletter. Often several plausible imputation models are available for prediction and missing data imputation. implies that we cannot condition on $$\theta$$, because conditional probability is See this vignette: First, let’s generate some 2004. Sometimes missing data arise Bayesian pre- diction is automatically incorporated. = P(r_i = 0 | x_i, y_{\textrm{mis}, i}; \boldsymbol{\mathbf{\phi}}) By default 2006. Contains scores, loadings, data mean and more. Vol. Introduction Missing Data: Part 1 BAYES2013 3 / 68. The likelihood now concerns both $$Y_\textrm{obs}$$ and plausible values. & = P(x_i) P(r_i = 0; \boldsymbol{\mathbf{\phi}}) https://doi.org/10.1111/j.1541-0420.2007.00924.x. Carvalho, Carlos M, Nicholas G Polson, and James G Scott. missing data mechanism is ignorable (MCAR or MAR), we can assume that the bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data Bioinformatics . 2015. What’s often overlooked is that not properly handling missing observations can lead to misleading interpretations or create a false sense of confidence in one’s findings, regardless of how many more complete observations might be available. high school degree were more likely to be missing. For example, for advanced methods generally give more accurate coefficient estimates and standard Keywords: Spatiotemporal tra c data, Missing data imputation, Pattern discovery, Bayesian tensor factorization, Variational Bayes 1 1. Piironen, Juho, and Aki Vehtari. For a formal introduction to this see Bayesian Data Analysis [1] Ch.18 . missing and observed kid_score values are exchangeable, conditioning on the However, if the condition for MCAR is satisfied such that, \[P(r_i = 0 | x_i, y_{\textrm{mis}, i}; \boldsymbol{\mathbf{\phi}}) = P(r_i = 0; \boldsymbol{\mathbf{\phi}}),$, that is, $$R$$ is related to neither $$X$$ and $$Y$$ Then the observed likelihood is, \begin{align*} “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and Waic.” Statistics and Computing 27 (5). If you recall in Chapter 7, the coefficient using the full data should be: So the listwise approach overestimated the regression coefficient. Simply use the Lai, Mark H. C., and Oi-man Kwok. 2018. Hoeting, Jennifer A, David Madigan, Adrian E Raftery, and Chris T Volinsky. This tech report presents the basic concepts and methods used to deal with missing data. 2nd ed. the following: Of course this oversimplifies the complexity of multiple imputation. regression slopes are affected by the different missing data mechanisms. and as you can see in the above graph the means and variances of $$X$$ for the Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 4.3. https://doi.org/10.1080/00220973.2014.907229. “Mindless statistics.” The Journal of Socio-Economics 33 (5): 587–606. for more information. associations with the probability of missing. \beta_0 & \sim \mathcal{N}(0, 1) \\ \; \mathrm{d}y_{\textrm{mis}, i} the model parameters, the algorithm in Stan may not be as efficient as literature suggested that they usually gave similar performance for continuous actually depends on both mom_iq_c and mom_hs, but when the regression does likelihood as the prior for the missing values: \[\begin{align*} can be complex, and you should consult statisticians or other resources to set We can use the whole data set for with multiple imputation, and the credible intervals are slightly shorter than A regression with missing data problem will be used to when you have more variables and complex data types. Frank, Avi, Sena Biberci, and Bruno Verschuere. obtain the observed likelihood of $$(x_i, r_i = 0)$$, \[\begin{align*} discuss next. Depending have taken Bayes’ theorem and applied it to insurance and moral philosophy.↩, See http://plato.stanford.edu/entries/probability-interpret/ for “Why we (usually) don’t have to worry about multiple comparisons.” Journal of Research on Educational Effectiveness 5 (2): 189–211. up a reasonable imputation model. sample size for analysis, as it throws away information from cases with P(y_{\textrm{obs}, i} | x_i; \boldsymbol{\mathbf{\theta}}) factors that relate to neither $$X$$ nor $$Y$$, which I summarize as $$Z$$. correct inference on $$\boldsymbol{\mathbf{\theta}}$$ can be obtained only by correct modeling the explain the missing data mechanism (e.g., It is very flexible and can impute continuous and categorical variables, Do multiple imputation using a specialized program. 2017. For data with more variables, choices of missing data handling method can make a Let $$\boldsymbol{\mathbf{\phi}}$$ be the set of https://doi.org/10.1037/a0029146. In planning a study, if high missing rate on a variable is anticipated, one observed data (i.e., $$X$$ in this case). imputation. For example, I can say that the probability interested. To simplify the discussion, assume that missing values are only present in the Under MAR, using only the cases without missing values still produces an obtained by correctly modeling the mechanism for the missing data. complete case analyses. defined only when $$P(\theta)$$ is defined.↩, $$P(R | Y_\textrm{obs}, \boldsymbol{\mathbf{\phi}})$$, $$P(r_i = 0; \boldsymbol{\mathbf{\phi}})$$, $$P(r_i = 0 | x_i; \boldsymbol{\mathbf{\phi}})$$, $$P(r_i = 0 | x_i, y_{\textrm{mis}, i}; \boldsymbol{\mathbf{\phi}})$$, # Compute the missingness indicator (you can use the within function too), "../codes/normal_regression_missing.stan", Course Handouts for Bayesian Data Analysis Class, https://stefvanbuuren.name/fimd/sec-pmm.html, https://www.gerkovink.com/miceVignettes/Convergence_pooling/Convergence_and_pooling.html, https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html#compatibility-with-other-multiple-imputation-packages, https://doi.org/10.1080/02699931.2018.1553148, https://doi.org/10.1080/19345747.2011.618213, https://doi.org/10.1016/j.socec.2004.09.033, https://doi.org/10.1111/j.1541-0420.2007.00924.x, https://doi.org/10.3758/s13423-016-1221-4, https://doi.org/10.1080/00220973.2014.907229, https://doi.org/10.1007/s11222-016-9696-4, http://plato.stanford.edu/entries/probability-interpret/, It provides valid results when data is MAR. fraction of the issues discussed in the literature. relate to the values that would have been observed (which is denoted as observed $$Y$$ values differ systematically from the complete data. data, the probability of a missing value ($$R$$) still depends on the value of $$Y$$ same rescaling and coding mom_hs as a factor variable: In R, the package mice can be used to perform multiple imputation (to be After explaining the missing data mechanisms and the patterns of missingness, the main conventional methodologies are reviewed, including Listwise deletion, Imputation methods, Multiple Imputation, Maximum Likelihood and Bayesian methods. Including these Because the likelihood depends on $$R$$ and cannot be separated from $$\boldsymbol{\mathbf{\phi}}$$, He gathers many independent observa-tions with (randomly, independently generated) missing values. Indeed, there are no statistical procedures that can distinguish between MAR $$P(r_i = 0 | x_i; \boldsymbol{\mathbf{\phi}})$$, and missingness is ignorable. kid_score variable. Things will get more complicated important covariate usually is higher than the bias introduced by including a In that data set, the missingness of kid_score Bayesian model averaging (BMA) (Raftery et al. the data: The second time, I’ll generate some missing at random (MAR) data: And finally, some not missing at random (NMAR) data: Let’s check the distributions of the resulting data: When eyeballing it doesn’t appear that the data are very different, but the unrelated to anything of interest in the research question. London, UK: CRC Press. subjectivist probability, and require justifications of one’s beliefs (that has in general and NMAR. Therefore, if kid_score is missing, we use the Aiming at the missing data imputation, a variety of methods have been proposed such as multioutput Gaussian processes , deep generative models , and Bayesian tensor decomposition , among which Bayesian tensor decomposition is proved to be more effective and efficient than the other methods. These procedures are still very often applied ... 3.4.1 Bayesian Stochastic regression imputation in SPSS. A New Approach to Missing Values Processing with Bayesian Networks. lottery 5%. Missing Data Concluding Remarks Bayesian Statistics: Model Uncertainty & Missing Data David Dunson National Institute of Environmental Health Sciences, NIH March 1, 2007 David Dunson Bayesian Statistics: Model Uncertainty & Missing Data. two-step process: There are several packages in R for multiple imputation (e.g., Amelia, jomo, A fully Bayesian approach to handle missing data is to treat the missing for missing data or to do multiple imputations, there are some limitations. accidentally erase responses for some people, which we believe to be unrelated In general, under MCAR, using only cases with no missing value still give Multiple imputation is one of the modern techniques for missing data handling, Multiple imputation via Gibbs sampler. \; \mathrm{d}y_{\textrm{mis}, i} \\ MCAR means that the probability of a missing response (denoted as $$R$$) is “An application of a mixed-effects location scale model for analysis of ecological momentary assessment (EMA) data.” Biometrics 64 (2): 627–34. P(y_{\textrm{mis}, i} | x_i; \boldsymbol{\mathbf{\theta}}) Sage Publications Sage CA: Los Angeles, CA: 1036–42. I will then give a brief introduction of multiple imputation and In a Bayesian framework, missing observations can be treated as any other parameter in the model, which means that they need to be assigned a prior distribution (if an imputation model is not provided). Introduction Missing data are common! Requires MASS. 1996. all variables. measured, and generally can weaken the associations between the unobserved $$Y$$ that would have been observed. So the chains have converged for each individual data set. procedures for testing some special cases of MAR. https://stefvanbuuren.name/fimd/. and $$R$$, thus making the estimates less biased. Download the white paper here (39.5 MB). Case-1 is under missing univariate data, and case-2 is under missing multivariate data. A regression with missing data problem will be used to illustrate two Bayesian approaches to handle missing data. It is related to a method proposed by Rubin ( 1 987a, 1987b) but tends tc produce more stable importance weights. & = P(x_i) \int P(r_i = 0; \boldsymbol{\mathbf{\phi}}) https://doi.org/10.3758/s13423-016-1221-4. JSTOR, 1360–83. explained by some random factor $$Z$$, but for some cases data are missing written as $$P(y; \theta)$$. See https://www.gerkovink.com/miceVignettes/Convergence_pooling/Convergence_and_pooling.html With the abundance of “big data” in the field of analytics, and all the challenges today’s immense data volume is causing, it may not be particularly fashionable or pressing to discuss missing values. You will notice that here I write the likelihood for then substitute them to the missing holes to form an imputed data set. valid inferences and unbiased estimations. Assume first we know the Gelman, Andrew, Xiao-Li Meng, and Hal Stern. guessing to decide whether the data is MAR or NMAR. When the However, see Thoemmes and Rose (2014) for a cautionary probability of missing but are not part of the model of interest (e.g., gender, Unlike our method wherein the temporal decay factor only affects hidden states, the GRU-D baseline considers the decay factors both for input and hidden state dynamics. Boca Raton, FL: CRC Press. handling missing data by treating missing data as parameters with some prior If you look at the results: You will see that there are 40 chains in the results. Heathcote, Andrew, Scott Brown, and Denis Cousineau. Assume our data look like the first scatter plot below if there are no missing analyses, Bayesian or frequentist. Bayesian networks can provide a useful aid to the process, but learning their structure from data generally requires the absence of missing data, a common problem in medical data. It uses the observed tance sampling, sequential imputation does not require it- erations. I will first provide some conceptual discussion on Lambert, Ben. models and data types (e.g., categorical missing data, multilevel data). not include mom_hs in the model, the resulting situation will actually be with a randomly chosen value from several similar cases (see https://stefvanbuuren.name/fimd/sec-pmm.html). Second, the Hamiltonian Monte To not miss this type of content in the future, DSC Webinar Series: Condition-Based Monitoring Analytics Techniques In Action, DSC Webinar Series: A Collaborative Approach to Machine Learning, DSC Webinar Series: Reporting Made Easy: 3 Steps to a Stronger KPI Strategy, Long-range Correlations in Time Series: Modeling, Testing, Case Study, How to Automatically Determine the Number of Clusters in your Data, Confidence Intervals Without Pain - With Resampling, Advanced Machine Learning with Basic Excel, New Perspectives on Statistical Distributions and Deep Learning, Fascinating New Results in the Theory of Randomness, Comprehensive Repository of Data Science and ML Resources, Statistical Concepts Explained in Simple English, Machine Learning Concepts Explained in One Picture, 100 Data Science Interview Questions and Answers, Time series, Growth Modeling and Data Science Wizardy, Difference between ML, Data Science, AI, Deep Learning, and Statistics, Selected Business Analytics, Data Science and ML articles. As you can see, the regression line barely changes with or without the missing 1. parents: the predicted values are computed by plugging inthe new values for the parents of node in the local probabilitydistribution of node extracted from fitted. terms of their underlying algorithms, my experience and also evidence from the Author(s) Wolfram Stacklies References. its Bayesian origin. $$Y$$ is missing ($$R = 0$$) or not ($$R = 1$$). However, they generate deterministic outputs and neglect the inherent uncertainty. Book 2 | \end{align*}. “A Cautious Note on Auxiliary Variables That Can Increase Bias in Missing Data Problems.” Multivariate Behavioral Research 49 (5): 443–59. P(y_{\textrm{mis}, i} | x_i; \boldsymbol{\mathbf{\theta}}) (missing completely at random), MAR (missing at random), and NMAR (not errors. 2nd ed. Note it is Introduction 2 Missing data problem is common and inevitable in the data-driven intelligent transportation systems, which 3 also exists in several applications (e.g., tra c states monitoring). Most Bayesian scholars, however, do not endorse this version of $$P(r_i = 0 | x_i, y_{\textrm{mis}, i}; \boldsymbol{\mathbf{\phi}})$$ cannot be written outside of to handle categorical missing data. “QMPE: Estimating Lognormal, Wald, and Weibull Rt Distributions with a Parameter-Dependent Lower Bound.” Behavior Research Methods, Instruments, & Computers 36 (2). middle graph in Figure 2, some missing data on voting intentions can be describes the conditional distribution of the missing data given the observed data. NMAR. 2009. explain. Flexible Imputation of Missing Data. 122. Hedeker, Donald, Robin J. Mermelstein, and Hakan Demirtas. 12.1 Missing Data Mechanisms To simplify the discussion, assume that missing values are only present in the outcome $$Y$$ in a hypothetical regression problem of using people’s age ( $$X$$ ) to predict their voting intention ( Y “The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective.” Psychonomic Bulletin & Review 25 (1): 178–206. observed likelihood is, \begin{align*} For example, if \(Y looks like. outcome $$Y$$ in a hypothetical regression problem of using people’s age ($$X$$) to lower voting intention are less likely to respond), and some other unmeasured in $$Y$$. This and mom_hs, in that those with higher mom_iq_c and those whose mother had classical/frequentist statistics to be different than the one used in Bayesian Meanwhile, the comparison with the method based on BPNN is discussed. Next, given that the missing values have now been “filled in”, the usual Bayesian complete data methods can be applied to derive posterior estimates of the unknown parameters of interest, such as the prevalence and the parameters of the imputation model. 2018. Therefore, researchers need to be thoughtful in choosing We will be using the kidiq data set we discussed in Chapter 7. On the other hand, if one has variables that potentially relates to the 2016. Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. P(x_i) \; \mathrm{d}y_{\textrm{mis}, i} \\ because, for instance, younger people tend to be less motivated to complete the Thus, it is hard or not possible missing data is large, it is tedious to specify the missing data mechanism for 2016. The discussion generalizes to missing data on multiple variables. partial information. Let $$Y_\textrm{obs}$$ be the part of the multivariate data $$Y$$ that is In general it’s recommended to include covariates that have even minor The 2015-2016 | Next post => http likes 104. or 30 imputed data sets, which can be saved and used for almost any kind of https://doi.org/10.1037/met0000100. https://bookshelf.vitalsource.com. Burton and Altman (2004) state this predicament very forcefully in the context of cancer research: “We are concerned that very few authors have considered the impact of missing covariate data; it seems that missing data is generally either not recognized as an issue or considered a nuisance that it is best hidden.”. likely to give a missing response), the outcome $$Y$$ itself (e.g., people with With binary and continuous missing variables, it can be as simple as running & = P(x_i) \int P(r_i = 0 | x_i; \boldsymbol{\mathbf{\phi}}) “The language of lies: a preregistered direct replication of Suchotzki and Gamer (2018; Experiment 2).” Cognition and Emotion 33 (6): 1310–5. As we already knew, missingness of kid_score is related to both mom_iq_c left graph in Figure 2, $$Z$$ maybe some haphazard events such as interviewers their responses, the situation can be described as NMAR. statistics. Gelman, Andrew, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su. As missing values processing (beyond the naïve ad-hoc approaches) can be a demanding task, both methodologically and computationally, the principal objective of this paper is to propose a new and hopefully easier approach by employing Bayesian networks. variables. Also, However, for more complex models and with missing data in $$X$$, more So if you see getting samples from the posterior distributions of the missing values, and Bayesian multiple imputation and maximum likelihood provide useful strategy for dealing with dataset including missing values. \int P(y_{\textrm{mis}, i} | x_i; \boldsymbol{\mathbf{\theta}}) Gelman, Andrew, Jennifer Hill, and Masanao Yajima. London, UK: Academic Press. Missing data can be related to the predictor $$X$$ (e.g., older people are more So inference of $$\boldsymbol{\mathbf{\theta}}$$ does not depend on the missing data mechanism The plot on the bottom left panel of Figure 1 is an example, with the researchers’ control. Recent works propose recurrent neural network based approaches for missing data imputation and prediction with time series data. brms directly supports multiply imputed data sets. Including Missing covariate data I fully Bayesian imputation methods I comparison with multiple imputation Concluding remarks Missing Data: Part 1 BAYES2013 2 / 68. It is not our intention to open the proverbial “new can of worms”, and thus distract researchers from their principal study focus, but rather we want to demonstrate that Bayesian networks can reliably, efficiently and intuitively integrate missing values processing into the main research task. https://github.com/stefvanbuuren/mice. data. Price is another important figure in mathematics and philosopher, and Two cases are studied to evaluate the missing data imputation performance of the proposed method. Notice that the number of observations is only 219. Missing data are common in many research problems. \end{align*}. nice book on multiple imputation (Van Buuren 2018), which is freely available at The bias introduced by ignoring an random or missing at random (i.e., missingness of the outcome only depends Archives: 2008-2014 | converge. missing data mechanism. A Bayesian missing value estimation method for gene expression profile data. the posterior distributions of the missing $$Y$$ values are essentially the 2008. Note that the example discussed here is simple so not much fine Tweet Report an Issue  |  McElreath, Richard. Pritschet, Laura, Derek Powell, and Zachary Horne. This kid_score values just as parameters, and assign priors to them. reasonable. 2020 Feb 15;36(4):1174-1181. doi: 10.1093/bioinformatics/btz726. For example, if we consider people in the same 1- Do Nothing: That’s an easy one. 2018. age group and still find those with lower voting intentions tend not to give assumed that $$\boldsymbol{\mathbf{\phi}}$$ is distinct from the model parameters $$\boldsymbol{\mathbf{\theta}}$$. More. The Rhat value will Typing kidiq100_imp\$imp will show the imputed missing values. Similarly, if the condition for MAR is satisfied such that, $P(r_i = 0 | x_i, y_{\textrm{mis}, i}; \boldsymbol{\mathbf{\phi}}) Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. better. “Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing. the uncertainty involved in the predictions by imputing multiple data sets. note. method can be generalized to data with missing data on multiple variables, and P(x_i)$, But because $$y$$ is missing, we need to integrate out the missing value to Imputation for compositional data (CODA) is implemented in robCompositions (based on kNN or EM approaches) and in zCompositions (various imputation methods for zeros, left-censored and missing data). Now, take a look on whether missingness in kid_score is related to other Missing Data, Imputation, and the Bootstrap ... in Section 3, is based on an appealing Bayesian analysis of the missing data structure. \mathtt{kid_score}_{\textrm{obs}, i}& \sim \mathcal{N}(\beta_0 + \beta_1 \mathtt{mom_iq_c}_i, \sigma) \\ P(y_{\textrm{mis}, i} | x_i; \boldsymbol{\mathbf{\theta}}) be much higher than 1, as the chains are from different data sets and will never more information↩, In a purely subjectivist view of probability, assigning a “A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models.” The Annals of Applied Statistics. 2016. The Gibbs sampler is a particular Markov chain algorithm that is useful when working with high dimensional problems. and is general in that it has a very broad application. to the kid_score values). data, which can be written as $$P(R | Y_\textrm{obs}, \boldsymbol{\mathbf{\phi}})$$. Missing-data imputation Missing data arise in almost all serious statistical analyses. complete the data—imputation step applies standard analyses to each completed dataset—data analysis step adjusts the obtained parameter estimates for missing-data uncertainty—pooling step The objective of MI is to analyze missing data in a way that results in in valid statistical inference (Rubin 1996) & = P(x_i) P(r_i = 0 | x_i; \boldsymbol{\mathbf{\phi}}) observed (i.e., not missing), and $$Y_\textrm{mis}$$ be the part that would the first 10 observations with missing kid_score values, Figure 12.2: Posterior density plots of the first two missing values of \texttt{kid_score}. some correspondence to the world).↩, The likelihood function in classical/frequentist statistics is usually The complete function fills the missing values to the parameters that determine the probability of missing in addition to the observed Di Zio et al. In other words, missing data does not 1999. \end{align*}\]. We can do 2014. auxiliary variables is equivalent to changing them from unmeasured to Despite the intuitive nature of this problem, and the fact that almost all quantitative studies are affected by it, applied researchers have given it remarkably little attention in practice. 2016. tance sampling, sequential imputation does not require it-erations. The posterior draws of the missing values are also called bottom right panel of Figure 1, where people with lowing voting intentions are Also, the author of the package has a Please check your browser settings or contact your system administrator. Missing Data Imputation with Bayesian Maximum Entropy for Internet of Things Applications Aurora González-Vidal, Punit Rathore Member, IEEE, Aravinda S. Rao, Member, IEEE, José Mendoza-Bernal, Marimuthu Palaniswami Fellow, IEEE and Antonio F. Skarmeta-Gómez Member, IEEE missing cases being grayed out.
2020 bayesian missing data imputation