Where academic tradition
meets the exciting future

Clusterwise Linear Regression Based Missing Value Imputation of Data Preprocessing

Napsu Karmitsa, Sona Taheri, Adil Bagirov, Pauliina Mäkinen, Clusterwise Linear Regression Based Missing Value Imputation of Data Preprocessing. TUCS Technical Reports 1193, University of Turku, 2018.

Abstract:

We introduce a new accurate method for preprocessing incomplete data sets. We combine two well-known approaches for missing value imputation: the linear regression and the clustering. That is, we use the clusterwise linear regression to predict
suitable imputations.
A clusterwise linear regression problem consists of finding a number of linear functions each approximating a subset of the given data.
The idea here is to approximate missing values using only those data points that are somewhat similar to the incomplete data object. This idea is used also in clustering based imputations. On the other hand, we use linear regression within the given cluster to find accurate predictions to the missing values and we do this simultaneously to clustering. The aim here is to make an accurate and efficient method for preprocessing incomplete data sets.
The proposed algorithm is tested on small and large, artificial and real world data sets and compared with other algorithms for missing data imputation. Numerical results demonstrate that the proposed algorithm produces the most accurate imputations in data sets with clear structure and small or moderate amount of missing values.

Files:

Full publication in PDF-format

BibTeX entry:

@TECHREPORT{tKaTaBaMx18a,
  title = {Clusterwise Linear Regression Based Missing Value Imputation of Data Preprocessing},
  author = {Karmitsa, Napsu and Taheri, Sona and Bagirov, Adil and Mäkinen, Pauliina},
  number = {1193},
  series = {TUCS Technical Reports},
  publisher = {University of Turku},
  year = {2018},
}

Belongs to TUCS Research Unit(s): Turku Optimization Group (TOpGroup)

Edit publication