Evaluating Model-assisted Estimators: A Comparative Study in High-dimensional Survey Data

Rakesh Chhalotre

ICAR – Indian Agricultural Statistics Research Institute, New Delhi – 110012, India.

B. Samuel Naik

Banaras Hindu University (BHU), Varanasi, Uttar Pradesh, 221 005, India.

V C Karthik

ICAR – Indian Agricultural Statistics Research Institute, New Delhi – 110012, India.

Manoj Varma *

ICAR – Indian Agricultural Statistics Research Institute, New Delhi – 110012, India.

Akarsh Singh

ICAR – Indian Agricultural Statistics Research Institute, New Delhi – 110012, India.

Balan C

ICAR – Indian Agricultural Statistics Research Institute, New Delhi – 110012, India.

Ashish Gupta

ICAR – Indian Agricultural Statistics Research Institute, New Delhi – 110012, India.

*Author to whom correspondence should be addressed.


Abstract

Model-assisted estimators have gained significant attention due to their ability to efficiently utilize auxiliary information during the estimation process. These estimators rely on a working model that links the survey variable to the auxiliary variables, which is then fitted to the sample data to generate predictions. These predictions are subsequently integrated into the estimation procedures. In this study, were explores various model-assisted estimators including Generalized Regression (GREG), Ridge regression, Lasso regression, CART (Classification and Regression Tree), Random Forest, Cubist and Principal Components Regression (PCR) estimator. The analysis involved 2,000 samples of size 50 (n/N ≈ 10%) and employed a stepwise variable selection method to determine the most significant auxiliary variables, incrementally adding them to the model. The performance of these estimators was assessed using relative bias (RB), relative root mean square error (RRMSE) and relative efficiency (RE). Our findings reveal that tree-based models like CART and Random Forest and penalized regression estimators such as Ridge and Lasso display robustness with increased number of auxiliary variables. Among all the estimators, Random Forest consistently yielded the lowest RRMSE, particularly with five auxiliary variables, demonstrating superior efficiency. Conversely, the GREG estimator exhibited poor performance as the number of auxiliary variables increased. This study underscores the importance of selecting suitable model-assisted estimation procedures tailored to the data characteristics and the relationship between survey and auxiliary variables within this high-dimensional dataset.

Keywords: Design consistency, GREG, CART, random forest, cubist, PCR, RB, RRMSE, RE


How to Cite

Chhalotre, Rakesh, B. Samuel Naik, V C Karthik, Manoj Varma, Akarsh Singh, Balan C, and Ashish Gupta. 2024. “Evaluating Model-Assisted Estimators: A Comparative Study in High-Dimensional Survey Data”. Journal of Scientific Research and Reports 30 (9):707-18. https://doi.org/10.9734/jsrr/2024/v30i92398.

Downloads

Download data is not yet available.