Abstract
The applications of machine learning (ML) techniques in real estate practices has become popular recently. Specifically, ML techniques are often used to develop Automated valuation models (AVM), which purpose is to provide a price estimate of a particular property at a specified time. The main objective is to minimise human intervention in the process of price estimation. Apart from the users providing a set of inputs which, in the present context, would be a set of property features, the AVM would provide a price estimate without any human intervention in the estimating process. The recent literature show that such prediction process performed much better than the traditional approach. However, the presence of missing values in the data remains a major challenge in developing AVM using ML techniques. Therefore, this paper examines different approaches to handle missing values in the context of developing AVMs using various tree based algorithms. This paper has two main objectives: (i) it examines the performance of Gradient Boosting Machine (GBM) in the presence of missing values. Recent literature suggested that GBM can still provide accurate prediction in the presence of missing values and this paper examines this claim in the context of AVMs; (ii) Using GBM, this paper compares some common strategies in managing missing values as well as a special strategy that only applicable for machine learning methods. This helps to identify any extra benefits in using machine learning techniques. It is worthwhile noting that data can be missing in both training and testing stages. Ideally, a model can be trained with missing values in the training set as well as having the ability to make predictions when some of the inputs (features) are missing in the test set. The results show that the proposed GBMs can predict more accurately than the traditional hedonic model across different forecast criteria. The results are also unaffected by the choice of missing value strategies. This is consistent with the results in recent studies. In addition, the proposed implementation required minimum human intervention in both training and testing stages, even in the presence of missing values. Thus, machine learning methods would appear to be more efficient in addition to being more accurate in predicting house price.
Original language | English |
---|---|
Title of host publication | Proceedings of the 25th International Congress on Modelling and Simulation, MODSIM 2023 |
Editors | Jai Vaze, Chris Chilcott, Lindsay Hutley, Susan M. Cuddy |
Publisher | Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ) |
Pages | 123-129 |
Number of pages | 7 |
ISBN (Electronic) | 9780987214300 |
DOIs | |
Publication status | Published - 2023 |
Event | 25th International Congress on Modelling and Simulation, MODSIM 2023 - Darwin, Australia Duration: 9 Jul 2023 → 14 Jul 2023 |
Conference
Conference | 25th International Congress on Modelling and Simulation, MODSIM 2023 |
---|---|
Country/Territory | Australia |
City | Darwin |
Period | 9/07/23 → 14/07/23 |
Bibliographical note
Funding Information:The authors would like to thank the financial supports from the School of Accounting, Economics and Finance, Curtin University and the Business School, University of Aberdeen.
Keywords
- Automated Valuation Model (AVM)
- Gradient Boosting Machine (GBM)
- Machine learning
- missing values