Large databases of legacy hydrocarbon reservoir and well data provide an opportunity to use modern data mining techniques to improve our understanding of the subsur-face in the presence of uncertainty and improve predictability of reservoir properties. A da-ta mining approach provides a way to screen dependencies in reservoir and fluid data and enable subsurface specialists to estimate absent properties in partial or incomplete datasets. This allows for uncertainty to be managed and reduced. An improvement in reservoir characterisation using machine learning results from the capacity of machine learning methods to detect and model hidden dependencies in large multivariate datasets with noisy and missing data. This study presents a workflow applied to a large basin-scale reservoir characterization database. The study aims to understand the dependencies between reservoir attributes in order to allow for predictions to be made to improve the data coverage. The machine learning workflow comprises the following steps: (i) exploratory data analysis; (ii) detection of outliers and data partitioning into groups showing similar trends using clustering; (iii) identification of dependencies within reservoir data in multivariate feature space with self-organising maps; and (iv) feature selection using supervised learning to identify relevant properties to use for predictions where data are absent. This workflow provides an opportunity to reduce the cost and in-crease accuracy of hydrocarbon exploration and production in mature basins.
Bibliographical noteFunding: This research was supported by Wood Mackenzie through funding of a Postdoctoral Research Associate position at Heriot Watt University, and through access to data from two basins.
Acknowledgments: This work was supported by Wood Mackenzie through funding research collab- oration with Heriot-Watt University. All the data were anonymised and supplied by Wood Mackenzie and authors are thankful for the opportunity to publish the outcomes of this research. Authors also thank Mikhail Kanevski of University of Lausanne for the peer exchange on feature selection and the opportunities opened during his course on Machine Learning hands-on applications. Authors acknowledge the use of Orange Data Mining  and ML Office for SOM application . We thank Susan Agar, who reviewed the paper most comprehensively and helped improve it along with two anonymous reviewers.
Data Availability StatementThe data used in this study are held by Wood Mackenzie.
- subsurface characterisation
- big data
- unsupervised learning
- supervised learning
- multivariant analysis
- machine learning
- hydrocarbon exploration