Predicting oil field performance using machine learning programming: a comparative case study from the UK continental shelf

Ukari Osah; John Howell

doi:10.1144/petgeo2022-071

Predicting oil field performance using machine learning programming: a comparative case study from the UK continental shelf

Ukari Osah^* (Corresponding Author), John Howell

^*Corresponding author for this work

Geology and Geophysics

Research output: Contribution to journal › Article › peer-review

Abstract

Predicting the performance of a subsurface oil field is a large, multivariant problem. Production is controlled and influenced by a wide array of geological and engineering parameters which overlap and interact in ways that are difficult to unravel in a manner that can be predictive. Supervised machine learning is a statistical approach which uses empirical learnings from a training dataset to create models and make predictions about future outcomes. The goal of this study is to test a number of supervised machine learning methods on a dataset of oil fields from the United Kingdom continental shelf (UKCS), in order to assess whether, (a) it is possible to predict future oil field performance and (b), which methods are the most effective. The study is based on a dataset of 60 fields with 5 controlling parameters, (gross depositional environment, average permeability, net-to-gross, gas–oil ratio and total number of wells) and 2 outcome parameters (recovery factor and maximum field rate) for each. The choice of controlling parameters was based on a PCA of a larger dataset from a wider project database. Five different machine learning algorithms were tested. These include linear regression, robust linear regression, linear kernel support vector regression, cubic kernel support vector regression and boosted trees regression. Overall, 83% of the data was used as a training dataset while 17% was used to test the predictability of the algorithms. Results were compared using R-Squared, Mean Square Error, Root Mean Square Error and Mean Absolute Error. Graphs of predicted responses v. true (actual) responses are also shown to give a visual illustration of model performance. Results of this analysis show that certain methods perform better than others, depending on the outcome variable in question (recovery factor or maximum field rate). The best method for both outcome variables was the support vector regression, where, depending on the kernel function applied, a reliable level of predictability with low error rates were achieved. This demonstrates a strong potential for statistics-based prediction models of reservoir performance.

Original language	English
Article number	petgeo2022-071
Journal	Petroleum Geoscience
Volume	29
Issue number	1
DOIs	https://doi.org/10.1144/petgeo2022-071
Publication status	Published - 2 Feb 2023

Bibliographical note

Funding Information:
Author contributions UO: data curation (lead), formal analysis (lead), funding acquisition (lead), methodology (equal), validation (lead), visualization (equal), writing – original draft (lead); JH: conceptualization (lead), methodology (equal), supervision (lead), visualization (equal), writing – review & editing (lead) Funding This work was funded by the Petroleum Technology Development Fund (PTDF/ED/OSS/PHD/OU/1188/17).

Access to Document

10.1144/petgeo2022-071Licence: CC BY

Osah_etal_petgeo_Predicting_Oil_Field_VOR
https://creativecommons.org/licenses/by/4.0/
Final published version, 1.06 MBLicence: CC BY

Cite this

@article{00cd55328291488f9bcbdaebfc3432f9,

title = "Predicting oil field performance using machine learning programming: a comparative case study from the UK continental shelf",

abstract = "Predicting the performance of a subsurface oil field is a large, multivariant problem. Production is controlled and influenced by a wide array of geological and engineering parameters which overlap and interact in ways that are difficult to unravel in a manner that can be predictive. Supervised machine learning is a statistical approach which uses empirical learnings from a training dataset to create models and make predictions about future outcomes. The goal of this study is to test a number of supervised machine learning methods on a dataset of oil fields from the United Kingdom continental shelf (UKCS), in order to assess whether, (a) it is possible to predict future oil field performance and (b), which methods are the most effective. The study is based on a dataset of 60 fields with 5 controlling parameters, (gross depositional environment, average permeability, net-to-gross, gas–oil ratio and total number of wells) and 2 outcome parameters (recovery factor and maximum field rate) for each. The choice of controlling parameters was based on a PCA of a larger dataset from a wider project database. Five different machine learning algorithms were tested. These include linear regression, robust linear regression, linear kernel support vector regression, cubic kernel support vector regression and boosted trees regression. Overall, 83% of the data was used as a training dataset while 17% was used to test the predictability of the algorithms. Results were compared using R-Squared, Mean Square Error, Root Mean Square Error and Mean Absolute Error. Graphs of predicted responses v. true (actual) responses are also shown to give a visual illustration of model performance. Results of this analysis show that certain methods perform better than others, depending on the outcome variable in question (recovery factor or maximum field rate). The best method for both outcome variables was the support vector regression, where, depending on the kernel function applied, a reliable level of predictability with low error rates were achieved. This demonstrates a strong potential for statistics-based prediction models of reservoir performance.",

author = "Ukari Osah and John Howell",

note = "Funding Information: Author contributions UO: data curation (lead), formal analysis (lead), funding acquisition (lead), methodology (equal), validation (lead), visualization (equal), writing – original draft (lead); JH: conceptualization (lead), methodology (equal), supervision (lead), visualization (equal), writing – review & editing (lead) Funding This work was funded by the Petroleum Technology Development Fund (PTDF/ED/OSS/PHD/OU/1188/17). ",

year = "2023",

month = feb,

day = "2",

doi = "10.1144/petgeo2022-071",

language = "English",

volume = "29",

journal = "Petroleum Geoscience",

issn = "1354-0793",

publisher = "Geological Society of London",

number = "1",

}

TY - JOUR

T1 - Predicting oil field performance using machine learning programming

T2 - a comparative case study from the UK continental shelf

AU - Osah, Ukari

AU - Howell, John

N1 - Funding Information: Author contributions UO: data curation (lead), formal analysis (lead), funding acquisition (lead), methodology (equal), validation (lead), visualization (equal), writing – original draft (lead); JH: conceptualization (lead), methodology (equal), supervision (lead), visualization (equal), writing – review & editing (lead) Funding This work was funded by the Petroleum Technology Development Fund (PTDF/ED/OSS/PHD/OU/1188/17).

PY - 2023/2/2

Y1 - 2023/2/2

N2 - Predicting the performance of a subsurface oil field is a large, multivariant problem. Production is controlled and influenced by a wide array of geological and engineering parameters which overlap and interact in ways that are difficult to unravel in a manner that can be predictive. Supervised machine learning is a statistical approach which uses empirical learnings from a training dataset to create models and make predictions about future outcomes. The goal of this study is to test a number of supervised machine learning methods on a dataset of oil fields from the United Kingdom continental shelf (UKCS), in order to assess whether, (a) it is possible to predict future oil field performance and (b), which methods are the most effective. The study is based on a dataset of 60 fields with 5 controlling parameters, (gross depositional environment, average permeability, net-to-gross, gas–oil ratio and total number of wells) and 2 outcome parameters (recovery factor and maximum field rate) for each. The choice of controlling parameters was based on a PCA of a larger dataset from a wider project database. Five different machine learning algorithms were tested. These include linear regression, robust linear regression, linear kernel support vector regression, cubic kernel support vector regression and boosted trees regression. Overall, 83% of the data was used as a training dataset while 17% was used to test the predictability of the algorithms. Results were compared using R-Squared, Mean Square Error, Root Mean Square Error and Mean Absolute Error. Graphs of predicted responses v. true (actual) responses are also shown to give a visual illustration of model performance. Results of this analysis show that certain methods perform better than others, depending on the outcome variable in question (recovery factor or maximum field rate). The best method for both outcome variables was the support vector regression, where, depending on the kernel function applied, a reliable level of predictability with low error rates were achieved. This demonstrates a strong potential for statistics-based prediction models of reservoir performance.

AB - Predicting the performance of a subsurface oil field is a large, multivariant problem. Production is controlled and influenced by a wide array of geological and engineering parameters which overlap and interact in ways that are difficult to unravel in a manner that can be predictive. Supervised machine learning is a statistical approach which uses empirical learnings from a training dataset to create models and make predictions about future outcomes. The goal of this study is to test a number of supervised machine learning methods on a dataset of oil fields from the United Kingdom continental shelf (UKCS), in order to assess whether, (a) it is possible to predict future oil field performance and (b), which methods are the most effective. The study is based on a dataset of 60 fields with 5 controlling parameters, (gross depositional environment, average permeability, net-to-gross, gas–oil ratio and total number of wells) and 2 outcome parameters (recovery factor and maximum field rate) for each. The choice of controlling parameters was based on a PCA of a larger dataset from a wider project database. Five different machine learning algorithms were tested. These include linear regression, robust linear regression, linear kernel support vector regression, cubic kernel support vector regression and boosted trees regression. Overall, 83% of the data was used as a training dataset while 17% was used to test the predictability of the algorithms. Results were compared using R-Squared, Mean Square Error, Root Mean Square Error and Mean Absolute Error. Graphs of predicted responses v. true (actual) responses are also shown to give a visual illustration of model performance. Results of this analysis show that certain methods perform better than others, depending on the outcome variable in question (recovery factor or maximum field rate). The best method for both outcome variables was the support vector regression, where, depending on the kernel function applied, a reliable level of predictability with low error rates were achieved. This demonstrates a strong potential for statistics-based prediction models of reservoir performance.

UR - http://www.scopus.com/inward/record.url?scp=85149655764&partnerID=8YFLogxK

U2 - 10.1144/petgeo2022-071

DO - 10.1144/petgeo2022-071

M3 - Article

AN - SCOPUS:85149655764

SN - 1354-0793

VL - 29

JO - Petroleum Geoscience

JF - Petroleum Geoscience

IS - 1

M1 - petgeo2022-071

ER -

Predicting oil field performance using machine learning programming: a comparative case study from the UK continental shelf

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this