471948 A Data-Mining Framework for Uncertainty Analysis in Pipeline Erosion Modeling

Tuesday, November 15, 2016: 1:15 PM
Union Square 22 (Hilton San Francisco Union Square)
Selen Cremaschi, Department of Chemical Engineering, Auburn University, Auburn, AL and Wei Dai, Auburn University, Auburn, AL

A data-mining framework for uncertainty analysis in pipeline erosion modeling

Wei Dai, Selen Cremaschia

a Department of Chemical Engineering, Auburn University, AL 36849



In many industrial operations, solid particles move at high speeds in the fluid system and result in serious wear attack. This type of wear is called erosion. Erosion in pipelines is defined as the material removal from the solid surface due to solid particle impingement. This phenomena, especially in multiphase flow systems, is very complex and depends on many factors including fluid and solid characteristics, the pipeline material properties and the geometry of the flow lines. The safe and efficient operation and design of these pipelines requires reliable estimates of erosion rates.

Given the complexity, most of the modeling work focuses on developing empirical or semi-mechanistic models to predict erosion rates. For example, Oka et al. (2005) developed their erosion model using particle impingement in air with empirical constants based on particle properties and hardness of the target materials. Their model is one of the most commonly cited in the literature. Another semi-mechanistic model called 1-D SPPS (Zhang, 2007), which is widely used for predicting erosion rates by the oil and gas industry, was developed with several empirically estimated parameters, like the sharpness factor of particles, Brinell hardness and the empirical constants in the impact angle function. These empirical parameters are calculated using experimental observations. However, the experimental data used in these calculations and also for model validation and uncertainty quantification are, for the most part, collected in small pipe diameters (from 2 to 4 inches). These small pipe sizes do not coincide with the field conditions, where the pipe diameters generally exceed 8 inches. Hence, the predictions of erosion models are routinely extrapolated to conditions where experimental data or even operating experience is not available, and the estimation of erosion-rate prediction uncertainty becomes crucial especially for systems too-costly to fail.The quantification of this uncertainty is especially important during the design phase for subsea applications, as erosion rate allowance, which is set using the erosion rate predictions and its uncertainty, directly impacts the integrity of the facility.

The uncertainty in model predictions can stem from three sources in general: (1) uncertainty in experimental measurements of input conditions, (2) model form uncertainty (i.e., incomplete presentation of the actual system due to lack of knowledge or imprecise experimental observations), and (3) model parameter uncertainty. The experimental data uncertainty usually consists of measurement errors due to both instrumental and human errors. The reliability of the models largely depends on the ability of model form to capture the details of erosion process in enough granularity. The uncertainty in the model parameters results from an inability to accurately quantify the parameters of a model (Shrestha, 2009).

In this talk, a systematic framework is introduced to quantify erosion-rate prediction uncertainty for operating conditions where experimental data are not available, and for a set of newly-collected experimental data points. The framework incorporates the impacts of model form and parameter uncertainties to estimate prediction uncertainties, and combines data clustering and Gaussian Process Modeling approaches with Monte Carlo Simulation.

For estimating erosion-rate prediction uncertainty, we compiled an experimental database of erosion rate measurements from literature. The database contains 586 data points in single or multiphase carrier flows. Eighty percent of the data in the database are collected for gas dominated flows (i.e., gas only, annular, mist and churn flow). The experimental database covers a wide range of input conditions resulting in significantly different erosion rate measurements. The dataset encompasses data collected from six different flow regimes, with wide-range of material properties and production characteristics.

The data clustering is used to capture the similar characteristics of operating conditions and to identify internal data structures present within the database. Among the data clustering approaches available in the literature, k-prototype (Cheung, 2013) is selected as the most appropriate for our dataset due to the existence of categorical variables. It calculates the similarity based on both categorical attributes and numerical attributes, and classifies the given data points into several clusters such that the similarities between objects in the same group are high while the similarities between objects in different groups are low.

Gaussian Process Modeling (GPM, Rasmussen, 2006) is used to estimate the prediction uncertainty stemming from both model form and model parameter uncertainties. The GPM models erosion-rate model discrepancy, which is defined as the difference between experimental erosion rates and the corresponding erosion rate predictions as a Gaussian random process. This process is presented by mean and covariance functions assuming a multivariate normal distribution. The most likely values of mean and covariance function parameters are determined by Maximum Likelihood Estimation (MLE) using experimental data. Once GPM is trained based on the available data set, a set of hyper-parameters can be obtained and used for future model interpolation or extrapolation analysis (Jiang, 2013). A GPM is built for each cluster identified by the data clustering step.

Finally, the Monte Carlo simulation is used to study the influences of data uncertainty due to limited repetition in the experiments. We previously developed a novel approach to estimate experimental data uncertainties in the absence of repetitive experiments and the kernel density estimations of experimental uncertainty for four different measurement approaches (Dai, 2016). The impact of experimental data uncertainty on GPM predictions is assessed in a Monte Carlo framework where the training of GPM is repeated 1000 times with randomized initializations from the kernel density. After 1000 replications, the distribution of model prediction uncertainty in each cluster is obtained. A box plot is used to show the spread of model prediction uncertainties.

The application of the developed framework is demonstrated on one of the well-known erosion model, 1-Dimensional Sand Production Pipe Saver (1-D SPPS, ECRC), which is used extensively in oil and gas industry for erosion predictions. The data clustering approaches divided the database into seven clusters. In the previous studies, we clustered the data based on flow regimes and built GPMs for these clusters (Dai, 2015). A comparison of data clustering based on the k-prototype approach and flow regimes is given in support of the application of data clustering approach. The mean square error (MSE) and area metric (AM) (Ferson, 2008) of the GPM predictions obtained using a fourth fold cross-validation for both approaches are compared.  The smallest MSE and AM are 6.79¡Á10-9 and 4.39¡Á10-5 based on k-prototype approach and 1.48¡Á10-8 and 6.44¡Á10-5 based on flow regimes. The results suggest the data clustering based on k-prototype approach where smaller MSE and AM are obtained.


This work is supported by the Chevron Energy Technology Company. Discussions and comments from the Haijing Gao, Gene Kouba and Janakiram Hariprasad of Chevron and Brenton McLaury, Siamack Shirazi of E/CRC at the University of Tulsa were highly acknowledged.


Cheung, Y.M. and Jia, H., 2013, Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognition, 46, 2228-2238.

Dai, W. and Cremaschi, C., 2015. Quantifying Model Uncertainty in Scarce Data Regions ¨C A Case Study of Particle Erosion in Pipelines. The 12th International Symposium on Process Systems Engineering and 25th European Symposium on Computer Aided Process Engineering, Copenhagen, Denmark

Dai, W., Cremaschi, C., Islam M.A., Nukala, R.T., Subramani, H,J., Kouba, G.E. and Gao, H.J., 2016. Uncertainty analysis of multiphase flow ¨C Case studies from erosion, sand transport, liquid entrainment models. 10th North American Multiphase conference, Banff, Canada

Ferson, S., Oberkampf, W. L., and Ginzburg, L., 2008, Model Validation and Predictive Capability for the Thermal Challenge Problem, Computer Methods in Applied Mechanics and Engineering, Vol. 197, No. 29-32, pp 2408-2430.

Jiang, Z., Chen, W., Fu, Y., and Yang, R., 2013, Reliability-Based Design Optimization with Model Bias and Data Uncertainty, SAE International.

Oka, Y. I., Okamura, K., and Yoshida, T., 2005, Practical estimation of erosion damage caused by solid particle impact: Part 1: Effects of impact parameters on a predictive equation, Wear, 259(1-6), page 95-101.

Norman, C. D. (2013). Correlation of porosity uncertainty to productive reservoir volume Society

of Petroleum Engineers.

Papadopoulos, C. E., & Yeung, H. (2001). Uncertainty estimation and monte carlo simulation method. Flow Measurement and Instrumentation, 12(4), 291¨C298.

Rasmussen, C.E. and Williams, C.K. I., 2006, Gaussian Processes for Machine Learning, The MIT Press.

Shrestha, D.L., 2009, Uncertainty Analysis in Rainfall-Runoff Modelling: Application of Machine Learning Techniques, PhD. Dissertation, UNESCO-IHE, the Netherlands

Zhang, Y., Reuterfors, E.P., McLaury, B.S., Shirazi, S.A., and Rybicki, E.F., 2007, Comparison of Computed and Measured Particle Velocities and Erosion in Water and Air Flows, Wear, 263.

Extended Abstract: File Not Uploaded
See more of this Session: Flow Assurance and Asset Integrity
See more of this Group/Topical: Upstream Engineering and Flow Assurance Forum