444315 Outlier Detection and Analysis in Batch and Continuous Processes

Monday, April 11, 2016: 3:30 PM
343B (Hilton Americas - Houston)
Peter Ryan, Response Process Consulting LLC, League City, TX

Title:  Outlier Detection and Analysis in Batch and Continuous Processes

Author/Presenter:  Peter J. Ryan, Ph.D., P.E.

Author/Presenter email:  peter.ryan@responsepc.com

Company:  Response Process Consulting LLC


All manufacturing sectors, continuous or batch, gather process data for archiving and analysis purposes.  Often the amount of data gathered, and the quality of data, makes it difficult to use this resource effectively.  Major drawbacks in the quality of the available data include:

·         large gaps in the process data

·         noise, poor signal-to-noise ratios

·         correlated data

·         accuracy

·         precision

New methods have been developed to handle these issues and to develop process models based on this resource.  Specifically, a new approach to handling missing data and reconstructing quality data based on the observed (archived) data is presented.  Once the models are developed, outliers can be identified in the continuous or batch data, and relationships between the Key Process Indicators (KPI's) and the upstream process variables can be established.



Often, steady state or dynamic models are available to describe a process.  However, these first-principle models often do not have the granularity needed to describe product specifications such as color, turbidity or solvent loss due to entrainment in a separator.  Machine learning can be used to examine large historical process data sets and determine the leading controlled variables of a process.  The machines learning methods first fit the data to a specified model, and then use the model to explore the data space.  Unsupervised learning gives the fitting method full ability to determine what is different in the data.  Supervised learning requires the fitting method to consider both the archived process data and measured quality data (acquired off-line of the process).  An example of unsupervised modeling is Principal Component Analysis (PCA).  An example of supervised modeling is Partial Least Squares (PLS).  Both of these methods can be used to model the historical process data and find outliers by plotting the two principal component elements (scores) that capture the most variability in the data.  The score plots examine the data for clusters that classify the data as meeting product specifications and not meeting product specifications.  The clusters that represent production runs not meeting product specifications can be further examined to discover the upstream process variables that are the cause of the product not meeting specification.  While visual inspection has been described, statistical metrics such as the Squared Prediction Error (SPE) and Hotellings T2 metrics can be calculated to find the same results in the data.



Examples of modeling both continuous and batch processes are given.  The continuous example is of a commodity chemical process where color is the product specification of interest.  The batch example is of a nylon process where the relative viscosity is the product specification of interest.  While the analysis methods are the same, one significant difference between handling continuous and batch data is that the batch data must first be “unfolded” before a model can be developed.  Once the models are developed, clusters corresponding to successful and non-compliant products are found.  The non-compliant product clusters are further examined, and the relationships between the product specification (KPI) and the upstream process variables causing the non-compliance are discovered.  Figure 1 shows the results of the scores plot of the continuous commodity chemical example.  Figure 2 is an example of the scores plot of the batch nylon example.  In both cases, clusters of production activity where both in-spec and non-compliant quality production are observed.  Focusing on the batch example, Figures 3 and 4 show the SPE and Hotelling T2 charts of the initial process data.  The outliers observed visually in Figure 2 are also detected numerically by calculating the Hotelling T2 metric.  The Hotelling T2 metric finds points in the scores plot that are on the model plane but far away from the center-of-mass of the model.  The SPE metric finds points in the clusters that are away from the model plane.

Contribution charts are used to discover the relationships between the KPI's and the upstream process variables, as shown in Figure 5.  Note that in Figure 3 (SPE metric), batch 49 is far away from the model plane, even though its projection is in the cluster of points that represent batches with good product specification.  The SPE contribution chart (Figure 5) reveals that deviations in five process variables caused the batch to product off-spec product.  Specifically, the batch did not meet the turning points in the prescribed trajectories when the batch reached the 58th time interval of its run.  An examination of the control system revealed that a control issue was accountable for missing the turning points.

These examples show how a very large set of process data – of varying quality (missing data, low signal-to-noise ratio, correlation, etc) – can be reduced in dimensionality and how the leading process variables can be identified and related to the downstream KPI's to improve product quality and consistency.  The model has the granularity that is typically missing in first-principal models.  The resource for this method of model building – historical process data – is readily available but seldom used.  The resource is seldom used for process optimization and analysis because, without the methods needed to reduce the data-space dimensionality and cope with missing and correlated data, this resource presents too much raw, unconditioned information.

Figure 1: Model developed using continuous historical process data

Batch No. 49,95% Confidence Limit
Figure 2: Model developed using batch historical process data
Figure 3: Squared Prediction Error (SPE) of the batch data (Nylon example)
Batches 50 - 55,95% Confidence Limit
Figure 4: Hotellings T2 metric of the batch data (Nylon example)

Extended Abstract: File Not Uploaded
See more of this Session: Data Management in Refineries II
See more of this Group/Topical: Topical 7: 19th Topical Conference on Refinery Processing