**Title:** Outlier Detection and Analysis in Batch and
Continuous Processes

**Author/Presenter:** Peter J. Ryan, Ph.D., P.E.

**Author/Presenter email:** peter.ryan@responsepc.com

**Company:** Response Process Consulting LLC

**Motivation**

All manufacturing sectors, continuous or batch, gather process data for archiving and analysis purposes. Often the amount of data gathered, and the quality of data, makes it difficult to use this resource effectively. Major drawbacks in the quality of the available data include:

· large gaps in the process data

· noise, poor signal-to-noise ratios

· correlated data

· accuracy

· precision

New methods have been developed to handle these issues and to develop process models based on this resource. Specifically, a new approach to handling missing data and reconstructing quality data based on the observed (archived) data is presented. Once the models are developed, outliers can be identified in the continuous or batch data, and relationships between the Key Process Indicators (KPI's) and the upstream process variables can be established.

** **

**Approach**

Often, steady state or
dynamic models are available to describe a process. However, these
first-principle models often do not have the granularity needed to describe
product specifications such as color, turbidity or solvent loss due to
entrainment in a separator. Machine learning can be used to examine large
historical process data sets and determine the leading controlled variables of
a process. The machines learning methods first fit the data to a specified
model, and then use the model to explore the data space. Unsupervised learning
gives the fitting method full ability to determine what is different in the
data. Supervised learning requires the fitting method to consider both the
archived process data and measured quality data (acquired off-line of the
process). An example of unsupervised modeling is Principal Component Analysis
(PCA). An example of supervised modeling is Partial Least Squares (PLS). Both
of these methods can be used to model the historical process data and find
outliers by plotting the two principal component elements (scores) that capture
the most variability in the data. The score plots examine the data for
clusters that classify the data as meeting product specifications and not
meeting product specifications. The clusters that represent production runs
not meeting product specifications can be further examined to discover the
upstream process variables that are the cause of the product not meeting
specification. While visual inspection has been described, statistical metrics
such as the Squared Prediction Error (SPE) and Hotellings T^{2} metrics
can be calculated to find the same results in the data.

** **

**Results**

Examples of modeling both
continuous and batch processes are given. The continuous example is of a
commodity chemical process where color is the product specification of
interest. The batch example is of a nylon process where the relative viscosity
is the product specification of interest. While the analysis methods are the
same, one significant difference between handling continuous and batch data is
that the batch data must first be “unfolded” before a model can be developed.
Once the models are developed, clusters corresponding to successful and
non-compliant products are found. The non-compliant product clusters are
further examined, and the relationships between the product specification (KPI)
and the upstream process variables causing the non-compliance are discovered.
Figure 1 shows the results of the scores plot of the continuous commodity
chemical example. Figure 2 is an example of the scores plot of the batch nylon
example. In both cases, clusters of production activity where both in-spec and
non-compliant quality production are observed. Focusing on the batch example,
Figures 3 and 4 show the SPE and Hotelling T^{2} charts of the initial
process data. The outliers observed visually in Figure 2 are also detected
numerically by calculating the Hotelling T^{2} metric. The Hotelling T^{2}
metric finds points in the scores plot that are on the model plane but far away
from the center-of-mass of the model. The SPE metric finds points in the
clusters that are away from the model plane.

Contribution charts are used
to discover the relationships between the KPI's and the upstream process
variables, as shown in Figure 5. Note that in Figure 3 (SPE metric), batch 49
is far away from the model plane, even though its projection is in the cluster
of points that represent batches with good product specification. The SPE contribution
chart (Figure 5) reveals that deviations in five process variables caused the
batch to product off-spec product. Specifically, the batch did not meet the
turning points in the prescribed trajectories when the batch reached the 58^{th}
time interval of its run. An examination of the control system revealed that a
control issue was accountable for missing the turning points.

These examples show how a very large set of process data – of varying quality (missing data, low signal-to-noise ratio, correlation, etc) – can be reduced in dimensionality and how the leading process variables can be identified and related to the downstream KPI's to improve product quality and consistency. The model has the granularity that is typically missing in first-principal models. The resource for this method of model building – historical process data – is readily available but seldom used. The resource is seldom used for process optimization and analysis because, without the methods needed to reduce the data-space dimensionality and cope with missing and correlated data, this resource presents too much raw, unconditioned information.

**Extended Abstract:**File Not Uploaded

See more of this Group/Topical: Topical 7: 19

^{th}Topical Conference on Refinery Processing