468267 Robust Dynamic Principal Component Analysis Method for Modelling Process Data

Monday, November 14, 2016: 10:05 AM
Monterey I (Hotel Nikko San Francisco)
Alisha Deshpande, Chemical Engineering, University of Southern California, Los Angeles, CA, S. Joe Qin, Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, CA and Lisa A. Brenskelle, Process Automation, Chevron Energy Technology Co., Houston, TX

Advancing sensor and data gathering technology has resulted in a substantial increase in the amount and frequency of data collected from processes, creating a valuable opportunity for data-driven modelling techniques in process monitoring. Reliable data-driven models can lead to online analysis of incoming data so that sensor and process faults can be detected and managed. There are numerous methods for building data-driven models, including the commonly used principal component analysis (PCA) and its extension, dynamic PCA (DPCA) which includes time lagged variables to represent dynamic processes. However, there are several drawbacks to PCA and DPCA, including the inability to handle any missing or corrupted data. Robust PCA methods can address this problem [1].

Two new algorithms were developed in this work by incorporating a recently developed robust PCA method into DPCA to create a new robust DPCA method. The new method allows for the recovery of a fault-free matrix even when there are gross sparse errors present, and is therefore more robust to outliers than other methods. The robust PCA method was incorporated into DPCA in two different ways (‘Method 1’ and ‘Method 2’) which were tested and compared to each other and to traditional DPCA with real clean and faulty process data.

This work contains two case studies: Case Study 1 used data from a field test which generated two sets of data: a clean data set representing normal operation and a corrupted data set, with known sensor faults. Case Study 2 used operation data from a single piece of process equipment during normal operation to ensure that tags were correlated and the data was largely free of faults. The two case studies allowed for two different approaches for analysis. Since the first data set had a known clean data set available, detection and false alarm rates could be calculated using the squared prediction error (SPE) or Q statistic, as well as the similarity between the normal data set and the one cleaned by the new algorithm. The second data set did not have a basis for comparison, but was normal operation data and was therefore expected to have a low detection rate. Faults were then added to several variables to determine the effectiveness of the algorithm. Data reconstructions based on the models with ‘normal’ data and ‘faulty’ data were then compared.

Both the methods presented a significant improvement for fault detection over DPCA models trained on faulty data, which failed to detect most of the faults. Method 1 was also able to detect some faults that Method 2 was not. Furthermore, though Method 2 resulted in high detection rates in all the tests, it also consistently resulted in high false alarm rates, making the results unreliable. Method 1 was effective with proper tuning and sufficient training data; without one or the other, the detection rate and false alarm rates were adversely affected.


[1] Candes E.J., Li X., Ma Y., & Wright J. (2009) “Robust Principal Component Analysis?” Journal of the Association of Computing Machinery, 58 (3), 1-37.

Extended Abstract: File Not Uploaded