263773 Multivariate SPC of Dynamic Processes Using Integrated Dynamic Principal Components Analysis and Missing Data Methods (DPCA-MD)
Classical univariate and multivariate SPC charts are usually based on iid assumptions. For the multivariate case, the statistic commonly used in the monitoring of industrial processes is the Hotelling’s T2  either applied to the original variables or to the latent scores after a principal component analysis (PCA) decomposition, usually complemented with the Q statistic (also known as square predicted error, SPE), in order to monitor that portion of variability not captured by the PCA model (PCA-MSPC, [2-4]). These SPC methodologies face some important practical limitations, as they assume variables to be independent along time, i.e., not autocorrelated, an hypothesis that is often not met in practice, specially with the high sampling rates currently achieved with modern instrumentation and data acquisition systems. In order to address this issue, Ku et al.  proposed an SPC procedure based on dynamic principal component analysis (DPCA), which is an extension of PCA that includes time shifted variables, in order to accommodate and tacitly model the dynamic behavior of variables within the same PCA framework. However, one can easily verify that the direct implementation of such method still leads to autocorrelated statistics, raising problems in its implementation.
Therefore, we have developed and tested alternatives procedures to cope with this problem. In this work, we present the results from the application of several statistics, some of them new, based on the PCA Hotelling’s T2 and Q statistics. These statistics use a combination of DPCA, ARMA models and missing data estimation methods, allowing the simultaneous reduction of the data dimensionality (correlation structure) while capturing their dynamic behavior, therefore handling the autocorrelation effects. Furthermore, a detailed procedure to select the number of lags for each variable in DPCA is also proposed, which improves the methodology proposed initially by Ku et al. .
In a first stage, the performance of the control chart procedures was assessed by their Average Run Length (ARL) on several systems, such as the Wood and Berry column. From this analysis we selected the proposed methodologies that presented the best overall performance and subsequently compared them with the traditional PCA and DPCA approaches, when applied to the Tennessee Eastman process.
The Wood and Berry column model  represents an approximation of the dynamical behavior of a binary distillation column separating methanol from water. In this dynamic model, the distillate and bottom methanol weight fraction are expressed as a function of the reflux and reboiler steam flow rate. The system was subject to a set of step perturbations in the sensor measurements, and the corresponding ARL were determined, for each perturbation. The upper control limits (UCL) for all statistics were previously adjusted, by trial and error, in order to enforce an equal in-control Average Run Length (ARL0) of 370 for all of them.
In this system, we observed that there was no significant difference between the traditional static and the dynamic versions of PCA. However, even with DPCA, the resulting statistics still present some autocorrelation. In the case of the new proposed statistics, the one that incorporates an implicit prediction methodology, namely through the missing data imputation (MD) approach to estimate future values, presented the best results. In fact the DPCA-MD monitoring statistics, not only improves the control chart performance, but also reduces the statistics autocorrelation, overcaming all the studied statistics.
This analysis was also conducted on the multivariate AR(l) process presented by Ku et al.  and on a Continuous Stirred-Tank Reactor (CSTR) system with a heating jacket . In all these systems, the DPCA-MD based statistics presented consistently superior performances and lower autocorrelation. Therefore, the DPCA-MD based statistics, were applied, at a second stage, to the Tennessee Eastman process, and compared with the traditional PCA and DPCA approaches.
The Tennessee Eastman process was developed by Downs and Vogel  and has been widely used by the process monitoring community as a source of data for comparing various approaches. The simulation model has 41 measurements (XMEAS), 12 manipulated (XMV) variables and allows for 21 process upsets.
In this study we used the data provided by Braatz in  where the control system reported by Lyman and Georgakis  was implemented to generate the closed loop simulated process data. Each data set contains 960 observations with a sample interval of 3 min. The fault was introduced 8 hours after the simulation start. All the manipulated and measurement variables were collected, with the exception of the agitation speed of the reactor’s stirrer, giving a total of 52 variables.
The data sets, without faults, were used to construct the PCA, DPCA and DPCA-MD models and to determine their upper control limits (UCL). The UCL’s were adjusted by trial and error so that all the monitoring statistics present the same false alarm rate of 1%. All of the studied statistics on this system failed to detect faults number 3 and 9 and had a low capability to detect faults number 15 and 21. On the remaining 17 faults, the DPCA-MD statistics had the highest fault detection rates, while the PCA and DPCA statistics were only capable to perform well on 4 faults each. The DPCA-MD statistics also presented a consistent out of control state during faulty periods.
The performance of the proposed method (DPCA-MD) was also assessed with a paired t-test. From this analysis, we concluded that, with a 5% significance level, the DPCA-MD statistics are significantly better than those from PCA and DPCA. Another advantage of the DPCA-MD statistics is its lower autocorrelation, which makes the DPCA-MD statistics more reliable and easy to implement in practice.
To sum up, in this work we addressed the problem of monitoring processes with correlated dynamical data. We have studied several statistics based on a combination of methods (including PCA, DPCA, PLS, Time Series and Missing Data) and conclude that those derived from missing data methodologies show, in general, better performances. However we would like to point out that such statistics do require a suitable method to estimate the number of lags needed to construct the DPCA model, an issue that was also treated. The best results were achieved with the DPCA-MD statistics, that also presented lower autocorrelation and consistent out of control detection, making it a very interesting and viable alternative to the current ones based strictly on PCA and DPCA.
1. Hotelling, H., The Generalization of Student's Ratio. The Annals of Mathematical Statistics, 1931. 2(3): p. 360-378.
2. Jackson, J.E., Quality Control Methods for Several Related Variables. Technometrics, 1959. 1(4): p. 359-377.
3. Kresta, J.V., J.F. MacGregor, and T.E. Marlin, Multivariate Statistical Monitoring of Process Operating Performance. The Canadian Journal of Chemical Engineering, 1991. 69: p. 35-47.
4. Reis, M.S. and P.M. Saraiva, Multivariate and Multiscale Data Analysis, in Statistical Practice in Business and Industry, S. Coleman, et al., Editors. 2008, Wiley: Chichester. p. 337-370.
5. Ku, W., R.H. Storer, and C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 1995. 30(1): p. 179-196.
6. Wood, R.K. and M.W. Berry, Terminal composition control of a binary distillation column. Chemical Engineering Science 1973. 28(9): p. 1707-1717.
7. Santos, L., Simulação dinâmica de um sistema de constituído por um CSTR. 2009.
8. Downs, J.J. and E.F. Vogel, A plant-wide industrial process control problem. Computers and Chemical Engineering, 1993. 17(3): p. 245-255.
9. Braatz, R.D. Multiscale Systems Research Laboratory. 2002 [cited; Available from: http://brahms.scs.uiuc.edu.
10. Lyman, P.R. and C. Georgakis, Plant-wide control of the Tennessee Eastman problem. Computers and Chemical Engineering, 1995. 19(3): p. 321-331.
See more of this Group/Topical: Computing and Systems Technology Division