462425 Process Knowledge Discovery and Selecting Number of Non-Zero Loadings in Sparse Principal Component Analysis

Tuesday, November 15, 2016: 1:38 PM
Monterey I (Hotel Nikko San Francisco)
Shriram Gajjar, Chemical Engineering & Material Science, University of California, Davis, Davis, CA, Murat Kulahci, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby, Denmark and Ahmet Palazoglu, Department of Chemical Engineering, University of California, Davis, Davis, CA

Process knowledge discovery and selecting number of non-zero loadings in sparse principal component analysis

Shriram Gajjar*, Murat Kulahci**, Ahmet Palazoglu*

*University of California, Davis, CA 95616, USA

(Tel: 530-752-8774; e-mail: anpalazoglu@ucdavis.edu).

**Technical University of Denmark, Lyngby, Denmark and Luleå University of Technology, Luleå, Sweden

(e-mail:muku@dtu.dk)

Background: Smart production technologies that are implemented today have dramatically intensified data generation and collection through networked information-based technologies throughout the chemical industry and other manufacturing enterprises. The data generation and collection are so fast-paced that humans have to rely on computers for consuming as well as processing the data. This, in turn, leads to an ever increasing pace for the development of algorithms and methods to improve process performance and facilitate process monitoring. The algorithms and methods should, at first, be able to extract significant information from the large datasets. Second, they should provide accurate means to reduce process variability and boost performance. Third, they should allow discovery of the underlying process dynamics that can substantially improve decision-making. Finally, steps can then be taken to move towards recommending preemptive actions (preventive decisions that are made before a failure occurs or is even observed).

Prior Work: Researchers have used principal component analysis (PCA) to capture meaningful information in a reduced dimensional space. PCA-based monitoring methods are among the most widely used multivariate statistical methods (Cinar et al., 2007). Using PCA for dimension reduction has one specific drawback where each principal component (PC) is a linear combination of all m variables and the loadings are typically nonzero. Such nonzero loadings (NZL) make it difficult to interpret the derived PCs and may confound subsequent analyses. To address this challenge, Zou et al. (2006) proposed sparse principal component analysis (SPCA) in which sparse loadings are obtained by imposing the lasso (elastic net) constraints on the coefficients (i.e., loadings) of the PCA model. SPCA essentially is the result of an optimization of the trade-off between variance captured by PCs and the sparsity imposed on PCs. It allows the user to control the sparsity of the loadings and improve the ability to identify the important variables.

Preliminary results: One of the challenges in using SPCA is in deciding the penalty parameters or choosing the number of non-zero variables/loadings (NNZL). We propose three approaches viz. exhaustive selection, forwards selection and sensitivity analysis that simplify the process of selecting penalty parameters and provide a more intuitive solution for understanding the physical meaning of variables monitored in chemical processes. In the exhaustive search approach one goes through all possible combinations of NZL in each PC, then chooses a solution that meets the required criteria. The downside of this approach is that it is computationally intensive and, in scenarios with large number of NZL, it is impractical and even infeasible to go through all combinations. In the forward selection approach we impose constraints on each SPC and a lower limit on the total variance captured by the sparse principal components (SPCs) is also imposed. By doing so, the search space for the optimum number of NNZL for each SPC is drastically reduced. Sensitivity analysis is a systematic review of the NNZL on SPCs. In this approach, the NNZL on a PC is varied keeping all other aspects constant. The goal is to determine if the NNZL on a PC can be made sparser without losing information. Thus, the traditional PCA can be altered in such a way that the obtained loadings would have a clear interpretation without significant loss of information extracted in each PC in terms of explained variance. Such an approach would also assist in the application of PCA in process surveillance as better understanding of the impact of PC loadings can clearly facilitate process monitoring, i.e., fault detection and diagnosis. Furthermore, we discuss the advantages of SPCA for process knowledge discovery with a synthetic example and the Tennessee Eastman benchmark process. The paper will highlight the substantially improved performance of process fault detection and diagnosis strategies using SPCA when compared with traditional approaches.  

References

Cinar, A., Palazoglu, A. and Kayihan, F. (2007) 'Multivariate Statistical Monitoring Techniques',  Chemical Process Performance Evaluation Chemical Industries: CRC Press, pp. 37-71.

Zou, H., Hastie, T. and Tibshirani, R. (2006) 'Sparse Principal Component Analysis', Journal of Computational and Graphical Statistics, 15(2), pp. 265-286.


Extended Abstract: File Not Uploaded
See more of this Session: Big Data Analytics in Chemical Engineering
See more of this Group/Topical: Computing and Systems Technology Division