282328 A Method of Estimating the Probability of Rare Events in Bayesian Networks with Application to Risk Assessment in Processes
Since introducing a tractable inference technique for Bayesian Networks (BNs) in 1988 by Spiegelhalter and Lauritzen , BNs have become very popular in modeling real world problems. Although BN models are capable of handling incomplete and qualitative data, they become unreliable when there is no information on every possible scenario such as those regarded as rare events. Several solutions have been proposed to address this problem. They are referred to as sampling techniques. The solutions, however, have several weaknesses. First, many of them are not suitable for systems that lack an accurate quantitative model, and underestimate the probability of rare events (e.g. logistic regression). Second, commonly used sampling techniques such as importance sampling become unstable when system dimension increases significantly . Third, some sampling techniques are only useful for constructing asymptotic probability distribution functions. Fourth, majority of sampling techniques have slow rate of convergence in estimating the probability distributions of rare events.
In this work, we present a method of generating historical data and estimating probability distributions in cases for which there is no accurate mathematical model or adequate historical data. In the framework of BN modeling, the problem arises when (a) there is no observation for a possible combination of states of parent variables (due to rarity of that instantiation) and (b) an underlying governing equation is not known. In such situations, it is impossible to estimate entries of conditional probability tables (CPTs) of child nodes (variables) accurately, which leads to unreliable inference. We have developed a method of estimating unknown entries of CPTs by “extrapolating” existing conditional probability distributions obtained from available historical data. We have successfully applied this method to a series of rare event case studies, and simulation results show remarkable agreement with true values. This method is applicable to BNs with many nodes and complex interactions, and its convergence rate is considerably higher compared to conventional sampling techniques. An application of this method is in risk assessment of rare events, which are usually results of a sequence of rare causes. Available databases usually lack adequate information on rare events. The application and performance of the method are shown using a process example case study.
 Lauritzen L., Spiegelhalter J., “Local computations with probabilities on graphical structures and their application to expert systems (with discussion),” Journal of the Royal Statistical Society 50(2), 157-224 (1988).
 Gelman A., Meng X.-L., “Simulating normalizing constants: From importance sampling to bridgesampling to path sampling,” Statistical Science 13(2), 163–185 (1998).