Class CaseStatisticsAnalyzer
- java.lang.Object
-
- org.processmining.plugins.stochasticpetrinet.analyzer.CaseStatisticsAnalyzer
-
public class CaseStatisticsAnalyzer extends java.lang.ObjectProvides statistical analysis for outliers using non-parametric density estimations with foundations in:Dit-Yan Yeung and C. Chow. Parzen-Window Network Intrusion Detectors. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 4, pages 385–388 vol.4, 2002.
Extensions are made to exploit dependencies in business processes.
- Author:
- Andreas Rogge-Solti
-
-
Constructor Summary
Constructors Constructor Description CaseStatisticsAnalyzer()CaseStatisticsAnalyzer(StochasticNet stochasticNet, org.processmining.models.semantics.petrinet.Marking initialMarking, CaseStatisticsList statistics)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description doublecomputePValueByApproximateIntegration(org.apache.commons.math3.distribution.RealDistribution dist, double x)doublecomputePValueByApproximateIntegration(ReplayStep step)CaseStatisticsListgetCaseStatistics()java.util.List<ReplayStep>getIndividualOutlierSteps(CaseStatistics selectedCaseStatistics)org.processmining.models.semantics.petrinet.MarkinggetInitialMarking()java.lang.DoublegetLogLikelihoodCutoff(TimedTransition tt)org.apache.commons.math3.distribution.RealDistributiongetLogLikelihoodDistribution(TimedTransition transition)intgetMaxActivityCount()double[]getModelDensities(ReplayStep x, org.apache.commons.math3.distribution.RealDistribution assumedErrorDistribution, double assumedErrorRate)Returns the likelihood ratio of theReplayStepx stemming from an error distribution, or from the original distribution.CaseStatisticsListgetOrderedList()intgetOutlierCount(CaseStatistics cs)doublegetOutlierRate()doublegetPValueOfStepIntegral(ReplayStep step)java.util.List<ReplayStep>getRegularSteps(CaseStatistics selectedCaseStatistics)StochasticNetgetStochasticNet()booleanisOutlierLikelyToBeAnError(ReplayStep step)Let X be this node's random duration variable having the value x.voidsetCaseStatistics(CaseStatisticsList caseStatistics)voidsetOutlierRate(double outlierRate)voidupdateLikelihoodCutoffs()voidupdateStatistics(double outlierRate)
-
-
-
Constructor Detail
-
CaseStatisticsAnalyzer
public CaseStatisticsAnalyzer()
-
CaseStatisticsAnalyzer
public CaseStatisticsAnalyzer(StochasticNet stochasticNet, org.processmining.models.semantics.petrinet.Marking initialMarking, CaseStatisticsList statistics)
-
-
Method Detail
-
getOrderedList
public CaseStatisticsList getOrderedList()
-
getOutlierRate
public double getOutlierRate()
-
setOutlierRate
public void setOutlierRate(double outlierRate)
-
getCaseStatistics
public CaseStatisticsList getCaseStatistics()
-
setCaseStatistics
public void setCaseStatistics(CaseStatisticsList caseStatistics)
-
getStochasticNet
public StochasticNet getStochasticNet()
-
getMaxActivityCount
public int getMaxActivityCount()
-
getOutlierCount
public int getOutlierCount(CaseStatistics cs)
-
getModelDensities
public double[] getModelDensities(ReplayStep x, org.apache.commons.math3.distribution.RealDistribution assumedErrorDistribution, double assumedErrorRate)
Returns the likelihood ratio of theReplayStepx stemming from an error distribution, or from the original distribution. Assume that there is but one child (could use weighted average of scores for multiple children).Let's assume an error distribution that can shift the duration of this step and also affect the duration of the next step. We compare the joint probability of the two durations x and y (y is the activity that follows x) in the original model that we learned from historical observations with the distribution that results when we add an error along the y=-x line. Latter is correct because, if x is a measurement error, it also affects the duration of the child in a conversely. For example, when the end of x is mistakenly measured later, than the duration of y is also affected (it is shorter than expected).
- Parameters:
x-ReplayStepto compute the error score forassumedErrorDistribution- theRealDistributionthat is assumed as noise in the data for measurement errorsassumedErrorRate- the rate of error occurrence (must be between 0 inclusive and 1 exclusive)- Returns:
- densities of the models:
- index 0 contains density of p(x,y) original model,
- index 1 contains density of p(x,y) error-model
- index 2 contains the weighted ratio for x,y according to the assumed error rate
- index 3 contains density of original p(x)
- index 4 contains density of error-model for p(x)
- index 5 contains the weighted ratio according for x to the assumed error rate
- index 6 contains density of original p(y)
- index 7 contains density of error-model for p(y)
- index 8 contains the weighted ratio according for y to the assumed error rate
-
isOutlierLikelyToBeAnError
public boolean isOutlierLikelyToBeAnError(ReplayStep step)
Let X be this node's random duration variable having the value x. Assume that step is an outlier by itself (i.e., p(X = x) very low compared to the usual values) Let parents be a function assigning the parents to a random duration.We compare the probability of P(children | X) with the marginal probability of P(children | parents(X) ). If we see that the marginal probability is higher than the one given X=x, we assume that it is a single (measurement) error in the log. In the other case, we assume that X fits with the following events and is just a regular outlier.
- Parameters:
step-ReplayStepExample:
U V <- parents (if there are more than one, it was a parallel split) \ / X <- variable / \ Y Z <- children (if there are more than one, the process forked into multiple parallel branches)
here, we compute P(Y=y,Z=z | X=x) and compare it with integral over X of P(Y=y, Z=z, X | U=u, V=v) That is, we compare u v \ / x with X <- and integrate over all the values of X / \ / \ y z y z
- Returns:
- boolean indicating, whether the outlier is likely to be an error.
-
getPValueOfStepIntegral
public double getPValueOfStepIntegral(ReplayStep step)
-
computePValueByApproximateIntegration
public double computePValueByApproximateIntegration(ReplayStep step)
-
computePValueByApproximateIntegration
public double computePValueByApproximateIntegration(org.apache.commons.math3.distribution.RealDistribution dist, double x)
-
getIndividualOutlierSteps
public java.util.List<ReplayStep> getIndividualOutlierSteps(CaseStatistics selectedCaseStatistics)
-
getRegularSteps
public java.util.List<ReplayStep> getRegularSteps(CaseStatistics selectedCaseStatistics)
-
getInitialMarking
public org.processmining.models.semantics.petrinet.Marking getInitialMarking()
-
getLogLikelihoodCutoff
public java.lang.Double getLogLikelihoodCutoff(TimedTransition tt)
-
getLogLikelihoodDistribution
public org.apache.commons.math3.distribution.RealDistribution getLogLikelihoodDistribution(TimedTransition transition)
-
updateStatistics
public void updateStatistics(double outlierRate)
-
updateLikelihoodCutoffs
public void updateLikelihoodCutoffs()
-
-