首页 > 代码库 > 应用高斯分布来解决异常检测问题(二)

应用高斯分布来解决异常检测问题(二)

(原创文章,转载请注明出处!)

在文章应用高斯分布来解决异常检测问题(一)中对如何使用高斯分布来解决异常检测问题进行了描述,本篇是使用R编程实现了第一篇中所描述的两个模型:多个一元高斯分布模型和一个多元高斯分布模型。

一、 多个一元高斯分布模型

 1 ## parameters: 2 ##     x  -  a vector, which is the data of new samples. 3 ##     X  -  a matrix, which stores samples‘ data. 4 ##     parameterFile - path of paramter file,  5 ##                     the paramter file stores the paramters of the MultiUnivariate Norm model. 6 ##     isTraining - flag, TRUE will trigger the training,  7 ##                        FALSE will skip the training. 8 funMultiUnivariateNorm <- function(x, X = NULL, parameterFile = ".MultiUnivariateNorm", isTraining = FALSE)  9 {10     if (isTraining == TRUE) {11         if (is.null(X) == TRUE) {12             cat("X is NULL, MultiUnivariateNorm model Can‘t be trained\n")13             return14         } 15         numOfSamples <- dim(X)[1]16         numOfFeatures  <- dim(X)[2]17         18         vectrMean <- colMeans(X)19         vectrSD <- numeric(0)20         for (i in 1:numOfFeatures) {21             vectrSD[i] <- sd(X[,i])22         }23         24         ## write the parameters to the file25         ##   1st line is means divided by one blank 26         ##   2nd line is SDs divided by one blank27         matrixMeanSD <- matrix(c(vectrMean, vectrSD), ncol=numOfFeatures, byrow=TRUE)28         # checking of parameterFile leaves to write.table29         write.table(x=matrixMeanSD, file=parameterFile, row.names=FALSE, col.names=FALSE, sep=" ")30     } else {31         matrixMeanSD <- read.table(file=parameterFile)32         matrixMeanSD <- as.matrix(matrixMeanSD)33         vectrMean <- matrixMeanSD[1,]34         vectrSD <- matrixMeanSD[2,]        35     }36     37     vectrProbabilityNewSample <- dnorm(x, mean = vectrMean, sd = vectrSD, log = FALSE)38     prod(vectrProbabilityNewSample)  # probability of the new sample39 }

 

二、 一个多元高斯分布模型

 1 ## Before using this function the package mvtnorm need to be installed. 2 ## To install package mvtnorm, issuing command install.packages("mvtnorm") 3 ## and using command library(mvtnorm) to load the package to R workspace. 4 ##  5 ## parameters: 6 ##     x  -  a vector, the data of one samples that need to be calculate the output by the MultiUnivariate Norm model. 7 ##           a matrix, each line is one sample that need to be calculate the output by the MultiUnivariate Norm model. 8 ##     X  -  a matrix, which stores samples‘ data. 9 ##     parameterFile - path of paramter file, 10 ##                     the paramter file stores the paramters of the MultiUnivariate Norm model.11 ##     isTraining - flag, TRUE will trigger the training, 12 ##                        FALSE will skip the training.13 funMultivariateNorm <- function(x, X = NULL, parameterFile = ".MultivariateNorm", isTraining = FALSE) 14 {15     if (isTraining == TRUE) {16         if (is.null(X) == TRUE) {17             cat("X is NULL, MultivariateNorm model Can‘t be trained\n")18             return19         } 20         21         vectrMean <- colMeans(X)22         matrixSigma <- cov(X)23         ## write the parameters to the file24         ##   1st line is means divided by one blank 25         ##   from the 2nd line to the last line are variances divided by one blank26         matrixMeanCov <- rbind(vectrMean, matrixSigma)27         # checking of parameterFile leaves to write.table28         write.table(x=matrixMeanCov, file=parameterFile, row.names=FALSE, col.names=FALSE, sep=" ")29     } else {30         matrixMeanCov <- read.table(file=parameterFile)31         matrixMeanCov <- as.matrix(matrixMeanCov)32         vectrMean <- matrixMeanCov[1,]33         matrixSigma <- matrixMeanCov[c(2:dim(matrixMeanCov)[1]),] 34     }35     36     dmvnorm(x, mean = vectrMean, sigma = matrixSigma, log = FALSE) # probability of the new samples
37 }

应用高斯分布来解决异常检测问题(二)