In my previous post I described about the text featurization using MicrosoftML.

In this post, I show you a brief introduction for the anomaly detection with MicrosoftML.

Note : As I mentioned in the previous post, MicrosoftML is now available in Windows only (not Linux including the Spark cluster). Sorry, but please wait for the update.

MicrosoftML provides the function of one class support vector machines (OC-SVM) named `rxOneClassSvm`

, which is used for the unbalanced binary classification. This function is the **unsupervised learner**, i.e., it doesn’t need to know about the possible anomalies in the training phase. (The only normal data is used for the training, and it’s separated by the optimal hyperplane while it’s mapped into the high dimensional space.)

First I show you a brief example of this function for your understanding as follows.

```
library(MicrosoftML)
# train data with normal data
train_count <- 500
ndivall <- rnorm(train_count)
ndivnorm <-
(ndivall - min(ndivall))/(max(ndivall) - min(ndivall))
traindata <-
data.frame(AvailableMemory = round(200 * ndivnorm, digits = 2))
ndivall <- rnorm(train_count)
ndivnorm <- (ndivall - min(ndivall))/(max(ndivall) - min(ndivall))
traindata$DiskIO <- round(100 * ndivnorm, digits = 2)
# test data with some anomaly data
test_count <- 10
ndivall <- rnorm(test_count)
ndivnorm <-
(ndivall - min(ndivall))/(max(ndivall) - min(ndivall))
testdata <-
data.frame(AvailableMemory = round(200 * ndivnorm, digits = 2))
ndivall <- rnorm(test_count)
ndivnorm <- (ndivall - min(ndivall))/(max(ndivall) - min(ndivall))
testdata$DiskIO <- round(100 * ndivnorm, digits = 2)
testdata$AvailableMemory[c(3,7)] <- c(100, 0)
testdata$DiskIO[c(3,7)] <- c(150, 120)
# train by OC-SVM with normal data
model <- rxOneClassSvm(
formula = ~AvailableMemory + DiskIO,
data = traindata)
# predict
result <- rxPredict(
model,
data = testdata,
extraVarsToWrite = c("AvailableMemory", "DiskIO"))
```

As you can see, the row #3 and #7 in the test data is the outlier.

The following illustrates the data map including the normal data by the blue dot and this outlier data by the red dot.

The following is the result. The outlier data in row #3 and #7 are scored as follows.

Let’s see the real scenario.

Here I use the “Breast Cancer Wisconsin Data Set” (see here). This data is including id of patient, the diagnosis result of disease (M = malignant, B = benign), and a lot of attributes which are computed from a digitized image of a breast mass (radius, texture, perimeter, etc). This sample is having high dimensions.

This dataset is well-formed for the analysis purpose, but in the real application you must do some works before training like selecting appropriate attributes, vectorizing, data cleaning, eliminating dependencies, etc.

```
8510426, B, 13.54, 14.36, 87.46, ...
8510653, B, 13.08, 15.71, 85.63, ...
8510824, B, 9.504, 12.44, 60.34, ...
...
```

Here I train and predict with the following steps.

- Split the original data into the training purpose and testing purpose.
- Create the trained model by
`rxOneClassSvm`

with the training data. We use all the attributes except for the patient id and the result (‘M’ or ‘B’) for training. - Predict by the generated model with test data, and evaluate the results. (Here I use ROCR package.)

This programming example is here :

```
library("MicrosoftML")
library("ROCR")
# read data
alldata <- read.csv(
"C:\tmp\wdbc.data",
col.names=c(
"patientid",
"outcome",
"radius_mean",
"texture_mean",
"perimeter_mean",
"area_mean",
"smoothness_mean",
"compactness_mean",
"concavity_mean",
"concavepoints_mean",
"symmetry_mean",
"fractaldimension_mean",
"radius_error",
"texture_error",
"perimeter_error",
"area_error",
"smoothness_error",
"compactness_error",
"concavity_error",
"concavepoints_error",
"symmetry_error",
"fractaldimension_error",
"radius_worst",
"texture_worst",
"perimeter_worst",
"area_worst",
"smoothness_worst",
"compactness_worst",
"concavity_worst",
"concavepoints_worst",
"symmetry_worst",
"fractaldimension_worst"))
# split data
# (Note that all training data must be normal data)
traindata <- alldata[1:449,]
traindata <-
traindata[traindata$outcome=="B",]
traindata <-
traindata[,!(names(traindata) %in% c("patientid", "outcome"))]
testdata <- alldata[450:568,]
# train by OC-SVM with normal data
model <- rxOneClassSvm(
formula = ~ .,
data = traindata)
# predict using the trained model
result <- rxPredict(
model,
data = testdata,
extraVarsToWrite = c("outcome"))
# evaluate results (compare with the diagnosis results) and plot
pred <- prediction(
predictions = result$Score,
labels = result$outcome,
label.ordering = c('B', 'M'))
roc.perf = performance(
pred,
measure = "tpr",
x.measure = "fpr")
plot(roc.perf)
```

The following is the result plotted by ROCR. The result seems to fairly match the diagnosis results.

`rxOneClassSvm`

uses the radial basis (RBF) as the SVM kernel function by default. For more complex cases, you can specify other kernel functions (linear, polynomial, sigmoid) with appropriate parameters.

```
model <- rxOneClassSvm(
formula = ~TestAttr1 + TestAttr2,
```**kernel = polynomialKernel(a = .2, deg = 2),**
data = traindata)

Categories: Uncategorized

May I ask what does the score mean in prediction result? and how to determine a threshold for that?

LikeLike

Hi, this sample code is training by “rxOneClassSvm” function using the normal data (in this case, the benign data), and the trained model is created and saved in the “model” variable, and is predicting (scoring) real data which is including both benign and malignant by “rxPredict” function with the created model.

One Class Support Vector (OC-SVM) determines the threshold like follows as a concept. (Note that the real algorithm is not as the following orders and the mathematical resolutions. This illustrates the simple concept.)

1. Transforming to the high dimensions (ex, 2 dimensions -> 3 dimensions)

2. Cut data (normal data and others) by the appropriate hyperplain.

If you’re not familiar with SVM, I think that the following video is a very good resource for your understanding first. (In such case you need more complexed transformation, you can specify the kernel function in OC-SVM. This video is also illustrating the simple case as example.)

LikeLike

Thanks for your reply! I think I am not clear in my question. I wanna ask whether score in the rxPredict function represents the probability of a data entry belonging to the origin/anomaly class? and in unsupervised learning, is there a good way to determine the threshold for being anomalies based on the Score produced by rxPredict?

Thanks a lot! I am really new to SVM

LikeLike

Sorry it’s not the exact answer for your question, but it depends on how the data is distributed, or parameters, etc.

In this case, RBF is used as default kernel function, and the function’s parameter “gamma” which determines how it tightly fits is equal to 0.1 by default. The parameter “nu” determines how the support vectors (which is the data on the margin boundary) exists, and it also affects the scoring values. (Please see https://msdn.microsoft.com/en-us/microsoft-r/microsoftml/packagehelp/rxoneclasssvm )

It’s better to determine the exact threshold by optimizing the parameters and evaluating the real test data.

LikeLike

Hey Tsuyoshi, I think Ellie is also asking about:

1. What is the definition of the score? (Is score the distance from the boundary to the points?)

2. What is the difference between the negative score and positive score? (Does the negative score means that the data is normal or abnormal?)

I may be wrong, but this is what I got from the question being asked.

LikeLike

Thanks for your comment.

Unlike other models or open source OC-SVM function, sorry but rxOneClassSvm doesn’t return the predicted label or probability. (Usually rxPredict returns “PredictedLabel” variable, but not in the case of rxOneClassSvm. See details for https://msdn.microsoft.com/en-us/microsoft-r/microsoftml/packagehelp/rxpredict)

Like sentiment analysis, “Score” is just the degree for getting results and doesn’t have common threshhold or some concrete steps to get predicted label. It depends on the internal implementation, and I’m sorry but I also don’t know the detailed algorithm internal for this scoring. (I can’t give any further comment for this “Score” result.)

LikeLike

I add my comment… The result’s output is just the value of degree by inference, and then we must additionally define loss function and reject option by ourselves.

In this case, the false-negative (misjudging for a person who has real cancer) has apparently much loss than false-positive. Moreover if the result is less than some confidentiality, it must be “rejected” and be judged by the real human doctor.

Before applying in the real case, we must define these kind of decision options with mathematical approaches by using actual observations.

LikeLike

Thank you for your reply Tsuyoshi

LikeLike