Accord.NET_Naive Bayes Classifier

首页 > 代码库 > Accord.NET_Naive Bayes Classifier

Accord.NET_Naive Bayes Classifier

2024-11-09 00:32:02 204人阅读

我们这个系列主要为了了解并会使用Accord.NET中机器学习有关算法，因此主要关注的是算法针对的的问题，算法的使用。所以主要以代码为主，通过代码来学习，在脑海中形成一个轮廓。下面就言归正传，开始贝叶斯分类器的学习。

朴素贝叶斯分类器，一个基于贝叶斯理论的简单概率分类器。简单的说，贝叶斯理论是独立特征模型，也就是说一个类别的指定特征的表现与否，跟其他任何特征无关。

TestCase1

著名的打网球实验(Tom Mitchell (1998))。实验中，基于四个条件，推测某人是否想去打网球。这些条件变量都是可分类的，即各变量的可取值之间没有关系

首先需要将问题的表现形式简化。通过Accord.Statistics.Filters.Codification，将问题转为用数字表示的codebook，比如Sunny转为0，Overcast为1，Rain为2。以此类推，得到训练用的输入输出。

接下来应该训练贝叶斯模型，用来预测最后一列，是否打网球。这里使用“Outlook”，“Temperature”，“Humidity”，“Wind”作为条件，预测是否打网球，四个输入一个输出。由于输入条件都是可分类的，在创建贝叶斯模型时应该指定每个变量的取值有几种可能，如果训练集的输入已经覆盖了每个变量的所有的情况，可以不创建模型，本例就是如此，因为算法的Learn函数会检查模型是否为空，空的情况下会根据输入输出创建。

得到分类器后，使用Decide方法根据输入计算输出。

接下来看代码

public void ComputeTest(){    #region doc_mitchell    DataTable data = new DataTable("Mitchell‘s Tennis Example");    data.Columns.Add("Day", "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis");    data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No");    data.Rows.Add("D2", "Sunny", "Hot", "High", "Strong", "No");    data.Rows.Add("D3", "Overcast", "Hot", "High", "Weak", "Yes");    data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes");    data.Rows.Add("D5", "Rain", "Cool", "Normal", "Weak", "Yes");    data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No");    data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes");    data.Rows.Add("D8", "Sunny", "Mild", "High", "Weak", "No");    data.Rows.Add("D9", "Sunny", "Cool", "Normal", "Weak", "Yes");    data.Rows.Add("D10", "Rain", "Mild", "Normal", "Weak", "Yes");    data.Rows.Add("D11", "Sunny", "Mild", "Normal", "Strong", "Yes");    data.Rows.Add("D12", "Overcast", "Mild", "High", "Strong", "Yes");    data.Rows.Add("D13", "Overcast", "Hot", "Normal", "Weak", "Yes");    data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No");    #endregion    #region doc_codebook    // 创建codification codebook    // 把字符串变量转为独立的符号变量    Codification codebook = new Codification(data,        "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis");    // 提取出输入输出对作为训练集    DataTable symbols = codebook.Apply(data);    int[][] inputs = symbols.ToArray<int>("Outlook", "Temperature", "Humidity", "Wind");    int[] outputs = symbols.ToArray<int>("PlayTennis");    #endregion    #region doc_learn    // 创建一个贝叶斯算法实例    var learner = new NaiveBayesLearning();    // 用训练集学习一个贝叶斯模型    NaiveBayes nb = learner.Learn(inputs, outputs);    #endregion    #region doc_test    // 测试一组数据，在sunny，cool，humid，windy的条件下，某人是否会打网球    // 先将条件通过codebook编码为符号    int[] instance = codebook.Translate("Sunny", "Cool", "High", "Strong");    // 获得数字输出表示的答案    int c = nb.Decide(instance); // answer will be 0    // 将数字输出的答案通过codebook转为实际的"Yes"/"No"    string result = codebook.Translate("PlayTennis", c); // 答案是"No"    // 还可以提取每种答案的概率    double[] probs = nb.Probabilities(instance); // { 0.795, 0.205 }    #endregion    Assert.AreEqual("No", result);    Assert.AreEqual(0, c);    Assert.AreEqual(0.795, probs[0], 1e-3);    Assert.AreEqual(0.205, probs[1], 1e-3);    Assert.AreEqual(1, probs.Sum(), 1e-10);    Assert.IsFalse(double.IsNaN(probs[0]));    Assert.AreEqual(2, probs.Length);}

TestCase2

下面的例子针对离散模型设置了更具体的学习参数。

public void laplace_smoothing_missing_sample(){    #region doc_laplace    // Laplace rule针对当某个输入符号的某个类别不在训练集中时    // 比如本例中输入的第二列应包含0，1，2三个值    // 但实际的例子中只有1，2两种情况    int[][] inputs =    {        //      输入         输出        new [] { 0, 1 }, //  0         new [] { 0, 2 }, //  0        new [] { 0, 1 }, //  0        new [] { 1, 2 }, //  1        new [] { 0, 2 }, //  1        new [] { 0, 2 }, //  1        new [] { 1, 1 }, //  2        new [] { 0, 1 }, //  2        new [] { 1, 1 }, //  2    };    int[] outputs = // 对应的分类    {        0, 0, 0, 1, 1, 1, 2, 2, 2,     };    // 由于训练集没有覆盖实际期望的所有情况Since the data is not enough to determine which symbols we are    // 所以需要指定贝叶斯模型    // 第一个输入有两种情况，第二个输入有三种情况    var bayes = new NaiveBayes(classes: 3, symbols: new[] { 2, 3 });    // 创建学习算法时指定模型    var learning = new NaiveBayesLearning()    {        Model = bayes    };    // 使用Laplace rule    learning.Options.InnerOption.UseLaplaceRule = true;    // 训练贝叶斯模型    learning.Learn(inputs, outputs);    // 第二个输入为0来预测分类结果    int answer = bayes.Decide(new int[] { 0, 0 });    #endregion    Assert.AreEqual(0, answer);    double prob = bayes.Probability(new int[] { 0, 0 }, out answer);    Assert.AreEqual(0, answer);    //Assert.AreEqual(0.52173913043478259, prob, 1e-10);    Assert.AreEqual(0.44444444444444453, prob, 1e-10);        double error = new ZeroOneLoss(outputs)    {        Mean = true    }.Loss(bayes.Decide(inputs));    Assert.AreEqual(2 / 9.0, error);}

TestCase3

下面的例子创建了一个多类别的分类器，使用整数输入并创建离散的贝叶斯模型。

public void ComputeTest3(){    #region doc_multiclass    // 将下列数据分成三类//    int[][] inputs =    {        //               输入         输出        new int[] { 0, 1, 1, 0 }, //  0         new int[] { 0, 1, 0, 0 }, //  0        new int[] { 0, 0, 1, 0 }, //  0        new int[] { 0, 1, 1, 0 }, //  0        new int[] { 0, 1, 0, 0 }, //  0        new int[] { 1, 0, 0, 0 }, //  1        new int[] { 1, 0, 0, 0 }, //  1        new int[] { 1, 0, 0, 1 }, //  1        new int[] { 0, 0, 0, 1 }, //  1        new int[] { 0, 0, 0, 1 }, //  1        new int[] { 1, 1, 1, 1 }, //  2        new int[] { 1, 0, 1, 1 }, //  2        new int[] { 1, 1, 0, 1 }, //  2        new int[] { 0, 1, 1, 1 }, //  2        new int[] { 1, 1, 1, 1 }, //  2    };    int[] outputs = // 对应的输出类别    {        0, 0, 0, 0, 0,        1, 1, 1, 1, 1,        2, 2, 2, 2, 2,    };    // 创建算法    var learner = new NaiveBayesLearning();    // 训练模型    NaiveBayes nb = learner.Learn(inputs, outputs);    // 使用第一个样本测试    int answer = nb.Decide(new int[] { 0, 1, 1, 0 }); // should be 1    #endregion    double error = new ZeroOneLoss(outputs).Loss(nb.Decide(inputs));    Assert.AreEqual(0, error);    for (int i = 0; i < inputs.Length; i++)    {        error = nb.Compute(inputs[i]);        double expected = outputs[i];        Assert.AreEqual(expected, error);    }}

TestCase4

下面的例子使用了高斯模型，并且展示如何设置更加具体的学习参数。

public void learn_test(){    #region doc_learn    // 将下面的输入分成三类    double[][] inputs =    {        //               输入           输出        new double[] { 0, 1, 1, 0 }, //  0         new double[] { 0, 1, 0, 0 }, //  0        new double[] { 0, 0, 1, 0 }, //  0        new double[] { 0, 1, 1, 0 }, //  0        new double[] { 0, 1, 0, 0 }, //  0        new double[] { 1, 0, 0, 0 }, //  1        new double[] { 1, 0, 0, 0 }, //  1        new double[] { 1, 0, 0, 1 }, //  1        new double[] { 0, 0, 0, 1 }, //  1        new double[] { 0, 0, 0, 1 }, //  1        new double[] { 1, 1, 1, 1 }, //  2        new double[] { 1, 0, 1, 1 }, //  2        new double[] { 1, 1, 0, 1 }, //  2        new double[] { 0, 1, 1, 1 }, //  2        new double[] { 1, 1, 1, 1 }, //  2    };    int[] outputs = // 对应输出的类别    {        0, 0, 0, 0, 0,        1, 1, 1, 1, 1,        2, 2, 2, 2, 2,    };    // 高斯模型
    var teacher = new NaiveBayesLearning<NormalDistribution>();    // component distributions    teacher.Options.InnerOption = new NormalOptions    {        Regularization = 1e-5 // 避免0变异    };    // 训练模型    NaiveBayes<NormalDistribution> bayes = teacher.Learn(inputs, outputs);    // 预测输出    int[] predicted = bayes.Decide(inputs);    // 预估模型误差，应为0    double error = new ZeroOneLoss(outputs).Loss(predicted);    // 预测指定输入    int answer = bayes.Decide(new double[] { 1, 0, 0, 1 }); // 应为1    #endregion    Assert.AreEqual(0, error);    Assert.AreEqual(1, answer);    Assert.IsTrue(predicted.IsEqual(outputs));}

Accord.NET_Naive Bayes Classifier

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Accord.NET_Naive Bayes Classifier

Accord.NET_Naive Bayes Classifier

看完仍有疑问？有类似问题直接问程序猿