首页 > 代码库 > libsvm源码凝视+算法描写叙述:svm_train

libsvm源码凝视+算法描写叙述:svm_train

(I will try my best to make this note clearer. We mainly focus on solve_c_svc in this note)

We mainly focus on solve_c_svc in this note.

Our goal:
mindB<script type="math/tex" id="MathJax-Element-54">\displaystyle \min_{\mathbf{d}_B}</script> 12dTB?2f(αk)dB+?f(αk)TBdB<script type="math/tex" id="MathJax-Element-55">\frac{1}{2}\mathbf{d}_{B}^{T}\nabla^2f(\alpha^k)\mathbf{d}_{B}+\nabla f(\alpha^k)_B^T\mathbf{d}_{B}</script>
s.t. yTBdB=0<script type="math/tex" id="MathJax-Element-56">\mathbf{y}_B^T\mathbf{d}_B=0</script>

Core function of training: solve_c_svc.
It will call the function:

s.Solve(l, SVC_Q(*prob,*param,y), minus_ones, y, alpha, Cp, Cn, param->eps, si, param->shrinking);

Algotirhm 2 of WSS3 (An SMO-type decomposition method using WSS3)
After the initialization (trivial, we will discuss later), the first step is to select the working set B={i,j}<script type="math/tex" id="MathJax-Element-4">B=\{i,j\}</script>. The chosen i<script type="math/tex" id="MathJax-Element-5">i</script> and j<script type="math/tex" id="MathJax-Element-6">j</script> most violate the optimally condition. Thus it is a natural choice . We called it “maximal violating pair”.
The pair satisfies in WSS1:
iargmaxt{?yt?f(αk)t|tIup(αk)}<script type="math/tex" id="MathJax-Element-7">i \in \arg \displaystyle \max_{t} \{ -y_t \nabla f(\alpha^k)_t | t \in I_{up}(\alpha^k) \}</script>
jargmint{?yt?f(αk)t|tIlow(αk)}<script type="math/tex" id="MathJax-Element-8">j \in \arg \displaystyle \min_{t} \{ -y_t \nabla f(\alpha^k)_t | t \in I_{low}(\alpha^k) \}</script>
where
Iup(α){t|αt<C,yt=1orαt>0,yt=?1}<script type="math/tex" id="MathJax-Element-9"> I_{up}(\alpha) \equiv \{ t | \alpha_t < C, y_t=1 \, or \, \alpha_t>0, y_t=-1 \}</script>, and
Ilow(α){t|αt<C,yt=?1orαt>0,yt=1}<script type="math/tex" id="MathJax-Element-10"> I_{low}(\alpha) \equiv \{ t | \alpha_t < C, y_t=-1 \, or \, \alpha_t>0, y_t=1 \}</script>
while in WSS3 j is different:
jargmint{?b2itaˉit|tIlow(αk),?yt?f(αk)t<?yi?f(αk)i}<script type="math/tex" id="MathJax-Element-11">j \in \arg \displaystyle \min_{t} \{ -\frac{b_{it}^2}{\bar a_{it}} | t \in I_{low}(\alpha^k), -y_t \nabla f(\alpha^k)_t < -y_i \nabla f(\alpha^k)_i \}</script>
Detailed proof of it is shown in Fan(2005) Theorem 3.

if(select_working_set(i,j)!=0)
{
    // reconstruct the whole gradient
    reconstruct_gradient();
    // reset active set size and check
    active_size = l;
    info("*");
    if(select_working_set(i,j)!=0)  //select_working_set: 1 optimal
        break;
    else
        counter = 1;    // do shrinking next iteration
}

Working set selection subroutine (WSS3):
Note that we only discuss the case of yj=1<script type="math/tex" id="MathJax-Element-12">y_j=1</script>, while the other case is similar. Also note that there is no yj<script type="math/tex" id="MathJax-Element-13">y_j</script> because it is 1 already which is reflected by the sign.
Note that:
iargmaxt{?yt?f(αk)t|tIup(αk)}<script type="math/tex" id="MathJax-Element-14">i \in \arg \displaystyle \max_{t} \{ -y_t \nabla f(\alpha^k)_t | t \in I_{up}(\alpha^k) \}</script>
where Iup(α){t|αt<C,yt=1orαt>0,yt=?1}<script type="math/tex" id="MathJax-Element-15"> I_{up}(\alpha) \equiv \{ t | \alpha_t < C, y_t=1 \, or \, \alpha_t>0, y_t=-1 \}</script>
We wanna find a t<script type="math/tex" id="MathJax-Element-16">t</script> satisfying its ?yt?f(αk)t<script type="math/tex" id="MathJax-Element-17">-y_t \nabla f(\alpha^k)_t</script> is maximum.
Select i:

// return i,j such that
// i: maximizes -y_i * grad(f)_i, i in I_up(\alpha)
// j: minimizes the decrease of obj value
//    (if quadratic coefficeint <= 0, replace it with tau)
//    -y_j*grad(f)_j < -y_i*grad(f)_i, j in I_low(\alpha)

for(int t=0;t<active_size;t++)
    if(y[t]==+1)    
    {
        if(!is_upper_bound(t))
            if(-G[t] >= Gmax)
            {
                Gmax = -G[t];
                Gmax_idx = t;
            }
    }
    else
    {
        if(!is_lower_bound(t))
            if(G[t] >= Gmax)
            {
                Gmax = G[t];
                Gmax_idx = t;
            }
    }

Select j:
quad_coef is aij<script type="math/tex" id="MathJax-Element-18">a_{ij}</script>. grad_diff is bij<script type="math/tex" id="MathJax-Element-19">b_{ij}</script>.

if (!is_lower_bound(j))
{
    double grad_diff=Gmax+G[j]; //y[t]==1
    if (G[j] >= Gmax2)
        Gmax2 = G[j];
    if (grad_diff > 0)
    {
        double obj_diff;
        double quad_coef = QD[i]+QD[j]-2.0*y[i]*Q_i[j]; 
        //HERE y[j]==1 and quad_coef is "a" in (Fan2005) Q_ii+Q_tt-yiyjQ_it
        if (quad_coef > 0)
            obj_diff = -(grad_diff*grad_diff)/quad_coef;
        else
            obj_diff = -(grad_diff*grad_diff)/TAU;
            if (obj_diff <= obj_diff_min)
        {
            Gmin_idx=j;
            obj_diff_min = obj_diff;
        }
    }
}

After choosing the i<script type="math/tex" id="MathJax-Element-20">i</script> and j<script type="math/tex" id="MathJax-Element-21">j</script> for the set B={i,j}<script type="math/tex" id="MathJax-Element-22">B=\{i,j\}</script>, we update the αi<script type="math/tex" id="MathJax-Element-23">\alpha_i</script> and αj<script type="math/tex" id="MathJax-Element-24">\alpha_j</script>:

const Qfloat *Q_i = Q.get_Q(i,active_size);
const Qfloat *Q_j = Q.get_Q(j,active_size);
double C_i = get_C(i);
double C_j = get_C(j);
double old_alpha_i = alpha[i];
double old_alpha_j = alpha[j];
if(y[i]!=y[j])
{
    double quad_coef = QD[i]+QD[j]+2*Q_i[j];
    if (quad_coef <= 0)
        quad_coef = TAU;
    double delta = (-G[i]-G[j])/quad_coef;
    double diff = alpha[i] - alpha[j];
    alpha[i] += delta;
    alpha[j] += delta;

    if(diff > 0)
    {
        if(alpha[j] < 0)
        {
            alpha[j] = 0;
            alpha[i] = diff;
        }
    }
    else
    {
        if(alpha[i] < 0)
        {
            alpha[i] = 0;
            alpha[j] = -diff;
        }
    }
    if(diff > C_i - C_j)
    {
        if(alpha[i] > C_i)
        {
            alpha[i] = C_i;
            alpha[j] = C_i - diff;
        }
    }
    else
    {
        if(alpha[j] > C_j)
        {
            alpha[j] = C_j;
            alpha[i] = C_j + diff;
        }
    }
}
else
{
    double quad_coef = QD[i]+QD[j]-2*Q_i[j];
    if (quad_coef <= 0)
        quad_coef = TAU;
    double delta = (G[i]-G[j])/quad_coef;
    double sum = alpha[i] + alpha[j];
    alpha[i] -= delta;
    alpha[j] += delta;
    if(sum > C_i)
    {
        if(alpha[i] > C_i)
        {
            alpha[i] = C_i;
            alpha[j] = sum - C_i;
        }
    }
    else
    {
        if(alpha[j] < 0)
        {
            alpha[j] = 0;
            alpha[i] = sum;
        }
    }
    if(sum > C_j)
    {
        if(alpha[j] > C_j)
        {
            alpha[j] = C_j;
            alpha[i] = sum - C_j;
        }
    }
    else
    {
        if(alpha[i] < 0)
        {
            alpha[i] = 0;
            alpha[j] = sum;
        }
    }
}

Then updating the gradient G[i]: ?f(αk+1)i<script type="math/tex" id="MathJax-Element-25">\nabla f(\alpha^{k+1})_i</script>:

double delta_alpha_i = alpha[i] - old_alpha_i;
double delta_alpha_j = alpha[j] - old_alpha_j;

for(int k=0;k<active_size;k++)
{
    G[k] += Q_i[k]*delta_alpha_i + Q_j[k]*delta_alpha_j;
}

QBNαkN+pB=?Bf(αk)?QBBαkB<script type="math/tex" id="MathJax-Element-26">Q_{BN} \alpha_N^k+p_{B} = \nabla_B f(\alpha^k)-Q_{BB}\alpha_B^k</script>
?f(αk+1)=?f(αk)+Q:,B(αk+1B?αkB)<script type="math/tex" id="MathJax-Element-27">\nabla f(\alpha^{k+1})=\nabla f(\alpha^k)+Q_{:,B}(\alpha_B^{k+1}-\alpha_B^k)</script>
where Q:,B<script type="math/tex" id="MathJax-Element-28">Q_{:,B}</script> is the sub-matrix of Q<script type="math/tex" id="MathJax-Element-29">Q</script> including columns in B<script type="math/tex" id="MathJax-Element-30">B</script>. After the sub-problem is solved, ?f(αk+1)<script type="math/tex" id="MathJax-Element-31">\nabla f(\alpha^{k+1})</script> is acquired. Therefore, LIBSVM maintains the gradient throughout the decomposition method.

bool ui = is_upper_bound(i);
bool uj = is_upper_bound(j);
update_alpha_status(i);
update_alpha_status(j);
int k;
if(ui != is_upper_bound(i))
{
    Q_i = Q.get_Q(i,l);
    if(ui)
        for(k=0;k<l;k++)
            G_bar[k] -= C_i * Q_i[k];
    else
        for(k=0;k<l;k++)
            G_bar[k] += C_i * Q_i[k];
}
if(uj != is_upper_bound(j))
{
    Q_j = Q.get_Q(j,l);
    if(uj)
        for(k=0;k<l;k++)
            G_bar[k] -= C_j * Q_j[k];
    else
        for(k=0;k<l;k++)
            G_bar[k] += C_j * Q_j[k];
}

Opssss, we are almost there:

    // calculate rho

    si->rho = calculate_rho();

    // calculate objective value
    {
        double v = 0;
        int i;
        for(i=0;i<l;i++)
            v += alpha[i] * (G[i] + p[i]);

        si->obj = v/2;
    }

    // put back the solution
    {
        for(int i=0;i<l;i++)
            alpha_[active_set[i]] = alpha[i];
    }

    si->upper_bound_p = Cp;
    si->upper_bound_n = Cn;


INPUT & OUTPUT

After the training by svm-train, the file “*.model” will show up. In the svm.cpp, where almost all useful functions locate, the function of “svm_save_model” is responsible for the output. The file looks like:

svm_type c_svc
kernel_type rbf
gamma 0.25
nr_class 2
total_sv 3053
rho -0.495266
label 1 0
nr_sv 1996 1057
SV
0.2759747847329309 1:26.173 2:58.867 3:-0.1894697 4:125.1225
0.5042408472321596 1:57.07397 2:221.404 3:0.08607959 4:122.9114
0.5047343497502841 1:17.259 2:173.436 3:-0.1298053 4:125.0318
0.450868920775192 1:21.7794 2:124.9531 3:0.1538853 4:152.715
0.5049158714985423 1:91.33997 2:293.5699 3:0.1423918 4:160.5402
0.5048770796985661 1:55.375 2:179.222 3:0.1654953 4:111.2273
0.5047914121992919 1:29.562 2:191.357 3:0.09901439 4:103.4076

The first column: y*alpha where α<script type="math/tex" id="MathJax-Element-32">\alpha</script> are dual solution (#-1 coefficients).
Data following: each line represents a support vector.

    for(int i=0;i<l;i++)
    {
        for(int j=0;j<nr_class-1;j++)
            fprintf(fp, "%.16g ",sv_coef[j][i]); //the first column

        const svm_node *p = SV[i];

        if(param.kernel_type == PRECOMPUTED)
            fprintf(fp,"0:%d ",(int)(p->value));
        else
            while(p->index != -1)
            {
                fprintf(fp,"%d:%.8g ",p->index,p->value); // index: 1-4; sv value
                p++;
            }
        fprintf(fp, "\n");
    }       

One part from the svm_train function:

    decision_function f = svm_train_one(prob,param,0,0);
    //receive the f from svm_train_one

    int i;
    for(i=0;i<prob->l;i++)
        if(fabs(f.alpha[i]) > 0) ++nSV;

    int j = 0;
    for(i=0;i<prob->l;i++)
        if(fabs(f.alpha[i]) > 0)
        {
            model->SV[j] = prob->x[i];
            model->sv_coef[0][j] = f.alpha[i];
            model->sv_indices[j] = i+1;
            ++j;
        }

In svm_train, the function first declare a new svm_model and at last return it.

    svm_model *model = Malloc(svm_model,1);

In the main(), model will receive it and save its data to the file:

    model = svm_train(&prob,&param);
    if(svm_save_model(model_file_name,model))

Read problem from the file. Mainly are the

void read_problem(const char *filename)
{
    //...
    prob.y = Malloc(double,prob.l);
    prob.x = Malloc(struct svm_node *,prob.l);
    x_space = Malloc(struct svm_node,elements);
    // elements are # of values
    prob.x[i] = &x_space[j]; 
    // x_space is a svm_node (index,value); prob.x is a svm_node *.
    // prob.x[i][0].value is x_space[0].value
    // prob.y is lable
    //...
    x_space[j].index = (int) strtol(idx,&endptr,10);
    x_space[j].value = http://www.mamicode.com/strtod(val,&endptr);>

2 IMPORTANT classes, actually 2+2:

struct svm_node
{
    int index;
    double value;
};

struct svm_problem
{
    int l;
    double *y;
    struct svm_node **x;
};

struct svm_parameter
{
    int svm_type;
    int kernel_type;
    int degree; /* for poly */
    double gamma;   /* for poly/rbf/sigmoid */
    double coef0;   /* for poly/sigmoid */

    /* these are for training only */
    double cache_size; /* in MB */
    double eps; /* stopping criteria */
    double C;   /* for C_SVC, EPSILON_SVR and NU_SVR */
    int nr_weight;      /* for C_SVC */
    int *weight_label;  /* for C_SVC */
    double* weight;     /* for C_SVC */
    double nu;  /* for NU_SVC, ONE_CLASS, and NU_SVR */
    double p;   /* for EPSILON_SVR */
    int shrinking;  /* use the shrinking heuristics */
    int probability; /* do probability estimates */
};

struct svm_model
{
    struct svm_parameter param; /* parameter */
    int nr_class;       /* number of classes, = 2 in regression/one class svm */
    int l;          /* total #SV */
    struct svm_node **SV;       /* SVs (SV[l]) */
    double **sv_coef;   /* coefficients for SVs in decision functions (sv_coef[k-1][l]) */
    double *rho;        /* constants in decision functions (rho[k*(k-1)/2]) */
    double *probA;      /* pariwise probability information */
    double *probB;
    int *sv_indices;        /* sv_indices[0,...,nSV-1] are values in [1,...,num_traning_data] to indicate SVs in the training set */

    /* for classification only */

    int *label;     /* label of each class (label[k]) */
    int *nSV;       /* number of SVs for each class (nSV[k]) */
                /* nSV[0] + nSV[1] + ... + nSV[k-1] = l */
    /* XXX */
    int free_sv;        /* 1 if svm_model is created by svm_load_model*/
                /* 0 if svm_model is created by svm_train */
};

Citations:

[1] http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf
[2]“Working set selection using second order information for traning support vector machines”, Rong-En Fan, etc., 2005.

<script type="text/javascript"> $(function () { $(‘pre.prettyprint code‘).each(function () { var lines = $(this).text().split(‘\n‘).length; var $numbering = $(‘
    ‘).addClass(‘pre-numbering‘).hide(); $(this).addClass(‘has-numbering‘).parent().append($numbering); for (i = 1; i <= lines; i++) { $numbering.append($(‘
  • ‘).text(i)); }; $numbering.fadeIn(1700); }); }); </script>

libsvm源码凝视+算法描写叙述:svm_train