首页 > 代码库 > Apache commons codec 之language

Apache commons codec 之language

         Apache commons codec的language是一个功能比较强大的包,主要是用在对各种语言的处理上,当然,这个包对汉字的支持很糟糕。这一块的内容,在网上非常少,只能自己写一些挺肤浅的代码,以后如果有机会接触,再完善。

我们先学习下编码的范例。

 

Examples of Soundex Coding

     Name                Letters Coded       Coding

     Allricht            l, r, c             A-462

     Eberhard            b, r, r             E-166

     Engebrethson        n, g, b             E-521

     Heimbach            m, b, c             H-512

     Hanselmann          n, s, l             H-524

     Henzelmann          n, z, l             H-524

     Hildebrand          l, d, b             H-431

     Kavanagh            v, n, g             K-152

     Lind, Van           n, d                L-530

     Lukaschowsky        k, s, s             L-222

     McDonnell           c, d, n             M-235

     McGee               c                   M-200

     O‘Brien             b, r, n             O-165

     Opnian              p, n, n             O-155

     Oppenheimer         p, n, m             O-155

     Swhgler             s, l, r             S-460

     Riedemanas          d, m, n             R-355

     Zita                t                   Z-300

     Zitzmeinn           t, z, m             Z-325

 

         在一些英文使用的场合中,特别是语音上,有很多可能相似的单词,用这个包里的方法使用下:

package test.ffm83.commons.codec;

 

import org.apache.commons.codec.language.RefinedSoundex;

import org.apache.commons.lang.StringUtils;

/**

 * 通过apache commonscodeclanauage包进行字符的相似度差异

 * 返回的是0到最短的编码长度,0表示无关,4或者意味着可能比较相似

 * @author范芳铭

 */

public classEasyLanguageDiff {

    private RefinedSoundexstringEncoder = this.createStringEncoder();

    public static void main(String[] args)throws Exception{

        EasyLanguageDiffdiff = newEasyLanguageDiff();

        diff.getDifference();

        diff.getEncode();

       

    }

   protectedRefinedSoundex createStringEncoder() {

        return new RefinedSoundex();

   }

   publicRefinedSoundex getStringEncoder() {

        return this.stringEncoder;

   }

    public  void getDifference() throws Exception{

        System.out.println(StringUtils.center("字符串差异", 50,"-"));

        System.out.println(this.getStringEncoder().difference(null,null));

        System.out.println(this.getStringEncoder().difference("",""));

        System.out.println(this.getStringEncoder().difference(" "," "));

        System.out.println(this.getStringEncoder().difference("Margaret","Andrew"));

        System.out.println(this.getStringEncoder().difference("Smith","Smythe"));

        System.out.println(this.getStringEncoder().difference("Ann","Andrew"));

        System.out.println(this.getStringEncoder().difference("Green","Greene"));

        System.out.println(this.getStringEncoder().difference("Smithers","Smythers"));

        System.out.println();

    }

   

    public void getEncode() throws Exception{

        System.out.println(StringUtils.center("语言编码", 50,"-"));

        System.out.println(this.getStringEncoder().encode("testing")); 

        System.out.println(this.getStringEncoder().encode("TESTING"));

       

        System.out.println(this.getStringEncoder().encode("dogs"));

        System.out.println(RefinedSoundex.US_ENGLISH.encode("dogs"));

        //目前不支持汉字,加入就直接异常

        //System.out.println(this.getStringEncoder().encode("范芳铭"));

        System.out.println();

    }

}

 

运行结果如下:

----------------------字符串差异-----------------------

0

0

0

1

6

3

5

8

 

-----------------------语言编码-----------------------

T6036084

T6036084

D6043

D6043

Apache commons codec 之language