首页 > 代码库 > R语言进行中文分词
R语言进行中文分词
用两种方法进行中文分词:Rwordseg和jiebaR
R语言的环境配置:
R_Path:
C:\Program Files\R\R-3.1.2
Path:%R_Path%
一、用Rwordseg包进行中文分词
(1)进行Java的环境变量配置:
JAVA_HOME:
C:\Program Files\Java\jdk1.8.0_31
Path:
%JAVA_HOME%\bin;%JAVA_HOME%\jre\bin
CLASSPATH:
%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar
(2)下载Rwordseg包到本地硬盘,当前版本的Rwordseg包在https://r-forge.r-project.org/R/?group_id=1054
1 > install.packages("rJava")
2 > 将以下路径添加到Path环境变量中:
? %JAVA_HOME%\jre\bin
? %JAVA_HOME%\jre\bin\server
? %R_Path%\library\rJava\jri
3 > install.packages("下载好的Rwordseg包所在的文件夹地址/Rwordseg_0.2-1.zip", repos=NULL,type="source")
(3)输入命令:
1 > library("rJava")
2 > library("Rwordseg")
3 > words = "环卫工因在寒风中烤火取暖被辞退"
4 > segment.options(isNameRecognition = TRUE) #打开人名识别
5 > segmentCN(words)
运行结果:
[1] "环卫" "工" "因" "在" "寒风" "中" "烤火" "取暖" "被" "辞退"
换成words = "我的名字是R语言"
运行结果:[1] "我" "的" "名字" "是" "R语言"
二、用jiebaR包进行中文分词
(1)输入命令:
1 > install.packages("jiebaR") #安装jiebaR包
2 > library("jiebaRD") #加载jiebaRD包
3 > library("jiebaR")
4 > words = "环卫工因在寒风中烤火取暖被辞退"
5 > test = worker()
6 > test <= words
(2)输出结果:
[1] "环卫工" "因在" "寒风" "中" "烤火" "取暖" "被" "辞退"
换成words = "我的名字是R语言"
运行结果:[1] "我" "的" "名字" "是" "R" "语言"
R语言进行中文分词