LibShortText 怎么处理中文文档

2024-07-31 16:46:39 221人阅读

LibShortText 是林智仁老师继 libsvm、liblinear 之后的另一力作，主要有几大特征：

It is more efficient than general text-mining packages. On a typical computer, processing and training 10 million short texts takes only around half an hour.
The fast training and testing is built upon the linear classifier LIBLINEAR
Default options often work well without tedious tuning.
An interactive tool for error analysis is included. Based on the property that each short text contains few words, LibShortText provides details in predicting each text.

这么一个工具，如何使用在中文处理呢？
尝试了一下中文的unigram feature的自动生成，发现中文character 没有count进unigram中=。=

于是我发信问了作者
作者回复：

Unfortunately I don‘t think our code can now support Chinesedocuments.Chih-Jen

LibShortText 怎么处理中文文档

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们