c++实现之 -- 汉语词语的简单处理

首页 > 代码库 > c++实现之 -- 汉语词语的简单处理

c++实现之 -- 汉语词语的简单处理

2024-08-06 11:59:54 216人阅读

好了，我们现在已经会怎样读入了，然后就是研究一下如何存储等一些细节上的的问题了。

首先，比较函数是不能传入char*的地址的，但是可以接受一个string类。

然而，如果是两个比较长的string类，要进行比较的话，时间复杂度会上升至O(min（length）)，非常不合算。于是采用双哈希的办法，用h1、h2两个哈希值来表示特定字符串，冲突概率可以下降至基本忽略不计。不难发现双哈希的单词比较复杂度是O(2)的，大大减少了时间复杂度。

然后，就是采用什么容器进行存储。一般有两种：（不妨设哈希的使用的素数分别为p1和p2）

第一种是二维数组，第一维表示h1，第二维表示h2。为了节省空间第二维用vector进行存储，于是插入和查询的时间复杂度都是O(log(p2))。

第二种嘛，直接丢到map里，插入、查询的时间复杂度都是O(log(cnt)) （其中cnt表示不同单词个数）

于是我直接用了第二种，因为实现起来简单，而且复杂度基本相同。（因为vector常数大）

 1 #include <cstdio> 2 #include <iostream> 3 #include <string> 4 #include <cstring> 5 #include <algorithm> 6 #include <map> 7  8 #define TF second 9 using namespace std;10 const int mod1 = 19997;11 const int mod2 = 30001;12 const int bin = 1 << 9;13 14 struct Word {15     string st;16     int h1, h2;17     inline bool operator < (const Word &x) const {18         return h1 == x.h1 ? h2 < x.h2 : h1 < x.h1;19     }20 21     #define x (int) st[i]22     #define Weight 300123     inline void calc_hash() {24         int len = st.length(), tmp, i;25         for (i = tmp = 0; i < len; ++i)26             ((tmp *= Weight) += (x < 0 ? x + bin : x)) %= mod1;27         h1 = tmp;28         for (i = tmp = 0; i < len; ++i)29             ((tmp *= Weight) += (x < 0 ? x + bin : x)) %= mod2;30         h2 = tmp;31     }32     #undef x33     #undef Weight34 };35 typedef map <string, int> map_for_words;36 typedef map_for_words :: iterator iter_for_words;37 38 map_for_words data;39 Word w;40 41 int main() {42     freopen("test.in", "r", stdin);43     ios::sync_with_stdio(false);44     while (cin >> w.st) {45         w.calc_hash();46         data[w.st] += 1;47     }48     iter_for_words it;49     for (it = data.begin(); it != data.end(); ++it)50         cout << it -> first << ‘ ‘ << it -> TF << endl;51     return 0;52 }

View Code

效果（貌似还可以的说）：

输入：

输出：

（不要问我这界面怎么那么搞笑。。。这是终端的说）

c++实现之 -- 汉语词语的简单处理

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > c++实现之 -- 汉语词语的简单处理

c++实现之 -- 汉语词语的简单处理

看完仍有疑问？有类似问题直接问程序猿