首页 > 代码库 > 软件工程第二周作业----词频统计

软件工程第二周作业----词频统计

实验要求:

统计文件中出现过的单词数目,并按数目和字典顺序排序,将结果输出到指定文件中。需要统计单词的文件名从命令行输入。

 

实验分工:

代码编写&测试:张文杰

博客编写:朱昱青

 

实验思路:

1.在主函数中打开输入和输出文件,获得文件的指针,然后以指针为参数调用count()函数进行词频统计。

2.在count()函数中,利用while循环,不断从文件中分离出可能是单词的字符串(也就是被分隔符隔开的连续字母和数字),然后进一步判断该字符串是否是一个单词。如果是,再查看这个单词是否出现过,如果出现过,则该单词数目+1,并且要比较一下这个单词和数组里存的单词的字典序,如果这个单词的字典序较小,就用它来替换原来的单词,例如用Word替换word;如果没出现过,就在存储结果的数组里新增一项。重复这个过程,直到读完整个文件。然后将数组中的内容按数目和字典序进行排序,按规定的格式输出到文件里(在这个程序里是输出到result.txt中)。

 

实验源码:

https://github.com/EverBlue1997/MyFirstRepository/blob/master/CountWords.cpp

 

 

结果测试:

1.首先用一个空文件来做测试,结果输出文件也为空,没有出现错误;

2.接下来用一段正常的英文来做测试:

As for the youth who had tried to steal the white horse that the King owned, he was bound hand and foot and taken into the castle of the King. There he was thrown down beside the trestles of the great table, and the hot wax from the candles that lighted the supper board dripped down upon him. And it was told to him that at the morrow’s sunrise he would be slain with the sword.

Then the King called upon one to finish the story that was being told when the neigh of the white horse was heard in the stable. The story could not be finished for him, however, because the one who had been telling it was now outside, guarding the iron door of the stable with a sword in his hand. And King Manus, sitting at the supper board, could not eat nor refresh himself because there was no one at hand to finish the story for him.

  测试结果:

<that>: 4
<King>: 4
<hand>: 3
<story>: 3
<white>: 2
<There>: 2
<down>: 2
<supper>: 2
<board>: 2
<upon>: 2
<told>: 2
<with>: 2
<sword>: 2
<finish>: 2
<horse>: 2
<stable>: 2
<could>: 2
<because>: 2
<trestles>: 1
<great>: 1
<table>: 1
<from>: 1
<candles>: 1
<lighted>: 1
<owned>: 1
<bound>: 1
<dripped>: 1
<tried>: 1
<foot>: 1
<morrow>: 1
<sunrise>: 1
<would>: 1
<slain>: 1
<taken>: 1
<into>: 1
<Then>: 1
<called>: 1
<castle>: 1
<steal>: 1
<being>: 1
<when>: 1
<neigh>: 1
<heard>: 1
<thrown>: 1
<youth>: 1
<finished>: 1
<however>: 1
<beside>: 1
<been>: 1
<telling>: 1
<outside>: 1
<guarding>: 1
<iron>: 1
<door>: 1
<Manus>: 1
<sitting>: 1
<refresh>: 1
<himself>: 1

检查过后发现没有错误;

3.用一段特殊的字符串来测试:

word123#123word@Word123

  测试结果为:

<Word123>: 2

经过测试,暂时没有发现程序有什么错误。

软件工程第二周作业----词频统计