首页 > 代码库 > Linux中对为本去重

Linux中对为本去重

1.格式

 uniq [OPTION]... [INPUT [OUTPUT]]

2.命令

       -c, --count              prefix lines by the number of occurrences       -d, --repeated              only print duplicate lines       -D, --all-repeated[=delimit-method]              print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines       -f, --skip-fields=N              avoid comparing the first N fields       -i, --ignore-case              ignore differences in case when comparing       -s, --skip-chars=N              avoid comparing the first N characters       -u, --unique              only print unique lines       -z, --zero-terminated              end lines with 0 byte, not newline       -w, --check-chars=N              compare no more than N characters in lines       --help display this help and exit       --version              output version information and exit     

3.举例子

unique.txt

hellopythonhellopythonpythonbbs.pythontab.compythonpythontab.compythonhello.pythontab.comhellopythontabhellopythontab

(1)执行 uniq unique.txt

hellopythonpythonbbs.pythontab.compythonpythontab.compythonhello.pythontab.comhellopythontab

(2)看了上面是不是感觉不对呢?再执行uniq -c unique.txt

2 hellopython1 python1 bbs.pythontab.com1 python1 pythontab.com1 python1 hello.pythontab.com2 hellopythontab1#感觉还是不对,uniq检查重复行时,是按相邻的行进行检查的#

(3)再执行sort unique.txt | uniq -c

11 bbs.pythontab.com2 hellopython2 hellopythontab3 python1 pythontab.com1 hello.pythontab.com

---------------------

EOF

Linux中对为本去重