正则表达式

首页 > 代码库 > 正则表达式

2024-09-14 02:31:54 216人阅读

grep egrep

语法： grep [-cinvABC] ‘word‘ filename

-c ：打印符合要求的行数

-i ：忽略大小写

-n ：在输出符合要求的行的同时连同行号一起输出

-v ：打印不符合要求的行

-A ：后跟一个数字（有无空格都可以），例如 –A2则表示打印符合要求的行以及下面两行

-B ：后跟一个数字，例如 –B2 则表示打印符合要求的行以及上面两行

-C ：后跟一个数字，例如 –C2 则表示打印符合要求的行以及上下各两行

[root@localhost ~]# grep -A2 ‘halt‘ /etc/passwd

halt:x:7:0:halt:/sbin:/sbin/halt

mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin

把包含 ‘halt’ 的行以及这行下面的两行都打印出。

[root@localhost ~]# grep -B2 ‘halt‘ /etc/passwd

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

把包含 ‘halt’ 的行以及这行上面的两行都打印出。

[root@localhost ~]# grep -C2 ‘halt‘ /etc/passwd

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin

把包含 ‘halt’ 的行以及这行上面和下面的各两行都打印出。

下面我举几个典型实例帮你更深刻的理解grep.

过滤出带有某个关键词的行并输出行号

[root@localhost ~]# grep -n ‘root‘ /etc/passwd

1:root:x:0:0:root:/root:/bin/bash

11:operator:x:11:0:operator:/root:/sbin/nologin

过滤不带有某个关键词的行，并输出行号

[root@localhost ~]# grep -nv ‘nologin‘ /etc/passwd

1:root:x:0:0:root:/root:/bin/bash

6:sync:x:5:0:sync:/sbin:/bin/sync

7:shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

8:halt:x:7:0:halt:/sbin:/sbin/halt

26:test:x:511:511::/home/test:/bin/bash

27:test1:x:512:511::/home/test1:/bin/bash

过滤出所有包含数字的行

[root@localhost ~]# grep ‘[0-9]‘ /etc/inittab

# upstart works, see init(5), init(8), and initctl(8).

# 0 - halt (Do NOT set initdefault to this)

# 1 - Single user mode

# 2 - Multiuser, without NFS (The same as 3, if you do not have networking)

# 3 - Full multiuser mode

# 4 - unused

# 5 - X11

# 6 - reboot (Do NOT set initdefault to this)

id:3:initdefault:

过滤出所有不包含数字的行

[root@localhost ~]# grep -v ‘[0-9]‘ /etc/inittab

# inittab is only used by upstart for the default runlevel.

# ADDING OTHER CONFIGURATION HERE WILL HAVE NO EFFECT ON YOUR SYSTEM.

# System initialization is started by /etc/init/rcS.conf

# Individual runlevels are started by /etc/init/rc.conf

# Ctrl-Alt-Delete is handled by /etc/init/control-alt-delete.conf

# Terminal gettys are handled by /etc/init/tty.conf and /etc/init/serial.conf,

# with configuration in /etc/sysconfig/init.

# For information on how to write upstart event handlers, or how

# Default runlevel. The runlevels used are:

把所有以 ‘#’ 开头的行去除

[root@localhost ~]# grep -v ‘^#‘ /etc/inittab

id:3:initdefault:

去除所有空行和以 ‘#’ 开头的行

[root@localhost ~]# grep -v ‘^#‘ /etc/crontab |grep -v ‘^$‘

SHELL=/bin/bash

PATH=/sbin:/bin:/usr/sbin:/usr/bin

MAILTO=root

HOME=/

在正则表达式中， “^” 表示行的开始， “$” 表示行的结尾，那么空行则可以用 “^$” 表示，如何打印出不以英文字母开头的行呢？

[root@localhost ~]# vim test.txt

[root@localhost ~]# cat test.txt

123

abc

456

abc2323

#laksdjf

Alllllllll

我先在test.txt中写几行字符串，用来做实验。

[root@localhost ~]# grep ‘^[^a-zA-Z]‘ test.txt

123

456

#laksdjf

[root@localhost ~]# grep ‘[^a-zA-Z]‘ test.txt

123

456

abc2323

#laksdjf

在前面我也提到过这个 ‘[ ]’ 的应用，如果是数字的话就用[0-9]这样的形式，当然有时候也可以用这样的形式[15]即只含有1或者5，注意，它不会认为是15。如果要过滤出数字以及大小写字母则要这样写[0-9a-zA-Z]。另外[ ]还有一种形式，就是[^字符] 表示除[ ]内的字符之外的字符。

过滤任意一个字符与重复字符

[root@localhost ~]# grep ‘r..o‘ /etc/passwd

operator:x:11:0:operator:/root:/sbin/nologin

gopher:x:13:30:gopher:/var/gopher:/sbin/nologin

vcsa:x:69:69:virtual console memory owner:/dev:/sbin/nologin

. 表示任意一个字符，上例中，就是把符合r与o之间有两个任意字符的行过滤出来， * 表示零个或多个前面的字符。

[root@localhost ~]# grep ‘ooo*‘ /etc/passwd

root:x:0:0:root:/root:/bin/bash

lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin

operator:x:11:0:operator:/root:/sbin/nologin

postfix:x:89:89::/var/spool/postfix:/sbin/nologin

‘ooo*’ 表示oo, ooo, oooo ... 或者更多的 ‘o’ 现在你是否想到了 ‘.*’ 这个组合表示什么意义？

[root@localhost ~]# grep ‘.*‘ /etc/passwd |wc -l

[root@localhost ~]# wc -l /etc/passwd

27 /etc/passwd

‘.*’ 表示零个或多个任意字符，空行也包含在内。

指定要过滤字符出现的次数

[root@localhost ~]# grep ‘o\{2\}‘ /etc/passwd

root:x:0:0:root:/root:/bin/bash

lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin

operator:x:11:0:operator:/root:/sbin/nologin

postfix:x:89:89::/var/spool/postfix:/sbin/nologin

这里用到了{ }，其内部为数字，表示前面的字符要重复的次数。上例中表示包含有两个o 即 ‘oo’ 的行。注意，{ }左右都需要加上脱意字符 ‘\’, 另外，使用{ }我们还可以表示一个范围的，具体格式是 ‘{n1,n2}’ 其中n1<n2，表示重复n1到n2次前面的字符，n2还可以为空，则表示大于等于n1次。

上面部分讲的grep，另外我常常用到egrep这个工具，简单点讲，后者是前者的扩展版本，我们可以用egrep完成grep不能完成的工作，当然了grep能完成的egrep完全可以完成。如果你嫌麻烦，egrep了解一下即可，因为grep的功能已经足够可以胜任你的日常工作了。下面我介绍egrep不用于grep的几个用法。为了试验方便，我把test.txt 编辑成如下内容：

rot:x:0:0:/rot:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

筛选一个或一个以上前面的字符

[root@localhost ~]# egrep ‘o+‘ test.txt

rot:x:0:0:/rot:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

[root@localhost ~]# egrep ‘oo+‘ test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

[root@localhost ~]# egrep ‘ooo+‘ test.txt

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

和grep 不同的是，egrep这里是使用’+’的。

筛选零个或一个前面的字符

[root@localhost ~]# egrep ‘o?‘ test.txt

rot:x:0:0:/rot:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

[root@localhost ~]# egrep ‘ooo?‘ test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

[root@localhost ~]# egrep ‘oooo?‘ test.txt

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

筛选字符串1或者字符串2

[root@localhost ~]# egrep ‘aaa|111|ooo‘ test.txt

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

egrep中( )的应用

[root@localhost ~]# egrep ‘r(oo)|(at)o‘ test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

用( )表示一个整体，例如(oo)+就表示1个 ‘oo’ 或者多个 ‘oo’

[root@localhost ~]# egrep ‘(oo)+‘ test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

-----------------------------------------------------------------------------------------------

sed工具的使用

grep工具的功能其实还不够强大，grep实现的只是查找功能，而它却不能实现把查找的内容替换掉。以前用vim的时候，可以查找也可以替换，但是只局限于在文本内部来操作，而不能输出到屏幕上。sed工具以及下面要讲的awk工具就能实现把替换的文本输出到屏幕上的功能了，而且还有其他更丰富的功能。sed和awk都是流式编辑器，是针对文档的行来操作的。

打印某行

sed -n ‘n‘p filename 单引号内的n是一个数字，表示第几行:

[root@localhost ~]# sed -n ‘2‘p /etc/passwd

bin:x:1:1:bin:/bin:/sbin/nologin

要想把所有行都打印出来可以使用 sed -n ‘1,$‘p filename

[root@localhost ~]# sed -n ‘1,$‘p test.txt

rot:x:0:0:/rot:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

也可以指定一个区间:

[root@localhost ~]# sed -n ‘1,3‘p test.txt

rot:x:0:0:/rot:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

打印包含某个字符串的行

[root@localhost ~]# sed -n ‘/root/‘p test.txt

operator:x:11:0:operator:/root:/sbin/nologin

grep中使用的特殊字符，如 ^ $ . * 等同样也能在sed中使用

[root@localhost ~]# sed -n ‘/^1/‘p test.txt

1111111111111111111111111111111

[root@localhost ~]# sed -n ‘/in$/‘p test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

[root@localhost ~]# sed -n ‘/r..o/‘p test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

[root@localhost ~]# sed -n ‘/ooo*/‘p test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

-e可以实现多个行为

[root@localhost ~]# sed -e ‘1‘p -e ‘/111/‘p -n test.txt

rot:x:0:0:/rot:/bin/bash

1111111111111111111111111111111

删除某行或者多行

[root@localhost ~]# sed ‘1‘d test.txt

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

[root@localhost ~]# sed ‘1,3‘d test.txt

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

[root@localhost ~]# sed ‘/oot/‘d test.txt

rot:x:0:0:/rot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

‘d’ 这个字符就是删除的动作了，不仅可以删除指定的单行以及多行，而且还可以删除匹配某个字符的行，另外还可以删除从某一行一直到文档末行。

替换字符或字符串

[root@localhost ~]# sed ‘1,2s/ot/to/g‘ test.txt

rto:x:0:0:/rto:/bin/bash

operator:x:11:0:operator:/roto:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

上例中的 ‘s’ 就是替换的命令， ‘g’ 为本行中全局替换，如果不加 ‘g’ 只换该行中出现的第一个。除了可以使用 ‘/’ 作为分隔符外，还可以使用其他特殊字符例如 ‘#’ 或者 ‘@’ 都没有问题。

[root@localhost ~]# sed ‘s#ot#to#g‘ test.txt

rto:x:0:0:/rto:/bin/bash

operator:x:11:0:operator:/roto:/sbin/nologin

operator:x:11:0:operator:/rooto:/sbin/nologin

roooto:x:0:0:/rooooto:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

[root@localhost ~]# sed ‘s@ot@to@g‘ test.txt

rto:x:0:0:/rto:/bin/bash

operator:x:11:0:operator:/roto:/sbin/nologin

operator:x:11:0:operator:/rooto:/sbin/nologin

roooto:x:0:0:/rooooto:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

现在思考一下，如何删除文档中的所有数字或者字母？

[root@localhost ~]# sed ‘s/[0-9]//g‘ test.txt

rot:x:::/rot:/bin/bash

operator:x:::operator:/root:/sbin/nologin

operator:x:::operator:/rooot:/sbin/nologin

roooot:x:::/rooooot:/bin/bash

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

[0-9]表示任意的数字。这里你也可以写成[a-zA-Z]甚至[0-9a-zA-Z]

[root@localhost ~]# sed ‘s/[a-zA-Z]//g‘ test.txt

::0:0:/://

::11:0::/://

::0:0:/://

1111111111111111111111111111111

调换两个字符串的位置

[root@localhost ~]# sed ‘s/$rot$$.*$$bash$/\3\2\1/‘ test.txt

bash:x:0:0:/rot:/bin/rot

operator:x:11:0:operator:/root:/sbin/nologin

operator:x:11:0:operator:/rooot:/sbin/nologin

roooot:x:0:0:/rooooot:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

这个就需要解释一下了，上例中用 () 把所想要替换的字符括起来成为一个整体，因为括号在sed中属于特殊符号，所以需要在前面加脱意字符 ‘’, 替换时则写成 ‘1’, ‘‘2’, ‘‘3’ 的形式。除了调换两个字符串的位置外，我还常常用到在某一行前或者后增加指定内容。

[root@localhost ~]# sed ‘s/^.*$/123&/‘ test.txt

123rot:x:0:0:/rot:/bin/bash

123operator:x:11:0:operator:/root:/sbin/nologin

123operator:x:11:0:operator:/rooot:/sbin/nologin

123roooot:x:0:0:/rooooot:/bin/bash

1231111111111111111111111111111111

123aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

直接修改文件的内容

[root@localhost ~]# sed -i ‘s/ot/to/g‘ test.txt

[root@localhost ~]# cat test.txt

rto:x:0:0:/rto:/bin/bash

operator:x:11:0:operator:/roto:/sbin/nologin

operator:x:11:0:operator:/rooto:/sbin/nologin

roooto:x:0:0:/rooooto:/bin/bash

1111111111111111111111111111111

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

这样就可以直接更改test.txt文件中的内容了。由于这个命令可以直接把文件修改，所以在修改前最好先复制一下文件以免改错。

-----------------------------------------------------------------------------------------------------------------------

awk工具的使用

上面也提到了awk和sed一样是流式编辑器，它也是针对文档中的行来操作的，一行一行的去执行。awk比sed更加强大，它能做到sed能做到的，同样也能做到sed不能做到的。awk工具其实是很复杂的，有专门的书籍来介绍它的应用，但是我认为学那么复杂没有必要，只要能处理日常管理工作中的问题即可。何必让自己的脑袋装那么东西来为难自己？毕竟用的也不多，即使现在教会了你很多，你也学会了，如果很久不用肯定就忘记了。鉴于此，我仅介绍比较常见的awk应用，如果你感兴趣的话，再去深入研究吧。

截取文档中的某个段

[root@localhost ~]# head -n2 /etc/passwd |awk -F ‘:‘ ‘{print $1}‘

root

bin

解释一下，-F 选项的作用是指定分隔符，如果不加-F指定，则以空格或者tab为分隔符。 Print为打印的动作，用来打印出某个字段。$1为第一个字段，$2为第二个字段，依次类推，有一个特殊的那就是$0，它表示整行。

[root@localhost ~]# head -n2 test.txt |awk -F‘:‘ ‘{print $0}‘

rto:x:0:0:/rto:/bin/bash

operator:x:11:0:operator:/roto:/sbin/nologin

注意awk的格式，-F后紧跟单引号，然后里面为分隔符，print的动作要用 { } 括起来，否则会报错。print还可以打印自定义的内容，但是自定义的内容要用双引号括起来。

[root@localhost ~]# head -n2 test.txt |awk -F‘:‘ ‘{print $1"#"$2"#"$3"#"$4}‘

rto#x#0#0

operator#x#11#0

匹配字符或字符串

[root@localhost ~]# awk ‘/oo/‘ test.txt

operator:x:11:0:operator:/rooto:/sbin/nologin

roooto:x:0:0:/rooooto:/bin/bash

跟sed很类似吧，不过还有比sed更强大的匹配。

[root@localhost ~]# awk -F ‘:‘ ‘$1 ~/oo/‘ test.txt

roooto:x:0:0:/rooooto:/bin/bash

可以让某个段去匹配，这里的’~’就是匹配的意思，继续往下看

[root@localhost ~]# awk -F ‘:‘ ‘/root/ {print $1,$3} /test/ {print $1,$3}‘ /etc/passwd

root 0

operator 11

test 511

test1 512

awk还可以多次匹配，如上例中匹配完root，再匹配test，它还可以只打印所匹配的段。

条件操作符

[root@localhost ~]# awk -F ‘:‘ ‘$3=="0"‘ /etc/passwd

root:x:0:0:root:/root:/bin/bash

awk中是可以用逻辑符号判断的，比如 ‘==’ 就是等于，也可以理解为 ‘精确匹配’ 另外也有 >, ‘>=, ‘<, ‘<=, ‘!= 等等，值得注意的是，在和数字比较时，若把比较的数字用双引号引起来后，那么awk不会认为是数字，而认为是字符，不加双引号则认为是数字。

[root@localhost ~]# awk -F ‘:‘ ‘$3>="500"‘ /etc/passwd

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

nobody:x:99:99:Nobody:/:/sbin/nologin

dbus:x:81:81:System message bus:/:/sbin/nologin

vcsa:x:69:69:virtual console memory owner:/dev:/sbin/nologin

haldaemon:x:68:68:HAL daemon:/:/sbin/nologin

postfix:x:89:89::/var/spool/postfix:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

tcpdump:x:72:72::/:/sbin/nologin

user11:x:510:502:user11,user11‘s office,12345678,123456789:/home/user11:/sbin/nologin

test:x:511:511::/home/test:/bin/bash

test1:x:512:511::/home/test1:/bin/bash

在上面的例子中，我本想把uid大于等于500的行打印出，但是结果并不是我们的预期，这是因为awk把所有的数字当作字符来对待了，就跟上一章中提到的 sort 排序原理一样。

[root@localhost ~]# awk -F ‘:‘ ‘$7!="/sbin/nologin"‘ /etc/passwd

root:x:0:0:root:/root:/bin/bash

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

test:x:511:511::/home/test:/bin/bash

test1:x:512:511::/home/test1:/bin/bash

!= 为不匹配，除了针对某一个段的字符进行逻辑比较外，还可以两个段之间进行逻辑比较。

[root@localhost ~]# awk -F ‘:‘ ‘$3<$4‘ /etc/passwd

adm:x:3:4:adm:/var/adm:/sbin/nologin

lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin

games:x:12:100:games:/usr/games:/sbin/nologin

gopher:x:13:30:gopher:/var/gopher:/sbin/nologin

ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

另外还可以使用 && 和 || 表示 “并且” 和 “或者” 的意思。

[root@localhost ~]# awk -F ‘:‘ ‘$3>"5" && $3<"7"‘ /etc/passwd

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

vcsa:x:69:69:virtual console memory owner:/dev:/sbin/nologin

haldaemon:x:68:68:HAL daemon:/:/sbin/nologin

user11:x:510:502:user11,user11‘s office,12345678,123456789:/home/user11:/sbin/nologin

test:x:511:511::/home/test:/bin/bash

test1:x:512:511::/home/test1:/bin/bash

也可以是或者

[root@localhost ~]# awk -F ‘:‘ ‘$3>"5" || $7=="/bin/bash"‘ /etc/passwd

root:x:0:0:root:/root:/bin/bash

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

nobody:x:99:99:Nobody:/:/sbin/nologin

dbus:x:81:81:System message bus:/:/sbin/nologin

vcsa:x:69:69:virtual console memory owner:/dev:/sbin/nologin

haldaemon:x:68:68:HAL daemon:/:/sbin/nologin

postfix:x:89:89::/var/spool/postfix:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

tcpdump:x:72:72::/:/sbin/nologin

user11:x:510:502:user11,user11‘s office,12345678,123456789:/home/user11:/sbin/nologin

test:x:511:511::/home/test:/bin/bash

test1:x:512:511::/home/test1:/bin/bash

awk的内置变量

awk常用的变量有：

NF ：用分隔符分隔后一共有多少段

NR ：行数

[root@localhost ~]# head -n3 /etc/passwd | awk -F ‘:‘ ‘{print NF}‘

[root@localhost ~]# head -n3 /etc/passwd | awk -F ‘:‘ ‘{print $NF}‘

/bin/bash

/sbin/nologin

NF 是多少段，而$NF是最后一段的值, 而NR则是行号。

[root@localhost ~]# head -n3 /etc/passwd | awk -F ‘:‘ ‘{print NR}‘

我们可以使用行号作为判断条件：

[root@localhost ~]# awk ‘NR>20‘ /etc/passwd

postfix:x:89:89::/var/spool/postfix:/sbin/nologin

abrt:x:173:173::/etc/abrt:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

tcpdump:x:72:72::/:/sbin/nologin

user11:x:510:502:user11,user11‘s office,12345678,123456789:/home/user11:/sbin/nologin

test:x:511:511::/home/test:/bin/bash

test1:x:512:511::/home/test1:/bin/bash

也可以配合段匹配一起使用：

[root@localhost ~]# awk -F ‘:‘ ‘NR>20 && $1 ~ /ssh/‘ /etc/passwd

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

awk中的数学运算

awk可以把段值更改：

[root@localhost ~]# head -n 3 /etc/passwd |awk -F ‘:‘ ‘$1="root"‘

root x 0 0 root /root /bin/bash

root x 1 1 bin /bin /sbin/nologin

root x 2 2 daemon /sbin /sbin/nologin

awk还可以对各个段的值进行数学运算：

[root@localhost ~]# head -n2 /etc/passwd

root:x:0:0:root:/root:/bin/bash

bin:x:1:1:bin:/bin:/sbin/nologin

[root@localhost ~]# head -n2 /etc/passwd |awk -F ‘:‘ ‘{$7=$3+$4}‘

[root@localhost ~]# head -n2 /etc/passwd |awk -F ‘:‘ ‘{$7=$3+$4; print $0}‘

root x 0 0 root /root 0

bin x 1 1 bin /bin 2

当然还可以计算某个段的总和

[root@localhost ~]# awk -F ‘:‘ ‘{(tot=tot+$3)}; END {print tot}‘ /etc/passwd

2891

这里的END要注意一下，表示所有的行都已经执行，这是awk特有的语法，其实awk连同sed都可以写成一个脚本文件，而且有他们特有的语法，在awk中使用if判断、for循环都是可以的，只是我认为日常管理工作中没有必要使用那么复杂的语句而已。

[root@localhost ~]# awk -F ‘:‘ ‘{if ($1=="root") print $0}‘ /etc/passwd

root:x:0:0:root:/root:/bin/bash

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

变量

命令历史

root用户命令历史在/root/.bash_history

默认保存1000条，echo $HISTSIZE

!!表示上一条命令 !$表示上一条命令的相对参数 !950表示执行第950条命令

alias 为命令设置别名 unalias 取消别名

env查看系统变量

set要比env输出的东西多一些

-------------------------------------------------------------------------------------------------------

* 通配符（多位）

？通配符（一位）

# 注释

\ 托意符号

-------------------------------------------------------------------------------------------------------

sort命令是在Linux里非常有用，它将文件进行排序，并将排序结果标准输出。sort命令既可以从特定的文件，也可以从stdin中获取输入。

语法 sort(选项)(参数)

选项

-k : 已第几段开始排序

-u：排序去重复

-b：忽略每行前面开始出的空格字符；

-c：检查文件是否已经按照顺序排序；

-d：排序时，处理英文字母、数字及空格字符外，忽略其他的字符；

-f：排序时，将小写字母视为大写字母；

-i：排序时，除了040至176之间的ASCII字符外，忽略其他的字符；

-m：将几个排序号的文件进行合并；

-M：将前面3个字母依照月份的缩写进行排序；

-n：依照数值的大小排序；

-o<输出文件>：将排序后的结果存入制定的文件；

-r：以相反的顺序来排序；

-t<分隔字符>：指定排序时所用的栏位分隔字符；

+<起始栏位>-<结束栏位>：以指定的栏位来排序，范围由起始栏位到结束栏位的前一栏位。

参数文件：指定待排序的文件列表。

实例 sort将文件/文本的每一行作为一个单位，相互比较，比较原则是从首字符向后，依次按ASCII码值进行比较，最后将他们按升序输出。

[root@mail text]# cat sort.txt

aaa:10:1.1

ccc:30:3.3

ddd:40:4.4

bbb:20:2.2

eee:50:5.5

[root@mail text]# sort sort.txt

aaa:10:1.1

bbb:20:2.2

ccc:30:3.3

ddd:40:4.4

eee:50:5.5

忽略相同行使用-u选项或者uniq：

[root@mail text]# cat sort.txt

aaa:10:1.1

ccc:30:3.3

ddd:40:4.4

bbb:20:2.2

eee:50:5.5

[root@mail text]# sort -u sort.txt

aaa:10:1.1

bbb:20:2.2

ccc:30:3.3

ddd:40:4.4

eee:50:5.5

[root@mail text]# uniq sort.txt

aaa:10:1.1

ccc:30:3.3

ddd:40:4.4

bbb:20:2.2

eee:50:5.5

sort的-n、-r、-k、-t选项的使用：

[root@mail text]# cat sort.txt

AAA:BB:CC

aaa:30:1.6

ccc:50:3.3

ddd:20:4.2

bbb:10:2.5

eee:40:5.4

eee:60:5.1

#将BB列按照数字从小到大顺序排列：

[root@mail text]# sort -nk 2 -t: sort.txt

AAA:BB:CC

bbb:10:2.5

ddd:20:4.2

aaa:30:1.6

eee:40:5.4

ccc:50:3.3

eee:60:5.1

#将CC列数字从大到小顺序排列：

[root@mail text]# sort -nrk 3 -t: sort.txt

eee:40:5.4

eee:60:5.1

ddd:20:4.2

ccc:50:3.3

bbb:10:2.5

aaa:30:1.6

AAA:BB:CC

# -n是按照数字大小排序，-r是以相反顺序，-k是指定需要爱排序的栏位，-t指定栏位分隔符为冒号

-k选项的具体语法格式： -k选项的语法格式：

FStart.CStart Modifie,FEnd.CEnd Modifier

-------Start--------,-------End--------

FStart.CStart 选项 , FEnd.CEnd 选项

这个语法格式可以被其中的逗号,分为两大部分，Start部分和End部分。Start部分也由三部分组成，其中的Modifier部分就是我们之前说过的类似n和r的选项部分。我们重点说说Start部分的FStart和C.Start。C.Start也是可以省略的，省略的话就表示从本域的开头部分开始。FStart.CStart，其中FStart就是表示使用的域，而CStart则表示在FStart域中从第几个字符开始算“排序首字符”。同理，在End部分中，你可以设定FEnd.CEnd，如果你省略.CEnd，则表示结尾到“域尾”，即本域的最后一个字符。或者，如果你将CEnd设定为0(零)，也是表示结尾到“域尾”。

从公司英文名称的第二个字母开始进行排序：

$ sort -t ‘ ‘ -k 1.2 facebook.txt

baidu 100 5000

sohu 100 4500

google 110 5000

guge 50 3000

使用了-k 1.2，表示对第一个域的第二个字符开始到本域的最后一个字符为止的字符串进行排序。你会发现baidu因为第二个字母是a而名列榜首。sohu和 google第二个字符都是o，但sohu的h在google的o前面，所以两者分别排在第二和第三。guge只能屈居第四了。

只针对公司英文名称的第二个字母进行排序，如果相同的按照员工工资进行降序排序：

$ sort -t ‘ ‘ -k 1.2,1.2 -nrk 3,3 facebook.txt

baidu 100 5000

google 110 5000

sohu 100 4500

guge 50 3000

由于只对第二个字母进行排序，所以我们使用了-k 1.2,1.2的表示方式，表示我们“只”对第二个字母进行排序。（如果你问“我使用-k 1.2怎么不行？”，当然不行，因为你省略了End部分，这就意味着你将对从第二个字母起到本域最后一个字符为止的字符串进行排序）。对于员工工资进行排序，我们也使用了-k 3,3，这是最准确的表述，表示我们“只”对本域进行排序，因为如果你省略了后面的3，就变成了我们“对第3个域开始到最后一个域位置的内容进行排序” 了。

-------------------------------------------------------------------------------------------------------

cut命令

cut命令用来显示行中的指定部分，删除文件中指定字段。

cut经常用来显示文件的内容，类似于下的type命令。

说明：该命令有两项功能，其一是用来显示文件的内容，它依次读取由参数file所指明的文件，将它们的内容输出到标准输出上；其二是连接两个或多个文件，如cut fl f2 > f3将把文件fl和几的内容合并起来，然后通过输出重定向符“>”的作用，将它们放入文件f3中。当文件较大时，文本在屏幕上迅速闪过（滚屏），用户往往看不清所显示的内容。因此，一般用more等命令分屏显示。

为了控制滚屏，可以按Ctrl+S键，停止滚屏；按Ctrl+Q键可以恢复滚屏。按Ctrl+C（中断）键可以终止该命令的执行，并且返回Shell提示符状态。

语法 cut(选项)(参数) 选项

-b：仅显示行中指定直接范围的内容；

-c：仅显示行中指定范围的字符；

-d：指定字段的分隔符，默认的字段分隔符为“TAB”；

-f：显示指定字段的内容；

-n：与“-b”选项连用，不分割多字节字符；

--complement：补足被选择的字节、字符或字段；

--out-delimiter=<字段分隔符>：指定输出内容是的字段分割符；

--help：显示指令的帮助信息；

--version：显示指令的版本信息。

参数文件：指定要进行内容过滤的文件。

实例例如有一个学生报表信息，包含No、Name、Mark、Percent：

[root@localhost text]# cat test.txt No Name Mark Percent 01 tom 69 91 02 jack 71 87 03 alex 68 98

使用 -f 选项提取指定字段：

[root@localhost text]# cut -f 1 test.txt No 01 02 03

[root@localhost text]# cut -f2,3 test.txt Name Mark tom 69 jack 71 alex 68

--complement 选项提取指定字段之外的列

（打印除了第二列之外的列）

[root@localhost text]# cut -f2 --complement test.txt No Mark Percent 01 69 91 02 71 87 03 68 98

使用 -d 选项指定字段分隔符：

[root@localhost text]# cat test2.txt No;Name;Mark;Percent 01;tom;69;91 02;jack;71;87 03;alex;68;98

[root@localhost text]# cut -f2 -d";" test2.txt Name tom jack alex

指定字段的字符或者字节范围 cut命令可以将一串字符作为列来显示，字符字段的记法： N-：从第N个字节、字符、字段到结尾； N-M：从第N个字节、字符、字段到第M个（包括M在内）字节、字符、字段； -M：从第1个字节、字符、字段到第M个（包括M在内）字节、字符、字段。上面是记法，结合下面选项将摸个范围的字节、字符指定为字段： -b 表示字节； -c 表示字符； -f 表示定义字段。

示例

[root@localhost text]# cat test.txt abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz

打印第1个到第3个字符：

[root@localhost text]# cut -c1-3 test.txt

abc abc abc abc abc

打印前2个字符：

[root@localhost text]# cut -c-2 test.txt

ab ab ab ab ab

打印从第5个字符开始到结尾：

[root@localhost text]# cut -c5- test.txt

efghijklmnopqrstuvwxyz efghijklmnopqrstuvwxyz efghijklmnopqrstuvwxyz efghijklmnopqrstuvwxyz efghijklmnopqrstuvwxyz

------------------------------------------------------------------------------------------------------

wc命令

wc命令用来计算数字。利用wc指令我们可以计算文件的Byte数、字数或是列数，若不指定文件名称，或是所给予的文件名为“-”，则wc指令会从标准输入设备读取数据。

语法 wc(选项)(参数) 选项

-c或--bytes或——chars：只显示Bytes数；

-l或——lines：只显示列数；

-w或——words：只显示字数。

-m有多少个字符

参数文件：需要统计的文件列表。

正则表达式

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 正则表达式

正则表达式

看完仍有疑问？有类似问题直接问程序猿