首页 > 代码库 > 正则表达式-2-正则表达式实战1

正则表达式-2-正则表达式实战1

正则表达式:

简单地说,正则表达式就是一套处理字符串的规则和方法,以为单位对字符串进行处理,通过特殊的符号的辅助,我们可以快速的过滤,替换某些特定的字符串。

 

运维工作中,会有大量访问日志,错误日志,大数据。如何能够快速的过滤出我们需要的内容,这就需要正则表达式。

 

awk,sed,grep(egrep) 三剑客要想能工作的更高效,那一定离不开正则表达式的配合的。

 

我们要想玩好三剑客,首先就要掌握正则表达式。

 

Linux里的正则表达式,主要是awk,sed,grep(egrep)三剑客的正则表达式。

 

基础正则表达式:即BRE

正则表达式实际就是一些特殊字符,赋予了它特定的含义。

1)^word  搜索以word开头的。

2)Word$  搜索以word结尾的。

例子:

文件oldboy.log内容:

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

 

过滤以I开头的内容:

[root@weibochoutu_1 test]# grep "^I" oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

 

过滤以M开头的内容:

[root@weibochoutu_1 test]# grep  "^M" oldboy.log

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

-i:不区分大小写

[root@weibochoutu_1 test]# grep -i  "^M" oldboy.log

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

 

过滤以M结尾的内容:

[root@weibochoutu_1 test]# grep -i  "M$" oldboy.log

My blog is http://oldboy.blog.51cto.com

 

 

3).    代表且只能代表任意一个字符。

例子1:

把含有blog的内容显示出来:

[root@weibochoutu_1 test]# grep   "bl.g" oldboy.log

My blog is http://oldboy.blog.51cto.com

例子2:

[root@weibochoutu_1 test]# echo "is blog not boog" >> oldboy.log

[root@weibochoutu_1 test]# echo "not boog" >> oldboy.log

[root@weibochoutu_1 test]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

[root@weibochoutu_1 test]# grep   "b.og" oldboy.log

My blog is http://oldboy.blog.51cto.com

is blog not boog

not boog

 

 

4)\   转义符号,让有着特殊身份意义的字符,脱掉马甲,还原原型。

例子:\.

5)*   重复0个或多个前面的一个字符。

例子:O*    可以表示啥也没有或者无限个O

 

例子:

[root@weibochoutu_1 test]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

[root@weibochoutu_1 test]# grep "490*448" oldboy.log

My qq is 49000448

 

[root@weibochoutu_1 test]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

[root@weibochoutu_1 test]# grep "490*448" oldboy.log

My qq is 49000448

4900000448,49448

 

6).*   匹配所有字符。  ^.*  以任意多个字符开头的。

例子:

[root@weibochoutu_1 test]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

 

[root@weibochoutu_1 test]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

 

[root@weibochoutu_1 test]# grep ".*" oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

 

 

7)[ ]   字符集合的重复特殊字符的符号。

例子:

[root@weibochoutu_1 test]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

 

[root@weibochoutu_1 test]# grep "b[lo]og" oldboy.log

My blog is http://oldboy.blog.51cto.com

is blog not boog

not boog

 

8)[ ^ ]   匹配不包含^后的任意字符的内容。

例子:

[ ^word ]:匹配不包含word任意字符的内容。

[root@localhost ~]# grep "[^qq]" oldboy.log --color

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

 

9)a\{n,m\}  重复n到m次,前一个重复的字符。(如果用egrep可以去掉斜线)

例子:a\{n,m\}:重复a,n到m 次。

   a\{,m\}    重复a至多m次,前一个重复的字符。(centos6已不支持此用法)

    \{n,\}     重复至少n次,前一个重复的字符。(如果用egrep可以去掉斜线)

    \{n\}     重复n次,前一个重复的字符。(如果用egrep可以去掉斜线) 

 

例子:

重复0,2到3次:

[root@localhost ~]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

[root@localhost ~]# grep "490\{2,3\}" oldboy.log --color

My qq is 49000448

4900000448,49448 

例子:

[root@localhost ~]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

[root@localhost ~]# grep "490\{2,\}" oldboy.log --color   匹配0至少2次

My qq is 49000448

4900000448,49448

 

例子:

[root@localhost ~]# cat oldboy.log

I am oldboy linux teacher.

I like chinese chess,table tennis.

My blog is http://oldboy.blog.51cto.com

My qq is 49000448

my god,my name is not oldbey,but OLDBOY.

is blog not boog

not boog

4900000448,49448

[root@localhost ~]# grep "490\{2,\}448" oldboy.log --color   匹配0至少2次,最后面匹配448.

My qq is 49000448

4900000448,49448 

拓展的正则表达式:即ERE(几乎用不到)研究暂时略过!

 

 

Linux里去哪里查找正则表达式的帮助呢?

[root@localhost ~]# man grep 

 

举个例子

抓取IP地址:

法1:[root@localhost ~]# ifconfig eth1|grep "inet addr"|cut -d ":" -f2|cut -d " " -f1

192.168.1.106

 

法2:                         

[root@localhost ~]# ifconfig eth1|grep "inet addr"|awk -F ":" ‘{print $2}‘|awk ‘{print $1}‘

192.168.1.106

 

法3:[root@localhost ~]# ifconfig eth1|sed -n ‘2p‘|awk -F ":" ‘{print $2}‘|awk ‘{print $1}‘

192.168.1.106

 

法4:[root@localhost ~]# ifconfig eth1|awk -F "[: ]+" ‘NR==2 {print $4}‘

192.168.1.106

法5(sed实现):[root@localhost ~]# ifconfig eth1|sed -n ‘/inet addr/p‘|sed ‘s#^.*addr:##g‘|sed ‘s#  B.*##g‘

192.168.1.106

法6(sed一条命令实现):

[root@localhost ~]# ifconfig eth1|sed -n ‘s#^.*addr:\(.*\)  Bcast.*$#\1#gp‘

192.168.1.106

 

Sed(正则)匹配技巧如下:

 

个人觉着类似的方法还有很多,需要多加实践练习才是硬道理啊!

Sed,awk重点,要学好!

正则表达式-2-正则表达式实战1