首页 > 代码库 > python处理blog文件数据

python处理blog文件数据

以下是Python数据处理的题目说明与要求:
The attachment is a log file used to show running status of set-top-box, and each line in the file follows the format of “LineNumber + Time + ProcessName + (ProcessID) + Logs”, currently the logs are displayed in time order. Please write one script with Python language to support the following features:

  1. Sort the logs in alphabetical order of process name, e.g.: halserver, processman, etc.
  2. Filter the logs according to process name, the output only show the interested logs, e.g.: “procman”, and hiding the rest.
  3. Statistics the number of log lines for each process.

这是机顶盒执行的blog文本文件,打开后部分截图例如以下:
技术分享

一看非常乱,事实上不应该用微软的txt打开,尝试用notepad++打开后,结构清楚了非常多,部分截图例如以下:
技术分享

以下给出代码:
第1题的代码例如以下:

#coding=utf-8                                
import re  
f1=open(‘stblog.txt‘,‘r‘)
f2=open(‘cc1.txt‘,‘w‘)
list1=f1.readlines()
list_process=[]    #定义列表存放Process
res=‘\d\D\d\d:\d\d:\d\d\.\d{3}\s([a-z]+)‘

for i in range(len(list1)):
    list_process.append(re.findall(res,str(list1[i])))

for i in range(len(list_process)):  #測试正则是否可行
    if len(list_process[i])>1:
        print ‘zheng ze fail‘


#print len(list_process)    
#print len(list1)
#print list_process[141]
#print list1[141]
for m in range(len(list1)):      #冒泡排序
    for n in range(m+1,len(list1)):
        if cmp(list_process[m],list_process[n])>0:
            list_process[m],list_process[n]=list_process[n],list_process[m]
            list1[m],list1[n]=list1[n],list1[m]

f2.writelines(list1)

第2,3题代码例如以下:

#coding=utf-8                              
import re  
f1=open(‘stblog.txt‘,‘r‘)
f2=open(‘cc2.txt‘,‘w‘)
list1=f1.readlines()
list_process=[]      #定义列表存放Process
list2=[]
count=0
res=‘\d\D\d\d:\d\d:\d\d\.\d{3}\s([a-z\.\-]+)‘


for i in range(len(list1)):
    list_process.append(re.findall(res,str(list1[i])))

for i in range(len(list_process)):  #測试正则是否可行
    if len(list_process[i])>1:
        print ‘zheng ze fail‘


s=raw_input("please input the log you interested:")

for i in range(len(list_process)):
    if list_process[i]==s.split():
        list2.append(list1[i])   #将相应的process行加入到cc2.txt
        count+=1
print count
f2.writelines(list2)
<script type="text/javascript"> $(function () { $(‘pre.prettyprint code‘).each(function () { var lines = $(this).text().split(‘\n‘).length; var $numbering = $(‘
    ‘).addClass(‘pre-numbering‘).hide(); $(this).addClass(‘has-numbering‘).parent().append($numbering); for (i = 1; i <= lines; i++) { $numbering.append($(‘
  • ‘).text(i)); }; $numbering.fadeIn(1700); }); }); </script>

python处理blog文件数据