首页 > 代码库 > 以spacy中函数调用为例记录对自然语言基本处理任务

以spacy中函数调用为例记录对自然语言基本处理任务

# coding=utf-8import spacynlp=spacy.load(en_core_web_md-1.2.1)docx=nlp(uThe ways to process documents are so varied and application- and language-dependent that I decided to not constrain them by any interface. Instead, a document is represented by the features extracted from it, not by its "surface" string form: how you get to the features is up to you. Below I describe one common, general-purpose approach (called bag-of-words), but keep in mind that different application domains call for different features, and, as always, it’s garbage in, garbage out...)‘‘‘功能测试‘‘‘#1.分词 tokenizeprint #################tokenizationfor token in docx:        print token#2.词性标注 pos taggingprint #################part of speech taggingfor token in docx:        print(token, token.pos_, token.pos)#3.命名实体识别 Named Entity Recognitionprint ################# Named Entity Recognitionfor ent in docx.ents:        print(ent,ent.label_,ent.label)#4.词干化 Lemmatizeprint #################Lemmatizefor token in docx:        print(token,token.lemma_,token.lemma)        #5.名词短语提取 Noun Phrase Extractionprint #################Noun Phrase Extractionfor np in docx.noun_chunks:        print np#6.断句 Sentence segmentationprint #################Sentence segmentationfor sent in docx.sents:        print sent

 

以spacy中函数调用为例记录对自然语言基本处理任务