首页 > 代码库 > 以spacy中函数调用为例记录对自然语言基本处理任务
以spacy中函数调用为例记录对自然语言基本处理任务
# coding=utf-8import spacynlp=spacy.load(‘en_core_web_md-1.2.1‘)docx=nlp(u‘The ways to process documents are so varied and application- and language-dependent that I decided to not constrain them by any interface. Instead, a document is represented by the features extracted from it, not by its "surface" string form: how you get to the features is up to you. Below I describe one common, general-purpose approach (called bag-of-words), but keep in mind that different application domains call for different features, and, as always, it’s garbage in, garbage out...‘)‘‘‘功能测试‘‘‘#1.分词 tokenizeprint ‘#################tokenization‘for token in docx: print token#2.词性标注 pos taggingprint ‘#################part of speech tagging‘for token in docx: print(token, token.pos_, token.pos)#3.命名实体识别 Named Entity Recognitionprint ‘################# Named Entity Recognition‘for ent in docx.ents: print(ent,ent.label_,ent.label)#4.词干化 Lemmatizeprint ‘#################Lemmatize‘for token in docx: print(token,token.lemma_,token.lemma) #5.名词短语提取 Noun Phrase Extractionprint ‘#################Noun Phrase Extraction‘for np in docx.noun_chunks: print np#6.断句 Sentence segmentationprint ‘#################Sentence segmentation‘for sent in docx.sents: print sent
以spacy中函数调用为例记录对自然语言基本处理任务
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。