用户登录
用户注册

分享至

pythonstopword

  • 作者: 夕阳下奔跑的姨妈
  • 来源: 51数据库
  • 2020-04-21

1.如何删除使用NLTK或者python停用词

1.filtered_words = [w for w in word_list if not w in stopwords.words('english')]

2. 我想您有您想要删除停用词字(WORD_LIST)的列表。你可以这样做:filtered_word_list = word_list[:] #make a copy of the word_list

for word in word_list: # iterate over word_list

if word in stopwords.words('english'):

filtered_word_list.remove(word) # remove word from filtered_word_list if it is a stopword

3. 你也可以做一组差异,例如:list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk.corpus.stopwords.words('english')))

2.python jieba分词如何去除停用词

import jieba

# 创建停用词list

def stopwordslist(filepath):

stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]

return stopwords

# 对句子进行分词

def seg_sentence(sentence):

sentence_seged = jieba.cut(sentence.strip())

stopwords = stopwordslist('./test/stopwords.txt') # 这里加载停用词的路径

outstr = ''

for word in sentence_seged:

if word not in stopwords:

if word != '\t':

outstr += word

outstr += " "

return outstr

inputs = open('./test/input.txt', 'r', encoding='utf-8')

outputs = open('./test/output.txt', 'w')

for line in inputs:

line_seg = seg_sentence(line) # 这里的返回值是字符串

outputs.write(line_seg + '\n')

outputs.close()

inputs.close()

3.python2怎么将回车当作一个字符接收

本来就是一个字符

Python默认遇到回车的时候,输入结束。所以我们需要更改这个提示符,在遇到空行的时候,输入才结束。

1

2

3

4

5

6

stopword = '' # 输入停止符

str = ''

for line in iter(raw_input, stopword): # 输入为空行,表示输入结束

str += line + '\n'

# print (str) #测试用

转载请注明出处51数据库 » pythonstopword

软件
前端设计
程序设计
Java相关