标签:elif rword word items 相关 counts txt 西游记 分词
import
jieba
txt
=
open
(
"西游记.txt"
,
"r"
, encoding
=
'utf-8'
).read()
words
=
jieba.lcut(txt)
counts
=
{}
for
word
in
words:
if
len
(word)
=
=
1
:
continue
elif
word
=
=
"大圣"
or
word
=
=
"老孙"
or
word
=
=
"行者"
or
word
=
=
"孙大圣"
or
word
=
=
"孙行者"
or
word
=
=
"猴王"
or
word
=
=
"悟空"
or
word
=
=
"齐天大圣"
or
word
=
=
"猴子"
:
rword
=
"孙悟空"
elif
word
=
=
"师父"
or
word
=
=
"三藏"
or
word
=
=
"圣僧"
:
rword
=
"唐僧"
elif
word
=
=
"呆子"
or
word
=
=
"八戒"
or
word
=
=
"老猪"
:
rword
=
"猪八戒"
elif
word
=
=
"沙和尚"
:
rword
=
"沙僧"
elif
word
=
=
"妖精"
or
word
=
=
"妖魔"
or
word
=
=
"妖道"
:
rword
=
"妖怪"
elif
word
=
=
"佛祖"
:
rword
=
"如来"
elif
word
=
=
"三太子"
:
rword
=
"白马"
else
:
rword
=
word
counts[rword]
=
counts.get(rword,
0
)
+
1
items
=
list
(counts.items())
items.sort(key
=
lambda
x: x[
1
], reverse
=
True
)
for
i
in
range
(
20
):
word, count
=
items[i]
print
(
"{0:<10}{1:>5}"
.
format
(word, count))
标签:elif,rword,word,items,相关,counts,txt,西游记,分词 来源: https://www.cnblogs.com/clef-xc/p/15550104.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。