ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

基于STL的字典生成模块-模拟搜索引擎算法的尝试

2020-01-23 14:52:28  阅读:229  来源: 互联网

标签:repeat 16 STL 搜索引擎 set article line include 字典


该课题来源于UVA中Searching the Web的题目:https://vjudge.net/problem/UVA-1597

按照题目的说法,我对按照特定格式输入的文章中的词语合成字典,以满足后期的快速查找。

针对于字典的合成途径,我利用了STL中的map与set的嵌套形成了一种特定的数据结构来解析文章中的单词

 1 #include<map>
 2 #include<iostream>
 3 #include<set>
 4 #include<algorithm>
 5 #include<string>
 6 #include<cctype>
 7 #include<sstream>
 8 using namespace std;
 9 struct newpair
10 {
11     int article;
12     int line;
13     bool operator<(const newpair b) const
14     {
15         return this->line < b.line;
16     }
17 };
18 typedef map<string,set<newpair> > BIGMAP;
19 typedef set<newpair>::iterator SET_pair_ITER;
20 typedef map<string,set<newpair> >::iterator BIGMAP_iter;
21 
22 BIGMAP maper;
23 string psd[1600];
24 int maxline;
25 
26 int checkmaper()
27 {
28     BIGMAP_iter it;
29     for(it=maper.begin();it!=maper.end();++it)
30     {
31         cout<<(it->first);//string-type
32         set<newpair> cyc;
33         cyc=it->second;//set<newpair>-type
34         for(SET_pair_ITER iter=cyc.begin();iter!=cyc.end();++iter)
35         {
36             newpair ctn=*iter;
37             cout<<"  article "<<ctn.article<<" line "<<ctn.line<<endl;
38         }
39     }
40     return 0;
41 }
42 
43 void buildmaper(string aim,int articlenum,int linenum)
44 {
45     newpair m;
46     m.article=articlenum;
47     m.line=linenum;
48     maper[aim].insert(m);
49 }
50 
51 int readin()
52 {
53     int n;
54     char c;//input the \n
55     cin>>n>>c;
56     int cur=0;
57     for(int i=0;i<n;cur++)
58     {
59         getline(cin,psd[cur]);
60         if((int)psd[cur].find("***")!=-1){i++;continue;}//the next article
61         for(string::iterator it=psd[cur].begin();it!=psd[cur].end();++it)
62         {
63             if(isalpha(*it)) *it=tolower(*it);
64             else *it=' ';
65         }
66         stringstream ss(psd[cur]);
67         string chr;
68         while(ss>>chr) buildmaper(chr,i,cur);
69     }
70     return cur;
71 }
72 
73 int main()
74 {
75     freopen("input.txt","r",stdin);
76     freopen("ans.txt","w",stdout);
77     maxline=readin();
78     checkmaper();
79     return 0;
80 }

以上代码涉及了较多C++知识与个别底层知识,下面进行列举:

1、stringstream常用操作

2、基本STL之map与set

3、结构体中的运算符重载

4、迭代器的操作

5、RB树实现map与set的基本原理

有关详细的实现方法请参照我的其它博客和上述代码。

在上述代码中唯一一个容易出现bug的位置是set的实现:由于set对输入的元素需要进行排序,所以必须在newpair结构体中重载<(operator)。

下面是运行图片:

输入如下:

4
one   repeat  repeat  repeat
A manufacturer, importer, or seller of
digital media devices may not (1) sell,
or offer for sale, in interstate commerce,
or (2) cause to be transported in, or in a
manner affecting, interstate commerce,
a digital media device unless the device
includes and utilizes standard security
technologies that adhere to the security
system standards.
**********
one two   repeat  repeat  repeat   repeat
Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate
**********
one two three   repeat   repeat  repeat  repeat   repeat
Research in analysis (i.e., the evaluation
of the strengths and weaknesses of
computer system) is essential to the
development of effective security, both
for works protected by copyright law
and for information in general. Such
research can progress only through the
open publication and exchange of
complete scientific results
**********
one two three   four   repeat  repeat   repeat  repeat  repeat   repeat
I am very very very happy!
What about you?
**********

输出如下:

a  article 0 line 1
  article 0 line 4
  article 0 line 6
  article 1 line 16
about  article 3 line 34
adhere  article 0 line 8
affecting  article 0 line 5
afford  article 1 line 17
alone  article 1 line 17
am  article 3 line 33
analysis  article 2 line 22
and  article 0 line 7
  article 1 line 16
  article 2 line 23
  article 2 line 27
  article 2 line 29
be  article 0 line 4
  article 1 line 18
books  article 1 line 13
  article 1 line 18
both  article 2 line 25
but  article 1 line 15
by  article 2 line 26
came  article 1 line 15
can  article 2 line 28
cause  article 0 line 4
class  article 1 line 16
commerce  article 0 line 3
  article 0 line 5
complete  article 2 line 30
computer  article 1 line 14
  article 2 line 24
copyright  article 2 line 26
could  article 1 line 16
  article 1 line 19
course  article 1 line 12
dan  article 1 line 15
development  article 2 line 25
device  article 0 line 6
devices  article 0 line 2
did  article 1 line 12
digital  article 0 line 2
  article 0 line 6
e  article 2 line 22
effective  article 2 line 25
essential  article 2 line 24
evaluation  article 2 line 22
exchange  article 2 line 29
family  article 1 line 16
fees  article 1 line 18
for  article 0 line 3
  article 2 line 26
  article 2 line 27
four  article 3 line 32
from  article 1 line 15
general  article 2 line 27
graduate  article 1 line 19
happy  article 3 line 33
hardly  article 1 line 16
her  article 1 line 14
  article 1 line 17

其余略。。。。。。。。。。

OK

标签:repeat,16,STL,搜索引擎,set,article,line,include,字典
来源: https://www.cnblogs.com/savennist/p/12230612.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有