首页 > 代码库 > Index Generation
Index Generation
Index Generation
Time Limit: 1000MS | Memory Limit: 10000K | |
Total Submissions: 230 | Accepted: 89 |
Description
Most nonfiction and reference books have an index to help readers find references to specific terms or concepts in the text. Here is a sample index.
larch, 4, 237, 238, 414
+ Monty Python and, 64, 65, 66
+ planting of, 17
Lenny Kravitz, 50
+ going his way, 53
lumbago, 107
mango
+ Chris Kattan, 380
+ storage of, 87, 90
+ use in Nethack, 500, 501
+ Vitamin C content, 192
Each index entry contains a primary entry followed by zero or more secondary entries, which begin with a ‘+‘. Entries will normally be followed by a list of page references, but a primary entry might not be if at least one secondary entry is present (as is the case with mango, above). Primary entries are sorted, and secondary entries following a primary entry are also sorted. Sorting is case-insensitive. Page references for an entry are in ascending order and do not include duplicates. (A duplicate could occur if there are two or more identical entries on the same page.)
Your task is to read a document that has index information embedded within it and produce the index. Documents consist of one or more lines of ASCII text. The page number starts at 1, and the character ‘&‘ indicates the start of a new page (which adds 1 to the current page number). Index entries are indicated by a marker, which in its most elaborate form has the following syntax:
{text%primary$secondary}
Here text is the text to be indexed, primary is an alternative primary entry, and secondary is a secondary entry. Both ‘%primary‘ and ‘$secondary‘ are optional, but if both are present they must appear in the order given. If primary is present then it is used as the primary entry, and if not then text is used as the primary entry. If secondary is present then the marker adds a page reference for that secondary entry; otherwise it adds a page reference for the primary entry. A single marker cannot add a page reference for both a primary and secondary entry. Here are examples of each of the four possible types of marker, which correspond to four of the entries in the sample index above.
... his {lumbago} was acting up, so ...
... {Lenny%Lenny Kravitz} lit up the crowd with his version of ...
... Monty Python often used the {larch$Monty Python and} in ...
... when storing {mangos%mango$storage of}, be sure to ...
larch, 4, 237, 238, 414
+ Monty Python and, 64, 65, 66
+ planting of, 17
Lenny Kravitz, 50
+ going his way, 53
lumbago, 107
mango
+ Chris Kattan, 380
+ storage of, 87, 90
+ use in Nethack, 500, 501
+ Vitamin C content, 192
Each index entry contains a primary entry followed by zero or more secondary entries, which begin with a ‘+‘. Entries will normally be followed by a list of page references, but a primary entry might not be if at least one secondary entry is present (as is the case with mango, above). Primary entries are sorted, and secondary entries following a primary entry are also sorted. Sorting is case-insensitive. Page references for an entry are in ascending order and do not include duplicates. (A duplicate could occur if there are two or more identical entries on the same page.)
Your task is to read a document that has index information embedded within it and produce the index. Documents consist of one or more lines of ASCII text. The page number starts at 1, and the character ‘&‘ indicates the start of a new page (which adds 1 to the current page number). Index entries are indicated by a marker, which in its most elaborate form has the following syntax:
{text%primary$secondary}
Here text is the text to be indexed, primary is an alternative primary entry, and secondary is a secondary entry. Both ‘%primary‘ and ‘$secondary‘ are optional, but if both are present they must appear in the order given. If primary is present then it is used as the primary entry, and if not then text is used as the primary entry. If secondary is present then the marker adds a page reference for that secondary entry; otherwise it adds a page reference for the primary entry. A single marker cannot add a page reference for both a primary and secondary entry. Here are examples of each of the four possible types of marker, which correspond to four of the entries in the sample index above.
... his {lumbago} was acting up, so ...
... {Lenny%Lenny Kravitz} lit up the crowd with his version of ...
... Monty Python often used the {larch$Monty Python and} in ...
... when storing {mangos%mango$storage of}, be sure to ...
Input
The input consists of one or more documents, followed by a line containing only ‘**‘ that signals the end of the input. Documents are implictly numbered starting with 1. Each document consists of one or more lines of text followed by a line containing only ‘*‘. Each line of text will be at most 79 characters long, not counting end-of-line characters. For document i, output the line ‘DOCUMENT i‘ followed by the sorted index using the exact output format shown in the examples.
Output
Note:
A document will contain at most 100 markers, with at most 20 primary entries.
A primary entry will have at most 5 secondary entries.
An entry will have at most 10 unique page references (not including duplicates).
The character ‘&‘ will not appear anywhere within a marker, and will appear at most 500 times within a document.
The character ‘*‘ is used only to signal the end of a document or the end of the input.
The characters ‘{‘, ‘}‘, ‘%‘, and ‘$‘ will only be used to define markers, and will not appear in any text or entries.
A marker may span one or more lines. Every end-of-line within a marker must be converted to a single space.
A space within a marker (including a converted end-of-line) is normally included in the text/entry, just like any other character. However, any space that immediately follows ‘{‘, immediately precedes ‘}‘, or is immediately adjacent to ‘%‘ or ‘$‘ must be ignored.
The total length of a marker, measured from the opening ‘{‘ to the closing ‘}‘, and in which all embedded end-of-lines are converted to spaces, will be at most 79 characters.
A document will contain at most 100 markers, with at most 20 primary entries.
A primary entry will have at most 5 secondary entries.
An entry will have at most 10 unique page references (not including duplicates).
The character ‘&‘ will not appear anywhere within a marker, and will appear at most 500 times within a document.
The character ‘*‘ is used only to signal the end of a document or the end of the input.
The characters ‘{‘, ‘}‘, ‘%‘, and ‘$‘ will only be used to define markers, and will not appear in any text or entries.
A marker may span one or more lines. Every end-of-line within a marker must be converted to a single space.
A space within a marker (including a converted end-of-line) is normally included in the text/entry, just like any other character. However, any space that immediately follows ‘{‘, immediately precedes ‘}‘, or is immediately adjacent to ‘%‘ or ‘$‘ must be ignored.
The total length of a marker, measured from the opening ‘{‘ to the closing ‘}‘, and in which all embedded end-of-lines are converted to spaces, will be at most 79 characters.
Sample Input
Call me Ishmael.*One {fish $unary}, two {fish$ binary},&red {fish $ scarlet}, blue {fish$azure}. & By { Dr. Seuss }.*This is a {simple } & & { document} that &{simply %simple$adverb} & {illustrates %vision} &&&&& one {simple-minded% simple} {Judge}‘s {vision} for what a {document } might { look % vision} like.***
Sample Output
DOCUMENT 1DOCUMENT 2Dr. Seuss, 3fish+ azure, 2+ binary, 1+ scarlet, 2+ unary, 1DOCUMENT 3document, 3, 10Judge, 10simple, 1, 10+ adverb, 4vision, 5, 10
1 /* 2 Name: Shangli_Cloud 3 Copyright: Shangli_Cloud 4 Author: Shangli_Cloud 5 Date: 10/10/14 08:15 6 Description: 7 字符串处理题, 8 在读取过程中,遇到’&‘就PAGe++; 9 我们要处理的对象为{}, 10 对象有 primary,secondary,page属性,所以结构体存储。 11 当遇到‘{‘是增加entry,遇标记,处理。 12 排序,三个因素。 13 我们遇标记处理,定义一个next_token()标记函数, 14 定义一个next_char()函数为next_token()函数服务。 15 注意,当我们遇到某些字符时,需要读取下一个字符,之后再用next_char()的会又会 16 读取下一个字符,这个字符就跳过了。 17 所以增加一个变量 lookahead表示是否是有效字符的开始。 18 */ 19 #include"iostream" 20 #include"cstdio" 21 #include"cstring" 22 #include"string" 23 #include"algorithm" 24 #include"set" 25 #include"map" 26 #include"stack" 27 #include"queue" 28 #include"vector" 29 #include"cstdlib" 30 #include"ctime" 31 using namespace std; 32 const int EndOfDocument=-1; 33 const int EndOfFile=-2; 34 char ch; 35 char token; 36 int page; 37 bool lookahead; 38 struct Entry 39 { 40 string primary; 41 string secondary; 42 int page; 43 Entry(string p,string s):primary(p),secondary(s),page(::page){}; 44 } ; 45 vector<Entry> entry; 46 int string_compare(const string &s,const string &t) 47 { 48 int m=s.length(); 49 int n=t.length(); 50 int k=m<n?m:n; 51 for(int i=0;i<k;i++) 52 { 53 int a=toupper(s[i]); 54 int b=toupper(t[i]); 55 if(a!=b) 56 return a-b; 57 } 58 return m==n?0:m<n?-1:1; 59 } 60 bool less_than(const Entry &s,const Entry &t) 61 { 62 int cmp=string_compare(s.primary,t.primary); 63 if(cmp<0) 64 return true; 65 if(cmp>0) 66 return false; 67 cmp=string_compare(s.secondary,t.secondary); 68 if(cmp<0) 69 return true; 70 if(cmp>0) 71 return false; 72 return s.page<t.page; 73 } 74 inline char next_char() 75 { 76 if(lookahead) 77 lookahead=false; 78 else 79 ch=cin.get(); 80 return ch; 81 } 82 83 84 char next_token () 85 { 86 switch (next_char ()) 87 { 88 case ‘*‘: 89 token = (next_char () == ‘*‘) ? EndOfFile : EndOfDocument; 90 break; 91 case ‘ ‘: 92 case ‘\n‘: 93 next_char (); 94 if (ch == ‘%‘ || ch == ‘$‘ || ch == ‘}‘) 95 token = ch; 96 else { 97 token = ‘ ‘; 98 lookahead = true; 99 break;100 }101 case ‘{‘: case ‘%‘: case ‘$‘:102 token = ch;103 lookahead = ! isspace (next_char ());104 break;105 default:106 token = ch;107 }108 return token;109 }110 111 112 inline bool is_delimiter(char t)113 {114 return t==‘%‘||t==‘$‘||t==‘}‘;115 }116 117 118 void add_entry ()119 {120 string primary, secondary;121 while (! is_delimiter (next_token ()))122 primary += token;123 if (token == ‘%‘)124 {125 primary.erase (); //primary="";126 while (! is_delimiter (next_token ()))127 primary += token; 128 }129 if (token == ‘$‘)130 while (! is_delimiter (next_token ()))131 secondary += token;132 entry.push_back (Entry (primary, secondary));133 }134 135 136 int main ()137 {138 for (int document = 1; ; ++document)139 {140 if (next_token () == EndOfFile) break;141 cout << "DOCUMENT " << document;142 page = 1;143 entry.clear ();144 entry.push_back (Entry ("", ""));145 //cout<<"----"<<entry.size()<<endl;146 do {147 if (token == ‘&‘)148 ++page;149 else if (token == ‘{‘)150 add_entry ();151 } while (next_token () != EndOfDocument);152 sort (entry.begin (), entry.end (), less_than);153 for (int i = 1; i < entry.size(); ++i)154 {155 if (entry[i].primary == entry[i-1].primary)156 if (entry[i].secondary == entry[i-1].secondary)157 {158 if (entry[i].page != entry[i-1].page)159 cout << ", " << entry[i].page;160 }161 else162 cout<<"\n+ "<<entry[i].secondary<<", "<<entry[i].page; 163 else164 {165 cout << ‘\n‘ << entry[i].primary;166 if (entry[i].secondary == "")167 cout << ", " << entry[i].page;168 else169 cout<<"\n+ "<<entry[i].secondary<<", "<<entry[i].page;170 }171 }172 cout << endl;173 }174 return 0;175 }
Index Generation
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。