【正文】
? int validUTF8ChineseWord (const char*)。 ? 國民黨立即反駁,表示國民黨的黨徽有其歷史及意義,絕不可能更改,而且國徽與黨徽也有明顯不同。 unsigned int freq。 ? … 2022/8/14 COSCUP 2022, NTU Suffix (Array) Previous results Reversion of words Process time(sec) Memory used r13 82,364 1,857 21,060K r15 99,072 73 6,832K r22 64,044 10 6,836K r32 48,679 4 6,736K ? Testing corpus ? 15,000 lines ? Utf8 size 913,034KB 2022/8/14 COSCUP 2022, NTU Program structure (nowadays) ? Struct node to store word_freq ? struct WORD_FREQ { char *word。 ? 他表示,但國民黨、親民黨、新黨仍將政黨組織留在軍中,他奉勸正值一百一十歲生日的國民黨,早日放棄黨國一家的舊思維和封建思想,摒棄黨國主義;因?yàn)辄h非國家,國家也非黨,不要再想以黨指揮國家,他要求泛藍(lán)趕快裁撤與軍隊(duì)相關(guān)的黨組織。 (no $) beneath it. 2022/8/14 COSCUP 2022, NTU Prefix searching ? idea every subtree of the PAT tree has all the sistrings with a given prefix. ? Search: proportional to the query length exhaust the prefix or up to external node. Search for the prefix “10100” and its answer 2022/8/14 COSCUP 2022, NTU Longest Repetition Searching ? the match between two different positions of a text where this match is the longest in the entire text, ., 0 1 1 0 0 1 0 0 0 1 0 1 1 1 2 2 2 3 1 5 4 3 6 1 3 7 5 8 4 Text 01100100010111 sistring 1 01100100010111 sistring 2 1100100010111 sistring 3 100100010111 sistring 4 00100010111 sistring 5 0100010111 sistring 6 100010111 sistring 7 00010111 sistring 8 0010111 the tallest internal node gives a pair of sistrings that match for the greatest number of characters 2022/8/14 COSCUP 2022, NTU “ Most Significant” or “Most Frequent” Matching ? the most frequently occurring strings within the text database, ., the most frequent trigram ? find the most frequent trigram find the largest subtree at a distance 3 characters from root 2 2 2 3 1 5 4 3 6 1 3 7 5 8 4 the tallest internal node