【正文】
? 身無彩鳳雙飛翼,心有靈犀一點通 WebGather: in Dec. 1, 2020 ? million scale ? Index million web pages ? More than 200,000 web pages everyday ? Ten day to update all data ? three PCs ? collect all the web pages in China ? keep pace with the rapid growth of Chinese web information WebGather: Design goals for a distributed webcrawling system for WebGather 238 X 40,000 = 9,520,000 WebGather : architecture Client log database User behavior Gather Database Indexer Retrieve Database Client Retriever Gatherer WWW WebGather : architecture of gather subsys