【正文】
16 As buckets are flushed, they are written sequentially to the change segment one page at a time MDBL ? No Partitions in CS ? Allows frequently updated blocks to have maximum space ? merge() all blocks when CS is full ? Potentially expensive ? Very infrequent ? Queries are supported by pointers ? As blocks are staged onto the CS, their pages are recorded for later retrieval ? Prefetch 11/30/2020 17 Expectations ? MB will incur more cleans than MDB or MDBL ?Frequent merge() operation will incur block erasure ? MDB and MDBL will incur slightly higher query times ?Addition of CS ? MDB and MDBL will have superior I/O performance ?Most operations are page level ?Less erasures ? lower latency 11/30/2020 18 Experimental Setup (Application) ? TFIDF ?Term FrequencyInverse Document Frequency ?Word importance is highest for infrequent words ?Requires a counting hash table ?Useful in many data mining and IR applications (document classification and search) 11/30/2020 19 Experimental Setup (DataSets) ? 100,000 Random Wikipedia articles ?136M keywords ? entries ? MemeTracker (Aug 2020 dump) ?402M total entries ?17M unique 11/30/2020 20 Experimental Setup (Method) ? 1M random queries were issued during insertion phase ?10 random workloads, queries need not be in the table ? Measure Query Performance, I/O time, and Cleans ? Used three SSD configurations ?One Single Level Cell (SLC) vs two Multi Level Cell (MLC) configurations ?MLC is more popular. Cheaper per GB but less lifetime ?SLC have lower internal error rate, and faster response rates (See Paper for specific configurations) ? DiskSim and Microsoft SSD Plugin ?Used for benchmarking and finetuning our SSD 21 Results (AVERAGE Query Time) By varying the on memory buffer, as a percentage of the data