【正文】
Hive)與傳統(tǒng)關系型數(shù)據(jù) 庫如何有效結合 – 大數(shù)據(jù)在區(qū)域衛(wèi)生信息平臺中的切實可行 應用場景 Public Health Hospital Primary care (Grassroots) Health Information DW EHR Data Services Registries Data Services Longitudinal Record Services Health Information Access Layer Care Coordination Clinical decision support … Data Analytic RD … RHIN Ancillary Data Services 分布式數(shù)據(jù)服務系統(tǒng) 展現(xiàn)層 (報告 , 視圖 ) 區(qū)域醫(yī)療及基層醫(yī)療信息系統(tǒng)大數(shù)據(jù)解決方案 (Hadoop*) 集成的用戶應用界面(居民、醫(yī)生、衛(wèi)生行政管理人員) 數(shù)據(jù)挖掘 ( Mahout) 分布式批量處理框架 (Map/Reduce) 區(qū)域衛(wèi)生信息訪問層 (HIAL) 醫(yī)院信息系統(tǒng) 醫(yī)院信息系統(tǒng) 語言和編譯 (Hive) 實時數(shù)據(jù)庫 (Hbase) 基層醫(yī)療信息系 統(tǒng) 醫(yī)療服務 藥品管理 新農合醫(yī)療保 險 服務器虛擬 化 基礎設施虛擬化 網絡虛擬化 存儲虛擬化 基于云的區(qū)域基層醫(yī)療服務系統(tǒng) 多租戶應用 分布式文件系統(tǒng) 協(xié)作 服務 (HDFS) (Zookeeper) 結構化數(shù)據(jù)采集器 日志數(shù)據(jù)采集器 (Sqoop) (Flume) 健康檔案數(shù)據(jù)存儲 公共衛(wèi)生 運營管理 Sequencing 3 Billion Base Pairs Data Processing Cloud Storage Visualization Millions of Variants Interpretation Analytics Millions of Variants Millions of Patients Commercializing Targeted Therapeutics Companion Diagnostics Actionable Biomarkers 案例分享 : NEXTBIO 基因數(shù)據(jù)分析 ? ? ? ? Cost to sequence a genome has fallen by 800x in the last 4 years Each genome has ~4 million variants Growth in the genomics data in the public and private domain Data available in variety of sources – Structured, semistructured, unstructured ? New aggregated data growing exponentially 案例分享 : NEXTBIO 病人相關性數(shù)據(jù) Novel Discoveries Biomarkers Disease Mechanism Drug Indications Clinical Trial Parameters Patient Care Options Large content repository of public and private genomic data bined with proprietary and patented correlation engine 案例分享 : NEXTBIO Nextbio Intel 合作方向 技術挑戰(zhàn) : ? ? ? ? Immutable Data – write once, never change, read many times Traditional Bloom Filters works Hadoop* HBase* well suited 1 genome ? 10 million rows 100 genomes ? 1billion rows 1M genomes ? 10 trillion rows 100M genomes ? 1 quadrillion 1,000,000,000,000,000 rows App can dynamically partitions HBase as data size grows 英特爾對于 Hadoop提供的優(yōu)化 : ? ? ? ? Optimized Hadoop stack in open source Stabilize HBase to provide reliable scalable deployment Optimize and support scaleout as data size dramatically grows Exploring cluster auto tuning, Security Compliance, etc. 案例