2019年杏盛學術前沿講座(46)-Enabling High-performance Sampling for Big Data Processing

2019年杏盛學術前沿講座(46

 

人♟:王軍  教授  美國佛羅裏達中央大學

主題名稱:Enabling High-performance Sampling for Big Data Processing

內容簡介🙎🏽‍♂️⏫:

In this talk, we aim to demonstrate how to perform sampling in today’s big data processing platforms. We enable both efficient

 and accurate approximations on arbitrary sub-datasets of a large dataset. Due to the prohibitive storage overhead of caching offline samples for each sub-dataset, existing offline sample based systems provide high accuracy results for only a limited number of sub-datasets, such as the popular ones. On the other hand, current online sample based approximation systems, which generate samples at runtime, do not take into account the uneven storage distribution of a sub-dataset. They work well for uniform distribution of a sub-dataset while suffer low sampling efficiency and poor estimation accuracy on unevenly distributed sub-datasets.

To address the problem, we develop a distribution aware method called Sapprox. Our idea is to collect the occurrences of a sub-dataset at each logical partition of a dataset (storage distribution) in the distributed system, and make good use of such information to facilitate online sampling. We have implemented Sapprox into Hadoop ecosystem as an example system and open sourced it on GitHub. Our comprehensive experimental results show that Sapprox can achieve a speedup by up to a factor of 20 over the precise execution.

 

時間地點:20191026日,信息235

主辦學院:信息工程學院

 

  

杏盛娱乐

2019.10.17

分類: 
  • 分類𓀇📷:
    學術交流
杏盛娱乐专业提供:杏盛娱乐杏盛🎉、杏盛平台等服务,提供最新官网平台、地址、注册、登陆、登录、入口、全站、网站、网页、网址、娱乐、手机版、app、下载、欧洲杯、欧冠、nba、世界杯、英超等,界面美观优质完美,安全稳定,服务一流,杏盛娱乐欢迎您。 杏盛娱乐官網xml地圖