大视场短时标天文大数据智能分析研究
时域天文学是研究天体如何随着时间变化的研究领域,其中,大视场短时标巡天已经成为时域天文学中全新的前沿领域。大视场短时标巡天具有的高时间分辨率数据采样和超大视场天区覆盖特性是时域天文观测的利器。较以往的天文巡天场景而言,其对固定视场持续观测的短时标特性适合于发现持续时间较短的耀变源(或短时标科学事件)。
但由于大视场短时标观测依赖于对观测视场高密集高频率的数据采集,因此对数据的分析技术提出了全新要求,表现为数据分析的智能化能够实现高效地从大数据中实时筛选出稀有的短时标科学事件并给出可解释的合理验证。基于此,本项目主要研究大视场短时标巡天数据分析,主要包含以下3个方面内容:(1)大规模科学数据流实时标注框架(2)交互反馈的智能化科学发现机制(3)面向领域的机器学习可解释性模型。
本项目拟从上述3个方面助力大视场短时标天文数据的智能分析研究。旨在突破智能分析的关键技术,并将这些关键技术集成,形成真实的天文大数据智能分析平台与系统,并支持天文观测的真实场景。
项目题目
² 国家自然科学基金面上项目“大视场短时标天文大数据智能分析研究”(62172423),2022年01月01日 - 2025年12月31日
项目说明
图1 大视场短时标天文大数据智能分析研究框架
本项目面向大视场短时标巡天提出大视场短时标天文大数据智能分析研究框架,如下图1所示,主要包括大规模科学数据流实时标注、交互反馈的智能化发现和机器学习可解释模型三个部分。从服务于大视场短时标科学发现的角度出发,为实现从大视场短时标观测装置采集的大规模多源块流数据中智能地发现短时标科学事件,基于图1提出的大视场短时标天文大数据智能分析框架,将主要研究内容概括为:
(1)研究大规模科学数据流实时标注框架。一方面解决科学数据流实时处理问题,满足天文场景中数据流处理的实时性约束,设计分布式实时标注框架,并在每个微服务内部嵌入实时标注模块, 结合一体化验证模块,加速科学事件报警后的验证工作。
(2)研究在线离线交互反馈智能发现机制。提出实时-离线闭环反馈机制,挖掘历史数据中的科学事件,构建全局特征模型,用于加强实时科学事件的发现,并采用数据增强技术,使得实时数据不断地用于增强模型,解决模型精度低问题。
(3)研究面向领域的可解释机器学习模型。结合多模态天文数据,针对系统中采用的机器学习模型给出的科学事件报警信号给出可解释性,主要包括从可视化、语义化、逻辑关系量化三个角度解决模型预测的不确定性问题。
项目工作
· Micro Analysis to Enable Energy-Efficient Database Systems
CPU has been identified as the energy bottleneck for database systems and existing approaches only allow database systems to trade the performance for energy. However, our work show that cutting down the energy cost of database systems without losing the CPU performance is feasible. L1 data cache (L1D cache) consumes 39%-67% of total CPU energy, being the energy bottleneck of database systems. It explains that the typical CPU architecture does not satisfy energy-efficient database systems. A proof-of-concept system is developed with customized CPU architecture and the experimental results show that our solution can not only achieve 60% of peak energy saving but also gain further performance improvement.
· SciDetector: Scientific Event Discovery by Tracking Variable Source Data Streaming
With the development of astronomical telescopes, the volume of astronomical data has been continuously expanded. At the same time, higher requirements and challenges have been put forward for the processing and storage of a large amount of astronomical data. The research on optical transient sources has become a major topic in astronomy. The transient source is a sudden short-term aperiodic astronomical phenomenon including supernovae, gamma bursts, and microgravity lenses. And the time span consists of transient source is form seconds to year. The study of transient sources is of great importance to astronomy and physical phenomena in the universe, so the observation and research of transient sources has become the focus of the astronomical community.
· Automating Characterization Deployment in Distributed Data Stream Management Systems
Dynamically predicting deployment configurations of SPS to ensure the throughput and low resource usage is a great challenge. W e presents OrientStream, a framework for automating characterization deployment in DDSMS using incremental machine learning techniques. By introducing the data-level, query plan-level, operator-level, and cluster-level’s four-level feature extraction mechanism, first use the different query workloads as training sets to predict the resource usage by DDSMS, and select the optimal resource configuration from candidate settings, then migrate the operator state by introducing dynamic reconfiguration. Finally, validating on the open source SPS–Storm.
项目成果
王春凯; 孟小峰*; 应对倾斜数据流在线连接方法, 软件学报, 2017, 29(3):869-882. 第四标注
孟小峰*; 马超红; 杨晨; 机器学习化数据库系统研究综述, 计算机研究与发展, 2019,56(9):1803-1820. 第四标注
Chen Yang; Xiaofeng Meng*; Zhihui Du; Cloud based Real-Time and Low Latency Scientific Event Analysis, 2018 IEEE International Conference on Big Data, Seattle, WA, USA, 2018-12-10至2018-12-13. 第三标注
Chen Yang; Xiaofeng Meng*; Zhihui Du; JiaMing Qiu; Kenan Liang; Yongjie Du; Zhiqiang Duan; Xiaobin Ma; AstroServ Distributed Database for Serving Large-Scale Full Life-Cycle Astronomical Data, International Conference on Big Scientific Data Management (BigSDM), Beijing, China, 2018-11-30至2018-12-01. 第三标注
Yongjie Du; Xiaofeng Meng*; Chen Yang; Zhiqiang Duan; Real-Time Query Enabled by Variable Precision in Astronomy, International Conference on Big Scientific Data Management (BigSDM), Beijing, China, 2019-11-30至2019-12-01. 第三标注
Zhiqiang Duan; Chen Yang*; Xiaofeng Meng; Yongjie Du; Continuous Cross Identification in Large-scale Dynamic Astronomical Data Flow, International Conference on Big Scientific Data Management (BigSDM), Beijing, China, 2019-11-30至2019-12-01. 第三标注
Zhiqiang Duan; Chen Yang; Xiaofeng Meng*; Yongjie Du; Xukang Zhang; Jiaming Qiu; Xiaobin Ma; Zhihui Du; Baoning Niu; Chao Wu; SciDetector: Scientific Event Discovery by Tracking Variable Source Data Streaming, 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, 2019-4-8至2019-4-11. 第二标注

Maintained by WAMDM Administrator() | Copyright © 2007-2017 WAMDM, All rights reserved |