
WAMDM实验室往年例会检索入口(SeminarDB) 点击跳转检索网页

2024-06-13 Value Alignment of LLMs by 徐冰冰
Abstract: Large language models are trained to engage in dialogue like humans, but not all conversations in large language models align with human values. Motivated by human value principles, we propose Value Alignment, an approach for combining value principles with large language models through In-Context Learning to make the output from LLM more in line with human value principles. We hope to explore the importance of different value principles on LLM through this method. In this representation, I will introduce my work about LLMs value alignment.
[1]Ji, Jiaming, et al. "Beavertails: Towards improved safety alignment of llm via a human-preference dataset." Advances in Neural Information Processing Systems 36 (2024).
[2]Ganguli, Deep, et al. "The capacity for moral self-correction in large language models." arXiv preprint arXiv:2302.07459 (2023).
2024-06-13 联邦学习下的隐私审计方法 by 许婧楠
Abstract: 现有的隐私审计大多数关注于中心化差分隐私场景中的DP-SGD算法。在联邦学习中,DP-FedAvg和DP-FedSGD算法是常用的差分隐私算法。若将中心化差分隐私算法的审计方法应用于联邦学习场景下,审计代价会成倍增长。本次报告介绍两种在联邦学习场景下的隐私审计方法,并提出其对应的挑战。
[1]Maddock S, Sablayrolles A, Stock P. CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning[J]. arXiv preprint arXiv:2210.02912, 2022.
[2]Andrew G, Kairouz P, Oh S, et al. One-shot Empirical Privacy Estimation for Federated Learning[J]. arXiv preprint arXiv:2302.03098, 2023.

W427: |
2024.2.22 会议地点:理工配楼101会议室 |
W427-1:实验室全体同学 |
题目:学期计划和假期总结 |
W428: |
2024.2.29 会议地点:理工配楼101会议室 |
W428-1:彭迎涛 (Web Group) |
题目:Denoise Alignment with Large Language Model for Recommendation 摘要:Recently, there has been a groundbreaking application of large language models (LLMs) in recommendation systems, allowing the integration of rich textual information into traditional ID-based recommenders. However, effective implementation in graph recommendation systems still requires addressing challenges such as aligning structural features with textual features and handling noise issues. To tackle these challenges, we propose a denoise alignment framework (DALR) aiming to align structural representation with textual representation and mitigate the effects of noise. Specifically, We propose a modeling framework that integrates representation of graph structure with LLMs to capture to capture intricate user-item interactions. We also design an alignment paradigm to enhance representation performance by aligning semantic information via LLMs and structural features from graphs. Additionally, we introduce a contrastive learning component to eliminate noise and improve model performance. 知识概念:denoise alignment;large language model;graph neural network 参考文献: [1]Wei W, Ren X, Tang J, et al. Llmrec: Large language models with graph augmentation for recommendation[J]. arXiv preprint arXiv:2311.00423, 2023. [2]Ren X, Wei W, **a L, et al. Representation learning with large language models for recommendation[J]. arXiv preprint arXiv:2310.15950, 2023. |
W429: |
2024.3.7 会议地点:理工配楼101会议室 |
W429-1:刘立新 (privacy group) |
题目:Beyond Parameter Protection: Enhancing Privacy through Anonymous Authentication in Blockchain-based Federated Learning 摘要:区块链驱动的联邦学习从区块链中动态选择“委员会”执行参数聚合,不依赖于第三方,也没有单点失败问题。为保护链上各轮中间参数,现有方法使用差分隐私或多方安全计算对参数进行保护。这影响了模型精度、或者需要付出复杂密码学计算代价去提高模型鲁棒性。为此,本文提出通过身份匿名认证解决此问题。具体而言,在每轮训练中,用户使用不同的身份进行匿名认证,用户的参数和身份具有不可链接性。“委员会”不能识别同一用户的多轮参数,因此可以依据参数明文聚合和提高模型鲁棒性。 知识概念:Authentication,Verifiable Random Function 参考文献: [1]Xiaohan Yuan, Jiqiang Liu, Bin Wang, Wei Wang, Bin Wang, Tao Li, Xiaobo Ma, Witold Pedrycz, FedComm: A Privacy-Enhanced and Efficient Authentication Protocol for Federated Learning in Vehicular Ad-hoc Networks. IEEE Transactions on Information Forensics and Security,2024,19:777-792 [2] 王恺祺, 洪睿琦, 毛云龙, 仲盛.基于区块链构建安全去中心化的邦学习方案[J].中国科学:信息科学, 2024, 54(2):316-334. [3]Z. Zhou, C. Xu, M. Wang, X. Kuang, Y. Zhuang, and S. Yu, “A multi- shuffler framework to establish mutual confidence for secure federated learning,” IEEE Trans. Dependable Secure Comput.,2022, 20(5):4230–4244. |
W429-2:张旭康 (cloud group) |
题目:G-IQRE:GPU-extended Intra-Query runtime elasticity 摘要:用尽可能少的计算资源来取得尽可能满意的查询速率是当前云数据库研究的热点。一个查询使用的计算资源决定于查询的并行度(DOP)设置。然而一个查询的optimal DOP难以直接确定,它会受到集群的配置、查询的结构、workload变化等的影响。这也导致了用户无法静态决定需要使用的计算资源的数量。Intra-Query runtime scaling 用于解决这个问题。它通过运行时改变查询DOP的方式来调节查询速率,从而动态决定合适的计算资源数量。之前IQRE考虑的都是同构的计算节点,现在我们将IQRE扩展到异构计算节点上,即GPU节点。GPU可以极大加速查询的计算速率。随着AI技术的发展,用户除了数据分析以外往往还会伴随人工智能、图像处理等相关的业务。用户会购买和租用大量GPU服务器。GIQRE考虑尽可能使用用户的异构计算资源来加速查询,并利用查询弹性尽可能降低对其他业务的影响。本次组会主要介绍当前GIQRE的研究进展。 知识概念:GPU、Cloud-native、Elasticity 参考文献: [1]Zhang, H., Liu, Y., & Yan, J. (2023). Cost-Intelligent Data Analytics in the Cloud. ArXiv, abs/2308.09569. |
W430: |
2024.3.14 会议地点:理工配楼101会议室 |
W430-1:蒋希文 (cloud group) |
报告题目:Causality-Inspired Spatial-Temporal Explanations for Dynamic Graph Neural Networks 报告摘要:动态图神经网络(DyGNNs)是动态图研究的热门话题。但是往往由于透明度低,人们很难从结果中获得合理的依据。尽管已经有许多现有研究致力于调查图神经网络(GNNs)的可解释性,但由于动态图中复杂的时空相关性,实现DyGNNs的可解释性是至关重要的挑战。为此,本文提出了一种基于结构因果模型(SCM)的生成模型,通过识别不重要的、静态的和动态的因果关系来探索DyGNN预测的潜在理念。为了实现这一目标,本文需要解决两项关键任务,包括(1)解开复杂的因果关系,和(2)将DyGNNs的时空解释与SCM架构相匹配。为了应对这些挑战,本文的方法结合了对比学习模块来解开非原因关系和原因关系,以及动态相关模块来分离动态和静态原因关系。进一步,本文开发了基于动态VGAE的框架,该框架为空间可解释性生成因果和动态掩码,并通过因果发现识别时间维度上的动态关系,从而实现时间可解释性。在合成和真实世界数据集上,本文方法取得了优越的性能。 知识概念:动态图神经网络;模型可解释性;因果推断 参考文献: [1] Kesen Zhao, Liang Zhang. Causality-Inspired Spatial-Temporal Explanations for Dynamic Graph Neural Networks. ICLR 2024 [2] Rex Ying, Jure Leskovec. GNNExplainer: Generating Explanations for Graph Neural Networks. NIPS 2019 |
W430-2:吴弘博 (cloud group) |
报告题目:Cross-Domain Data Fusion in Urban Region Representation 报告摘要:城市计算需要整合来自各种来源和模式的广泛且多样化的数据集,也称为跨域数据融合,这是由于认识到仅依赖单一数据源或模式可能是不够的统筹推进城市任务。最近,利用多模态数据学习城市区域表示变得越来越流行,以深入了解城市中各种社会经济特征的分布。本次汇报主要从跨模态数据融合的角度介绍城市区域表征的相关方法。 知识概念:Urban Representation Learning Data fusion Multi-modal data 参考文献: [1] Fu, Y., Wang, P., Du, J., Wu, L., Li, X., 2019. Efficient region embedding with multi-view spatial networks: A perspective of locality-constrained spatial autocorrelations, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 906–913. [2] Zhang, M., Li, T., Li, Y., Hui, P., 2021b. Multi-view joint graph representation learning for urban region embedding, in: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Yokohama, Japan. pp. 4431–4437. [3] Zhang, L., Long, C., Cong, G., 2023c. Region embedding with intra and inter-view contrastive learning. IEEE Transactions on Knowledge and Data Engineering 35, 9031–9036. doi:10.1109/TKDE.2022.3220874. [4] Li, Yi, et al. "Urban region representation learning with OpenStreetMap building footprints." Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023. |
W431: |
2024.3.21 会议地点:理工配楼101会议室 |
W431-1:郝新丽 (cloud group) |
报告题目:Are Transformers Effective for Time Series Forecasting? 报告摘要:Transformer已经被广泛应用于时间序列分析任务中,成为了一种基础架构。但是最近有研究表明,在面对长时预测任务时,Transformer的能力被夸大了,甚至不及简单的线性模型。但同时也有另一种研究表明,Transformer的能力没有被夸大,而是未被正确使用。本次汇报主要介绍这两种相反的学术观点。 知识概念:permutation-invariant; iterated multi-step (IMS) forecasting; direct multistep (DMS) 参考文献: [1] Zeng A, Chen M, Zhang L, et al. Are transformers effective for time series forecasting? [C]//Proceedings of the AAAI conference on artificial intelligence. 2023, 37(9): 11121-11128. [2] Liu Y, Hu T, Zhang H, et al. itransformer: Inverted transformers are effective for time series forecasting [J]. arXiv preprint arXiv:2310.06625, 2023. [3] Das A, Kong W, Leach A, et al. Long-term forecasting with tide: Time-series dense encoder [J]. arXiv preprint arXiv:2304.08424, 2023. |
W431-2:但唐朋 (cloud group) |
报告题目:时变路网上的个性化最短路径查询研究 报告摘要:本次汇报我将介绍时变路网上的个性化最短路径查询研究路线。为了解决这一新的查询问题,我们设计了一个全新的二阶段查询框架。在第一阶段,我们利用规则层次树筛选符合用户个性化查询的候选路径集合,计算路网中的高频访问对象。在第二阶段,我们针对现有查询模式和索引结构进行分析和优化,并利用改进后的标签索引来计算最终的最短路径。 知识概念:路径规划;树分解 参考文献: [1] Dian Ouyang, Dong Wen, Lu Qin, Lijun Chang, Xuemin Lin, Ying Zhang: When hierarchy meets 2-hop-labeling: efficient shortest distance and path queries on road networks. VLDB J. 32(6): 1263-1287 (2023) [2] Dian Ouyang, Long Yuan, Lu Qin, Lijun Chang, Ying Zhang, Xuemin Lin: Efficient Shortest Path Index Maintenance on Dynamic Road Networks with Theoretical Guarantees. Proc. VLDB Endow. 13(5): 602-615 (2020) |
W432: |
2024.3.28 会议地点:理工配楼101会议室 |
W432-1:李梓童(privacy group) |
报告题目:图神经网络中的遗忘学习方法(unlearning in GNN) 报告摘要:本次汇报我将介绍针对图神经网络的遗忘学习方法。图神经网络相较传统神经网络,需要考虑节点与节点之间的关联性等,传统神经网络中的遗忘学习方法需在图神经网络中做一定的适应和改变。本次汇报将介绍两种GNN中的遗忘学习方法:一,GraphEraser,沿用了传统神经网络的分块训练-聚合思想,而针对图数据改善了分块算法和聚合算法;二,GIF,沿用了传统的利用泰勒展开计算遗忘后模型的方式,而在计算过程中多考虑了被删除子图所影响的其他节点。 知识概念:Machine Unlearning;GNN;Sharding 参考文献: [1] Chen M, Zhang Z, Wang T, et al. Graph unlearning[C]//Proceedings of the 2022 ACM SIGSAC conference on computer and communications security. 2022: 499-513. [2] Wu J, Yang Y, Qian Y, et al. Gif: A general graph unlearning strategy via influence function[C]//Proceedings of the ACM Web Conference 2023. 2023: 651-661. |
W432-2:李晨阳 (cloud group) |
报告题目:Explore the model's predictions on data through data modeling 报告摘要:机器学习模型表现出什么样的偏见?它利用了什么相关性?它在哪些子集中表现良好(或较差)?最近机器学习领域的一项研究表明,这些问题的答案存在于学习算法和所使用的训练数据中。然而,通常很难理解算法和数据是如何结合起来产生模型预测的。因此,本次报告介绍一个概念框架——数据建模,用于根据训练数据分析模型的行为。 知识概念:反事实;代理建模 参考文献: [1] Ilyas A, Park S M, Engstrom L, et al. Datamodels: Predicting predictions from training data[J]. arXiv preprint arXiv:2202.00622, 2022. [2] Yoon J, Arik S, Pfister T. Data valuation using reinforcement learning[C]//International Conference on Machine Learning. PMLR, 2020: 10842-10851. |
W433: |
2024.4.11 会议地点:理工配楼101会议室 |
W433-1:李维 (cloud group) |
报告题目:Early Classification of Time Series in The Real World 报告摘要:The time-sensitive domains makes decisions early, and it is worth sacrificing some accuracy of time-series classification in favor of earlier predictions. In spite of the fact that there are dozens of papers on early classification of time series, it is not clear that any of them could ever work in a real-world setting. The problem is not with the algorithms per se but with the problem description. However, many existing researches of early classification of time series focus more on balance between earliness and accuracy by the optimization of method, but ignores the limitation of problem definition. What is the new definition? Is it valuable? I will answer in this research representation. 知识概念:Early Classification; Time Series; Definition 参考文献: [1] R. Wu et al., “When is Early Classification of Time Series Meaningful?” IEEE Transactions on Knowledge and Data Engineering, 2021. [2] C. Sun et al., “A Ranking-Based Cross-Entropy Loss for Early Classification of Time Series,” IEEE Transactions on Neural Networks and Learning Systems, 2024. |
W434: |
2024.4.18 会议地点:理工配楼101会议室 |
W434-1:徐冰冰 (privacy group) |
报告题目:Align on the Fly: Adapting Chatbot Behavior to Established Norms 报告摘要:The development trend of Alignment is align large language models with the ever-changing, complex, and diverse human values (e.g., social norms) across time and locations. This presents a challenge to existing alignment techniques, such as supervised fine-tuning, which internalize values within model parameters. Today I will share a paper proposed an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way. It employs an external memory to store established rules for alignment, which can constrain LLMs’ behaviors without further training, allowing for convenient updates and customization of human values. 知识概念:AI alignment;Human values 参考文献: [1] Xu C, Chern S, Chern E, et al. Align on the fly: Adapting chatbot behavior to established norms[J]. arXiv preprint arXiv:2312.15907, 2023. [2] Forbes M, Hwang J D, Shwartz V, et al. Social chemistry 101: Learning to reason about social and moral norms[J]. arXiv preprint arXiv:2011.00620, 2020. |
W434-2:许婧楠 (privacy group) |
报告题目:Auditing Private Prediction 报告摘要:当前隐私审计方法大多关注于在模型训练过程中加入噪声的差分隐私算法,例如DP-SGD、DP-FedSGD等。有一类利用Private Prediction框架的算法仅在模型预测过程中利用noisy argmax算法进行加噪。但这些方法通常满足于RDP定义,而RDP无法直接从假设检验角度进行表示。本次组会将介绍Auditing Private Prediction方法,该审计方法对仅在模型预测过程中加噪的隐私算法进行审计,并利用renyi散度的2-cut形式获取在预测过程中加噪的隐私算法的经验隐私下界。 知识概念:Private Prediction Framework;Renyi divergence;2-cut of Renyi 参考文献: [1] Chadha K, Jagielski M, Papernot N, et al. Auditing Private Prediction[J]. arXiv preprint arXiv:2402.09403, 2024. [2] Wang J, Schuster R, Shumailov I, et al. In differential privacy, there is truth: On vote leakage in ensemble private learning[J]. arXiv preprint arXiv:2209.10732, 2022. |
W435: |
2024.4.24 会议地点:理工配楼101会议室 |
W435-1:彭迎涛 (web group) |
报告题目:LLM-based Agent for Recommenders 报告摘要:大型语言模型 (LLMs)正在成为增强推荐系统效果的有前景方法,其中基于提示和基于微调的方法都已得到广泛研究。然而,由于缺乏特定于任务的反馈,大多已有的方法难以获得最佳提示来实现LLMs的推理,从而导致推荐的结果不令人满意。此外,一些研究试图采用特定领域的知识来微调LLMs,但它们面临着计算成本高和对开源骨干网的依赖等限制。本次报告,我们调研了基于大语言模型的Agent在推荐系统上的研究,旨在探索和思考解决上述现有问题的方法。 知识概念:LLM-based agent; large language model; recommendation system 参考文献: [1] Wu L, Zheng Z, Qiu Z, et al. A survey on large language models for recommendation[J]. arXiv preprint arXiv:2305.19860, 2023. [2] Xi Z, Chen W, Guo X, et al. The rise and potential of large language model based agents: A survey[J]. arXiv preprint arXiv:2309.07864, 2023. |
W436: |
2024.5.9 会议地点:理工配楼101会议室 |
W436-1:但唐朋 (cloud group) |
报告题目:Can Large Language Models be Good Graph Structure Learner? 报告摘要:Graph Structure Learning (GSL) focuses on capturing intrinsic dependencies and interactions among nodes in graph-structured data by generating novel graph structures. Graph Neural Networks (GNNs) have emerged as promising GSL solutions, utilizing recursive message passing to encode node-wise inter-dependencies. However, many existing GSL methods heavily depend on explicit graph structural information as supervision signals, leaving them susceptible to challenges such as data noise and sparsity. In this work, we propose GraphEdit, an approach that leverages large language models (LLMs) to learn complex node relationships in graph-structured data. By enhancing the reasoning capabilities of LLMs through instruction-tuning over graph structures, we aim to overcome the limitations associated with explicit graph structural information and enhance the reliability of graph structure learning. Our approach not only effectively denoises noisy connections but also identifies node-wise dependencies from a global perspective, providing a comprehensive understanding of the graph structure. We conduct extensive experiments on multiple benchmark datasets to demonstrate the effectiveness and robustness of GraphEdit across various settings. 知识概念:LLMs;图表征;图学习 参考文献: [1] GraphEdit: Large Language Models for Graph Structure Learning. Zirui Guo, Lianghao Xia, Yanhua Yu, Yuling Wang, Zixuan Yang, Wei Wei, Liang Pang, Tat-Seng Chua, Chao Huang. In: arXiv:2402.15183 [2] Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs. Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Suhang Wang, Yu Meng, Jiawei Han. In: arXiv:2404.07103. |
W436-2:张旭康 (cloud group) |
报告题目:经济成本导向的云上数据分析综述 报告摘要:越来越多的用户使用云数据库、云分析框架来存储和分析数据。在这个趋势下,用户如何选择适合自己业务的配置、确定合适存储、计算资源量等成本控制问题成为了当前一个研究热点。在调研总结了近年来云上数据分析的成本控制、性能成本权衡相关的研究后,本次组会以资源配置、资源缩放和性能成本权衡三个方面进行文献综述。并分析当前云上数据分析在经济成本方面的问题以及给出对未来发展趋势的展望。 知识概念:云原生;大数据分析;成本 参考文献: [1] Daniela Florescu ,et al :Rethinking cost and performance of database systems. SIGMOD Rec. 38(1): 43-48 (2009) [2] Huanchen Zhang, et al : Cost-Intelligent Data Analytics in the Cloud. CIDR 2024. |
W437: |
2024.5.16 会议地点:理工配楼101会议室 |
W437-1:蒋希文 (cloud group) |
报告题目:MoE概述 报告摘要:随着人工智能技术的不断发展,混合专家模型(Mixture of Experts,简称MoE)作为一种先进的神经网络架构,已经在多个领域取得了显著的应用成果。MoE模型是一种基于分而治之策略的神经网络架构,它将复杂的问题分解为多个子问题,每个子问题由一个独立的模型(称为专家)进行处理。这些专家模型可以是任意类型的神经网络,如全连接网络、卷积神经网络或循环神经网络等。MoE模型的核心在于如何有效地结合这些专家模型的输出,以得到最终的预测结果。这通常通过一个门控机制来实现,门控机制根据输入数据的特点选择最合适的专家模型进行预测,并将各个专家的输出进行加权组合,得到最终的输出结果。本次报告旨在介绍MoE的基本背景和原理,并且引入Switch Transformers以介绍当前MoE的研究现状。 知识概念:混合专家模型;大模型训练 参考文献: [1] Jacobs, Robert A., et al. "Adaptive mixtures of local experts." Neural computation 3.1 (1991): 79-87. [2] Fedus, William, Barret Zoph, and Noam Shazeer. "Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity." Journal of Machine Learning Research 23.120 (2022): 1-39. [3] Shazeer, Noam, et al. "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer." arXiv preprint arXiv:1701.06538 (2017). |
W437-2:吴弘博 (cloud group) |
报告题目:ST-LoRA: Low-rank Adaptation for Spatio-Temporal Forecasting 报告摘要:时空预测在现实世界的动态系统中至关重要,它利用来自不同地点的历史数据来预测未来的变化。 现有的方法通常优先考虑开发复杂的神经网络来捕获数据的复杂依赖性,但这些方法在准确性上并没有显示出持续的改进,并且往往忽略了节点的异质性,这阻碍了为不同区域节点定制预测模块的能力。本次汇报介绍一种新颖的低秩适配框架,作为现有时空预测模型的现成插件,称为ST-LoRA,通过节点级别的调整来缓解上述问题。 知识概念:Spatio-temporal forecasting;Low-rank Adaptation;spatial and temporal heterogeneity 参考文献: [1] Hu, Edward J., et al. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021). [2] Ruan, Weilin, et al. "Low-rank Adaptation for Spatio-Temporal Forecasting." arXiv preprint arXiv:2404.07919 (2024).[4] Li, Yi, et al. "Urban region representation learning with OpenStreetMap building footprints." Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023. |
W438: |
2024.5.23 会议地点:理工配楼101会议室 |
W438-1:李梓童 (privacy group) |
报告题目:Reinforcement Unlearning 报告摘要:强化学习(Reinforcement Learning, RL)是机器学习的一个分支,其目标是学习一个策略,使智能体(Agent)在与环境的交互过程中,通过执行一系列动作,最大化累积的奖励或实现特定的目标。本次汇报将会从遗忘学习和强化学习的基础概念出发,介绍两种在强化学习场景进行遗忘的方法:Decremental Reinforcement Learning-based Method和Environment Poisoning-based Method,两者分别作用于损失函数和状态转移函数,使智能体达到遗忘的目的。 知识概念:强化学习;遗忘学习 参考文献: [1] Ye, D., Zhu, T., Zhu, C., Wang, D., Shen, S., & Zhou, W. (2023). Reinforcement Unlearning. arXiv preprint arXiv:2312.15910. |
W439: |
2024.5.30 会议地点:理工配楼101会议室 |
W439-1:郝新丽 (cloud group) |
报告题目:ICDE 2024参会报告 报告摘要:本次报告首先介绍本届ICDE录用论文的基本情况,进而介绍几个热点与前沿问题,包括AI Training Resilience、Tree of Thought、LLM for Data Management和LLM for Time Series四个方面。 知识概念:Resilience;Chain of Thought;Tree of Thought 参考文献: [1]Yao S, Yu D, Zhao J, et al. Tree of thoughts: Deliberate problem solving with large language models[J]. Advances in Neural Information Processing Systems, 2024, 36. [2]Applications and Challenges for Large Language Models: From Data Management Perspective[C]//2024 IEEE 40th International Conference on Data Engineering (ICDE). |
W439-2:刘立新 (privacy group) |
报告题目:区块链驱动的物联网数据共享技术 报告摘要:物联网的普及和发展为智慧城市和智慧医疗提供了基础,给人类生产生活提供更高效、更高质和个性化的服务。这依赖于大量密集物联网设备的部署和复杂的物联网数据共享流通。与此同时,由于这些数据包含大量的个人隐私信息,给物联网数据安全共享提出了新挑战。本次汇报介绍我们在区块链驱动的物联网数据共享技术方面的一些思考。 知识概念:ID-based signature;Authentication;Broadcast encryption 参考文献: [1] Wang F, Cui J, Zhang Q, et al. Blockchain-Based Secure Cross-Domain Data Sharing for Edge-Assisted Industrial Internet of Things[J]. IEEE Transactions on Information Forensics and Security, 2024. [2] Lai J, Susilo W, Deng R H, et al. SDSS: Sequential Data Sharing System in IoT[J]. IEEE Transactions on Information Forensics and Security, 2023. [3]Lin C, He D, Zeadally S, et al. Blockchain-based data sharing system for sensing-as-a-service in smart cities[J]. ACM Transactions on Internet Technology (TOIT), 2021, 21(2): 1-21. [4] Shafagh H, Burkhalter L, Ratnasamy S, et al. Droplet: Decentralized authorization and access control for encrypted data streams[C]//29th USENIX Security Symposium (USENIX Security 20). 2020: 2469-2486. |
W440: |
2024.6.6 会议地点:理工配楼101会议室 |
W440-1:李晨阳 (cloud group) |
报告题目:Expressive constraints under time series 报告摘要:Traditional constraint-based methods have made great contributions to relational data problems. Time series data generated by thousands of sensors also has data problems. However, the existing constraints for relational data are not applicable to time series data problems. Therefore, some studies are devoted to proposing expressive constraints supporting time series data to solve its data problems. This presentation focuses on the expressive constraints under time series data. 知识概念:row-constraints;column-constraints;evolutionary algorithm 参考文献: [1]A. Fariha, A. Tiwari, A. Radhakrishna, et al, “Conformance constraint discovery: Measuring trust in data-driven sysems,” in SIGMOD. ACM, 2021. [2]S. Song, F. Gao, A. Zhang, et al, “Stream data cleaning under speed and acceleration constraints,” ACM TDS, 2021. [3]X. Ding, G. Li, H. Wang, et al, “Time series data cleaning under expressive constraints on both rows and columns,” in ICDE, 2024. |
W440-2:李维 (cloud group) |
报告题目:From Recognition to Hallucination: How to Understand Label-Oriented Sequence Data via LLMs 报告摘要:With the emergence of Large Language Models (LLMs), it is important to transform numeric-oriented sequences into understandable label-oriented sequences. The numeric-oriented sequence data naturally appears in many real-world applications such as sensors of smart homes and observation of medical, which are mostly complex and multidimensional data and change with the distribution of input sequence, ranging from packet data in network traffic analysis to time-series data in time-sensitive medical diagnosis. Adaptive classification models or manual annotation methods transform these numeric-oriented sequences into label-oriented sequences. However, after LLMs continuously recognize natural language formed from label-oriented data, semantic drift occurs in long-text generation tasks. In this representation, I will discuss a recent paper on semantic drift and summarize some thoughts of LLMs based on my existing work. 知识概念:Label-Oriented Sequence Data;Semantic Drift;LLMs 参考文献: [1] A. Spataru et al., "Know When To Stop: A Study of Semantic Drift in Text Generation." arXiv preprint arXiv:2404.05411 (2024). [2] L. Huang et al., "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Question." arXiv preprint arXiv:2311.05232 (2023). [4] Shafagh H, Burkhalter L, Ratnasamy S, et al. Droplet: Decentralized authorization and access control for encrypted data streams[C]//29th USENIX Security Symposium (USENIX Security 20). 2020: 2469-2486. |
W441: |
2024.6.13 会议地点:理工配楼101会议室 |
W441-1:徐冰冰 (privacy group) |
报告题目:Value Alignment of LLMs 报告摘要:Large language models are trained to engage in dialogue like humans, but not all conversations in large language models align with human values. Motivated by human value principles, we propose Value Alignment, an approach for combining value principles with large language models through In-Context Learning to make the output from LLM more in line with human value principles. We hope to explore the importance of different value principles on LLM through this method. In this representation, I will introduce my work about LLMs value alignment. 知识概念:Value Alignment;LLMs;In-context Learning 参考文献: [1]Ji, Jiaming, et al. "Beavertails: Towards improved safety alignment of llm via a human-preference dataset." Advances in Neural Information Processing Systems 36 (2024). [2]Ganguli, Deep, et al. "The capacity for moral self-correction in large language models." arXiv preprint arXiv:2302.07459 (2023). |
W441-2:许婧楠 (privacy group) |
报告题目:联邦学习下的隐私审计方法 报告摘要:现有的隐私审计大多数关注于中心化差分隐私场景中的DP-SGD算法。在联邦学习中,DP-FedAvg和DP-FedSGD算法是常用的差分隐私算法。若将中心化差分隐私算法的审计方法应用于联邦学习场景下,审计代价会成倍增长。本次报告介绍两种在联邦学习场景下的隐私审计方法,并提出其对应的挑战。 知识概念:联邦学习;高维向量失效性;隐私审计 参考文献: [1]Maddock S, Sablayrolles A, Stock P. CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning[J]. arXiv preprint arXiv:2210.02912, 2022. [2]Andrew G, Kairouz P, Oh S, et al. One-shot Empirical Privacy Estimation for Federated Learning[J]. arXiv preprint arXiv:2302.03098, 2023. |
2024.6.20 会议地点:理工配楼101会议室 |
|
W442-1:实验室全体同学 |
报告题目:学期总结和假期计划 |

Maintained by WAMDM Administrator() | Copyright © 2007-2017 WAMDM, All rights reserved |