AI资讯日报 - 2025/12/10

While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct framework to model the scaling of benchmark performance from the training budget. We find that for a fixed token-to-parameter ratio, a simple power law can accurately describe the scaling behavior of log accuracy on multiple popular downstream tasks. Our results show that the direct approach extrapolates better than the previously proposed two-stage procedure, which is prone to compounding errors. Furthermore, we introduce functional forms that predict accuracy across token-to-parameter ratios and account for inference compute under repeated sampling. We validate our findings on models with up to 17B parameters trained on up to 350B tokens across two dataset mixtures. To support reproducibility and encourage future research, we release the complete set of pretraining losses and downstream evaluation results.

👨‍🔬 Jakub Krajewski, Amitis Shidani, Dan Busbridge, Sam Wiseman, Jason Ramapuram

Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders

学术论文 ArXiv 重要度: 7

利用稀疏自编码器解耦LLM内部激活，开发RAGLens轻量级幻觉检测器，准确识别不忠实的RAG输出。

👨‍🔬 Guangzhi Xiong, Zhenghao He, Bohan Liu, Sanchit Sinha, Aidong Zhang

No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

学术论文 ArXiv 重要度: 7

提出无标注视觉推理训练框架，结合LLM和VLM验证器通过强化学习和困难负样本挖掘提升推理与定位能力。

👨‍🔬 Damiano Marsili, Georgia Gkioxari

When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation

学术论文 ArXiv 重要度: 7

揭示LLM表格数据生成的隐私风险，提出LevAtt成员推断攻击，展示数字序列记忆导致显著隐私泄露。

👨‍🔬 Joshua Ward, Bochao Gu, Chi-Hua Wang, Guang Cheng

Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents

学术论文 ArXiv 重要度: 6

提出Fed-SE联邦自进化框架，通过局部进化-全局聚合范式实现隐私约束下多环境LLM智能体的稳健知识迁移。

👨‍🔬 Xiang Chen, Yuling Shi, Qizhen Lan, Yuchao Qiu, Xiaodong Gu

Differentially Private Synthetic Data Generation Using Context-Aware GANs

学术论文 ArXiv 重要度: 6

提出ContextGAN上下文感知差分隐私GAN，通过约束矩阵整合领域规则，生成既保护隐私又符合领域约束的合成数据。

👨‍🔬 Anantaa Kotal, Anupam Joshi

DAO-GP Drift Aware Online Non-Linear Regression Gaussian-Process

学术论文 ArXiv 重要度: 6

提出DAO-GP漂移感知在线高斯过程模型，具备内置漂移检测与适应机制，实现完全自适应、无超参数的非线性回归。

👨‍🔬 Mohammad Abu-Shaira, Ajita Rattani, Weishi Shi

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

学术论文 ArXiv 重要度: 6

提出EcomBench电商基准，基于真实用户需求构建，全面评估智能体在现实电商环境中的深度检索、多步推理等核心能力。

👨‍🔬 Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Xuan Zhou, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R., Fung, Yalong Li, Pengjun Xie

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

学术论文 ArXiv 重要度: 5

提出SOLI方法，采用孪生网络架构优化低分辨率图像的潜在嵌入，实现轻量级图像描述，降低计算开销。

👨‍🔬 Jing Jie Tan, Anissa Mokraoui, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum

🤖 AI资讯日报

📊 今日趋势总结

Ask HN: What's the pain using current AI algorithms?

Ask HN: Is the rate of progress in AI exponential?

Why Boring Businesses Outlast AI Hype Cycles

NLP, AI, ML, bots – a passing trend or much more? What's your take on this?

Ask HN: Anyone concerned about NYC Local Law 144?

Ask HN: What would you read to learn about "artificial intelligence"?

Ask HN: Dipping my toes with artificial intelligence and what to expect? (CS)

The AI Crackpot Index

Common Lisp + Machine Learning Internship at Google (Mountain View, CA)

Bioinformatician

Show HN: Startup Raising capital through Book Sales

The Next Bill Gates or Albert Einstein in AI “Chris Clark” – Yourobot

Astra: General Interactive World Model with Autoregressive Denoising

Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs

SAQ: Stabilizer-Aware Quantum Error Correction Decoder

Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders

No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation

Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents

Differentially Private Synthetic Data Generation Using Context-Aware GANs

DAO-GP Drift Aware Online Non-Linear Regression Gaussian-Process

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

📅 历史日报目录