AI资讯日报 - 2026/5/13

👨‍🔬 Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S Dhillon, Rishabh Agarwal, Devvrit Khatri

Reward Hacking in Rubric-Based Reinforcement Learning

学术论文 ArXiv 重要度: 8

研究基于评分的RL中的奖励破解，指出强验证器无法完全消除，需关注准则设计缺陷。

👨‍🔬 Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang, Anisha Gunjal, Bing Liu, Yunzhong He

Solve the Loop: Attractor Models for Language and Reasoning

学术论文 ArXiv 重要度: 8

引入吸引子模型，通过隐式微分求解固定点，实现高效迭代推理，小模型超越大模型。

👨‍🔬 Jacob Fein-Ashley, Paria Rashidinejad

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

学术论文 ArXiv 重要度: 7

提出KV-Fold，无需训练即可通过KV缓存递归实现长上下文推理，检索准确率100%。

👨‍🔬 Alireza Nadali, Patrick Cooper, Ashutosh Trivedi, Alvaro Velasquez

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

学术论文 ArXiv 重要度: 7

提出ToolCUA，通过分阶段训练学习GUI与工具混合操作的最佳路径，准确率提升66%。

👨‍🔬 Xuhao Hu, Xi Zhang, Haiyang Xu, Kyle Qiao, Jingyi Yang, Xuanjing Huang, Jing Shao, Ming Yan, Jieping Ye

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

学术论文 ArXiv 重要度: 6

提出OmniNFT，通过模态感知的RL框架改进音视频联合生成，解决多目标不一致和梯度失衡。

👨‍🔬 Guohui Zhang, XiaoXiao Ma, Jie Huang, Hang Xu, Hu Yu, Siming Fu, Yuming Li, Zeyue Xue, Lin Song, Haoyang Huang, Nan Duan, Feng Zhao

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

学术论文 ArXiv 重要度: 6

审计LLM生成的政治话语，发现其情感更负面、结构更规律，但缺乏人口级真实性。

👨‍🔬 Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

学术论文 ArXiv 重要度: 5

在编码器继续预训练中临时切换为因果语言建模，可提升下游任务性能，低层影响大。

👨‍🔬 Rian Touchent, Eric de la Clergerie

Towards Affordable Energy: A Gymnasium Environment for Electric Utility Demand-Response Programs

学术论文 ArXiv 重要度: 4

发布DR-Gym环境，用于强化学习训练电力需求响应策略，模拟极端电价和建筑需求。

👨‍🔬 Jose E. Aguilar Escamilla, Lingdong Zhou, Xiangqi Zhu, Huazheng Wang

Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

学术论文 ArXiv 重要度: 4

提供6G移动性真实数据集，包含切换、波束管理和定时提前测量，支持AI模型训练。

👨‍🔬 Mannam Veera Narayana, Rohit Singh, Deepa M. R, Radha Krishna Ganti

🤖 AI资讯日报

📊 今日趋势总结

MIT Non-AI License

The AI Crackpot Index

Ask HN: Anyone concerned about NYC Local Law 144?

Ask HN: Is the rate of progress in AI exponential?

Why Boring Businesses Outlast AI Hype Cycles

Ask HN: What's the pain using current AI algorithms?

Ask HN: What would you read to learn about 'artificial intelligence'?

NLP, AI, ML, bots – a passing trend or much more?

Show HN: Startup Raising capital through Book Sales

Common Lisp + Machine Learning Internship at Google

The Next Bill Gates or Albert Einstein in AI 'Chris Clark'

Bioinformatician

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Learning, Fast and Slow: Towards LLMs That Adapt Continually