AI资讯日报 - 2025/7/12

👨‍🔬 Haochen Wang, Xiangtai Li, Zilong Huang, Anran Wang, Jiacong Wang, Tao Zhang, Jiani Zheng, Sule Bai, Zijian Kang, Jiashi Feng, Zhuochen Wang, Zhaoxiang Zhang

MIRIX: Multi-Agent Memory System for LLM-Based Agents

学术论文 ArXiv 重要度: 9

提出MIRIX多代理记忆系统，提升LLM代理的记忆和推理能力。

👨‍🔬 Yu Wang, Xi Chen

PyVision: Agentic Vision with Dynamic Tooling

学术论文 ArXiv 重要度: 8

介绍PyVision框架，使MLLMs能自主生成和执行Python工具，提升视觉推理。

👨‍🔬 Shitian Zhao, Haoquan Zhang, Shaoheng Lin, Ming Li, Qilong Wu, Kaipeng Zhang, Chen Wei

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

学术论文 ArXiv 重要度: 8

Video large language models (LLMs) achieve strong video understanding by leveraging a large number of spatio-temporal tokens, but suffer from quadratic computational scaling with token count. To address this, we propose a training-free spatio-temporal token merging method, named STTM. Our key insight is to exploit local spatial and temporal redundancy in video data which has been overlooked in prior work. STTM first transforms each frame into multi-granular spatial tokens using a coarse-to-fine search over a quadtree structure, then performs directed pairwise merging across the temporal dimension. This decomposed merging approach outperforms existing token reduction methods across six video QA benchmarks. Notably, STTM achieves a 2$\times$ speed-up with only a 0.5% accuracy drop under a 50% token budget, and a 3$\times$ speed-up with just a 2% drop under a 30% budget. Moreover, STTM is query-agnostic, allowing KV cache reuse across different questions for the same video. The project page is available at https://www.jshyun.me/projects/sttm.

👨‍🔬 Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim

EXPO: Stable Reinforcement Learning with Expressive Policies

学术论文 ArXiv 重要度: 8

提出EXPO算法，通过表达性策略优化提升强化学习的样本效率。

👨‍🔬 Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn

Scaling RL to Long Videos

学术论文 ArXiv 重要度: 8

介绍LongVILA框架，扩展视觉语言模型至长视频推理，提升效率。

👨‍🔬 Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han

Single-pass Adaptive Image Tokenization for Minimum Program Search

学术论文 ArXiv 重要度: 7

提出KARL单通自适应标记器，预测图像适当标记数，提升效率。

👨‍🔬 Shivam Duggal, Sanghyun Byun, William T. Freeman, Antonio Torralba, Phillip Isola

Multigranular Evaluation for Brain Visual Decoding

学术论文 ArXiv 重要度: 7

引入BASIC框架，多粒度评估脑视觉解码方法的结构保真度和语义对齐。

👨‍🔬 Weihao Xia, Cengiz Oztireli

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

学术论文 ArXiv 重要度: 7

提出Geometry Forcing方法，增强视频扩散模型的3D一致性。

👨‍🔬 Haoyu Wu, Diankun Wu, Tianyu He, Junliang Guo, Yang Ye, Yueqi Duan, Jiang Bian

Reinforcement Learning with Action Chunking

学术论文 ArXiv 重要度: 7

提出Q-chunking方法，通过动作分块提升长视野稀疏奖励任务的强化学习效率。

👨‍🔬 Qiyang Li, Zhiyuan Zhou, Sergey Levine

Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology

学术论文 ArXiv 重要度: 6

评估显示，结合RAG的SLMs在风湿病临床决策支持中表现优于LLMs。

👨‍🔬 Sabine Felde, Rüdiger Buchkremer, Gamal Chehab, Christian Thielscher, Jörg HW Distler, Matthias Schneider, Jutta G. Richter

Why is Your Language Model a Poor Implicit Reward Model?

学术论文 ArXiv 重要度: 6

研究语言模型作为隐式奖励模型的泛化差距原因。

👨‍🔬 Noam Razin, Yong Lin, Jiarui Yao, Sanjeev Arora

🤖 AI资讯日报

📊 今日趋势总结

50% Cheaper GPUs for cloud-computing / Saving devs 50% compared to AWS

Ask HN: Is the rate of progress in AI exponential?

Ask HN: What's the pain using current AI algorithms?

NLP, AI, ML, bots – a passing trend or much more? What's your take on this?

Common Lisp + Machine Learning Internship at Google (Mountain View, CA)

Ask HN: Dipping my toes with artificial intelligence and what to expect? (CS)

The AI Crackpot Index

The Next Bill Gates or Albert Einstein in AI “Chris Clark” – Yourobot

Ask HN: Thoughts on grad school? (CS PhD)

Ask HN: Anyone concerned about NYC Local Law 144?

Bioinformatician

Show HN: Startup Raising capital through Book Sales

neuml/txtai

lukasmasuch/best-of-ml-python

apple/turicreate

DLR-RM/stable-baselines3

NVIDIA/FastPhotoStyle

openai/spinningup

khangich/machine-learning-interview

kjw0612/awesome-deep-vision

lengstrom/fast-style-transfer

karpathy/convnetjs

rushter/MLAlgorithms

srush/GPU-Puzzles

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

MIRIX: Multi-Agent Memory System for LLM-Based Agents

PyVision: Agentic Vision with Dynamic Tooling

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

EXPO: Stable Reinforcement Learning with Expressive Policies

Scaling RL to Long Videos

Single-pass Adaptive Image Tokenization for Minimum Program Search

Multigranular Evaluation for Brain Visual Decoding

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Reinforcement Learning with Action Chunking

Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology

Why is Your Language Model a Poor Implicit Reward Model?

📅 历史日报目录