AI资讯日报 - 2026/3/10

👨‍🔬 Akshay Gulati, Kanha Singhania, Tushar Banga, Parth Arora, Anshul Verma, Vaibhav Kumar Singh, Agyapal Digra, Jayant Singh Bisht, Danish Sharma, Varun Singla, Shubh Garg

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

学术论文 ArXiv 重要度: 8

发布OfficeQA Pro基准，评估AI智能体对大型异构文档库的端到端推理能力。前沿LLM在直接访问文档时平均准确率仅34.1%，显示企业级推理仍具挑战。

👨‍🔬 Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins, Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen Oertell, Jacob Portes, Sam Havens, Erich Elsen, Michael Bendersky, Matei Zaharia, Xing Chen

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

学术论文 ArXiv 重要度: 7

提出一种分层联邦学习架构的联合优化方法，考虑模型分割层和客户端分配对精度、延迟和开销的影响，在公共数据集上实现精度提升3%、延迟降低20%、开销减少50%。

👨‍🔬 Yiannis Papageorgiou, Yannis Thomas, Ramin Khalili, Iordanis Koutsopoulos

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

学术论文 ArXiv 重要度: 7

提出CoCo框架，将思维链推理过程表示为可执行代码，生成确定性草稿图像后进行细化，在多个基准上显著优于直接生成和其他CoT方法。

👨‍🔬 Haodong Li, Chunmei Qing, Huanyu Zhang, Dongzhi Jiang, Yihang Zou, Hongbo Peng, Dingming Li, Yuhong Dai, ZePeng Lin, Juanxi Tian, Yi Zhou, Siqi Dai, Jingwei Wu

UNBOX: Unveiling Black-box visual models with Natural-language

学术论文 ArXiv 重要度: 7

提出UNBOX框架，在完全数据、梯度和反向传播不可知的严格黑盒约束下，利用LLM和扩散模型生成可解释的文本描述符，揭示模型隐含学习的概念和潜在偏见。

👨‍🔬 Simone Carnemolla, Chiara Russo, Simone Palazzo, Quentin Bouniot, Daniela Giordano, Zeynep Akata, Matteo Pennisi, Concetto Spampinato

A Multi-Objective Optimization Approach for Sustainable AI-Driven Entrepreneurship in Resilient Economies

学术论文 ArXiv 重要度: 6

提出EcoAI-Resilience框架，通过多目标优化最大化AI部署的可持续性效益，同时最小化环境成本并增强经济韧性，实验验证显示其性能显著优于基线方法。

👨‍🔬 Anas ALsobeh, Raneem Alkurdi

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

学术论文 ArXiv 重要度: 6

引入PostTrainBench基准，评估LLM智能体在有限计算约束下自动化LLM后训练的能力。前沿智能体取得进展但整体落后于官方指令调优模型，并观察到奖励黑客等失败模式。

👨‍🔬 Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym Andriushchenko

Weakly Supervised Teacher-Student Framework with Progressive Pseudo-mask Refinement for Gland Segmentation

学术论文 ArXiv 重要度: 6

提出弱监督师生框架，利用稀疏病理学家标注和指数移动平均稳定的教师网络生成精炼伪掩码，用于结直肠癌组织腺体分割，实现高精度且具泛化性。

👨‍🔬 Hikmat Khan, Wei Chen, Muhammad Khalid Khan Niazi

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

学术论文 ArXiv 重要度: 5

基准测试LM在无损音频压缩中的应用，提出Trilobyte字节级分词方案，首次实现可处理的24位LM无损压缩，在8位和16位音频上性能优于FLAC。

👨‍🔬 Phillip Long, Zachary Novack, Chris Donahue

A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade using AI-Guided Evolutionary Search

学术论文 ArXiv 重要度: 5

利用AI引导的进化搜索框架AlphaEvolve，在双边交易中发现新的最坏情况实例，将随机报价机制相对于最优效率的近似比下界提升至2.0749。

👨‍🔬 Yang Cai, Vineet Gupta, Zun Li, Aranyak Mehta

🤖 AI资讯日报

📊 今日趋势总结

Why Boring Businesses Outlast AI Hype Cycles

Ask HN: What's the pain using current AI algorithms?

Ask HN: Anyone concerned about NYC Local Law 144?

NLP, AI, ML, bots – a passing trend or much more? What's your take on this?

Ask HN: Is the rate of progress in AI exponential?

MIT Non-AI License

The AI Crackpot Index

Ask HN: What would you read to learn about "artificial intelligence"?

The Next Bill Gates or Albert Einstein in AI “Chris Clark” – Yourobot

Common Lisp + Machine Learning Internship at Google (Mountain View, CA)

Show HN: Startup Raising capital through Book Sales

Bioinformatician

Agentic Critical Training

Scale Space Diffusion

Evaluating Financial Intelligence in Large Language Models: Benchmarking SuperInvesting AI with LLM Engines

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

UNBOX: Unveiling Black-box visual models with Natural-language

A Multi-Objective Optimization Approach for Sustainable AI-Driven Entrepreneurship in Resilient Economies

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

Weakly Supervised Teacher-Student Framework with Progressive Pseudo-mask Refinement for Gland Segmentation

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade using AI-Guided Evolutionary Search

📅 历史日报目录