行业动态
Hacker News
重要度: 9
探讨当前AI算法使用中的痛点。
行业动态
Hacker News
重要度: 8
讨论AI进步速度是否呈指数级增长。
行业动态
Hacker News
重要度: 7
询问对纽约市地方法律144的担忧。
行业动态
Hacker News
重要度: 7
探讨NLP、AI、ML和机器人是短暂趋势还是更深远的变革。
行业动态
Hacker News
重要度: 6
谷歌提供Common Lisp和机器学习实习机会。
行业动态
Hacker News
重要度: 6
提供比AWS便宜50%的GPU云计算解决方案。
行业动态
Hacker News
重要度: 5
介绍AI领域的潜在领军人物Chris Clark。
行业动态
Hacker News
重要度: 5
生物信息学家的职业机会。
行业动态
Hacker News
重要度: 4
初学者询问涉足AI领域的预期。
行业动态
Hacker News
重要度: 4
讨论计算机科学研究生教育的看法。
行业动态
Hacker News
重要度: 3
AI领域的非主流观点索引。
行业动态
Hacker News
重要度: 2
初创公司通过书籍销售筹集资金。
开源项目
GitHub
重要度: 9
VITS:带有对抗学习的条件变分自编码器,用于端到端文本到语音转换。
⭐ 7587 stars
开源项目
GitHub
重要度: 8
DoWhy是一个用于因果推理的Python库,支持明确建模和测试因果假设。
⭐ 7634 stars
开源项目
GitHub
重要度: 8
BertViz:可视化NLP模型中的注意力机制(BERT, GPT2, BART等)。
⭐ 7581 stars
开源项目
GitHub
重要度: 8
高质量的单文件实现深度强化学习算法,具有研究友好特性(PPO, DQN, C51等)。
⭐ 7577 stars
开源项目
GitHub
重要度: 7
此仓库伴随书籍“Grokking Deep Learning”。
⭐ 7619 stars
开源项目
GitHub
重要度: 7
BoxMOT:为分割、目标检测和姿态估计模型提供可插拔的最先进多目标跟踪模块。
⭐ 7579 stars
开源项目
GitHub
重要度: 7
斯坦福NLP Python库,用于多种人类语言的标记化、句子分割、NER和解析。
⭐ 7551 stars
开源项目
GitHub
重要度: 6
这是Siraj Raval在Youtube上发布的“3个月学习机器学习”的代码。
⭐ 7633 stars
开源项目
GitHub
重要度: 6
Pocket Flow:100行代码的LLM框架。让代理构建代理!
⭐ 7561 stars
开源项目
GitHub
重要度: 5
为本科生或任何希望深入AI领域的人提供的详细和定制化资源指南。
⭐ 7551 stars
学术论文
ArXiv
重要度: 9
通过基于LLM的世界模型模拟推理架构构建通用目标导向代理。
👨🔬 Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing
学术论文
ArXiv
重要度: 9
LLMs have demonstrated strong mathematical reasoning abilities by leveraging
reinforcement learning with long chain-of-thought, yet they continue to
struggle with theorem proving due to the lack of clear supervision signals when
solely using natural language. Dedicated domain-specific languages like Lean
provide clear supervision via formal verification of proofs, enabling effective
training through reinforcement learning. In this work, we propose
\textbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover
can iteratively refine its proof based on Lean feedback, proved lemmas, and
self-summarization. To solve IMO-level contest problems, we design three
test-time inference strategies that enable both deep and broad reasoning.
Seed-Prover proves $78.1\%$ of formalized past IMO problems, saturates MiniF2F,
and achieves over 50\% on PutnamBench, outperforming the previous
state-of-the-art by a large margin. To address the lack of geometry support in
Lean, we introduce a geometry reasoning engine \textbf{Seed-Geometry}, which
outperforms previous formal geometry engines. We use these two systems to
participate in IMO 2025 and fully prove 5 out of 6 problems. This work
represents a significant advancement in automated mathematical reasoning,
demonstrating the effectiveness of formal verification with long
chain-of-thought reasoning.
👨🔬 Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zhan, Chi Zhang, Yue Zhang, Ge Zhang, Tianyun Zhao, Jianqiu Zhao, Yichi Zhou, Thomas Hanwen Zhu
学术论文
ArXiv
重要度: 8
通过合成属性替换评估概念瓶颈模型的泛化能力。
👨🔬 Jessica Bader, Leander Girrbach, Stephan Alaniz, Zeynep Akata
学术论文
ArXiv
重要度: 8
Achieving robust cognitive autonomy in robots navigating complex,
unpredictable environments remains a fundamental challenge in robotics. This
paper presents Underwater Robot Self-Organizing Autonomy (UROSA), a
groundbreaking architecture leveraging distributed Large Language Model AI
agents integrated within the Robot Operating System 2 (ROS 2) framework to
enable advanced cognitive capabilities in Autonomous Underwater Vehicles. UROSA
decentralises cognition into specialised AI agents responsible for multimodal
perception, adaptive reasoning, dynamic mission planning, and real-time
decision-making. Central innovations include flexible agents dynamically
adapting their roles, retrieval-augmented generation utilising vector databases
for efficient knowledge management, reinforcement learning-driven behavioural
optimisation, and autonomous on-the-fly ROS 2 node generation for runtime
functional extensibility. Extensive empirical validation demonstrates UROSA's
promising adaptability and reliability through realistic underwater missions in
simulation and real-world deployments, showing significant advantages over
traditional rule-based architectures in handling unforeseen scenarios,
environmental uncertainties, and novel mission objectives. This work not only
advances underwater autonomy but also establishes a scalable, safe, and
versatile cognitive robotics framework capable of generalising to a diverse
array of real-world applications.
👨🔬 Markus Buchholz, Ignacio Carlucho, Michele Grimaldi, Yvan R. Petillot
学术论文
ArXiv
重要度: 7
提升GUI接地模型在计算机使用代理中的感知能力。
👨🔬 Miaosen Zhang, Ziqiang Xu, Jialiang Zhu, Qi Dai, Kai Qiu, Yifan Yang, Chong Luo, Tianyi Chen, Justin Wagle, Tim Franklin, Baining Guo
学术论文
ArXiv
重要度: 7
We propose CoT-Self-Instruct, a synthetic data generation method that
instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on the
given seed tasks, and then to generate a new synthetic prompt of similar
quality and complexity for use in LLM training, followed by filtering for
high-quality data with automatic metrics. In verifiable reasoning, our
synthetic data significantly outperforms existing training datasets, such as
s1k and OpenMathReasoning, across MATH500, AMC23, AIME24 and GPQA-Diamond. For
non-verifiable instruction-following tasks, our method surpasses the
performance of human or standard self-instruct prompts on both AlpacaEval 2.0
and Arena-Hard.
👨🔬 Ping Yu, Jack Lanchantin, Tianlu Wang, Weizhe Yuan, Olga Golovneva, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Jing Xu
学术论文
ArXiv
重要度: 7
While Reinforcement Learning (RL) has achieved remarkable success in language
modeling, its triumph hasn't yet fully translated to visuomotor agents. A
primary challenge in RL models is their tendency to overfit specific tasks or
environments, thereby hindering the acquisition of generalizable behaviors
across diverse settings. This paper provides a preliminary answer to this
challenge by demonstrating that RL-finetuned visuomotor agents in Minecraft can
achieve zero-shot generalization to unseen worlds. Specifically, we explore
RL's potential to enhance generalizable spatial reasoning and interaction
capabilities in 3D worlds. To address challenges in multi-task RL
representation, we analyze and establish cross-view goal specification as a
unified multi-task goal space for visuomotor policies. Furthermore, to overcome
the significant bottleneck of manual task design, we propose automated task
synthesis within the highly customizable Minecraft environment for large-scale
multi-task RL training, and we construct an efficient distributed RL framework
to support this. Experimental results show RL significantly boosts interaction
success rates by $4\times$ and enables zero-shot generalization of spatial
reasoning across diverse environments, including real-world settings. Our
findings underscore the immense potential of RL training in 3D simulated
environments, especially those amenable to large-scale task generation, for
significantly advancing visuomotor agents' spatial reasoning.
👨🔬 Shaofei Cai, Zhancun Mu, Haiwen Xia, Bowei Zhang, Anji Liu, Yitao Liang
学术论文
ArXiv
重要度: 6
The widespread availability of off-the-shelf machine learning models poses a
challenge: which model, of the many available candidates, should be chosen for
a given data analysis task? This question of model selection is traditionally
answered by collecting and annotating a validation dataset -- a costly and
time-intensive process. We propose a method for active model selection, using
predictions from candidate models to prioritize the labeling of test data
points that efficiently differentiate the best candidate. Our method, CODA,
performs consensus-driven active model selection by modeling relationships
between classifiers, categories, and data points within a probabilistic
framework. The framework uses the consensus and disagreement between models in
the candidate pool to guide the label acquisition process, and Bayesian
inference to update beliefs about which model is best as more information is
collected. We validate our approach by curating a collection of 26 benchmark
tasks capturing a range of model selection scenarios. CODA outperforms existing
methods for active model selection significantly, reducing the annotation
effort required to discover the best model by upwards of 70% compared to the
previous state-of-the-art. Code and data are available at
https://github.com/justinkay/coda.
👨🔬 Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, Sara Beery
学术论文
ArXiv
重要度: 6
High-fidelity 3D video reconstruction is essential for enabling real-time
rendering of dynamic scenes with realistic motion in virtual and augmented
reality (VR/AR). The deformation field paradigm of 3D Gaussian splatting has
achieved near-photorealistic results in video reconstruction due to the great
representation capability of deep deformation networks. However, in videos with
complex motion and significant scale variations, deformation networks often
overfit to irregular Gaussian trajectories, leading to suboptimal visual
quality. Moreover, the gradient-based densification strategy designed for
static scene reconstruction proves inadequate to address the absence of dynamic
content. In light of these challenges, we propose a flow-empowered velocity
field modeling scheme tailored for Gaussian video reconstruction, dubbed
FlowGaussian-VR. It consists of two core components: a velocity field rendering
(VFR) pipeline which enables optical flow-based optimization, and a
flow-assisted adaptive densification (FAD) strategy that adjusts the number and
size of Gaussians in dynamic regions. We validate our model's effectiveness on
multi-view dynamic reconstruction and novel view synthesis with multiple
real-world datasets containing challenging motion scenarios, demonstrating not
only notable visual improvements (over 2.5 dB gain in PSNR) and less blurry
artifacts in dynamic textures, but also regularized and trackable per-Gaussian
trajectories.
👨🔬 Zhenyang Li, Xiaoyang Bai, Tongchen Zhang, Pengfei Shen, Weiwei Xu, Yifan Peng
学术论文
ArXiv
重要度: 5
Knowledge graphs (KGs) often contain sufficient information to support the
inference of new facts. Identifying logical rules not only improves the
completeness of a knowledge graph but also enables the detection of potential
errors, reveals subtle data patterns, and enhances the overall capacity for
reasoning and interpretation. However, the complexity of such rules, combined
with the unique labeling conventions of each KG, can make them difficult for
humans to understand. In this paper, we explore the potential of large language
models to generate natural language explanations for logical rules.
Specifically, we extract logical rules using the AMIE 3.5.1 rule discovery
algorithm from the benchmark dataset FB15k-237 and two large-scale datasets,
FB-CVT-REV and FB+CVT-REV. We examine various prompting strategies, including
zero- and few-shot prompting, including variable entity types, and
chain-of-thought reasoning. We conduct a comprehensive human evaluation of the
generated explanations based on correctness, clarity, and hallucination, and
also assess the use of large language models as automatic judges. Our results
demonstrate promising performance in terms of explanation correctness and
clarity, although several challenges remain for future research. All scripts
and data used in this study are publicly available at
https://github.com/idirlab/KGRule2NL}{https://github.com/idirlab/KGRule2NL.
👨🔬 Nasim Shirvani-Mahdavi, Devin Wingfield, Amin Ghasemi, Chengkai Li
学术论文
ArXiv
重要度: 5
Evaluating AI agents within complex, interactive environments that mirror
real-world challenges is critical for understanding their practical
capabilities. While existing agent benchmarks effectively assess skills like
tool use or performance on structured tasks, they often do not fully capture an
agent's ability to operate autonomously in exploratory environments that demand
sustained, self-directed reasoning over a long and growing context. To spur the
development of agents capable of more robust intrinsic reasoning over long
horizons, we introduce TextQuests, a benchmark based on the Infocom suite of
interactive fiction games. These text-based adventures, which can take human
players over 30 hours and require hundreds of precise actions to solve, serve
as an effective proxy for evaluating AI agents on focused, stateful tasks. The
benchmark is specifically designed to assess an LLM agent's capacity for
self-contained problem-solving by precluding the use of external tools, thereby
focusing on intrinsic long-context reasoning capabilities in an exploratory
environment characterized by the need for trial-and-error learning and
sustained problem-solving within a single interactive session. We release
TextQuests at https://textquests.ai.
👨🔬 Long Phan, Mantas Mazeika, Andy Zou, Dan Hendrycks
学术论文
ArXiv
重要度: 4
We provide a comprehensive examination of agent-based approaches that codify
the principles and linkages underlying multi-agent systems, simulations, and
information systems. Based on two decades of study, this paper confirms a
framework intended as a formal specification for geosimulation platforms. Our
findings show that large language models (LLMs) can be effectively incorporated
as agent components if they follow a structured architecture specific to
fundamental agent activities such as perception, memory, planning, and action.
This integration is precisely consistent with the architecture that we
formalize, providing a solid platform for next-generation geosimulation
systems.
👨🔬 Virginia Padilla, Jacinto Dávila