行业动态
Hacker News
重要度: 8
探讨AI进步速度是否呈指数级增长。
行业动态
Hacker News
重要度: 7
探讨当前AI算法的使用痛点。
行业动态
Hacker News
重要度: 7
讨论NLP、AI、ML和机器人是短暂趋势还是更深远的变革。
行业动态
Hacker News
重要度: 6
谷歌提供Common Lisp与机器学习实习机会。
行业动态
Hacker News
重要度: 6
初学者探讨涉足AI领域的预期。
行业动态
Hacker News
重要度: 6
提供比AWS便宜50%的GPU云计算资源。
行业动态
Hacker News
重要度: 5
AI Crackpot指数,探讨AI领域的非主流观点。
行业动态
Hacker News
重要度: 5
探讨AI领域的下一个比尔·盖茨或爱因斯坦。
行业动态
Hacker News
重要度: 5
探讨计算机科学研究生教育的看法。
行业动态
Hacker News
重要度: 4
讨论对纽约市地方法律144号的关注。
行业动态
Hacker News
重要度: 4
生物信息学家的职业机会。
行业动态
Hacker News
重要度: 3
初创公司通过书籍销售筹集资金。
开源项目
GitHub
重要度: 10
20+高性能LLMs及预训练、微调和部署方案
⭐ 12452 stars
开源项目
GitHub
重要度: 9
自动驾驶研究的开源模拟器
⭐ 12693 stars
开源项目
GitHub
重要度: 9
CVPR论文/代码/解读合集
⭐ 12500 stars
开源项目
GitHub
重要度: 8
DataTalks.Club提供的免费MLOps课程
⭐ 12966 stars
开源项目
GitHub
重要度: 8
支持Android、iOS等的离线语音识别API
⭐ 12636 stars
开源项目
GitHub
重要度: 7
使用CLIP进行图像和句子的可扩展嵌入、推理和排名
⭐ 12697 stars
开源项目
GitHub
重要度: 7
能够将照片转换为绘画、马变斑马等的软件
⭐ 12690 stars
开源项目
GitHub
重要度: 7
PyTorch教程和有趣项目,包括神经对话和风格转换
⭐ 12536 stars
开源项目
GitHub
重要度: 7
面向开发者和ML工程师的机器学习服务器
⭐ 12528 stars
开源项目
GitHub
重要度: 7
现代3D数据处理库
⭐ 12521 stars
开源项目
GitHub
重要度: 6
机器学习张量库
⭐ 12793 stars
开源项目
GitHub
重要度: 6
深度学习、强化学习等领域的精选资源
⭐ 12607 stars
学术论文
ArXiv
重要度: 9
提出MemoryAgentBench,评估LLM代理的记忆能力,涵盖四个核心能力。
👨🔬 Yuanzhe Hu, Yu Wang, Julian McAuley
学术论文
ArXiv
重要度: 9
The rapid advancements of AI agents have ignited the long-held ambition of
leveraging them to accelerate scientific discovery. Achieving this goal
requires a deep understanding of the frontiers of human knowledge. As such,
Humanity's Last Exam (HLE) provides an exceptionally challenging touchstone for
evaluating scientific AI agents. In this work, we aim to construct the
foundational architecture for general-purpose agents and validate the
capabilities through leading performance on HLE. To achieve this, we introduce
X-Master, a tool-augmented reasoning agent designed to emulate human
researchers by interacting flexibly with external tools during its reasoning
process. This agent, guided by the conceptualization of code as an interaction
language, can flexibly leverage built-in Python libraries and our customized
tools to augment the reasoning. We further scale its capabilities through
X-Masters, a scattered-and-stacked agentic workflow that systematically
enhances breadth and depth of reasoning. Our open-source solution, X-Masters,
sets a new state-of-the-art record on HLE with a score of 32.1%, surpassing
OpenAI's and Google's Deep Research (26.6% and 26.9%) and becoming the first to
exceed the 30% threshold. This work allows us to gain a deeper understanding of
complex task-solving and accumulates valuable experience that can inform future
advancements, guiding subsequent model training.
👨🔬 Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xinyu Zhu, Mengcheng Zhou, Yanfeng Wang, Weinan E, Siheng Chen
学术论文
ArXiv
重要度: 8
系统研究联合运动预测方法,评估预测准确性、多模态性和推理效率。
👨🔬 Fabian Konstantinidis, Ariel Dallari Guerreiro, Raphael Trumpp, Moritz Sackmann, Ulrich Hofmann, Marco Caccamo, Christoph Stiller
学术论文
ArXiv
重要度: 8
介绍两种动作空间缩减策略,提高自动驾驶中强化学习的训练效率和策略性能。
👨🔬 Elahe Delavari, Feeza Khan Khanzada, Jaerock Kwon
学术论文
ArXiv
重要度: 8
Unified segmentation of 3D point clouds is crucial for scene understanding,
but is hindered by its sparse structure, limited annotations, and the challenge
of distinguishing fine-grained object classes in complex environments. Existing
methods often struggle to capture rich semantic and contextual information due
to limited supervision and a lack of diverse multimodal cues, leading to
suboptimal differentiation of classes and instances. To address these
challenges, we propose VDG-Uni3DSeg, a novel framework that integrates
pre-trained vision-language models (e.g., CLIP) and large language models
(LLMs) to enhance 3D segmentation. By leveraging LLM-generated textual
descriptions and reference images from the internet, our method incorporates
rich multimodal cues, facilitating fine-grained class and instance separation.
We further design a Semantic-Visual Contrastive Loss to align point features
with multimodal queries and a Spatial Enhanced Module to model scene-wide
relationships efficiently. Operating within a closed-set paradigm that utilizes
multimodal knowledge generated offline, VDG-Uni3DSeg achieves state-of-the-art
results in semantic, instance, and panoptic segmentation, offering a scalable
and practical solution for 3D understanding. Our code is available at
https://github.com/Hanzy1996/VDG-Uni3DSeg.
👨🔬 Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong, Jinhong Wang, Rao Muhammad Anwer
学术论文
ArXiv
重要度: 8
Artificial intelligence (AI) has significant potential in healthcare
applications, but its training and deployment faces challenges due to
healthcare's diverse data, complex tasks, and the need to preserve privacy.
Foundation models that perform well on medical tasks and require less
task-specific tuning data are critical to accelerate the development of
healthcare AI applications. We introduce MedGemma, a collection of medical
vision-language foundation models based on Gemma 3 4B and 27B. MedGemma
demonstrates advanced medical understanding and reasoning on images and text,
significantly exceeding the performance of similar-sized generative models and
approaching the performance of task-specific models, while maintaining the
general capabilities of the Gemma 3 base models. For out-of-distribution tasks,
MedGemma achieves 2.6-10% improvement on medical multimodal question answering,
15.5-18.1% improvement on chest X-ray finding classification, and 10.8%
improvement on agentic evaluations compared to the base models. Fine-tuning
MedGemma further improves performance in subdomains, reducing errors in
electronic health record information retrieval by 50% and reaching comparable
performance to existing specialized state-of-the-art methods for pneumothorax
classification and histopathology patch classification. We additionally
introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
MedSigLIP powers the visual understanding capabilities of MedGemma and as an
encoder achieves comparable or better performance than specialized medical
image encoders. Taken together, the MedGemma collection provides a strong
foundation of medical image and text capabilities, with potential to
significantly accelerate medical research and development of downstream
applications. The MedGemma collection, including tutorials and model weights,
can be found at https://goo.gle/medgemma.
👨🔬 Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang
学术论文
ArXiv
重要度: 8
The rapid advancement of Embodied AI has led to an increasing demand for
large-scale, high-quality real-world data. However, collecting such embodied
data remains costly and inefficient. As a result, simulation environments have
become a crucial surrogate for training robot policies. Yet, the significant
Real2Sim2Real gap remains a critical bottleneck, particularly in terms of
physical dynamics and visual appearance. To address this challenge, we propose
EmbodieDreamer, a novel framework that reduces the Real2Sim2Real gap from both
the physics and appearance perspectives. Specifically, we propose PhysAligner,
a differentiable physics module designed to reduce the Real2Sim physical gap.
It jointly optimizes robot-specific parameters such as control gains and
friction coefficients to better align simulated dynamics with real-world
observations. In addition, we introduce VisAligner, which incorporates a
conditional video diffusion model to bridge the Sim2Real appearance gap by
translating low-fidelity simulated renderings into photorealistic videos
conditioned on simulation states, enabling high-fidelity visual transfer.
Extensive experiments validate the effectiveness of EmbodieDreamer. The
proposed PhysAligner reduces physical parameter estimation error by 3.74%
compared to simulated annealing methods while improving optimization speed by
89.91\%. Moreover, training robot policies in the generated photorealistic
environment leads to a 29.17% improvement in the average task success rate
across real-world tasks after reinforcement learning. Code, model and data will
be publicly available.
👨🔬 Boyuan Wang, Xinpan Meng, Xiaofeng Wang, Zheng Zhu, Angen Ye, Yang Wang, Zhiqin Yang, Chaojun Ni, Guan Huang, Xingang Wang
学术论文
ArXiv
重要度: 8
The proliferation of AI-driven systems presents a fundamental challenge to
Human-Computer Interaction (HCI) and Computer-Supported Cooperative Work
(CSCW), often diminishing user agency and failing to account for value
pluralism. Current approaches to value alignment, which rely on centralized,
top-down definitions, lack the mechanisms for meaningful contestability. This
leaves users and communities unable to challenge or shape the values embedded
in the systems that govern their digital lives, creating a crisis of legitimacy
and trust. This paper introduces Community-Defined AI Value Pluralism (CDAVP),
a socio-technical framework that addresses this gap. It reframes the design
problem from achieving a single aligned state to infrastructuring a dynamic
ecosystem for value deliberation and application. At its core, CDAVP enables
diverse, self-organizing communities to define and maintain explicit value
profiles - rich, machine-readable representations that can encompass not only
preferences but also community-specific rights and duties. These profiles are
then contextually activated by the end-user, who retains ultimate control
(agency) over which values guide the AI's behavior. AI applications, in turn,
are designed to transparently interpret these profiles and moderate conflicts,
adhering to a set of non-negotiable, democratically-legitimated meta-rules. The
designer's role shifts from crafting static interfaces to becoming an architect
of participatory ecosystems. We argue that infrastructuring for pluralism is a
necessary pathway toward achieving robust algorithmic accountability and
genuinely contestable, human-centric AI.
👨🔬 Andreas Mayer
学术论文
ArXiv
重要度: 7
While chain-of-thought (CoT) monitoring is an appealing AI safety defense,
recent work on "unfaithfulness" has cast doubt on its reliability. These
findings highlight an important failure mode, particularly when CoT acts as a
post-hoc rationalization in applications like auditing for bias. However, for
the distinct problem of runtime monitoring to prevent severe harm, we argue the
key property is not faithfulness but monitorability. To this end, we introduce
a conceptual framework distinguishing CoT-as-rationalization from
CoT-as-computation. We expect that certain classes of severe harm will require
complex, multi-step reasoning that necessitates CoT-as-computation. Replicating
the experimental setups of prior work, we increase the difficulty of the bad
behavior to enforce this necessity condition; this forces the model to expose
its reasoning, making it monitorable. We then present methodology guidelines to
stress-test CoT monitoring against deliberate evasion. Applying these
guidelines, we find that models can learn to obscure their intentions, but only
when given significant help, such as detailed human-written strategies or
iterative optimization against the monitor. We conclude that, while not
infallible, CoT monitoring offers a substantial layer of defense that requires
active protection and continued stress-testing.
👨🔬 Scott Emmons, Erik Jenner, David K. Elson, Rif A. Saurous, Senthooran Rajamanoharan, Heng Chen, Irhum Shafkat, Rohin Shah
学术论文
ArXiv
重要度: 7
In collaborative tasks, being able to adapt to your teammates is a necessary
requirement for success. When teammates are heterogeneous, such as in
human-agent teams, agents need to be able to observe, recognize, and adapt to
their human partners in real time. This becomes particularly challenging in
tasks with time pressure and complex strategic spaces where the dynamics can
change rapidly. In this work, we introduce TALENTS, a strategy-conditioned
cooperator framework that learns to represent, categorize, and adapt to a range
of partner strategies, enabling ad-hoc teamwork. Our approach utilizes a
variational autoencoder to learn a latent strategy space from trajectory data.
This latent space represents the underlying strategies that agents employ.
Subsequently, the system identifies different types of strategy by clustering
the data. Finally, a cooperator agent is trained to generate partners for each
type of strategy, conditioned on these clusters. In order to adapt to
previously unseen partners, we leverage a fixed-share regret minimization
algorithm that infers and adjusts the estimated partner strategy dynamically.
We assess our approach in a customized version of the Overcooked environment,
posing a challenging cooperative cooking task that demands strong coordination
across a wide range of possible strategies. Using an online user study, we show
that our agent outperforms current baselines when working with unfamiliar human
partners.
👨🔬 Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Michael Lewis, Katia Sycara, Simon Stepputtis
学术论文
ArXiv
重要度: 7
Deep learning models have demonstrated exceptional performance across a wide
range of computer vision tasks. However, their performance often degrades
significantly when faced with distribution shifts, such as domain or dataset
changes. Test-Time Training (TTT) has emerged as an effective method to enhance
model robustness by incorporating an auxiliary unsupervised task during
training and leveraging it for model updates at test time. In this work, we
introduce CTA (Cross-Task Alignment), a novel approach for improving TTT.
Unlike existing TTT methods, CTA does not require a specialized model
architecture and instead takes inspiration from the success of multi-modal
contrastive learning to align a supervised encoder with a self-supervised one.
This process enforces alignment between the learned representations of both
models, thereby mitigating the risk of gradient interference, preserving the
intrinsic robustness of self-supervised learning and enabling more semantically
meaningful updates at test-time. Experimental results demonstrate substantial
improvements in robustness and generalization over the state-of-the-art on
several benchmark datasets.
👨🔬 Samuel Barbeau, Pedram Fekri, David Osowiechi, Ali Bahri, Moslem YazdanpanahMasih Aminbeidokhti, Christian Desrosiers
学术论文
ArXiv
重要度: 7
Existing language model benchmarks provide contradictory model rankings, even
for benchmarks that aim to capture similar skills. This dilemma of conflicting
rankings hampers model selection, clouds model comparisons, and adds confusion
to a growing ecosystem of competing models. Recent work attributed ranking
disagreement to the phenomenon of training on the test task: As released,
different models exhibit a different level of preparation for any given test
task. A candidate solution to the problem is train-before-test: Give each model
the same benchmark-specific finetuning before evaluation. Our primary
contribution is a broad empirical evaluation of train-before-test across 24
benchmarks and 61 models. We show that train-before-test significantly improves
ranking agreement consistently across all benchmarks. Whereas rankings have
little external validity to start with, they enjoy a significant degree of
external validity when applying train-before-test: Model rankings transfer
gracefully from one benchmark to the other. Even within the same model family,
train-before-test reduces strong ranking disagreement to near-perfect
agreement. In addition, train-before-test reduces the model-score matrix to
essentially rank one, revealing new insights into the latent factors of
benchmark performance. Our work supports the recommendation to make
train-before-test a default component of LLM benchmarking.
👨🔬 Guanhua Zhang, Ricardo Dominguez-Olmedo, Moritz Hardt