行业动态
Hacker News
重要度: 9
探讨务实企业如何在AI炒作周期中持久发展
行业动态
Hacker News
重要度: 8
讨论当前AI算法使用中的痛点和挑战
行业动态
Hacker News
重要度: 8
探讨AI发展速度是否呈指数级增长
行业动态
Hacker News
重要度: 7
AI领域不靠谱言论索引,识别行业泡沫
行业动态
Hacker News
重要度: 7
NLP, AI, ML, bots – a passing trend or much more? What's your take on this?
行业动态
Hacker News
重要度: 7
讨论纽约地方法律144对AI行业的影响和担忧
行业动态
Hacker News
重要度: 6
谷歌Common Lisp与机器学习实习机会
行业动态
Hacker News
重要度: 6
寻求AI学习资源推荐
行业动态
Hacker News
重要度: 6
计算机科学背景初学者询问AI入门预期
行业动态
Hacker News
重要度: 5
生物信息学职位机会
行业动态
Hacker News
重要度: 4
初创公司通过图书销售筹集资金
行业动态
Hacker News
重要度: 3
介绍AI领域的潜在领军人物Chris Clark
学术论文
ArXiv
重要度: 9
提出潜在去噪扩散桥模型,实现任意模态间转换,无需对齐维度,在多种跨模态任务中表现优异。
👨🔬 Nimrod Berman, Omkar Joglekar, Eitan Kosman, Dotan Di Castro, Omri Azencot
学术论文
ArXiv
重要度: 8
A fundamental challenge in robot navigation lies in learning policies that
generalize across diverse environments while conforming to the unique physical
constraints and capabilities of a specific embodiment (e.g., quadrupeds can
walk up stairs, but rovers cannot). We propose VAMOS, a hierarchical VLA that
decouples semantic planning from embodiment grounding: a generalist planner
learns from diverse, open-world data, while a specialist affordance model
learns the robot's physical constraints and capabilities in safe, low-cost
simulation. We enabled this separation by carefully designing an interface that
lets a high-level planner propose candidate paths directly in image space that
the affordance model then evaluates and re-ranks. Our real-world experiments
show that VAMOS achieves higher success rates in both indoor and complex
outdoor navigation than state-of-the-art model-based and end-to-end learning
methods. We also show that our hierarchical design enables cross-embodied
navigation across legged and wheeled robots and is easily steerable using
natural language. Real-world ablations confirm that the specialist model is key
to embodiment grounding, enabling a single high-level planner to be deployed
across physically distinct wheeled and legged robots. Finally, this model
significantly enhances single-robot reliability, achieving 3X higher success
rates by rejecting physically infeasible plans. Website:
https://vamos-vla.github.io/
👨🔬 Mateo Guaman Castro, Sidharth Rajagopal, Daniel Gorbatov, Matt Schmittle, Rohan Baijal, Octi Zhang, Rosario Scalise, Sidharth Talia, Emma Romig, Celso de Melo, Byron Boots, Abhishek Gupta
学术论文
ArXiv
重要度: 8
This paper presents GSWorld, a robust, photo-realistic simulator for robotics
manipulation that combines 3D Gaussian Splatting with physics engines. Our
framework advocates "closing the loop" of developing manipulation policies with
reproducible evaluation of policies learned from real-robot data and sim2real
policy training without using real robots. To enable photo-realistic rendering
of diverse scenes, we propose a new asset format, which we term GSDF (Gaussian
Scene Description File), that infuses Gaussian-on-Mesh representation with
robot URDF and other objects. With a streamlined reconstruction pipeline, we
curate a database of GSDF that contains 3 robot embodiments for single-arm and
bimanual manipulation, as well as more than 40 objects. Combining GSDF with
physics engines, we demonstrate several immediate interesting applications: (1)
learning zero-shot sim2real pixel-to-action manipulation policy with
photo-realistic rendering, (2) automated high-quality DAgger data collection
for adapting policies to deployment environments, (3) reproducible benchmarking
of real-robot manipulation policies in simulation, (4) simulation data
collection by virtual teleoperation, and (5) zero-shot sim2real visual
reinforcement learning. Website: https://3dgsworld.github.io/.
👨🔬 Guangqi Jiang, Haoran Chang, Ri-Zhao Qiu, Yutong Liang, Mazeyu Ji, Jiyue Zhu, Zhao Dong, Xueyan Zou, Xiaolong Wang
学术论文
ArXiv
重要度: 7
Large Vision-Language Models (VLMs) have achieved remarkable progress in
multimodal understanding, yet they struggle when reasoning over
information-intensive images that densely interleave textual annotations with
fine-grained graphical elements. The main challenges lie in precisely
localizing critical cues in dense layouts and multi-hop reasoning to integrate
dispersed evidence. We propose Speculative Verdict (SV), a training-free
framework inspired by speculative decoding that combines multiple lightweight
draft experts with a large verdict model. In the draft stage, small VLMs act as
draft experts to generate reasoning paths that provide diverse localization
candidates; in the verdict stage, a strong VLM synthesizes these paths to
produce the final answer, minimizing computational cost while recovering
correct answers. To further improve efficiency and accuracy, SV introduces a
consensus expert selection mechanism that forwards only high-agreement
reasoning paths to the verdict. Empirically, SV achieves consistent gains on
challenging information-intensive and high-resolution visual question answering
benchmarks, including InfographicVQA, ChartMuseum, ChartQAPro, and HR-Bench 4K.
By synthesizing correct insights from multiple partially accurate reasoning
paths, SV achieves both error correction and cost-efficiency compared to large
proprietary models or training pipelines. Code is available at
https://github.com/Tinaliu0123/speculative-verdict
👨🔬 Yuhan Liu, Lianhui Qin, Shengjie Wang
学术论文
ArXiv
重要度: 7
Machine learning has facilitated significant advancements across various
robotics domains, including navigation, locomotion, and manipulation. Many such
achievements have been driven by the extensive use of simulation as a critical
tool for training and testing robotic systems prior to their deployment in
real-world environments. However, simulations consist of abstractions and
approximations that inevitably introduce discrepancies between simulated and
real environments, known as the reality gap. These discrepancies significantly
hinder the successful transfer of systems from simulation to the real world.
Closing this gap remains one of the most pressing challenges in robotics.
Recent advances in sim-to-real transfer have demonstrated promising results
across various platforms, including locomotion, navigation, and manipulation.
By leveraging techniques such as domain randomization, real-to-sim transfer,
state and action abstractions, and sim-real co-training, many works have
overcome the reality gap. However, challenges persist, and a deeper
understanding of the reality gap's root causes and solutions is necessary. In
this survey, we present a comprehensive overview of the sim-to-real landscape,
highlighting the causes, solutions, and evaluation metrics for the reality gap
and sim-to-real transfer.
👨🔬 Elie Aljalbout, Jiaxu Xing, Angel Romero, Iretiayo Akinola, Caelan Reed Garrett, Eric Heiden, Abhishek Gupta, Tucker Hermans, Yashraj Narang, Dieter Fox, Davide Scaramuzza, Fabio Ramos
学术论文
ArXiv
重要度: 7
Recently, Sharma et al. suggested a method called Layer-SElective-Rank
reduction (LASER) which demonstrated that pruning high-order components of
carefully chosen LLM's weight matrices can boost downstream accuracy -- without
any gradient-based fine-tuning. Yet LASER's exhaustive, per-matrix search (each
requiring full-dataset forward passes) makes it impractical for rapid
deployment. We demonstrate that this overhead can be removed and find that: (i)
Only a small, carefully chosen subset of matrices needs to be inspected --
eliminating the layer-by-layer sweep, (ii) The gradient of each matrix's
singular values pinpoints which matrices merit reduction, (iii) Increasing the
factorization search space by allowing matrices rows to cluster around multiple
subspaces and then decomposing each cluster separately further reduces
overfitting on the original training data and further lifts accuracy by up to
24.6 percentage points, and finally, (iv) we discover that evaluating on just
100 samples rather than the full training data -- both for computing the
indicative gradients and for measuring the final accuracy -- suffices to
further reduce the search time; we explain that as adaptation to downstream
tasks is dominated by prompting style, not dataset size. As a result, we show
that combining these findings yields a fast and robust adaptation algorithm for
downstream tasks. Overall, with a single gradient step on 100 examples and a
quick scan of the top candidate layers and factorization techniques, we can
adapt LLMs to new datasets -- entirely without fine-tuning.
👨🔬 Shiva Sreeram, Alaa Maalouf, Pratyusha Sharma, Daniela Rus
学术论文
ArXiv
重要度: 7
Recent work by \citet{hendrycks2025agidefinition} formalized
\textit{Artificial General Intelligence} (AGI) as the arithmetic mean of
proficiencies across cognitive domains derived from the Cattell--Horn--Carroll
(CHC) model of human cognition. While elegant, this definition assumes
\textit{compensability} -- that exceptional ability in some domains can offset
failure in others. True general intelligence, however, should reflect
\textit{coherent sufficiency}: balanced competence across all essential
domains. We propose a coherence-aware measure of AGI based on the integral of
generalized means over a continuum of compensability exponents. This
formulation spans arithmetic, geometric, and harmonic regimes, and the
resulting \textit{area under the curve} (AUC) quantifies robustness under
varying compensability assumptions. Unlike the arithmetic mean, which rewards
specialization, the AUC penalizes imbalance and captures inter-domain
dependency. Applied to published CHC-based domain scores for GPT-4 and GPT-5,
the coherence-adjusted AUC reveals that both systems remain far from general
competence despite high arithmetic scores (e.g., GPT-5 at~24\%). Integrating
the generalized mean thus yields a principled, interpretable, and stricter
foundation for measuring genuine progress toward AGI.
👨🔬 Fares Fourati
学术论文
ArXiv
重要度: 6
With the widespread use of large language models (LLMs), many researchers
have turned their attention to detecting text generated by them. However, there
is no consistent or precise definition of their target, namely "LLM-generated
text". Differences in usage scenarios and the diversity of LLMs further
increase the difficulty of detection. What is commonly regarded as the
detecting target usually represents only a subset of the text that LLMs can
potentially produce. Human edits to LLM outputs, together with the subtle
influences that LLMs exert on their users, are blurring the line between
LLM-generated and human-written text. Existing benchmarks and evaluation
approaches do not adequately address the various conditions in real-world
detector applications. Hence, the numerical results of detectors are often
misunderstood, and their significance is diminishing. Therefore, detectors
remain useful under specific conditions, but their results should be
interpreted only as references rather than decisive indicators.
👨🔬 Mingmeng Geng, Thierry Poibeau
学术论文
ArXiv
重要度: 6
With the rapid growth of research in AI and robotics now producing over
10,000 papers annually it has become increasingly difficult for researchers to
stay up to date. Fast evolving trends, the rise of interdisciplinary work, and
the need to explore domains beyond one's expertise all contribute to this
challenge. To address these issues, we propose a generalizable pipeline capable
of systematically analyzing any research area: identifying emerging trends,
uncovering cross domain opportunities, and offering concrete starting points
for new inquiry. In this work, we present Real Deep Research (RDR) a
comprehensive framework applied to the domains of AI and robotics, with a
particular focus on foundation models and robotics advancements. We also
briefly extend our analysis to other areas of science. The main paper details
the construction of the RDR pipeline, while the appendix provides extensive
results across each analyzed topic. We hope this work sheds light for
researchers working in the field of AI and beyond.
👨🔬 Xueyan Zou, Jianglong Ye, Hao Zhang, Xiaoyu Xiang, Mingyu Ding, Zhaojing Yang, Yong Jae Lee, Zhuowen Tu, Sifei Liu, Xiaolong Wang
学术论文
ArXiv
重要度: 6
A common strategy to reduce the computational costs of using long contexts in
retrieval-augmented generation (RAG) with large language models (LLMs) is soft
context compression, where the input sequence is transformed into a shorter
continuous representation. We develop a lightweight and simple mean-pooling
approach that consistently outperforms the widely used compression-tokens
architecture, and study training the same compressor to output multiple
compression ratios. We conduct extensive experiments across in-domain and
out-of-domain QA datasets, as well as across model families, scales, and
compression ratios. Overall, our simple mean-pooling approach achieves the
strongest performance, with a relatively small drop when training for multiple
compression ratios. More broadly though, across architectures and training
regimes the trade-offs are more nuanced, illustrating the complex landscape of
compression methods.
👨🔬 Yair Feldman, Yoav Artzi
学术论文
ArXiv
重要度: 5
Deep learning has emerged as a transformative methodology in modern
cosmology, providing powerful tools to extract meaningful physical information
from complex astronomical datasets. This paper implements a novel Bayesian
graph deep learning framework for estimating key cosmological parameters in a
primordial magnetic field (PMF) cosmology directly from simulated Cosmic
Microwave Background (CMB) maps. Our methodology utilizes DeepSphere, a
spherical convolutional neural network architecture specifically designed to
respect the spherical geometry of CMB data through HEALPix pixelization. To
advance beyond deterministic point estimates and enable robust uncertainty
quantification, we integrate Bayesian Neural Networks (BNNs) into the
framework, capturing aleatoric and epistemic uncertainties that reflect the
model confidence in its predictions. The proposed approach demonstrates
exceptional performance, achieving $R^{2}$ scores exceeding 0.89 for the
magnetic parameter estimation. We further obtain well-calibrated uncertainty
estimates through post-hoc training techniques including Variance Scaling and
GPNormal. This integrated DeepSphere-BNNs framework not only delivers accurate
parameter estimation from CMB maps with PMF contributions but also provides
reliable uncertainty quantification, providing the necessary tools for robust
cosmological inference in the era of precision cosmology.
👨🔬 Juan Alejandro Pinto Castro, Héctor J. Hortúa, Jorge Enrique García-Farieta, Roger Anderson Hurtado
学术论文
ArXiv
重要度: 5
Current methods for evaluating large language models (LLMs) typically focus
on high-level tasks such as text generation, without targeting a particular AI
application. This approach is not sufficient for evaluating LLMs for
Responsible AI dimensions like fairness, since protected attributes that are
highly relevant in one application may be less relevant in another. In this
work, we construct a dataset that is driven by a real-world application
(generate a plain-text product description, given a list of product features),
parameterized by fairness attributes intersected with gendered adjectives and
product categories, yielding a rich set of labeled prompts. We show how to use
the data to identify quality, veracity, safety, and fairness gaps in LLMs,
contributing a proposal for LLM evaluation paired with a concrete resource for
the research community.
👨🔬 Alicia Sagae, Chia-Jung Lee, Sandeep Avula, Brandon Dang, Vanessa Murdock