行业动态
Hacker News
重要度: 9
探讨基础业务模式比AI炒作周期更具持久性的原因
行业动态
Hacker News
重要度: 8
讨论当前AI算法使用中的痛点问题
行业动态
Hacker News
重要度: 8
AI领域过度炒作和荒谬言论的索引
行业动态
Hacker News
重要度: 7
Ask HN: Is the rate of progress in AI exponential?
行业动态
Hacker News
重要度: 7
讨论对纽约市第144号地方法律(AI监管)的担忧
行业动态
Hacker News
重要度: 7
探讨NLP、AI、ML和机器人技术是短期趋势还是长期变革
行业动态
Hacker News
重要度: 6
征求学习人工智能的推荐阅读材料
行业动态
Hacker News
重要度: 6
计算机科学专业学生初涉AI领域的期望和建议
行业动态
Hacker News
重要度: 5
谷歌山景城Common Lisp与机器学习实习机会
行业动态
Hacker News
重要度: 4
介绍被誉为AI界下一个比尔·盖茨或爱因斯坦的Chris Clark
行业动态
Hacker News
重要度: 3
展示通过图书销售筹集资金的初创公司
行业动态
Hacker News
重要度: 3
生物信息学相关职位或讨论
学术论文
ArXiv
重要度: 5
The efficiency of large language models (LLMs) is fundamentally limited by
their sequential, token-by-token generation process. We argue that overcoming
this bottleneck requires a new design axis for LLM scaling: increasing the
semantic bandwidth of each generative step. To this end, we introduce
Continuous Autoregressive Language Models (CALM), a paradigm shift from
discrete next-token prediction to continuous next-vector prediction. CALM uses
a high-fidelity autoencoder to compress a chunk of K tokens into a single
continuous vector, from which the original tokens can be reconstructed with
over 99.9\% accuracy. This allows us to model language as a sequence of
continuous vectors instead of discrete tokens, which reduces the number of
generative steps by a factor of K. The paradigm shift necessitates a new
modeling toolkit; therefore, we develop a comprehensive likelihood-free
framework that enables robust training, evaluation, and controllable sampling
in the continuous domain. Experiments show that CALM significantly improves the
performance-compute trade-off, achieving the performance of strong discrete
baselines at a significantly lower computational cost. More importantly, these
findings establish next-vector prediction as a powerful and scalable pathway
towards ultra-efficient language models. Code:
https://github.com/shaochenze/calm. Project:
https://shaochenze.github.io/blog/2025/CALM.
学术论文
ArXiv
重要度: 5
Recent advances in vision-language models (VLMs) have enabled impressive
multimodal reasoning, yet most medical applications remain limited to 2D
imaging. In this work, we extend VLMs to 3D positron emission tomography and
computed tomography (PET/CT), a domain characterized by large volumetric data,
small and dispersed lesions, and lengthy radiology reports. We introduce a
large-scale dataset comprising over 11,000 lesion-level descriptions paired
with 3D segmentations from more than 5,000 PET/CT exams, extracted via a hybrid
rule-based and large language model (LLM) pipeline. Building upon this dataset,
we propose PETAR-4B, a 3D mask-aware vision-language model that integrates PET,
CT, and lesion contours for spatially grounded report generation. PETAR bridges
global contextual reasoning with fine-grained lesion awareness, producing
clinically coherent and localized findings. Comprehensive automated and human
evaluations demonstrate that PETAR substantially improves PET/CT report
generation quality, advancing 3D medical vision-language understanding.
学术论文
ArXiv
重要度: 5
Structure-based drug design (SBDD), which maps target proteins to candidate
molecular ligands, is a fundamental task in drug discovery. Effectively
aligning protein structural representations with molecular representations, and
ensuring alignment between generated drugs and their pharmacological
properties, remains a critical challenge. To address these challenges, we
propose MolChord, which integrates two key techniques: (1) to align protein and
molecule structures with their textual descriptions and sequential
representations (e.g., FASTA for proteins and SMILES for molecules), we
leverage NatureLM, an autoregressive model unifying text, small molecules, and
proteins, as the molecule generator, alongside a diffusion-based structure
encoder; and (2) to guide molecules toward desired properties, we curate a
property-aware dataset by integrating preference data and refine the alignment
process using Direct Preference Optimization (DPO). Experimental results on
CrossDocked2020 demonstrate that our approach achieves state-of-the-art
performance on key evaluation metrics, highlighting its potential as a
practical tool for SBDD.
学术论文
ArXiv
重要度: 5
In the rapidly evolving field of multi-agent reinforcement learning (MARL),
understanding the dynamics of open systems is crucial. Openness in MARL refers
to the dynam-ic nature of agent populations, tasks, and agent types with-in a
system. Specifically, there are three types of openness as reported in (Eck et
al. 2023) [2]: agent openness, where agents can enter or leave the system at
any time; task openness, where new tasks emerge, and existing ones evolve or
disappear; and type openness, where the capabil-ities and behaviors of agents
change over time. This report provides a conceptual and empirical review,
focusing on the interplay between openness and the credit assignment problem
(CAP). CAP involves determining the contribution of individual agents to the
overall system performance, a task that becomes increasingly complex in open
environ-ments. Traditional credit assignment (CA) methods often assume static
agent populations, fixed and pre-defined tasks, and stationary types, making
them inadequate for open systems. We first conduct a conceptual analysis,
in-troducing new sub-categories of openness to detail how events like agent
turnover or task cancellation break the assumptions of environmental
stationarity and fixed team composition that underpin existing CAP methods. We
then present an empirical study using representative temporal and structural
algorithms in an open environment. The results demonstrate that openness
directly causes credit misattribution, evidenced by unstable loss functions and
significant performance degradation.
学术论文
ArXiv
重要度: 5
Feature-attribution methods (e.g., SHAP, LIME) explain individual predictions
but often miss higher-order structure: sets of features that act in concert. We
propose Modules of Influence (MoI), a framework that (i) constructs a model
explanation graph from per-instance attributions, (ii) applies community
detection to find feature modules that jointly affect predictions, and (iii)
quantifies how these modules relate to bias, redundancy, and causality
patterns. Across synthetic and real datasets, MoI uncovers correlated feature
groups, improves model debugging via module-level ablations, and localizes bias
exposure to specific modules. We release stability and synergy metrics, a
reference implementation, and evaluation protocols to benchmark module
discovery in XAI.
学术论文
ArXiv
重要度: 5
Modern deep neural networks (DNNs) are typically trained with a global
cross-entropy loss in a supervised end-to-end manner: neurons need to store
their outgoing weights; training alternates between a forward pass
(computation) and a top-down backward pass (learning) which is biologically
implausible. Alternatively, greedy layer-wise training eliminates the need for
cross-entropy loss and backpropagation. By avoiding the computation of
intermediate gradients and the storage of intermediate outputs, it reduces
memory usage and helps mitigate issues such as vanishing or exploding
gradients. However, most existing layer-wise training approaches have been
evaluated only on relatively small datasets with simple deep architectures. In
this paper, we first systematically analyze the training dynamics of popular
convolutional neural networks (CNNs) trained by stochastic gradient descent
(SGD) through an information-theoretic lens. Our findings reveal that networks
converge layer-by-layer from bottom to top and that the flow of information
adheres to a Markov information bottleneck principle. Building on these
observations, we propose a novel layer-wise training approach based on the
recently developed deterministic information bottleneck (DIB) and the
matrix-based R\'enyi's $\alpha$-order entropy functional. Specifically, each
layer is trained jointly with an auxiliary classifier that connects directly to
the output layer, enabling the learning of minimal sufficient task-relevant
representations. We empirically validate the effectiveness of our training
procedure on CIFAR-10 and CIFAR-100 using modern deep CNNs and further
demonstrate its applicability to a practical task involving traffic sign
recognition. Our approach not only outperforms existing layer-wise training
baselines but also achieves performance comparable to SGD.
学术论文
ArXiv
重要度: 5
Semantic segmentation of blood vessels is an important task in medical image
analysis, but its progress is often hindered by the scarcity of large annotated
datasets and the poor generalization of models across different imaging
modalities. A key aspect is the tendency of Convolutional Neural Networks
(CNNs) to learn texture-based features, which limits their performance when
applied to new domains with different visual characteristics. We hypothesize
that leveraging geometric priors of vessel shapes, such as their tubular and
branching nature, can lead to more robust and data-efficient models. To
investigate this, we introduce VessShape, a methodology for generating
large-scale 2D synthetic datasets designed to instill a shape bias in
segmentation models. VessShape images contain procedurally generated tubular
geometries combined with a wide variety of foreground and background textures,
encouraging models to learn shape cues rather than textures. We demonstrate
that a model pre-trained on VessShape images achieves strong few-shot
segmentation performance on two real-world datasets from different domains,
requiring only four to ten samples for fine-tuning. Furthermore, the model
exhibits notable zero-shot capabilities, effectively segmenting vessels in
unseen domains without any target-specific training. Our results indicate that
pre-training with a strong shape bias can be an effective strategy to overcome
data scarcity and improve model generalization in blood vessel segmentation.
学术论文
ArXiv
重要度: 5
Graphic layout generation is a growing research area focusing on generating
aesthetically pleasing layouts ranging from poster designs to documents. While
recent research has explored ways to incorporate user constraints to guide the
layout generation, these constraints often require complex specifications which
reduce usability. We introduce an innovative approach exploiting user-provided
sketches as intuitive constraints and we demonstrate empirically the
effectiveness of this new guidance method, establishing the sketch-to-layout
problem as a promising research direction, which is currently under-explored.
To tackle the sketch-to-layout problem, we propose a multimodal
transformer-based solution using the sketch and the content assets as inputs to
produce high quality layouts. Since collecting sketch training data from human
annotators to train our model is very costly, we introduce a novel and
efficient method to synthetically generate training sketches at scale. We train
and evaluate our model on three publicly available datasets: PubLayNet,
DocLayNet and SlidesVQA, demonstrating that it outperforms state-of-the-art
constraint-based methods, while offering a more intuitive design experience. In
order to facilitate future sketch-to-layout research, we release O(200k)
synthetically-generated sketches for the public datasets above. The datasets
are available at https://github.com/google-deepmind/sketch_to_layout.
学术论文
ArXiv
重要度: 5
Large Language Model (LLM) agents have recently shown strong potential in
domains such as automated coding, deep research, and graphical user interface
manipulation. However, training them to succeed on long-horizon,
domain-specialized tasks remains challenging. Current methods primarily fall
into two categories. The first relies on dense human annotations through
behavior cloning, which is prohibitively expensive for long-horizon tasks that
can take days or months. The second depends on outcome-driven sampling, which
often collapses due to the rarity of valid positive trajectories on
domain-specialized tasks. We introduce Apollo, a sampling framework that
integrates asynchronous human guidance with action-level data filtering.
Instead of requiring annotators to shadow every step, Apollo allows them to
intervene only when the agent drifts from a promising trajectory, by providing
prior knowledge, strategic advice, etc. This lightweight design makes it
possible to sustain interactions for over 30 hours and produces valuable
trajectories at a lower cost. Apollo then applies supervision control to filter
out sub-optimal actions and prevent error propagation. Together, these
components enable reliable and effective data collection in long-horizon
environments. To demonstrate the effectiveness of Apollo, we evaluate it using
InnovatorBench. Our experiments show that when applied to train the GLM-4.5
model on InnovatorBench, Apollo achieves more than a 50% improvement over the
untrained baseline and a 28% improvement over a variant trained without human
interaction. These results highlight the critical role of human-in-the-loop
sampling and the robustness of Apollo's design in handling long-horizon,
domain-specialized tasks.
学术论文
ArXiv
重要度: 5
Open-weight bio-foundation models present a dual-use dilemma. While holding
great promise for accelerating scientific research and drug development, they
could also enable bad actors to develop more deadly bioweapons. To mitigate the
risk posed by these models, current approaches focus on filtering biohazardous
data during pre-training. However, the effectiveness of such an approach
remains unclear, particularly against determined actors who might fine-tune
these models for malicious use. To address this gap, we propose \eval, a
framework to evaluate the robustness of procedures that are intended to reduce
the dual-use capabilities of bio-foundation models. \eval assesses models'
virus understanding through three lenses, including sequence modeling,
mutational effects prediction, and virulence prediction. Our results show that
current filtering practices may not be particularly effective: Excluded
knowledge can be rapidly recovered in some cases via fine-tuning, and exhibits
broader generalizability in sequence modeling. Furthermore, dual-use signals
may already reside in the pretrained representations, and can be elicited via
simple linear probing. These findings highlight the challenges of data
filtering as a standalone procedure, underscoring the need for further research
into robust safety and security strategies for open-weight bio-foundation
models.
学术论文
ArXiv
重要度: 5
While AI agents have long been discussed and studied in computer science,
today's Agentic AI systems are something new. We consider other definitions of
Agentic AI and propose a new realist definition. Agentic AI is a software
delivery mechanism, comparable to software as a service (SaaS), which puts an
application to work autonomously in a complex enterprise setting. Recent
advances in large language models (LLMs) as foundation models have driven
excitement in Agentic AI. We note, however, that Agentic AI systems are
primarily applications, not foundations, and so their success depends on
validation by end users and principal stakeholders. The tools and techniques
needed by the principal users to validate their applications are quite
different from the tools and techniques used to evaluate foundation models.
Ironically, with good validation measures in place, in many cases the
foundation models can be replaced with much simpler, faster, and more
interpretable models that handle core logic. When it comes to Agentic AI,
validity is what you need. LLMs are one option that might achieve it.
学术论文
ArXiv
重要度: 5
Multimodal large language models (MLLMs) have advanced embodied agents by
enabling direct perception, reasoning, and planning task-oriented actions from
visual inputs. However, such vision driven embodied agents open a new attack
surface: visual backdoor attacks, where the agent behaves normally until a
visual trigger appears in the scene, then persistently executes an
attacker-specified multi-step policy. We introduce BEAT, the first framework to
inject such visual backdoors into MLLM-based embodied agents using objects in
the environments as triggers. Unlike textual triggers, object triggers exhibit
wide variation across viewpoints and lighting, making them difficult to implant
reliably. BEAT addresses this challenge by (1) constructing a training set that
spans diverse scenes, tasks, and trigger placements to expose agents to trigger
variability, and (2) introducing a two-stage training scheme that first applies
supervised fine-tuning (SFT) and then our novel Contrastive Trigger Learning
(CTL). CTL formulates trigger discrimination as preference learning between
trigger-present and trigger-free inputs, explicitly sharpening the decision
boundaries to ensure precise backdoor activation. Across various embodied agent
benchmarks and MLLMs, BEAT achieves attack success rates up to 80%, while
maintaining strong benign task performance, and generalizes reliably to
out-of-distribution trigger placements. Notably, compared to naive SFT, CTL
boosts backdoor activation accuracy up to 39% under limited backdoor data.
These findings expose a critical yet unexplored security risk in MLLM-based
embodied agents, underscoring the need for robust defenses before real-world
deployment.