行业动态
Hacker News
重要度: 9
AI领域的下一个比尔·盖茨或爱因斯坦
行业动态
Hacker News
重要度: 8
探讨AI进步速度是否呈指数级增长
行业动态
Hacker News
重要度: 8
云计算GPU成本降低50%,相比AWS节省开发者50%
行业动态
Hacker News
重要度: 7
探讨当前AI算法的痛点
行业动态
Hacker News
重要度: 7
NLP、AI、ML、机器人是短暂趋势还是更多?
行业动态
Hacker News
重要度: 6
谷歌提供Common Lisp与机器学习实习机会
行业动态
Hacker News
重要度: 6
初探人工智能及其预期
行业动态
Hacker News
重要度: 5
AI领域的疯狂指数讨论
行业动态
Hacker News
重要度: 5
关于研究生院的思考(CS博士)
行业动态
Hacker News
重要度: 4
讨论对纽约市地方法律144的关注
行业动态
Hacker News
重要度: 3
初创公司通过书籍销售筹集资金
行业动态
Hacker News
重要度: 2
生物信息学家的讨论
学术论文
ArXiv
重要度: 10
综述自进化代理,迈向人工超级智能的道路。
👨🔬 Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, Hongru Wang, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xiang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, Qihan Ren, Cheng Qian, Zhenghailong Wang, Minda Hu, Huazheng Wang, Qingyun Wu, Heng Ji, Mengdi Wang
学术论文
ArXiv
重要度: 9
GenoMAS:通过代码驱动的基因表达分析进行科学发现的多代理框架。
👨🔬 Haoyang Liu, Yijiang Li, Haohan Wang
学术论文
ArXiv
重要度: 9
Deep Neural Networks (DNNs) deliver impressive performance but their
black-box nature limits deployment in high-stakes domains requiring
transparency. We introduce Compositional Function Networks (CFNs), a novel
framework that builds inherently interpretable models by composing elementary
mathematical functions with clear semantics. Unlike existing interpretable
approaches that are limited to simple additive structures, CFNs support diverse
compositional patterns -- sequential, parallel, and conditional -- enabling
complex feature interactions while maintaining transparency. A key innovation
is that CFNs are fully differentiable, allowing efficient training through
standard gradient descent. We demonstrate CFNs' versatility across multiple
domains, from symbolic regression to image classification with deep
hierarchical networks. Our empirical evaluation shows CFNs achieve competitive
performance against black-box models (96.24% accuracy on CIFAR-10) while
outperforming state-of-the-art interpretable models like Explainable Boosting
Machines. By combining the hierarchical expressiveness and efficient training
of deep learning with the intrinsic interpretability of well-defined
mathematical functions, CFNs offer a powerful framework for applications where
both performance and accountability are paramount.
👨🔬 Fang Li
学术论文
ArXiv
重要度: 9
While frontier large language models (LLMs) continue to push capability
boundaries, their deployment remains confined to GPU-powered cloud
infrastructure. We challenge this paradigm with SmallThinker, a family of LLMs
natively designed - not adapted - for the unique constraints of local devices:
weak computational power, limited memory, and slow storage. Unlike traditional
approaches that mainly compress existing models built for clouds, we architect
SmallThinker from the ground up to thrive within these limitations. Our
innovation lies in a deployment-aware architecture that transforms constraints
into design principles. First, We introduce a two-level sparse structure
combining fine-grained Mixture-of-Experts (MoE) with sparse feed-forward
networks, drastically reducing computational demands without sacrificing model
capacity. Second, to conquer the I/O bottleneck of slow storage, we design a
pre-attention router that enables our co-designed inference engine to prefetch
expert parameters from storage while computing attention, effectively hiding
storage latency that would otherwise cripple on-device inference. Third, for
memory efficiency, we utilize NoPE-RoPE hybrid sparse attention mechanism to
slash KV cache requirements. We release SmallThinker-4B-A0.6B and
SmallThinker-21B-A3B, which achieve state-of-the-art performance scores and
even outperform larger LLMs. Remarkably, our co-designed system mostly
eliminates the need for expensive GPU hardware: with Q4_0 quantization, both
models exceed 20 tokens/s on ordinary consumer CPUs, while consuming only 1GB
and 8GB of memory respectively. SmallThinker is publicly available at
hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct and
hf.co/PowerInfer/SmallThinker-21BA3B-Instruct.
👨🔬 Yixin Song, Zhenliang Xue, Dongliang Wei, Feiyang Chen, Jianxiang Gao, Junchen Liu, Hangyu Liang, Guangshuo Qin, Chengrong Tian, Bo Wen, Longyu Zhao, Xinrui Zheng, Zeyu Mi, Haibo Chen
学术论文
ArXiv
重要度: 8
MIRAGE-Bench:LLM代理的幻觉及其发现位置。
👨🔬 Weichen Zhang, Yiyou Sun, Pohao Huang, Jiayue Pu, Heyue Lin, Dawn Song
学术论文
ArXiv
重要度: 8
In real-world machine learning deployments, models must be continually
updated, composed, and when required, selectively undone. However, existing
approaches to model merging and continual learning often suffer from task
interference, catastrophic forgetting, or lack of reversibility. We propose
Modular Delta Merging with Orthogonal Constraints (MDM-OC), a novel framework
that enables scalable, interference-free, and reversible composition of
fine-tuned models. Each task-specific model is encoded as a delta from a shared
base and projected into an orthogonal subspace to eliminate conflict. These
projected deltas are then merged via gradient-based optimization to form a
unified model that retains performance across tasks. Our approach supports
continual integration of new models, structured unmerging for compliance such
as GDPR requirements, and model stability via elastic weight consolidation and
synthetic replay. Extensive experiments on vision and natural language
processing benchmarks demonstrate that MDM-OC outperforms prior baselines in
accuracy, backward transfer, and unmerge fidelity, while remaining
memory-efficient and computationally tractable. This framework offers a
principled solution for modular and compliant AI system design.
👨🔬 Haris Khan, Shumaila Asif, Sadia Asif
学术论文
ArXiv
重要度: 8
Large visual-language models (LVLMs) integrate aligned large language models
(LLMs) with visual modules to process multimodal inputs. However, the safety
mechanisms developed for text-based LLMs do not naturally extend to visual
modalities, leaving LVLMs vulnerable to harmful image inputs. To address this
cross-modal safety gap, we introduce security tensors - trainable input vectors
applied during inference through either the textual or visual modality. These
tensors transfer textual safety alignment to visual processing without
modifying the model's parameters. They are optimized using a curated dataset
containing (i) malicious image-text pairs requiring rejection, (ii) contrastive
benign pairs with text structurally similar to malicious queries, with the
purpose of being contrastive examples to guide visual reliance, and (iii)
general benign samples preserving model functionality. Experimental results
demonstrate that both textual and visual security tensors significantly enhance
LVLMs' ability to reject diverse harmful visual inputs while maintaining
near-identical performance on benign tasks. Further internal analysis towards
hidden-layer representations reveals that security tensors successfully
activate the language module's textual "safety layers" in visual inputs,
thereby effectively extending text-based safety to the visual modality.
👨🔬 Shen Li, Liuyi Yao, Wujia Niu, Lan Zhang, Yaliang Li
学术论文
ArXiv
重要度: 7
基于ASP的交互式配置的智能扩展技术。
👨🔬 Lucia Balážová, Richard Comploi-Taupe, Susana Hahn, Nicolas Rühling, Gottfried Schenner
学术论文
ArXiv
重要度: 7
This study investigates the mechanisms and factors influencing memorization
in fine-tuned large language models (LLMs), with a focus on the medical domain
due to its privacy-sensitive nature. We examine how different aspects of the
fine-tuning process affect a model's propensity to memorize training data,
using the PHEE dataset of pharmacovigilance events. Our research employs two main approaches: a membership inference attack to
detect memorized data, and a generation task with prompted prefixes to assess
verbatim reproduction. We analyze the impact of adapting different weight
matrices in the transformer architecture, the relationship between perplexity
and memorization, and the effect of increasing the rank in low-rank adaptation
(LoRA) fine-tuning. Key findings include: (1) Value and Output matrices contribute more
significantly to memorization compared to Query and Key matrices; (2) Lower
perplexity in the fine-tuned model correlates with increased memorization; (3)
Higher LoRA ranks lead to increased memorization, but with diminishing returns
at higher ranks. These results provide insights into the trade-offs between model performance
and privacy risks in fine-tuned LLMs. Our findings have implications for
developing more effective and responsible strategies for adapting large
language models while managing data privacy concerns.
👨🔬 Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Cappé
学术论文
ArXiv
重要度: 7
Existing methods for estimating personalized treatment effects typically rely
on structured covariates, limiting their applicability to unstructured data.
Yet, leveraging unstructured data for causal inference has considerable
application potential, for instance in healthcare, where clinical notes or
medical images are abundant. To this end, we first introduce an approximate
'plug-in' method trained directly on the neural representations of unstructured
data. However, when these fail to capture all confounding information, the
method may be subject to confounding bias. We therefore introduce two
theoretically grounded estimators that leverage structured measurements of the
confounders during training, but allow estimating personalized treatment
effects purely from unstructured inputs, while avoiding confounding bias. When
these structured measurements are only available for a non-representative
subset of the data, these estimators may suffer from sampling bias. To address
this, we further introduce a regression-based correction that accounts for the
non-uniform sampling, assuming the sampling mechanism is known or can be
well-estimated. Our experiments on two benchmark datasets show that the plug-in
method, directly trainable on large unstructured datasets, achieves strong
empirical performance across all settings, despite its simplicity.
👨🔬 Henri Arno, Thomas Demeester
学术论文
ArXiv
重要度: 7
Domain shift poses a fundamental challenge in time series analysis, where
models trained on source domain often fail dramatically when applied in target
domain with different yet similar distributions. While current unsupervised
domain adaptation (UDA) methods attempt to align cross-domain feature
distributions, they typically treat features as indivisible entities, ignoring
their intrinsic compositions that governs domain adaptation. We introduce
DARSD, a novel UDA framework with theoretical explainability that explicitly
realizes UDA tasks from the perspective of representation space decomposition.
Our core insight is that effective domain adaptation requires not just
alignment, but principled disentanglement of transferable knowledge from mixed
representations. DARSD consists three synergistic components: (I) An
adversarial learnable common invariant basis that projects original features
into a domain-invariant subspace while preserving semantic content; (II) A
prototypical pseudo-labeling mechanism that dynamically separates target
features based on confidence, hindering error accumulation; (III) A hybrid
contrastive optimization strategy that simultaneously enforces feature
clustering and consistency while mitigating emerging distribution gaps.
Comprehensive experiments conducted on four benchmark datasets (WISDM, HAR,
HHAR, and MFD) demonstrate DARSD's superiority against 12 UDA algorithms,
achieving optimal performance in 35 out of 53 cross-domain scenarios.
👨🔬 Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang
学术论文
ArXiv
重要度: 6
Recent advances in diffusion-based video generation have enabled
photo-realistic short clips, but current methods still struggle to achieve
multi-modal consistency when jointly generating whole-body motion and natural
speech. Current approaches lack comprehensive evaluation frameworks that assess
both visual and audio quality, and there are insufficient benchmarks for
region-specific performance analysis. To address these gaps, we introduce the
Joint Whole-Body Talking Avatar and Speech Generation Version I(JWB-DH-V1),
comprising a large-scale multi-modal dataset with 10,000 unique identities
across 2 million video samples, and an evaluation protocol for assessing joint
audio-video generation of whole-body animatable avatars. Our evaluation of SOTA
models reveals consistent performance disparities between face/hand-centric and
whole-body performance, which incidates essential areas for future research.
The dataset and evaluation tools are publicly available at
https://github.com/deepreasonings/WholeBodyBenchmark.
👨🔬 Xinhan Di, Kristin Qi, Pengqian Yu