AI资讯日报 - 2025/6/27

👨‍🔬 Boyu Gou, Zanming Huang, Yuting Ning, Yu Gu, Michael Lin, Weijian Qi, Andrei Kopanev, Botao Yu, Bernal Jiménez Gutiérrez, Yiheng Shu, Chan Hee Song, Jiaman Wu, Shijie Chen, Hanane Nour Moussa, Tianshu Zhang, Jian Xie, Yifei Li, Tianci Xue, Zeyi Liao, Kai Zhang, Boyuan Zheng, Zhaowei Cai, Viktor Rozgic, Morteza Ziyadi, Huan Sun, Yu Su

TITAN: Query-Token based Domain Adaptive Adversarial Learning

学术论文 ArXiv 重要度: 8

TITAN通过查询令牌对抗学习，提升无源域自适应目标检测的性能。

👨‍🔬 Tajamul Ashraf, Janibul Bashir

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

学术论文 ArXiv 重要度: 7

引入HalluSegBench，首个通过反事实视觉推理评估视觉基础分割幻觉的基准。

👨‍🔬 Xinzhuo Li, Adheesh Juvekar, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Ismini Lourentzou

PsyLite Technical Report

学术论文 ArXiv 重要度: 7

PsyLite轻量级心理辅导模型，通过两阶段训练提升对话安全和专业性。

👨‍🔬 Fangjun Ding, Renyu Zhang, Xinyu Feng, Chengye Xie, Zheng Zhang, Yanting Zhang

Potemkin Understanding in Large Language Models

学术论文 ArXiv 重要度: 7

Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.

👨‍🔬 Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan

Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems

学术论文 ArXiv 重要度: 7

结合过程挖掘和随机模拟，提升网络物理系统中的故障诊断能力。

👨‍🔬 Francesco Vitale, Nicola Dall'Ora, Sebastiano Gaiardelli, Enrico Fraccaroli, Nicola Mazzocca, Franco Fummi

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

学术论文 ArXiv 重要度: 6

分析用户如何通过对话AI寻求健康信息，揭示现有模型的局限和改进方向。

👨‍🔬 Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal

Ad-Hoc Human-AI Coordination Challenge

学术论文 ArXiv 重要度: 6

提出AH2AC2挑战，促进人类与AI在不确定信息下的协调合作研究。

👨‍🔬 Tin Dizdarević, Ravi Hammond, Tobias Gessler, Anisoara Calinescu, Jonathan Cook, Matteo Gallici, Andrei Lupu, Jakob Nicolaus Foerster

skLEP: A Slovak General Language Understanding Benchmark

学术论文 ArXiv 重要度: 5

推出skLEP，首个全面评估斯洛伐克自然语言理解模型的基准。

👨‍🔬 Marek Šuppa, Andrej Ridzik, Daniel Hládek, Tomáš Javůrek, Viktória Ondrejová, Kristína Sásiková, Martin Tamajka, Marián Šimko

🤖 AI资讯日报

📊 今日趋势总结

The Next Bill Gates or Albert Einstein in AI “Chris Clark” – Yourobot

Ask HN: Is the rate of progress in AI exponential?

Ask HN: What's the pain using current AI algorithms?

NLP, AI, ML, bots – a passing trend or much more? What's your take on this?

Common Lisp + Machine Learning Internship at Google (Mountain View, CA)

Ask HN: Dipping my toes with artificial intelligence and what to expect? (CS)

The AI Crackpot Index

Ask HN: Thoughts on grad school? (CS PhD)

Ask HN: Anyone concerned about NYC Local Law 144?

50% Cheaper GPUs for cloud-computing / Saving devs 50% compared to AWS

Show HN: Startup Raising capital through Book Sales

Bioinformatician

microsoft/JARVIS

qdrant/qdrant

PaddlePaddle/Paddle

lucidrains/vit-pytorch

shap/shap

junyanz/pytorch-CycleGAN-and-pix2pix

ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

modular/modular

mxgmn/WaveFunctionCollapse

fastai/fastbook

trekhleb/homemade-machine-learning

HumanSignal/labelImg

mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

Whole-Body Conditioned Egocentric Video Prediction

WorldVLA: Towards Autoregressive Action World Model

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

TITAN: Query-Token based Domain Adaptive Adversarial Learning

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

PsyLite Technical Report

Potemkin Understanding in Large Language Models

Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

Ad-Hoc Human-AI Coordination Challenge

skLEP: A Slovak General Language Understanding Benchmark

📅 历史日报目录