Saved in:
| Main Authors: | Huang, Yue, Jiang, Zhengzhe, Luo, Xiaonan, Guo, Kehan, Zhuang, Haomin, Zhou, Yujun, Yuan, Zhengqing, Sun, Xiaoqi, Schleinitz, Jules, Wang, Yanbo, Zhang, Shuhao, Surve, Mihir, Chawla, Nitesh V, Wiest, Olaf, Zhang, Xiangliang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.16543 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond
by: Guo, Kehan, et al.
Published: (2025)
by: Guo, Kehan, et al.
Published: (2025)
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
by: Guo, Taicheng, et al.
Published: (2026)
by: Guo, Taicheng, et al.
Published: (2026)
Reliable Control-Point Selection for Steering Reasoning in Large Language Models
by: Zhuang, Haomin, et al.
Published: (2026)
by: Zhuang, Haomin, et al.
Published: (2026)
ReactionTeam: Teaming Experts for Divergent Thinking Beyond Typical Reaction Patterns
by: Guo, Taicheng, et al.
Published: (2023)
by: Guo, Taicheng, et al.
Published: (2023)
ChemHGNN: A Hierarchical Hypergraph Neural Network for Reaction Virtual Screening and Discovery
by: Huang, Xiaobao, et al.
Published: (2025)
by: Huang, Xiaobao, et al.
Published: (2025)
AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models
by: Wang, Xiangqi, et al.
Published: (2025)
by: Wang, Xiangqi, et al.
Published: (2025)
Are we making much progress? Revisiting chemical reaction yield prediction from an imbalanced regression perspective
by: Ma, Yihong, et al.
Published: (2024)
by: Ma, Yihong, et al.
Published: (2024)
Defending Jailbreak Prompts via In-Context Adversarial Game
by: Zhou, Yujun, et al.
Published: (2024)
by: Zhou, Yujun, et al.
Published: (2024)
Dual Optimal: Make Your LLM Peer-like with Dignity
by: Wang, Xiangqi, et al.
Published: (2026)
by: Wang, Xiangqi, et al.
Published: (2026)
Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations?
by: Huang, Yue, et al.
Published: (2024)
by: Huang, Yue, et al.
Published: (2024)
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
by: Guo, Taicheng, et al.
Published: (2024)
by: Guo, Taicheng, et al.
Published: (2024)
PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
by: Bao, Han, et al.
Published: (2026)
by: Bao, Han, et al.
Published: (2026)
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
by: Lang, Yicheng, et al.
Published: (2025)
by: Lang, Yicheng, et al.
Published: (2025)
Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)
by: Wang, Xiangqi, et al.
Published: (2025)
AIRGuard: Guarding Agent Actions with Runtime Authority Control
by: Qin, Suliu, et al.
Published: (2026)
by: Qin, Suliu, et al.
Published: (2026)
AgentClick: A Skill-Based Human-in-the-Loop Review Layer for Terminal AI Agents
by: Zhuang, Haomin, et al.
Published: (2026)
by: Zhuang, Haomin, et al.
Published: (2026)
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)
by: Zhuang, Haomin, et al.
Published: (2024)
MolX: Enhancing Large Language Models for Molecular Understanding With A Multi-Modal Extension
by: Le, Khiem, et al.
Published: (2024)
by: Le, Khiem, et al.
Published: (2024)
Exploring Multi-Temperature Strategies for Token- and Rollout-Level Control in RLVR
by: Zhuang, Haomin, et al.
Published: (2025)
by: Zhuang, Haomin, et al.
Published: (2025)
Capability-Oriented Training Induced Alignment Risk
by: Zhou, Yujun, et al.
Published: (2026)
by: Zhou, Yujun, et al.
Published: (2026)
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
by: Zhou, Yujun, et al.
Published: (2025)
by: Zhou, Yujun, et al.
Published: (2025)
UGMAE: A Unified Framework for Graph Masked Autoencoders
by: Tian, Yijun, et al.
Published: (2024)
by: Tian, Yijun, et al.
Published: (2024)
AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills
by: Zhuang, Haomin, et al.
Published: (2026)
by: Zhuang, Haomin, et al.
Published: (2026)
SenseMath: Do LLMs Have Number Sense? Evaluating Shortcut Use, Judgment, and Generation
by: Zhuang, Haomin, et al.
Published: (2026)
by: Zhuang, Haomin, et al.
Published: (2026)
ProbeLLM: Automating Principled Diagnosis of LLM Failures
by: Huang, Yue, et al.
Published: (2026)
by: Huang, Yue, et al.
Published: (2026)
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
by: Huang, Yue, et al.
Published: (2026)
by: Huang, Yue, et al.
Published: (2026)
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
by: Sokol, Anna, et al.
Published: (2024)
by: Sokol, Anna, et al.
Published: (2024)
Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs
by: Huang, Yue, et al.
Published: (2026)
by: Huang, Yue, et al.
Published: (2026)
Synthetic Interaction Data for Scalable Personalization in Large Language Models
by: Ma, Yuchen, et al.
Published: (2026)
by: Ma, Yuchen, et al.
Published: (2026)
Quasiparticle Interference Kernel Extraction with Variational Autoencoders via Latent Alignment
by: Ji, Yingshuai, et al.
Published: (2025)
by: Ji, Yingshuai, et al.
Published: (2025)
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
by: Zhou, Yujun, et al.
Published: (2024)
by: Zhou, Yujun, et al.
Published: (2024)
Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles
by: Huang, Yue, et al.
Published: (2025)
by: Huang, Yue, et al.
Published: (2025)
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
by: Ye, Jiayi, et al.
Published: (2024)
by: Ye, Jiayi, et al.
Published: (2024)
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
by: Luo, Xiaonan, et al.
Published: (2025)
by: Luo, Xiaonan, et al.
Published: (2025)
ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering
by: Chen, Xiuying, et al.
Published: (2024)
by: Chen, Xiuying, et al.
Published: (2024)
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
by: Xu, Zixiang, et al.
Published: (2025)
by: Xu, Zixiang, et al.
Published: (2025)
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
by: Liang, Zhenwen, et al.
Published: (2024)
by: Liang, Zhenwen, et al.
Published: (2024)
SkillGen: Verified Inference-Time Agent Skill Synthesis
by: Ma, Yuchen, et al.
Published: (2026)
by: Ma, Yuchen, et al.
Published: (2026)
Fast Explanations via Policy Gradient-Optimized Explainer
by: Pan, Deng, et al.
Published: (2024)
by: Pan, Deng, et al.
Published: (2024)
Conformalized Selective Regression
by: Sokol, Anna, et al.
Published: (2024)
by: Sokol, Anna, et al.
Published: (2024)
Similar Items
-
Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond
by: Guo, Kehan, et al.
Published: (2025) -
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
by: Guo, Taicheng, et al.
Published: (2026) -
Reliable Control-Point Selection for Steering Reasoning in Large Language Models
by: Zhuang, Haomin, et al.
Published: (2026) -
ReactionTeam: Teaming Experts for Divergent Thinking Beyond Typical Reaction Patterns
by: Guo, Taicheng, et al.
Published: (2023) -
ChemHGNN: A Hierarchical Hypergraph Neural Network for Reaction Virtual Screening and Discovery
by: Huang, Xiaobao, et al.
Published: (2025)