:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Yue, Jiang, Zhengzhe, Luo, Xiaonan, Guo, Kehan, Zhuang, Haomin, Zhou, Yujun, Yuan, Zhengqing, Sun, Xiaoqi, Schleinitz, Jules, Wang, Yanbo, Zhang, Shuhao, Surve, Mihir, Chawla, Nitesh V, Wiest, Olaf, Zhang, Xiangliang
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2509.16543
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond
by: Guo, Kehan, et al.
Published: (2025)

AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
by: Guo, Taicheng, et al.
Published: (2026)

Reliable Control-Point Selection for Steering Reasoning in Large Language Models
by: Zhuang, Haomin, et al.
Published: (2026)

ReactionTeam: Teaming Experts for Divergent Thinking Beyond Typical Reaction Patterns
by: Guo, Taicheng, et al.
Published: (2023)

ChemHGNN: A Hierarchical Hypergraph Neural Network for Reaction Virtual Screening and Discovery
by: Huang, Xiaobao, et al.
Published: (2025)

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models
by: Wang, Xiangqi, et al.
Published: (2025)

Are we making much progress? Revisiting chemical reaction yield prediction from an imbalanced regression perspective
by: Ma, Yihong, et al.
Published: (2024)

Defending Jailbreak Prompts via In-Context Adversarial Game
by: Zhou, Yujun, et al.
Published: (2024)

Dual Optimal: Make Your LLM Peer-like with Dignity
by: Wang, Xiangqi, et al.
Published: (2026)

Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations?
by: Huang, Yue, et al.
Published: (2024)

Large Language Model based Multi-Agents: A Survey of Progress and Challenges
by: Guo, Taicheng, et al.
Published: (2024)

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
by: Bao, Han, et al.
Published: (2026)

Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
by: Lang, Yicheng, et al.
Published: (2025)

Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)

AIRGuard: Guarding Agent Actions with Runtime Authority Control
by: Qin, Suliu, et al.
Published: (2026)

AgentClick: A Skill-Based Human-in-the-Loop Review Layer for Terminal AI Agents
by: Zhuang, Haomin, et al.
Published: (2026)

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)

MolX: Enhancing Large Language Models for Molecular Understanding With A Multi-Modal Extension
by: Le, Khiem, et al.
Published: (2024)

Exploring Multi-Temperature Strategies for Token- and Rollout-Level Control in RLVR
by: Zhuang, Haomin, et al.
Published: (2025)

Capability-Oriented Training Induced Alignment Risk
by: Zhou, Yujun, et al.
Published: (2026)

Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
by: Zhou, Yujun, et al.
Published: (2025)

UGMAE: A Unified Framework for Graph Masked Autoencoders
by: Tian, Yijun, et al.
Published: (2024)

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills
by: Zhuang, Haomin, et al.
Published: (2026)

SenseMath: Do LLMs Have Number Sense? Evaluating Shortcut Use, Judgment, and Generation
by: Zhuang, Haomin, et al.
Published: (2026)

ProbeLLM: Automating Principled Diagnosis of LLM Failures
by: Huang, Yue, et al.
Published: (2026)

Emergent Social Intelligence Risks in Generative Multi-Agent Systems
by: Huang, Yue, et al.
Published: (2026)

BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
by: Sokol, Anna, et al.
Published: (2024)

Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs
by: Huang, Yue, et al.
Published: (2026)

Synthetic Interaction Data for Scalable Personalization in Large Language Models
by: Ma, Yuchen, et al.
Published: (2026)

Quasiparticle Interference Kernel Extraction with Variational Autoencoders via Latent Alignment
by: Ji, Yingshuai, et al.
Published: (2025)

LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
by: Zhou, Yujun, et al.
Published: (2024)

Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles
by: Huang, Yue, et al.
Published: (2025)

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
by: Ye, Jiayi, et al.
Published: (2024)

Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
by: Luo, Xiaonan, et al.
Published: (2025)

ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering
by: Chen, Xiuying, et al.
Published: (2024)

SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
by: Xu, Zixiang, et al.
Published: (2025)

SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
by: Liang, Zhenwen, et al.
Published: (2024)

SkillGen: Verified Inference-Time Agent Skill Synthesis
by: Ma, Yuchen, et al.
Published: (2026)

Fast Explanations via Policy Gradient-Optimized Explainer
by: Pan, Deng, et al.
Published: (2024)

Conformalized Selective Regression
by: Sokol, Anna, et al.
Published: (2024)