:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhu, Erle, Liu, Yadi, Zhang, Zhe, Li, Xujun, Zhou, Jin, Yu, Xinjie, Huang, Minlie, Wang, Hongning
Formato:	Preprint
Publicado:	2025
Materias:	Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2501.10768
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

SkillEvolver: Skill Learning as a Meta-Skill
por: Zhang, Genrui, et al.
Publicado: (2026)

MAPS: Multi-Agent Personality Shaping for Collaborative Reasoning
por: Zhang, Jian, et al.
Publicado: (2025)

Grounding LLMs in Scientific Discovery via Embodied Actions
por: Zhang, Bo, et al.
Publicado: (2026)

Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
por: Yang, Junxiao, et al.
Publicado: (2025)

Trust-Region Adaptive Policy Optimization
por: Su, Mingyu, et al.
Publicado: (2025)

LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
por: Gui, Jiayi, et al.
Publicado: (2024)

PhysUniBench: A Multi-Modal Physics Reasoning Benchmark at Undergraduate Level
por: Wang, Lintao, et al.
Publicado: (2025)

BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
por: Yang, Junxiao, et al.
Publicado: (2025)

Geometric Mixture-of-Experts with Curvature-Guided Adaptive Routing for Graph Representation Learning
por: Cao, Haifang, et al.
Publicado: (2026)

MAPS: Multi-Fidelity AI-Augmented Photonic Simulation and Inverse Design Infrastructure
por: Ma, Pingchuan, et al.
Publicado: (2025)

Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning
por: Guo, Chenyou, et al.
Publicado: (2026)

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning
por: Zhou, Hao, et al.
Publicado: (2026)

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
por: Cheng, Jiale, et al.
Publicado: (2024)

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
por: Cheng, Jiale, et al.
Publicado: (2024)

LongSafety: Evaluating Long-Context Safety of Large Language Models
por: Lu, Yida, et al.
Publicado: (2025)

Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting
por: Zhu, Lingting, et al.
Publicado: (2025)

Stop Before You Fail: Operational Capability Boundaries for Mitigating Unproductive Reasoning in Large Reasoning Models
por: Zhang, Qingjie, et al.
Publicado: (2025)

HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment
por: Liao, Zhichao, et al.
Publicado: (2025)

SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
por: Xiang, Kun, et al.
Publicado: (2026)

Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
por: Liu, Zhenghao, et al.
Publicado: (2025)

UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling
por: Liu, Pingping, et al.
Publicado: (2026)

Data-Efficient RLVR via Off-Policy Influence Guidance
por: Zhu, Erle, et al.
Publicado: (2025)

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
por: Hu, Liang, et al.
Publicado: (2025)

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
por: Zuo, Yuxin, et al.
Publicado: (2025)

Knowledge Graph-Driven Expert-Level Reasoning for Neuroscience
por: Stephen, Jake, et al.
Publicado: (2026)

Benchmarking Complex Instruction-Following with Multiple Constraints Composition
por: Wen, Bosi, et al.
Publicado: (2024)

Agentic Active Omni-Modal Perception for Multi-Hop Audio-Visual Reasoning
por: Xu, Ke, et al.
Publicado: (2026)

DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering
por: Wang, Xinyi, et al.
Publicado: (2025)

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?
por: Jing, Liqiang, et al.
Publicado: (2024)

IntentionESC: An Intention-Centered Framework for Enhancing Emotional Support in Dialogue Systems
por: Zhang, Xinjie, et al.
Publicado: (2025)

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
por: Ke, Pei, et al.
Publicado: (2023)

Multi-Modal Time Series Prediction via Mixture of Modulated Experts
por: Zhang, Lige, et al.
Publicado: (2026)

Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation
por: Zhang, Shengzhe, et al.
Publicado: (2025)

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
por: Yang, Junxiao, et al.
Publicado: (2026)

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs
por: Xiao, Shuai, et al.
Publicado: (2026)

ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom
por: Zhou, Jingqi, et al.
Publicado: (2024)

Asymmetric Generative Recommendation via Multi-Expert Projection and Multi-Faceted Hierarchical Quantization
por: Huang, Bin, et al.
Publicado: (2026)

MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge
por: Cheung, Jerry Junyang, et al.
Publicado: (2025)

How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study
por: Wang, Junran, et al.
Publicado: (2026)

RE-MCDF: Closed-Loop Multi-Expert LLM Reasoning for Knowledge-Grounded Clinical Diagnosis
por: Shen, Shaowei, et al.
Publicado: (2026)