Saved in:
| Main Authors: | Yan, Weixiang, Liu, Haitian, Wu, Tengxiao, Chen, Qian, Wang, Wen, Chai, Haoyuan, Wang, Jiayi, Zhao, Weishan, Zhang, Yixin, Zhang, Renjun, Zhu, Li, Zhao, Xuandong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.13890 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
by: Yan, Weixiang, et al.
Published: (2023)
by: Yan, Weixiang, et al.
Published: (2023)
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
by: Tian, Yuchen, et al.
Published: (2024)
by: Tian, Yuchen, et al.
Published: (2024)
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
by: Chen, Kaiyuan, et al.
Published: (2025)
by: Chen, Kaiyuan, et al.
Published: (2025)
A Case of AL Amyloidosis With Hepatomegaly as the Main Clinical Manifestation
by: Shuchen Dong, et al.
Published: (2025)
by: Shuchen Dong, et al.
Published: (2025)
Self-Sovereign Agent
by: Qu, Wenjie, et al.
Published: (2026)
by: Qu, Wenjie, et al.
Published: (2026)
Genetic and Clinical Landscape of Chinese Frontotemporal Dementia: Dominance of TBK1 and OPTN Mutations
by: Haitian Nan
Published: (2025)
by: Haitian Nan
Published: (2025)
CureAgent: A Training-Free Executor-Analyst Framework for Clinical Reasoning
by: Xie, Ting-Ting, et al.
Published: (2025)
by: Xie, Ting-Ting, et al.
Published: (2025)
Evolving Interactive Diagnostic Agents in a Virtual Clinical Environment
by: Qiu, Pengcheng, et al.
Published: (2025)
by: Qiu, Pengcheng, et al.
Published: (2025)
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
by: Xu, Renjun, et al.
Published: (2026)
by: Xu, Renjun, et al.
Published: (2026)
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
by: Xie, Jingxu, et al.
Published: (2025)
by: Xie, Jingxu, et al.
Published: (2025)
Clinical significance and immune microenvironment association of cuproptosis‐related genes in pan‐cancer
by: Xinyu Ge, et al.
Published: (2025)
by: Xinyu Ge, et al.
Published: (2025)
PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis
by: Xu, Jiao, et al.
Published: (2026)
by: Xu, Jiao, et al.
Published: (2026)
Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance
by: Zhang, Zhan, et al.
Published: (2024)
by: Zhang, Zhan, et al.
Published: (2024)
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
by: Meng, Jinxiang, et al.
Published: (2026)
by: Meng, Jinxiang, et al.
Published: (2026)
GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick
by: Fu, Jiayi, et al.
Published: (2024)
by: Fu, Jiayi, et al.
Published: (2024)
An Electrochemical Oxidation and Intercalation Strategy for Iodide Removal Using LDHs
by: Xiaomeng Guo, et al.
Published: (2024)
by: Xiaomeng Guo, et al.
Published: (2024)
Reward Shaping to Mitigate Reward Hacking in RLHF
by: Fu, Jiayi, et al.
Published: (2025)
by: Fu, Jiayi, et al.
Published: (2025)
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control
by: Li, Yunzhe, et al.
Published: (2023)
by: Li, Yunzhe, et al.
Published: (2023)
ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents
by: Fu, Xing, et al.
Published: (2026)
by: Fu, Xing, et al.
Published: (2026)
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
by: Liu, Yepeng, et al.
Published: (2025)
by: Liu, Yepeng, et al.
Published: (2025)
Amide Proton Transfer‐Weighted MRI, Associations with Clinical Severity and Prognosis in Ischemic Strokes
by: Le Zhou, et al.
Published: (2024)
by: Le Zhou, et al.
Published: (2024)
M3PD Dataset: Dual-view Photoplethysmography (PPG) Using Front-and-rear Cameras of Smartphones in Lab and Clinical Settings
by: Tang, Jiankai, et al.
Published: (2025)
by: Tang, Jiankai, et al.
Published: (2025)
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
by: Sun, Wenqiang, et al.
Published: (2025)
by: Sun, Wenqiang, et al.
Published: (2025)
The Role of Departmental Secretariats
by: Podger, Andrew
Published: (2013)
by: Podger, Andrew
Published: (2013)
Managing for Departmental Success
by: Stefan Niewiesk, et al.
Published: (2025)
by: Stefan Niewiesk, et al.
Published: (2025)
Real-Time Reasoning Agents in Evolving Environments
by: Wen, Yule, et al.
Published: (2025)
by: Wen, Yule, et al.
Published: (2025)
MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence
by: Gao, Renjun
Published: (2025)
by: Gao, Renjun
Published: (2025)
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
by: Zhao, Xuandong, et al.
Published: (2024)
by: Zhao, Xuandong, et al.
Published: (2024)
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
by: Zhang, Ruoxuan, et al.
Published: (2025)
by: Zhang, Ruoxuan, et al.
Published: (2025)
Real‐World Analysis of Treatment Patterns in Limited‐Stage Small Cell Lung Cancer: Implications for Clinical Practice
by: Siyuan Yu, et al.
Published: (2025)
by: Siyuan Yu, et al.
Published: (2025)
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices
by: Li, Jia, et al.
Published: (2026)
by: Li, Jia, et al.
Published: (2026)
Clinical glycoproteomics: methods and diseases
by: Yujia Wang, et al.
Published: (2024)
by: Yujia Wang, et al.
Published: (2024)
ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment
by: Li, Ruochen, et al.
Published: (2025)
by: Li, Ruochen, et al.
Published: (2025)
APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight
by: Huang, Wanjing, et al.
Published: (2025)
by: Huang, Wanjing, et al.
Published: (2025)
ClinicalAgents: Multi-Agent Orchestration for Clinical Decision Making with Dual-Memory
by: Ge, Zhuohan, et al.
Published: (2026)
by: Ge, Zhuohan, et al.
Published: (2026)
Large Language Model Agents Are Not Always Faithful Self-Evolvers
by: Zhao, Weixiang, et al.
Published: (2026)
by: Zhao, Weixiang, et al.
Published: (2026)
Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning
by: Wang, Qian, et al.
Published: (2026)
by: Wang, Qian, et al.
Published: (2026)
An Efficient CRISPR/Cas Cooperative Shearing Platform for Clinical Diagnostics Applications
by: Junhong Zhao, et al.
Published: (2024)
by: Junhong Zhao, et al.
Published: (2024)
An Efficient CRISPR/Cas Cooperative Shearing Platform for Clinical Diagnostics Applications
by: Junhong Zhao, et al.
Published: (2024)
by: Junhong Zhao, et al.
Published: (2024)
Similar Items
-
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
by: Yan, Weixiang, et al.
Published: (2023) -
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
by: Tian, Yuchen, et al.
Published: (2024) -
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
by: Chen, Kaiyuan, et al.
Published: (2025) -
A Case of AL Amyloidosis With Hepatomegaly as the Main Clinical Manifestation
by: Shuchen Dong, et al.
Published: (2025) -
Self-Sovereign Agent
by: Qu, Wenjie, et al.
Published: (2026)