Saved in:
| Main Authors: | Duan, Haodong, Fang, Xinyu, Yang, Junming, Zhao, Xiangyu, Qiao, Yuxuan, Li, Mo, Agarwal, Amit, Chen, Zhe, Chen, Lin, Liu, Yuan, Ma, Yubo, Sun, Hailong, Zhang, Yifan, Lu, Shiyin, Wong, Tack Hwa, Wang, Weiyun, Zhou, Peiheng, Li, Xiaozhe, Fu, Chaoyou, Cui, Junbo, Chen, Jixuan, Song, Enxin, Mao, Song, Ding, Shengyuan, Liang, Tianhao, Zhang, Zicheng, Dong, Xiaoyi, Zang, Yuhang, Zhang, Pan, Wang, Jiaqi, Lin, Dahua, Chen, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.11691 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
by: Li, Xiaozhe, et al.
Published: (2025)
by: Li, Xiaozhe, et al.
Published: (2025)
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
by: Li, Xiaozhe, et al.
Published: (2026)
by: Li, Xiaozhe, et al.
Published: (2026)
UniDial-EvalKit: A Unified Toolkit for Evaluating Multi-Faceted Conversational Abilities
by: Jia, Qi, et al.
Published: (2026)
by: Jia, Qi, et al.
Published: (2026)
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024)
by: Qiao, Yuxuan, et al.
Published: (2024)
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
by: Fang, Xinyu, et al.
Published: (2025)
by: Fang, Xinyu, et al.
Published: (2025)
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
by: Wang, Chonghua, et al.
Published: (2024)
by: Wang, Chonghua, et al.
Published: (2024)
Extracellular vesicles—Potential link between periodontal disease and diabetic complications
by: Shengyuan Huang, et al.
Published: (2024)
by: Shengyuan Huang, et al.
Published: (2024)
NP-Engine: Empowering Optimization Reasoning in Large Language Models with Verifiable Synthetic NP Problems
by: Li, Xiaozhe, et al.
Published: (2025)
by: Li, Xiaozhe, et al.
Published: (2025)
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
by: Zhuo, Jingming, et al.
Published: (2024)
by: Zhuo, Jingming, et al.
Published: (2024)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
by: Zhao, Xiangyu, et al.
Published: (2025)
by: Zhao, Xiangyu, et al.
Published: (2025)
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
by: Zhao, Xiangyu, et al.
Published: (2025)
by: Zhao, Xiangyu, et al.
Published: (2025)
Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMs
by: Li, Xiaozhe, et al.
Published: (2026)
by: Li, Xiaozhe, et al.
Published: (2026)
Beyond Mode Collapse: Distribution Matching for Diverse Reasoning
by: Li, Xiaozhe, et al.
Published: (2026)
by: Li, Xiaozhe, et al.
Published: (2026)
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
by: Wang, Yiheng, et al.
Published: (2025)
by: Wang, Yiheng, et al.
Published: (2025)
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
by: Zang, Yuhang, et al.
Published: (2025)
by: Zang, Yuhang, et al.
Published: (2025)
SPARK: Synergistic Policy And Reward Co-Evolving Framework
by: Liu, Ziyu, et al.
Published: (2025)
by: Liu, Ziyu, et al.
Published: (2025)
A Rare Case of Disseminated Extrapulmonary Tuberculosis Diagnosed by Endoscopic Ultrasonography‐Guided Fine‐Needle Biopsy: A Case Report
by: Yi‐Lin Lin, et al.
Published: (2026)
by: Yi‐Lin Lin, et al.
Published: (2026)
MM-IFEngine: Towards Multimodal Instruction Following
by: Ding, Shengyuan, et al.
Published: (2025)
by: Ding, Shengyuan, et al.
Published: (2025)
Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic
by: Ma, Yichuan, et al.
Published: (2026)
by: Ma, Yichuan, et al.
Published: (2026)
Are We on the Right Way for Evaluating Large Vision-Language Models?
by: Chen, Lin, et al.
Published: (2024)
by: Chen, Lin, et al.
Published: (2024)
Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises
by: Lin, Shiyin
Published: (2025)
by: Lin, Shiyin
Published: (2025)
LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis
by: Lin, Shiyin
Published: (2025)
by: Lin, Shiyin
Published: (2025)
Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback
by: Lin, Shiyin
Published: (2025)
by: Lin, Shiyin
Published: (2025)
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
by: Zhao, Xiangyu, et al.
Published: (2026)
by: Zhao, Xiangyu, et al.
Published: (2026)
Information Density Principle for MLLM Benchmarks
by: Li, Chunyi, et al.
Published: (2025)
by: Li, Chunyi, et al.
Published: (2025)
Sea–Land Segmentation Dataset Sources
by: Zhang, Jixuan
Published: (2025)
by: Zhang, Jixuan
Published: (2025)
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions
by: Zhang, Yi-Kai, et al.
Published: (2024)
by: Zhang, Yi-Kai, et al.
Published: (2024)
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)
by: Fang, Xinyu, et al.
Published: (2024)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
by: Li, Rongjie, et al.
Published: (2024)
by: Li, Rongjie, et al.
Published: (2024)
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)
by: Zhang, Mengchen, et al.
Published: (2025)
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
by: Gu, Yuzhe, et al.
Published: (2025)
by: Gu, Yuzhe, et al.
Published: (2025)
Decentralized model reference adaptive control for interconnected systems with time‐varying delays and unknown dead‐zone inputs
by: Chen Yang, et al.
Published: (2024)
by: Chen Yang, et al.
Published: (2024)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
by: Liu, Hongwei, et al.
Published: (2024)
by: Liu, Hongwei, et al.
Published: (2024)
Diabetic Wound Repair: From Mechanism to Therapeutic Opportunities
by: Renyuan Wang, et al.
Published: (2025)
by: Renyuan Wang, et al.
Published: (2025)
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
by: Lei, Haodong, et al.
Published: (2026)
by: Lei, Haodong, et al.
Published: (2026)
High‐Throughput Sorting and Single‐Cell Mechanotyping by Hydrodynamic Sorting‐Mechanotyping Cytometry
by: Yao Chen, et al.
Published: (2024)
by: Yao Chen, et al.
Published: (2024)
Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance
by: Chen, Wenhao, et al.
Published: (2026)
by: Chen, Wenhao, et al.
Published: (2026)
What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents
by: Li, Xiaozhe, et al.
Published: (2026)
by: Li, Xiaozhe, et al.
Published: (2026)
From Pets to Robots: MojiKit as a Data-Informed Toolkit for Affective HRI Design
by: He, Liwen, et al.
Published: (2026)
by: He, Liwen, et al.
Published: (2026)
NepTrain and NepTrainKit: Automated Active Learning and Visualization Toolkit for Neuroevolution Potentials
by: Chen, Chengbing, et al.
Published: (2025)
by: Chen, Chengbing, et al.
Published: (2025)
Similar Items
-
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
by: Li, Xiaozhe, et al.
Published: (2025) -
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
by: Li, Xiaozhe, et al.
Published: (2026) -
UniDial-EvalKit: A Unified Toolkit for Evaluating Multi-Faceted Conversational Abilities
by: Jia, Qi, et al.
Published: (2026) -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024) -
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
by: Fang, Xinyu, et al.
Published: (2025)