Saved in:
| Main Authors: | Wu, Siwei, Peng, Zhongyuan, Du, Xinrun, Zheng, Tuney, Liu, Minghao, Wu, Jialong, Ma, Jiachen, Li, Yizhi, Yang, Jian, Zhou, Wangchunshu, Lin, Qunshu, Zhao, Junbo, Zhang, Zhaoxiang, Huang, Wenhao, Zhang, Ge, Lin, Chenghua, Liu, J. H. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.13639 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation
by: Qu, Xingwei, et al.
Published: (2024)
by: Qu, Xingwei, et al.
Published: (2024)
Reverse-Engineered Reasoning for Open-Ended Generation
by: Wang, Haozhe, et al.
Published: (2025)
by: Wang, Haozhe, et al.
Published: (2025)
DocMMIR: A Framework for Document Multi-modal Information Retrieval
by: Li, Zirui, et al.
Published: (2025)
by: Li, Zirui, et al.
Published: (2025)
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
by: Wu, Siwei, et al.
Published: (2024)
by: Wu, Siwei, et al.
Published: (2024)
Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge
by: Li, Yizhi, et al.
Published: (2025)
by: Li, Yizhi, et al.
Published: (2025)
Scaling Test-time Compute for LLM Agents
by: Zhu, King, et al.
Published: (2025)
by: Zhu, King, et al.
Published: (2025)
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
by: Zhang, Ge, et al.
Published: (2024)
by: Zhang, Ge, et al.
Published: (2024)
LIME: Less Is More for MLLM Evaluation
by: Zhu, King, et al.
Published: (2024)
by: Zhu, King, et al.
Published: (2024)
Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements
by: Liang, Yiming, et al.
Published: (2025)
by: Liang, Yiming, et al.
Published: (2025)
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
by: Li, Ziming, et al.
Published: (2024)
by: Li, Ziming, et al.
Published: (2024)
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
by: Ma, Kaijing, et al.
Published: (2024)
by: Ma, Kaijing, et al.
Published: (2024)
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation
by: Zheng, Tianyu, et al.
Published: (2024)
by: Zheng, Tianyu, et al.
Published: (2024)
MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language
by: Wang, Shun, et al.
Published: (2024)
by: Wang, Shun, et al.
Published: (2024)
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
by: P Team, et al.
Published: (2025)
by: P Team, et al.
Published: (2025)
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
by: Liang, Yiming, et al.
Published: (2024)
by: Liang, Yiming, et al.
Published: (2024)
Why Tropical Cyclones Over Oceanic Cyclonic Eddies Can Be Intensified in Global Basins
by: Lingwei Wu, et al.
Published: (2026)
by: Lingwei Wu, et al.
Published: (2026)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)
by: Wu, Siwei, et al.
Published: (2024)
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
by: Wang, Kevin, et al.
Published: (2024)
by: Wang, Kevin, et al.
Published: (2024)
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
by: Li, Yizhi, et al.
Published: (2025)
by: Li, Yizhi, et al.
Published: (2025)
Sci-Reasoning: A Dataset Decoding AI Innovation Patterns
by: Liu, Jiachen, et al.
Published: (2026)
by: Liu, Jiachen, et al.
Published: (2026)
MAmmoTH2: Scaling Instructions from the Web
by: Yue, Xiang, et al.
Published: (2024)
by: Yue, Xiang, et al.
Published: (2024)
OmniBench: Towards The Future of Universal Omni-Language Models
by: Li, Yizhi, et al.
Published: (2024)
by: Li, Yizhi, et al.
Published: (2024)
Evaluation of OpenAI o1: Opportunities and Challenges of AGI
by: Zhong, Tianyang, et al.
Published: (2024)
by: Zhong, Tianyang, et al.
Published: (2024)
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
by: Yu, Zhouliang, et al.
Published: (2025)
by: Yu, Zhouliang, et al.
Published: (2025)
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning
by: Xu, Caijun, et al.
Published: (2026)
by: Xu, Caijun, et al.
Published: (2026)
LongIns: A Challenging Long-context Instruction-based Exam for LLMs
by: Gavin, Shawn, et al.
Published: (2024)
by: Gavin, Shawn, et al.
Published: (2024)
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
by: Zhang, Alexander, et al.
Published: (2025)
by: Zhang, Alexander, et al.
Published: (2025)
Objaverse++: Curated 3D Object Dataset with Quality Annotations
by: Lin, Chendi, et al.
Published: (2025)
by: Lin, Chendi, et al.
Published: (2025)
Quantization for OpenAI's Whisper Models: A Comparative Analysis
by: Andreyev, Allison
Published: (2025)
by: Andreyev, Allison
Published: (2025)
MIO: A Foundation Model on Multimodal Tokens
by: Wang, Zekun, et al.
Published: (2024)
by: Wang, Zekun, et al.
Published: (2024)
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
by: Ren, Jincheng, et al.
Published: (2026)
by: Ren, Jincheng, et al.
Published: (2026)
Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments
by: Wu, Siwei, et al.
Published: (2026)
by: Wu, Siwei, et al.
Published: (2026)
ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
VideoScore2: Think before You Score in Generative Video Evaluation
by: He, Xuan, et al.
Published: (2025)
by: He, Xuan, et al.
Published: (2025)
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning
by: Yang, Bohao, et al.
Published: (2025)
by: Yang, Bohao, et al.
Published: (2025)
LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm
by: Wu, Siwei, et al.
Published: (2025)
by: Wu, Siwei, et al.
Published: (2025)
OpenAI for OpenAPI: Automated generation of REST API specification via LLMs
by: Chen, Hao, et al.
Published: (2026)
by: Chen, Hao, et al.
Published: (2026)
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
by: Ren, Weiming, et al.
Published: (2024)
by: Ren, Weiming, et al.
Published: (2024)
A Survey on Latent Reasoning
by: Zhu, Rui-Jie, et al.
Published: (2025)
by: Zhu, Rui-Jie, et al.
Published: (2025)
MSNav: Zero-Shot Vision-and-Language Navigation with Dynamic Memory and LLM Spatial Reasoning
by: Liu, Chenghao, et al.
Published: (2025)
by: Liu, Chenghao, et al.
Published: (2025)
Similar Items
-
Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation
by: Qu, Xingwei, et al.
Published: (2024) -
Reverse-Engineered Reasoning for Open-Ended Generation
by: Wang, Haozhe, et al.
Published: (2025) -
DocMMIR: A Framework for Document Multi-modal Information Retrieval
by: Li, Zirui, et al.
Published: (2025) -
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
by: Wu, Siwei, et al.
Published: (2024) -
Overview of the NLPCC 2025 Shared Task: Gender Bias Mitigation Challenge
by: Li, Yizhi, et al.
Published: (2025)