Saved in:
| Main Authors: | Ying, Kaining, Meng, Fanqing, Wang, Jin, Li, Zhiqian, Lin, Han, Yang, Yue, Zhang, Hao, Zhang, Wenbo, Lin, Yuqi, Liu, Shuo, Lei, Jiayi, Lu, Quanfeng, Chen, Runjian, Xu, Peng, Zhang, Renrui, Zhang, Haozhe, Gao, Peng, Wang, Yali, Qiao, Yu, Luo, Ping, Zhang, Kaipeng, Shao, Wenqi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.16006 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
by: Liu, Shuo, et al.
Published: (2024)
by: Liu, Shuo, et al.
Published: (2024)
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
Position: Towards Implicit Prompt For Text-To-Image Models
by: Yang, Yue, et al.
Published: (2024)
by: Yang, Yue, et al.
Published: (2024)
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
by: Lu, Quanfeng, et al.
Published: (2024)
by: Lu, Quanfeng, et al.
Published: (2024)
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models
by: Shao, Wenqi, et al.
Published: (2023)
by: Shao, Wenqi, et al.
Published: (2023)
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
by: Shao, Wenqi, et al.
Published: (2023)
by: Shao, Wenqi, et al.
Published: (2023)
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
by: Meng, Fanqing, et al.
Published: (2025)
by: Meng, Fanqing, et al.
Published: (2025)
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
by: Chen, Mengzhao, et al.
Published: (2024)
by: Chen, Mengzhao, et al.
Published: (2024)
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
by: Zhou, Pengfei, et al.
Published: (2024)
by: Zhou, Pengfei, et al.
Published: (2024)
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
by: Xu, Zhaopan, et al.
Published: (2025)
by: Xu, Zhaopan, et al.
Published: (2025)
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
by: Xu, Peng, et al.
Published: (2024)
by: Xu, Peng, et al.
Published: (2024)
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
by: Lin, Yuqi, et al.
Published: (2025)
by: Lin, Yuqi, et al.
Published: (2025)
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
by: Chen, Runjian, et al.
Published: (2025)
by: Chen, Runjian, et al.
Published: (2025)
TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception
by: Chen, Runjian, et al.
Published: (2024)
by: Chen, Runjian, et al.
Published: (2024)
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts
by: Xie, Yuxuan, et al.
Published: (2024)
by: Xie, Yuxuan, et al.
Published: (2024)
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
by: Liu, Dongyang, et al.
Published: (2024)
by: Liu, Dongyang, et al.
Published: (2024)
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
by: Peng, Wenshuo, et al.
Published: (2024)
by: Peng, Wenshuo, et al.
Published: (2024)
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
by: Miao, Ziliang, et al.
Published: (2025)
by: Miao, Ziliang, et al.
Published: (2025)
BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD
by: Zhang, Haozhe, et al.
Published: (2026)
by: Zhang, Haozhe, et al.
Published: (2026)
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
by: Hu, Yutao, et al.
Published: (2024)
by: Hu, Yutao, et al.
Published: (2024)
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
by: Chen, Runjian, et al.
Published: (2024)
by: Chen, Runjian, et al.
Published: (2024)
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams
by: Zhou, Pengfei, et al.
Published: (2025)
by: Zhou, Pengfei, et al.
Published: (2025)
T3M: Text Guided 3D Human Motion Synthesis from Speech
by: Peng, Wenshuo, et al.
Published: (2024)
by: Peng, Wenshuo, et al.
Published: (2024)
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation
by: Zhang, Hao, et al.
Published: (2024)
by: Zhang, Hao, et al.
Published: (2024)
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
by: Yang, Yue, et al.
Published: (2024)
by: Yang, Yue, et al.
Published: (2024)
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
by: Zhao, Lirui, et al.
Published: (2024)
by: Zhao, Lirui, et al.
Published: (2024)
B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions
by: Zhang, Hao, et al.
Published: (2024)
by: Zhang, Hao, et al.
Published: (2024)
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
by: Zhao, Lirui, et al.
Published: (2024)
by: Zhao, Lirui, et al.
Published: (2024)
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
by: Zhang, Hao, et al.
Published: (2023)
by: Zhang, Hao, et al.
Published: (2023)
DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
by: Li, Fan, et al.
Published: (2025)
by: Li, Fan, et al.
Published: (2025)
Motif Counting in Complex Networks: A Comprehensive Survey
by: Yin, Haozhe, et al.
Published: (2025)
by: Yin, Haozhe, et al.
Published: (2025)
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
by: Li, Chuanhao, et al.
Published: (2024)
by: Li, Chuanhao, et al.
Published: (2024)
Quantum Visual Word Sense Disambiguation: Unraveling Ambiguities Through Quantum Inference Model
by: Qiao, Wenbo, et al.
Published: (2025)
by: Qiao, Wenbo, et al.
Published: (2025)
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
by: Li, Teng, et al.
Published: (2025)
by: Li, Teng, et al.
Published: (2025)
OneLLM: One Framework to Align All Modalities with Language
by: Han, Jiaming, et al.
Published: (2023)
by: Han, Jiaming, et al.
Published: (2023)
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models
by: Liu, Zongkai, et al.
Published: (2025)
by: Liu, Zongkai, et al.
Published: (2025)
Similar Items
-
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
by: Meng, Fanqing, et al.
Published: (2024) -
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
by: Liu, Shuo, et al.
Published: (2024) -
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
by: Meng, Fanqing, et al.
Published: (2024) -
Position: Towards Implicit Prompt For Text-To-Image Models
by: Yang, Yue, et al.
Published: (2024) -
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
by: Lu, Quanfeng, et al.
Published: (2024)