Saved in:
| Main Authors: | Chen, Kaibing, Shen, Dong, Zhong, Hanwen, Zhong, Huasong, Xia, Kui, Xu, Di, Yuan, Wei, Hu, Yifei, Wen, Bin, Zhang, Tianke, Liu, Changyi, Fan, Dewen, Xiao, Huihui, Wu, Jiahong, Yang, Fan, Li, Size, Zhang, Di |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.14177 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
by: Chen, Jiankang, et al.
Published: (2025)
by: Chen, Jiankang, et al.
Published: (2025)
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
by: Zhong, Hanwen, et al.
Published: (2025)
by: Zhong, Hanwen, et al.
Published: (2025)
InstructEngine: Instruction-driven Text-to-Image Alignment
by: Lu, Xingyu, et al.
Published: (2025)
by: Lu, Xingyu, et al.
Published: (2025)
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
by: Zhang, Yi-Fan, et al.
Published: (2025)
by: Zhang, Yi-Fan, et al.
Published: (2025)
Joint Reward Modeling: Internalizing Chain-of-Thought for Efficient Visual Reward Models
by: Yang, Yankai, et al.
Published: (2026)
by: Yang, Yankai, et al.
Published: (2026)
VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning
by: Lu, Xingyu, et al.
Published: (2026)
by: Lu, Xingyu, et al.
Published: (2026)
Thyme: Think Beyond Images
by: Zhang, Yi-Fan, et al.
Published: (2025)
by: Zhang, Yi-Fan, et al.
Published: (2025)
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
by: Khalid, Umar, et al.
Published: (2024)
by: Khalid, Umar, et al.
Published: (2024)
UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation
by: Chen, Yanzhe, et al.
Published: (2025)
by: Chen, Yanzhe, et al.
Published: (2025)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
by: Wu, Zhuguanyu, et al.
Published: (2024)
by: Wu, Zhuguanyu, et al.
Published: (2024)
ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code
by: Xie, Jian, et al.
Published: (2025)
by: Xie, Jian, et al.
Published: (2025)
VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos
by: Liu, Wenqi, et al.
Published: (2026)
by: Liu, Wenqi, et al.
Published: (2026)
Kwai-STaR: Transform LLMs into State-Transition Reasoners
by: Lu, Xingyu, et al.
Published: (2024)
by: Lu, Xingyu, et al.
Published: (2024)
VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform
by: Lu, Xingyu, et al.
Published: (2025)
by: Lu, Xingyu, et al.
Published: (2025)
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
by: Hu, Xiao, et al.
Published: (2025)
by: Hu, Xiao, et al.
Published: (2025)
ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL
by: Lu, Xingyu, et al.
Published: (2026)
by: Lu, Xingyu, et al.
Published: (2026)
TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis
by: Zhang, Liwen, et al.
Published: (2024)
by: Zhang, Liwen, et al.
Published: (2024)
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
by: Yang, Longrong, et al.
Published: (2024)
by: Yang, Longrong, et al.
Published: (2024)
Keypoint-Integrated Instruction-Following Data Generation for Enhanced Human Pose and Action Understanding in Multimodal Models
by: Zhang, Dewen, et al.
Published: (2024)
by: Zhang, Dewen, et al.
Published: (2024)
Complete universal scaling of first-order phase transitions in the two-dimensional Ising model
by: Zhang, Yuxiang, et al.
Published: (2025)
by: Zhang, Yuxiang, et al.
Published: (2025)
Learning Spatial Decay for Vision Transformers
by: Mao, Yuxin, et al.
Published: (2025)
by: Mao, Yuxin, et al.
Published: (2025)
SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning
by: Long, Yancheng, et al.
Published: (2026)
by: Long, Yancheng, et al.
Published: (2026)
Fully Spiking Neural Networks for Unified Frame-Event Object Tracking
by: Yang, Jingjun, et al.
Published: (2025)
by: Yang, Jingjun, et al.
Published: (2025)
Physics-Informed Visual MARFE Prediction on the HL-3 Tokamak
by: Dong, Qianyun, et al.
Published: (2025)
by: Dong, Qianyun, et al.
Published: (2025)
Nucleation and growth manifest universal scaling, surely
by: Zhong, Fan
Published: (2024)
by: Zhong, Fan
Published: (2024)
Complete universal scaling in first-order phase transitions
by: Zhong, Fan
Published: (2024)
by: Zhong, Fan
Published: (2024)
Is there Kibble-Zurek scaling of topological defects in first-order phase transitions?
by: Zhong, Fan
Published: (2025)
by: Zhong, Fan
Published: (2025)
iMOVE: Instance-Motion-Aware Video Understanding
by: Li, Jiaze, et al.
Published: (2025)
by: Li, Jiaze, et al.
Published: (2025)
VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration
by: Li, Ben, et al.
Published: (2025)
by: Li, Ben, et al.
Published: (2025)
CacheFL: Privacy-Preserving and Efficient Federated Cache Model Fine-Tuning for Vision-Language Models
by: Yi, Mengjun, et al.
Published: (2025)
by: Yi, Mengjun, et al.
Published: (2025)
Corporate ESG Washing and ESG Rating Divergence: Evidence From China
by: Hanwen Chen, et al.
Published: (2025)
by: Hanwen Chen, et al.
Published: (2025)
Recursive Visual Imagination and Adaptive Linguistic Grounding for Vision Language Navigation
by: Chen, Bolei, et al.
Published: (2025)
by: Chen, Bolei, et al.
Published: (2025)
Self‐Adaptive Dielectrics with Tunable Nonlinear Electrical Conductivity via Virus‐Like Structures Composed of Metal Particles
by: Daoming Zhang, et al.
Published: (2025)
by: Daoming Zhang, et al.
Published: (2025)
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)
by: Wu, Size, et al.
Published: (2025)
Understanding tectonics from fluvial topography by using the stochastic‐threshold incision model: Theory and application to the Dadu River basin, eastern Tibetan Plateau
by: Yizhou Wang, et al.
Published: (2024)
by: Yizhou Wang, et al.
Published: (2024)
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval
by: Gao, Hong, et al.
Published: (2025)
by: Gao, Hong, et al.
Published: (2025)
LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning
by: Zhang, Dewen, et al.
Published: (2025)
by: Zhang, Dewen, et al.
Published: (2025)
AutoAssert 1: A LoRA Fine-Tuned LLM Model for Efficient Automated Assertion Generation
by: Zhong, Yi, et al.
Published: (2025)
by: Zhong, Yi, et al.
Published: (2025)
VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging
by: Zhong, Ming, et al.
Published: (2025)
by: Zhong, Ming, et al.
Published: (2025)
Quasibound and quasinormal modes of a thick brane in Rastall gravity
by: Tan, Qin, et al.
Published: (2024)
by: Tan, Qin, et al.
Published: (2024)
Similar Items
-
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
by: Chen, Jiankang, et al.
Published: (2025) -
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
by: Zhong, Hanwen, et al.
Published: (2025) -
InstructEngine: Instruction-driven Text-to-Image Alignment
by: Lu, Xingyu, et al.
Published: (2025) -
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
by: Zhang, Yi-Fan, et al.
Published: (2025) -
Joint Reward Modeling: Internalizing Chain-of-Thought for Efficient Visual Reward Models
by: Yang, Yankai, et al.
Published: (2026)