:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Kaibing, Shen, Dong, Zhong, Hanwen, Zhong, Huasong, Xia, Kui, Xu, Di, Yuan, Wei, Hu, Yifei, Wen, Bin, Zhang, Tianke, Liu, Changyi, Fan, Dewen, Xiao, Huihui, Wu, Jiahong, Yang, Fan, Li, Size, Zhang, Di
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.14177
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
by: Chen, Jiankang, et al.
Published: (2025)

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
by: Zhong, Hanwen, et al.
Published: (2025)

InstructEngine: Instruction-driven Text-to-Image Alignment
by: Lu, Xingyu, et al.
Published: (2025)

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
by: Zhang, Yi-Fan, et al.
Published: (2025)

Joint Reward Modeling: Internalizing Chain-of-Thought for Efficient Visual Reward Models
by: Yang, Yankai, et al.
Published: (2026)

VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning
by: Lu, Xingyu, et al.
Published: (2026)

Thyme: Think Beyond Images
by: Zhang, Yi-Fan, et al.
Published: (2025)

EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
by: Khalid, Umar, et al.
Published: (2024)

UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation
by: Chen, Yanzhe, et al.
Published: (2025)

AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
by: Wu, Zhuguanyu, et al.
Published: (2024)

ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code
by: Xie, Jian, et al.
Published: (2025)

VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos
by: Liu, Wenqi, et al.
Published: (2026)

Kwai-STaR: Transform LLMs into State-Transition Reasoners
by: Lu, Xingyu, et al.
Published: (2024)

VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform
by: Lu, Xingyu, et al.
Published: (2025)

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
by: Hu, Xiao, et al.
Published: (2025)

ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL
by: Lu, Xingyu, et al.
Published: (2026)

TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis
by: Zhang, Liwen, et al.
Published: (2024)

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
by: Yang, Longrong, et al.
Published: (2024)

Keypoint-Integrated Instruction-Following Data Generation for Enhanced Human Pose and Action Understanding in Multimodal Models
by: Zhang, Dewen, et al.
Published: (2024)

Complete universal scaling of first-order phase transitions in the two-dimensional Ising model
by: Zhang, Yuxiang, et al.
Published: (2025)

Learning Spatial Decay for Vision Transformers
by: Mao, Yuxin, et al.
Published: (2025)

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning
by: Long, Yancheng, et al.
Published: (2026)

Fully Spiking Neural Networks for Unified Frame-Event Object Tracking
by: Yang, Jingjun, et al.
Published: (2025)

Physics-Informed Visual MARFE Prediction on the HL-3 Tokamak
by: Dong, Qianyun, et al.
Published: (2025)

Nucleation and growth manifest universal scaling, surely
by: Zhong, Fan
Published: (2024)

Complete universal scaling in first-order phase transitions
by: Zhong, Fan
Published: (2024)

Is there Kibble-Zurek scaling of topological defects in first-order phase transitions?
by: Zhong, Fan
Published: (2025)

iMOVE: Instance-Motion-Aware Video Understanding
by: Li, Jiaze, et al.
Published: (2025)

VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration
by: Li, Ben, et al.
Published: (2025)

CacheFL: Privacy-Preserving and Efficient Federated Cache Model Fine-Tuning for Vision-Language Models
by: Yi, Mengjun, et al.
Published: (2025)

Corporate ESG Washing and ESG Rating Divergence: Evidence From China
by: Hanwen Chen, et al.
Published: (2025)

Recursive Visual Imagination and Adaptive Linguistic Grounding for Vision Language Navigation
by: Chen, Bolei, et al.
Published: (2025)

Self‐Adaptive Dielectrics with Tunable Nonlinear Electrical Conductivity via Virus‐Like Structures Composed of Metal Particles
by: Daoming Zhang, et al.
Published: (2025)

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)

Understanding tectonics from fluvial topography by using the stochastic‐threshold incision model: Theory and application to the Dadu River basin, eastern Tibetan Plateau
by: Yizhou Wang, et al.
Published: (2024)

APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval
by: Gao, Hong, et al.
Published: (2025)

LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning
by: Zhang, Dewen, et al.
Published: (2025)

AutoAssert 1: A LoRA Fine-Tuned LLM Model for Efficient Automated Assertion Generation
by: Zhong, Yi, et al.
Published: (2025)

VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging
by: Zhong, Ming, et al.
Published: (2025)

Quasibound and quasinormal modes of a thick brane in Rastall gravity
by: Tan, Qin, et al.
Published: (2024)