:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liao, Ning, Zhang, Shaofeng, Xia, Renqiu, Cao, Min, Qiao, Yu, Yan, Junchi
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2310.06594
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios
by: Liao, Ning, et al.
Published: (2023)

PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders
by: Zhang, Xiangdong, et al.
Published: (2024)

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
by: Zhang, Xiangdong, et al.
Published: (2025)

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space
by: Li, Yan, et al.
Published: (2025)

Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
by: Li, Yan, et al.
Published: (2024)

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding
by: Xia, Renqiu, et al.
Published: (2023)

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
by: Xia, Renqiu, et al.
Published: (2024)

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
by: Ye, Hancheng, et al.
Published: (2024)

DriveVGGT: Calibration-Constrained Visual Geometry Transformers for Multi-Camera Autonomous Driving
by: Jia, Xiaosong, et al.
Published: (2025)

EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation
by: Li, Yan, et al.
Published: (2026)

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
by: Zhang, Bo, et al.
Published: (2023)

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
by: Zhang, Xiangdong, et al.
Published: (2025)

GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
by: Feng, Yuan, et al.
Published: (2025)

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
by: Xia, Renqiu, et al.
Published: (2024)

Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds
by: Zhang, Shaofeng, et al.
Published: (2025)

TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations
by: Yan, Xiangchao, et al.
Published: (2023)

Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank
by: Zhang, Shaofeng, et al.
Published: (2025)

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
by: Ye, Hancheng, et al.
Published: (2024)

AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network
by: Hu, Yu, et al.
Published: (2026)

MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models
by: Yan, Qiao, et al.
Published: (2025)

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
by: Cao, Meng, et al.
Published: (2024)

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
by: Liu, Yangzhou, et al.
Published: (2024)

Can Vision Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective
by: An, Arctanx, et al.
Published: (2026)

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach
by: Liao, Ning, et al.
Published: (2026)

Resolving Representation Ambiguity in Feedforward Novel View Synthesis Transformer via Semantic-Spatial Decoupling
by: Wu, Yihang, et al.
Published: (2026)

Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
by: Zhang, Jinrui, et al.
Published: (2024)

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
by: Zhang, Beichen, et al.
Published: (2024)

Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following
by: Kang, Myeongkyun, et al.
Published: (2026)

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
by: Zhang, Shaofeng, et al.
Published: (2024)

VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
by: Luo, Run, et al.
Published: (2025)

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
by: Zhao, Xiangyu, et al.
Published: (2025)

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
by: Gou, Yunhao, et al.
Published: (2023)

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
by: Xia, Renqiu, et al.
Published: (2024)

Streaming Video Instruction Tuning
by: Xia, Jiaer, et al.
Published: (2025)

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
by: Wang, Bin, et al.
Published: (2024)

PointAlign: Feature-Level Alignment Regularization for 3D Vision-Language Models
by: Su, Yuanhao, et al.
Published: (2026)

B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions
by: Zhang, Hao, et al.
Published: (2024)

AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors
by: Chang, You-Ming, et al.
Published: (2023)