:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Yang, Wang, Wenhai, Chen, Zhe, Dai, Jifeng, Zheng, Liang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.13803
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
by: Tian, Changyao, et al.
Published: (2024)

CoMemo: LVLMs Need Image Context with Image Memory
by: Liu, Shi, et al.
Published: (2025)

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
by: Yang, Chenyu, et al.
Published: (2024)

Aligning Object Detector Bounding Boxes with Human Preference
by: Strafforello, Ombretta, et al.
Published: (2024)

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
by: Wang, Weiyun, et al.
Published: (2024)

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
by: Duan, Yuchen, et al.
Published: (2024)

Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors
by: Phuc, Pham, et al.
Published: (2024)

Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers
by: Nokabadi, Fatemeh Nourilenjan, et al.
Published: (2024)

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
by: Xu, Weiye, et al.
Published: (2025)

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
by: Tao, Chenxin, et al.
Published: (2024)

Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors
by: Lin, Tao, et al.
Published: (2024)

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
by: Wang, Weiyun, et al.
Published: (2024)

GenExam: A Multidisciplinary Text-to-Image Exam
by: Wang, Zhaokai, et al.
Published: (2025)

MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
by: Yang, Linyan, et al.
Published: (2024)

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
by: Cao, Yue, et al.
Published: (2024)

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
by: Liu, Yangzhou, et al.
Published: (2024)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
by: Chen, Zhe, et al.
Published: (2023)

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
by: Ren, Yiming, et al.
Published: (2025)

Docopilot: Improving Multimodal Models for Document-Level Understanding
by: Duan, Yuchen, et al.
Published: (2025)

Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
by: Xu, Hang, et al.
Published: (2025)

Significance and Stability Analysis of Gene-Environment Interaction using RGxEStat
by: Qin, Meng'en, et al.
Published: (2026)

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
by: Wu, Jiannan, et al.
Published: (2024)

Object Detectors in the Open Environment: Challenges, Solutions, and Outlook
by: Liang, Siyuan, et al.
Published: (2024)

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
by: Xing, Zhenghao, et al.
Published: (2025)

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
by: Yang, Chenyu, et al.
Published: (2024)

FSSD: Feature Fusion Single Shot Multibox Detector
by: Li, Zuoxin, et al.
Published: (2017)

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
by: Wei, Xingguang, et al.
Published: (2025)

SFFR: Spatial-Frequency Feature Reconstruction for Multispectral Aerial Object Detection
by: Zuo, Xin, et al.
Published: (2025)

Beyond Dropout: Robust Convolutional Neural Networks Based on Local Feature Masking
by: Gong, Yunpeng, et al.
Published: (2024)

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector
by: Zhu, Weiheng, et al.
Published: (2025)

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models
by: Luo, Gen, et al.
Published: (2025)

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
by: Cui, Erfei, et al.
Published: (2023)

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
by: Gao, Zhangwei, et al.
Published: (2024)

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
by: Xiao, Zi-Kai, et al.
Published: (2024)

MGPC: Multimodal Network for Generalizable Point Cloud Completion With Modality Dropout and Progressive Decoding
by: Liu, Jiangyuan, et al.
Published: (2026)

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
by: Wang, Weiyun, et al.
Published: (2025)

OpenBox: Annotate Any Bounding Boxes in 3D
by: Lee, In-Jae, et al.
Published: (2025)

BoxSplitGen: A Generative Model for 3D Part Bounding Boxes in Varying Granularity
by: Koo, Juil, et al.
Published: (2026)

Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors
by: Koda, Satoru, et al.
Published: (2024)

Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration
by: Wu, Chen, et al.
Published: (2024)