:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Hong, Lyu, Yixuan, Yu, Qian, Liu, Hanyang, Ma, Huimin, Yuan, Ding, Yang, Yifan
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.12086
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance
by: Chen, Chunyuan, et al.
Published: (2025)

Holistic Visual-Textual Sentiment Analysis with Prior Models
by: Chen, Junyu, et al.
Published: (2022)

ViTA-PAR: Visual and Textual Attribute Alignment with Attribute Prompting for Pedestrian Attribute Recognition
by: Park, Minjeong, et al.
Published: (2025)

Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling
by: Kong, Hanyang, et al.
Published: (2025)

Advancing Textual Prompt Learning with Anchored Attributes
by: Li, Zheng, et al.
Published: (2024)

Mutual Information guided Visual Contrastive Learning
by: Chen, Hanyang, et al.
Published: (2025)

CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors
by: Lyu, Linye, et al.
Published: (2024)

Toward Robust and Accurate Adversarial Camouflage Generation against Vehicle Detectors
by: Zhou, Jiawei, et al.
Published: (2024)

Knowledge Rectification for Camouflaged Object Detection: Unlocking Insights from Low-Quality Data
by: Guan, Juwei, et al.
Published: (2025)

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)

Exploring Boundary-Aware Spatial-Frequency Fusion for Camouflaged Object Detection
by: Yu, Song, et al.
Published: (2026)

Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs
by: Wu, Daiqing, et al.
Published: (2025)

Generating Attribute-Aware Human Motions from Textual Prompt
by: Wang, Xinghan, et al.
Published: (2025)

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement
by: Hu, Jiesi, et al.
Published: (2025)

EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction
by: Liu, Yifan, et al.
Published: (2024)

RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation
by: Zhou, Jiawei, et al.
Published: (2024)

Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion
by: Bian, Yuan, et al.
Published: (2025)

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
by: Jiao, Pengkun, et al.
Published: (2024)

Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation
by: Han, Kai, et al.
Published: (2025)

Visual Textualization for Image Prompted Object Detection
by: Wu, Yongjian, et al.
Published: (2025)

Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space
by: Verma, Gaurav, et al.
Published: (2024)

Green Video Camouflaged Object Detection
by: Wang, Xinyu, et al.
Published: (2025)

FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
by: Zhao, Jianwei, et al.
Published: (2024)

Probing CLIP's Comprehension of 360-Degree Textual and Visual Semantics
by: Wang, Hai, et al.
Published: (2026)

Uncertainty-Masked Bernoulli Diffusion for Camouflaged Object Detection Refinement
by: Shen, Yuqi, et al.
Published: (2025)

Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset
by: Wang, Chuhong, et al.
Published: (2025)

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
by: Luo, Ziyang, et al.
Published: (2023)

TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
by: Zhong, Linqing, et al.
Published: (2024)

Embedding Textual Information in Images Using Quinary Pixel Combinations
by: Kandala, A V Uday Kiran
Published: (2026)

Text-guided Controllable Diffusion for Realistic Camouflage Images Generation
by: Qian, Yuhang, et al.
Published: (2025)

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
by: Dong, Mingyu, et al.
Published: (2026)

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
by: Chen, Tsai-Shien, et al.
Published: (2025)

RealCustom++: Representing Images as Real Textual Word for Real-Time Customization
by: Mao, Zhendong, et al.
Published: (2024)

Bridge the Gap Between Visual and Linguistic Comprehension for Generalized Zero-shot Semantic Segmentation
by: Guo, Xiaoqing, et al.
Published: (2025)

Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering
by: Li, Qiming, et al.
Published: (2025)

Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
by: Liu, Jingming, et al.
Published: (2024)

Textualize Visual Prompt for Image Editing via Diffusion Bridge
by: Xu, Pengcheng, et al.
Published: (2025)

Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection
by: Zhang, Haichao, et al.
Published: (2023)

An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
by: Tan, Zhiyu, et al.
Published: (2024)

A Revisit to the Decoder for Camouflaged Object Detection
by: Ko, Seung Woo, et al.
Published: (2025)