:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Zizhao, Zhou, Xiaolin, Rostami, Mohammad
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.07049
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
by: Hu, Zizhao, et al.
Published: (2024)

An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models
by: Hu, Zizhao, et al.
Published: (2024)

Unsupervised Domain Adaptation Using Compact Internal Representations
by: Rostami, Mohammad
Published: (2024)

A New Class Biorthogonal Spline Wavelet for Image Edge Detection
by: Zhou, Dujuan, et al.
Published: (2024)

Continuous Unsupervised Domain Adaptation Using Stabilized Representations and Experience Replay
by: Rostami, Mohammad
Published: (2024)

Online Continual Domain Adaptation for Semantic Image Segmentation Using Internal Representations
by: Stan, Serban, et al.
Published: (2024)

Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks
by: Cai, Yuliang, et al.
Published: (2024)

Cross-Domain Distribution Alignment for Segmentation of Private Unannotated 3D Medical Images
by: Sun, Ruitong, et al.
Published: (2024)

Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation
by: Rostami, Mohammad, et al.
Published: (2024)

Cross-domain Multi-modal Few-shot Object Detection via Rich Text
by: Shangguan, Zeyu, et al.
Published: (2024)

CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering
by: Cai, Yuliang, et al.
Published: (2024)

DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models
by: Zhou, Xirui, et al.
Published: (2025)

Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
by: Li, Kai, et al.
Published: (2025)

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
by: Li, Zizhao, et al.
Published: (2024)

Attention Is not Everything: Efficient Alternatives for Vision
by: Kazi, Nur Mohammad, et al.
Published: (2026)

TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP
by: Cai, Yuliang, et al.
Published: (2025)

Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment
by: Shangguan, Zeyu, et al.
Published: (2025)

SRMambaV2: Biomimetic Attention for Sparse Point Cloud Upsampling in Autonomous Driving
by: Chen, Chuang, et al.
Published: (2025)

HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework
by: Wei, Shuobin, et al.
Published: (2025)

Receptive Field Expanded Look-Up Tables for Vision Inference: Advancing from Low-level to High-level Tasks
by: Zhang, Xi, et al.
Published: (2025)

HEART-VIT: Hessian-Guided Efficient Dynamic Attention and Token Pruning in Vision Transformer
by: Uddin, Mohammad Helal, et al.
Published: (2025)

Unsupervised Federated Domain Adaptation for Segmentation of MRI Images
by: Nananukul, Navapat, et al.
Published: (2024)

Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data
by: Faysal, Atik, et al.
Published: (2026)

A Vision-Centric Approach for Static Map Element Annotation
by: Zhang, Jiaxin, et al.
Published: (2023)

MiVE: Multiscale Vision-language features for reference-guided video Editing
by: Wang, Tong, et al.
Published: (2026)

Out-of-distribution detection in 3D applications: a review
by: Li, Zizhao, et al.
Published: (2025)

ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction
by: Yu, Jingyi, et al.
Published: (2023)

SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion
by: Xiang, Zhengkang, et al.
Published: (2025)

Curvature Diversity-Driven Deformation and Domain Alignment for Point Cloud
by: Wu, Mengxi, et al.
Published: (2024)

FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation
by: Li, Zhenghua, et al.
Published: (2025)

Unsupervised Monocular Road Segmentation for Autonomous Driving via Scene Geometry
by: Rostami, Sara Hatami, et al.
Published: (2025)

Representative Attention For Vision Transformers
by: Li, Yuntong, et al.
Published: (2026)

Vision Transformers with Hierarchical Attention
by: Liu, Yun, et al.
Published: (2021)

CAMAv2: A Vision-Centric Approach for Static Map Element Annotation
by: Chen, Shiyuan, et al.
Published: (2024)

Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data
by: Chen, Yin, et al.
Published: (2024)

Pay Attention to the Keys: Visual Piano Transcription Using Transformers
by: Zivanovic, Uros, et al.
Published: (2024)

Beyond Static Frames: Temporal Aggregate-and-Restore Vision Transformer for Human Pose Estimation
by: Fang, Hongwei, et al.
Published: (2026)

Learning Weakly Supervised Audio-Visual Violence Detection in Hyperbolic Space
by: Peng, Xiaogang, et al.
Published: (2023)

Structured Initialization for Attention in Vision Transformers
by: Zheng, Jianqiao, et al.
Published: (2024)

Vision Transformers are Circulant Attention Learners
by: Han, Dongchen, et al.
Published: (2025)