:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Xian-Hong, Su, Hui-Kai, Sun, Chi-Chia, Hsieh, Jun-Wei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.05474
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RepSFNet : A Single Fusion Network with Structural Reparameterization for Crowd Counting
by: Achmadiah, Mas Nurul, et al.
Published: (2026)

Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving
by: Setyawan, Novendra, et al.
Published: (2025)

TinyFormer: Preserving Tiny Objects in YOLO-DETR Hybrid Real-time Detectors
by: Hsieh, Jun-Wei, et al.
Published: (2026)

RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images
by: Jiang, Xiaozheng, et al.
Published: (2025)

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)

COMO: Cross-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection
by: Liu, Chang, et al.
Published: (2024)

AUV-Fusion: Cross-Modal Adversarial Fusion of User Interactions and Visual Perturbations Against VARS
by: Ling, Hai, et al.
Published: (2025)

COXNet: Cross-Layer Fusion with Adaptive Alignment and Scale Integration for RGBT Tiny Object Detection
by: Peng, Peiran, et al.
Published: (2025)

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
by: Huang, Yi-Xin, et al.
Published: (2024)

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection
by: Liu, Hou-I, et al.
Published: (2024)

SONAR: Semantic-Object Navigation with Aggregated Reasoning through a Cross-Modal Inference Paradigm
by: Wang, Yao, et al.
Published: (2025)

InterFusion: Text-Driven Generation of 3D Human-Object Interaction
by: Dai, Sisi, et al.
Published: (2024)

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
by: Dai, Yusheng, et al.
Published: (2023)

Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent
by: He, Linfeng, et al.
Published: (2024)

Interacting Null Sources in Different Geometries
by: Hsieh, Chia-Li
Published: (2024)

MicroViTv2: Beyond the FLOPS for Edge Energy-Friendly Vision Transformers
by: Setyawan, Novendra, et al.
Published: (2026)

FaceLiVTv2: An Improved Hybrid Architecture for Efficient Mobile Face Recognition
by: Setyawan, Novendra, et al.
Published: (2026)

FaceLiVT: Face Recognition using Linear Vision Transformer with Structural Reparameterization For Mobile Device
by: Setyawan, Novendra, et al.
Published: (2025)

MicroViT: A Vision Transformer with Low Complexity Self Attention for Edge Device
by: Setyawan, Novendra, et al.
Published: (2025)

Contrast-Guided Cross-Modal Distillation for Thermal Object Detection
by: Kim, SiWoo, et al.
Published: (2025)

Energy-Efficient Fast Object Detection on Edge Devices for IoT Systems
by: Achmadiah, Mas Nurul, et al.
Published: (2026)

Dual-Domain Homogeneous Fusion with Cross-Modal Mamba and Progressive Decoder for 3D Object Detection
by: Hu, Xuzhong, et al.
Published: (2025)

Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection
by: Ranasinghe, Yasiru, et al.
Published: (2026)

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection
by: Gu, Zijian, et al.
Published: (2024)

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
by: Fan, Lei, et al.
Published: (2024)

RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
by: Bang, Geonho, et al.
Published: (2025)

Visual Decision‐Making in Early Childhood Nutrition: Taiwanese Parents′ Infant Formula Choices via Eye‐Tracking and Hierarchical Decision Modeling
by: Chia-Yen Hsieh
Published: (2026)

High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
by: Xue, Jialong, et al.
Published: (2025)

Seg the HAB: Language-Guided Geospatial Algae Bloom Reasoning and Segmentation
by: Hsieh, Patterson, et al.
Published: (2025)

STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification
by: Xu, Xingguo, et al.
Published: (2026)

Cross-Modal Bottleneck Fusion For Noise Robust Audio-Visual Speech Recognition
by: Ok, Seaone, et al.
Published: (2026)

Similarity Distance-Based Label Assignment for Tiny Object Detection
by: Shi, Shuohao, et al.
Published: (2024)

Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery
by: Zhao, Zhicheng, et al.
Published: (2025)

UFO-DETR: Frequency-Guided End-to-End Detector for UAV Tiny Objects
by: Chen, Yuankai, et al.
Published: (2026)

ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding
by: Setyawan, Novendra, et al.
Published: (2024)

Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection
by: Cui, Jiaming, et al.
Published: (2026)

HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection
by: Song, Harris, et al.
Published: (2025)

Cross-modal Offset-guided Dynamic Alignment and Fusion for Weakly Aligned UAV Object Detection
by: Zongzhen, Liu, et al.
Published: (2025)

VIFO: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion
by: Wang, Yanlong, et al.
Published: (2025)

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion
by: Liu, Jiangyuan, et al.
Published: (2025)