Saved in:
| Main Authors: | Kim, Taeheon, Chung, Sangyun, Yu, Youngjoon, Ro, Yong Man |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.17995 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
by: Kim, Taeheon, et al.
Published: (2024)
by: Kim, Taeheon, et al.
Published: (2024)
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
by: Kim, Taeheon, et al.
Published: (2024)
by: Kim, Taeheon, et al.
Published: (2024)
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
by: Yu, Youngjoon, et al.
Published: (2024)
by: Yu, Youngjoon, et al.
Published: (2024)
Enhanced Vision-Language Models for Diverse Sensor Understanding: Cost-Efficient Optimization and Benchmarking
by: Chung, Sangyun, et al.
Published: (2024)
by: Chung, Sangyun, et al.
Published: (2024)
Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
by: Park, Sungjune, et al.
Published: (2024)
by: Park, Sungjune, et al.
Published: (2024)
Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection
by: Park, Sungjune, et al.
Published: (2023)
by: Park, Sungjune, et al.
Published: (2023)
Phantom of Latent for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
TroL: Traversal of Layers for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
Strip-Fusion: Spatiotemporal Fusion for Multispectral Pedestrian Detection
by: Kanu-Asiegbu, Asiegbu Miracle, et al.
Published: (2026)
by: Kanu-Asiegbu, Asiegbu Miracle, et al.
Published: (2026)
Multispectral Pedestrian Detection with Sparsely Annotated Label
by: Lee, Chan, et al.
Published: (2025)
by: Lee, Chan, et al.
Published: (2025)
AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection
by: Chen, Zizhao, et al.
Published: (2024)
by: Chen, Zizhao, et al.
Published: (2024)
GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory
by: Yeo, Jeong Hun, et al.
Published: (2025)
by: Yeo, Jeong Hun, et al.
Published: (2025)
Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling
by: Park, Sungjune, et al.
Published: (2025)
by: Park, Sungjune, et al.
Published: (2025)
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models
by: Kim, Junho, et al.
Published: (2024)
by: Kim, Junho, et al.
Published: (2024)
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
by: Kim, Junho, et al.
Published: (2024)
by: Kim, Junho, et al.
Published: (2024)
Language-guided Learning for Object Detection Tackling Multiple Variations in Aerial Images
by: Park, Sungjune, et al.
Published: (2025)
by: Park, Sungjune, et al.
Published: (2025)
MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization
by: Xing, Yinghui, et al.
Published: (2023)
by: Xing, Yinghui, et al.
Published: (2023)
Deep Understanding of Sign Language for Sign to Subtitle Alignment
by: Jang, Youngjoon, et al.
Published: (2025)
by: Jang, Youngjoon, et al.
Published: (2025)
WCCNet: Wavelet-context Cooperative Network for Efficient Multispectral Pedestrian Detection
by: Wang, Xingjian, et al.
Published: (2023)
by: Wang, Xingjian, et al.
Published: (2023)
Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment
by: Jang, Youngjoon, et al.
Published: (2025)
by: Jang, Youngjoon, et al.
Published: (2025)
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
CoLLaVO: Crayon Large Language and Vision mOdel
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
MoAI: Mixture of All Intelligence for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
Cross-modal Offset-guided Dynamic Alignment and Fusion for Weakly Aligned UAV Object Detection
by: Zongzhen, Liu, et al.
Published: (2025)
by: Zongzhen, Liu, et al.
Published: (2025)
Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning
by: Park, Sungjune, et al.
Published: (2026)
by: Park, Sungjune, et al.
Published: (2026)
TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection
by: Zhang, Xue, et al.
Published: (2023)
by: Zhang, Xue, et al.
Published: (2023)
CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion
by: Hsu, Chih-Chung, et al.
Published: (2024)
by: Hsu, Chih-Chung, et al.
Published: (2024)
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes
by: Park, Sungjune, et al.
Published: (2025)
by: Park, Sungjune, et al.
Published: (2025)
Multispectral Detection Transformer with Infrared-Centric Feature Fusion
by: Hwang, Seongmin, et al.
Published: (2025)
by: Hwang, Seongmin, et al.
Published: (2025)
Multispectral State-Space Feature Fusion: Bridging Shared and Cross-Parametric Interactions for Object Detection
by: Shen, Jifeng, et al.
Published: (2025)
by: Shen, Jifeng, et al.
Published: (2025)
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
by: Jung, Chaeyoung, et al.
Published: (2025)
by: Jung, Chaeyoung, et al.
Published: (2025)
Empathetic Response in Audio-Visual Conversations Using Emotion Preference Optimization and MambaCompressor
by: Kim, Yeonju, et al.
Published: (2024)
by: Kim, Yeonju, et al.
Published: (2024)
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
by: Kim, Junho, et al.
Published: (2024)
by: Kim, Junho, et al.
Published: (2024)
ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding
by: Lee, Hosu, et al.
Published: (2025)
by: Lee, Hosu, et al.
Published: (2025)
Pedestrian Crossing Intent Prediction via Psychological Features and Transformer Fusion
by: Ashayer, Sima, et al.
Published: (2026)
by: Ashayer, Sima, et al.
Published: (2026)
Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection
by: Zhang, Xue, et al.
Published: (2024)
by: Zhang, Xue, et al.
Published: (2024)
Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues
by: Park, Beomchan, et al.
Published: (2026)
by: Park, Beomchan, et al.
Published: (2026)
Cross-modal Full-mode Fine-grained Alignment for Text-to-Image Person Retrieval
by: Yin, Hao, et al.
Published: (2025)
by: Yin, Hao, et al.
Published: (2025)
Fusion-Mamba for Cross-modality Object Detection
by: Dong, Wenhao, et al.
Published: (2024)
by: Dong, Wenhao, et al.
Published: (2024)
Fourier-enhanced Implicit Neural Fusion Network for Multispectral and Hyperspectral Image Fusion
by: Liang, Yu-Jie, et al.
Published: (2024)
by: Liang, Yu-Jie, et al.
Published: (2024)
Similar Items
-
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
by: Kim, Taeheon, et al.
Published: (2024) -
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
by: Kim, Taeheon, et al.
Published: (2024) -
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
by: Yu, Youngjoon, et al.
Published: (2024) -
Enhanced Vision-Language Models for Diverse Sensor Understanding: Cost-Efficient Optimization and Benchmarking
by: Chung, Sangyun, et al.
Published: (2024) -
Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
by: Park, Sungjune, et al.
Published: (2024)