Saved in:
| Main Authors: | Li, Long, Liu, Nian, Zhang, Dingwen, Li, Zhongyu, Khan, Salman, Anwer, Rao, Cholakkal, Hisham, Han, Junwei, Khan, Fahad Shahbaz |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.01021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
by: Luo, Ziyang, et al.
Published: (2025)
by: Luo, Ziyang, et al.
Published: (2025)
AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock
by: Nawaz, Umair, et al.
Published: (2025)
by: Nawaz, Umair, et al.
Published: (2025)
DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models
by: Kumar, Komal, et al.
Published: (2025)
by: Kumar, Komal, et al.
Published: (2025)
Semi-supervised Open-World Object Detection
by: Mullappilly, Sahal Shaji, et al.
Published: (2024)
by: Mullappilly, Sahal Shaji, et al.
Published: (2024)
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
by: Sheikh, Tooba Tehreem, et al.
Published: (2025)
by: Sheikh, Tooba Tehreem, et al.
Published: (2025)
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
by: Ishaq, Ayesha, et al.
Published: (2025)
by: Ishaq, Ayesha, et al.
Published: (2025)
Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking
by: Ishaq, Ayesha, et al.
Published: (2024)
by: Ishaq, Ayesha, et al.
Published: (2024)
CDChat: A Large Multimodal Model for Remote Sensing Change Description
by: Noman, Mubashir, et al.
Published: (2024)
by: Noman, Mubashir, et al.
Published: (2024)
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection
by: Noman, Mubashir, et al.
Published: (2024)
by: Noman, Mubashir, et al.
Published: (2024)
AIN: The Arabic INclusive Large Multimodal Model
by: Heakl, Ahmed, et al.
Published: (2025)
by: Heakl, Ahmed, et al.
Published: (2025)
Salient Mask-Guided Vision Transformer for Fine-Grained Classification
by: Demidov, Dmitry, et al.
Published: (2023)
by: Demidov, Dmitry, et al.
Published: (2023)
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
by: Boudjoghra, Mohamed El Amine, et al.
Published: (2024)
by: Boudjoghra, Mohamed El Amine, et al.
Published: (2024)
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
by: Luo, Ziyang, et al.
Published: (2023)
by: Luo, Ziyang, et al.
Published: (2023)
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning
by: Deria, Ankan, et al.
Published: (2026)
by: Deria, Ankan, et al.
Published: (2026)
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
by: Noman, Mubashir, et al.
Published: (2024)
by: Noman, Mubashir, et al.
Published: (2024)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models
by: Thawakar, Omkar, et al.
Published: (2023)
by: Thawakar, Omkar, et al.
Published: (2023)
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
by: Ghaboura, Sara, et al.
Published: (2025)
by: Ghaboura, Sara, et al.
Published: (2025)
MediX-R1: Open Ended Medical Reinforcement Learning
by: Mullappilly, Sahal Shaji, et al.
Published: (2026)
by: Mullappilly, Sahal Shaji, et al.
Published: (2026)
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
by: Thawakar, Omkar, et al.
Published: (2025)
by: Thawakar, Omkar, et al.
Published: (2025)
Multi-modal Generation via Cross-Modal In-Context Learning
by: Kumar, Amandeep, et al.
Published: (2024)
by: Kumar, Amandeep, et al.
Published: (2024)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
by: Kumar, Komal, et al.
Published: (2025)
by: Kumar, Komal, et al.
Published: (2025)
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
by: Kumar, Amandeep, et al.
Published: (2024)
by: Kumar, Amandeep, et al.
Published: (2024)
BiMediX: Bilingual Medical Mixture of Experts LLM
by: Pieri, Sara, et al.
Published: (2024)
by: Pieri, Sara, et al.
Published: (2024)
AURORA:Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation
by: Luo, Ziyang, et al.
Published: (2025)
by: Luo, Ziyang, et al.
Published: (2025)
MAviS: A Multimodal Conversational Assistant For Avian Species
by: Kryklyvets, Yevheniia, et al.
Published: (2026)
by: Kryklyvets, Yevheniia, et al.
Published: (2026)
CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation
by: Thengane, Vishal, et al.
Published: (2025)
by: Thengane, Vishal, et al.
Published: (2025)
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
by: Kumar, Komal, et al.
Published: (2026)
by: Kumar, Komal, et al.
Published: (2026)
How Good are Foundation Models in Step-by-Step Embodied Reasoning?
by: Dissanayake, Dinura, et al.
Published: (2025)
by: Dissanayake, Dinura, et al.
Published: (2025)
Enhancing Novel Object Detection via Cooperative Foundational Models
by: Bharadwaj, Rohit, et al.
Published: (2023)
by: Bharadwaj, Rohit, et al.
Published: (2023)
Composed Object Retrieval: Object-level Retrieval via Composed Expressions
by: Wang, Tong, et al.
Published: (2025)
by: Wang, Tong, et al.
Published: (2025)
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
by: Mullappilly, Sahal Shaji, et al.
Published: (2024)
by: Mullappilly, Sahal Shaji, et al.
Published: (2024)
BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning
by: Hanif, Asif, et al.
Published: (2024)
by: Hanif, Asif, et al.
Published: (2024)
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
by: Ashraf, Tajamul, et al.
Published: (2025)
by: Ashraf, Tajamul, et al.
Published: (2025)
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
by: Ishaq, Ayesha, et al.
Published: (2025)
by: Ishaq, Ayesha, et al.
Published: (2025)
GLaMM: Pixel Grounding Large Multimodal Model
by: Rasheed, Hanoona, et al.
Published: (2023)
by: Rasheed, Hanoona, et al.
Published: (2023)
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
by: Thawakar, Omkar, et al.
Published: (2024)
by: Thawakar, Omkar, et al.
Published: (2024)
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
by: Thawakar, Omkar, et al.
Published: (2025)
by: Thawakar, Omkar, et al.
Published: (2025)
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
by: Ghaboura, Sara, et al.
Published: (2025)
by: Ghaboura, Sara, et al.
Published: (2025)
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
by: Ashraf, Tajamul, et al.
Published: (2025)
by: Ashraf, Tajamul, et al.
Published: (2025)
Modulate Your Spectrum in Self-Supervised Learning
by: Weng, Xi, et al.
Published: (2023)
by: Weng, Xi, et al.
Published: (2023)
Similar Items
-
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
by: Luo, Ziyang, et al.
Published: (2025) -
AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock
by: Nawaz, Umair, et al.
Published: (2025) -
DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models
by: Kumar, Komal, et al.
Published: (2025) -
Semi-supervised Open-World Object Detection
by: Mullappilly, Sahal Shaji, et al.
Published: (2024) -
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
by: Sheikh, Tooba Tehreem, et al.
Published: (2025)