Saved in:
| Main Authors: | Khan, Omer Jauhar, Khan, Sudais, Anwar, Hafeez, Khan, Shahzeb, Arifeen, Shams Ul, Ullah, Farman |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.23117 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
Using Deep Learning to Generate Semantically Correct Hindi Captions
by: Khan, Wasim Akram, et al.
Published: (2026)
by: Khan, Wasim Akram, et al.
Published: (2026)
TAG-Head: Time-Aligned Graph Head for Plug-and-Play Fine-grained Action Recognition
by: Hassan, Imtiaz Ul, et al.
Published: (2026)
by: Hassan, Imtiaz Ul, et al.
Published: (2026)
Botany Meets Robotics in Alpine Scree Monitoring
by: De Benedittis, Davide, et al.
Published: (2025)
by: De Benedittis, Davide, et al.
Published: (2025)
Have We Mastered Scale in Deep Monocular Visual SLAM? The ScaleMaster Dataset and Benchmark
by: Ju, Hyoseok, et al.
Published: (2026)
by: Ju, Hyoseok, et al.
Published: (2026)
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)
by: Chindemi, Giuseppe, et al.
Published: (2025)
From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
by: Gonzalez, Leonardo
Published: (2026)
by: Gonzalez, Leonardo
Published: (2026)
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification
by: Manick, Rajeev, et al.
Published: (2026)
by: Manick, Rajeev, et al.
Published: (2026)
Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception
by: Meng, Siyuan, et al.
Published: (2026)
by: Meng, Siyuan, et al.
Published: (2026)
Akasha 2: Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architectur
by: Meziani, Yani
Published: (2026)
by: Meziani, Yani
Published: (2026)
Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward
by: Alishzade, Nigar, et al.
Published: (2026)
by: Alishzade, Nigar, et al.
Published: (2026)
Intermitotic timing and motility patterns in the cell division of the diatom Seminavis robusta
by: Ziebarth, Jonas, et al.
Published: (2026)
by: Ziebarth, Jonas, et al.
Published: (2026)
Seeing The Words: Evaluating AI-generated Biblical Art
by: Makimei, Hidde, et al.
Published: (2025)
by: Makimei, Hidde, et al.
Published: (2025)
Digital analysis of early color photographs taken using regular color screen processes
by: Hubička, Jan, et al.
Published: (2023)
by: Hubička, Jan, et al.
Published: (2023)
OmniAcc: Personalized Accessibility Assistant Using Generative AI
by: Karki, Siddhant, et al.
Published: (2025)
by: Karki, Siddhant, et al.
Published: (2025)
Caption-Driven Explainability: Probing CNNs for Bias via CLIP
by: Koller, Patrick, et al.
Published: (2025)
by: Koller, Patrick, et al.
Published: (2025)
Estimating optical vegetation indices and biophysical variables for temperate forests with Sentinel-1 SAR data using machine learning techniques: A case study for Czechia
by: Paluba, Daniel, et al.
Published: (2023)
by: Paluba, Daniel, et al.
Published: (2023)
SVGS-DSGAT: An IoT-Enabled Innovation in Underwater Robotic Object Detection Technology
by: Wu, Dongli, et al.
Published: (2025)
by: Wu, Dongli, et al.
Published: (2025)
Explaining What Machines See: XAI Strategies in Deep Object Detection Models
by: Seyedmomeni, FatemehSadat, et al.
Published: (2025)
by: Seyedmomeni, FatemehSadat, et al.
Published: (2025)
From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation
by: Chen, Jingkun, et al.
Published: (2025)
by: Chen, Jingkun, et al.
Published: (2025)
Domain-Adaptive Pretraining Improves Primate Behavior Recognition
by: Mueller, Felix B., et al.
Published: (2025)
by: Mueller, Felix B., et al.
Published: (2025)
Visible Iris Area as a Quality Metric for Reliable Iris Recognition Under Pupil Dilation and Eyelid Occlusion
by: Pessaud, Jack, et al.
Published: (2025)
by: Pessaud, Jack, et al.
Published: (2025)
Leum-VL Technical Report
by: He, Yuxuan, et al.
Published: (2026)
by: He, Yuxuan, et al.
Published: (2026)
DNRSelect: Active Best View Selection for Deferred Neural Rendering
by: Wu, Dongli, et al.
Published: (2025)
by: Wu, Dongli, et al.
Published: (2025)
FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction
by: Daba, Mohammed, et al.
Published: (2025)
by: Daba, Mohammed, et al.
Published: (2025)
Context in object detection: a systematic literature review
by: Jamali, Mahtab, et al.
Published: (2025)
by: Jamali, Mahtab, et al.
Published: (2025)
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
by: Zhang, Chuhan, et al.
Published: (2025)
by: Zhang, Chuhan, et al.
Published: (2025)
A Sensorimotor Vision Transformer
by: Gadzicki, Konrad, et al.
Published: (2025)
by: Gadzicki, Konrad, et al.
Published: (2025)
Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
by: Sumuk, Aarya
Published: (2026)
by: Sumuk, Aarya
Published: (2026)
Physical Knot Classification Beyond Accuracy: A Benchmark and Diagnostic Study
by: Nie, Shiheng, et al.
Published: (2026)
by: Nie, Shiheng, et al.
Published: (2026)
Pedestrian Detection in Low-Light Conditions: A Comprehensive Survey
by: Ghari, Bahareh, et al.
Published: (2024)
by: Ghari, Bahareh, et al.
Published: (2024)
Human-Centric Perception for Child Sexual Abuse Imagery
by: Laranjeira, Camila, et al.
Published: (2026)
by: Laranjeira, Camila, et al.
Published: (2026)
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
by: Büsching, Marcel, et al.
Published: (2023)
by: Büsching, Marcel, et al.
Published: (2023)
IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)
by: González, Abiam Remache, et al.
Published: (2025)
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)
by: Wang, Chaoyi, et al.
Published: (2025)
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)
by: Lee, Kyuho, et al.
Published: (2025)
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)
by: Shen, Meng, et al.
Published: (2026)
MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space
by: Galella, Santiago, et al.
Published: (2026)
by: Galella, Santiago, et al.
Published: (2026)
Similar Items
-
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025) -
Using Deep Learning to Generate Semantically Correct Hindi Captions
by: Khan, Wasim Akram, et al.
Published: (2026) -
TAG-Head: Time-Aligned Graph Head for Plug-and-Play Fine-grained Action Recognition
by: Hassan, Imtiaz Ul, et al.
Published: (2026) -
Botany Meets Robotics in Alpine Scree Monitoring
by: De Benedittis, Davide, et al.
Published: (2025) -
Have We Mastered Scale in Deep Monocular Visual SLAM? The ScaleMaster Dataset and Benchmark
by: Ju, Hyoseok, et al.
Published: (2026)