Saved in:
| Main Authors: | Sarkar, Ayushman, Idris, Mohd Yamani Idna, Yu, Zhenyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.10523 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling
by: Sarkar, Ayushman, et al.
Published: (2026)
by: Sarkar, Ayushman, et al.
Published: (2026)
SatelliteCalculator: A Multi-Task Vision Foundation Model for Quantitative Remote Sensing Inversion
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
StoryState: Agent-Based State Control for Consistent and Editable Storybooks
by: Sarkar, Ayushman, et al.
Published: (2026)
by: Sarkar, Ayushman, et al.
Published: (2026)
ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation
by: Sarkar, Ayushman, et al.
Published: (2026)
by: Sarkar, Ayushman, et al.
Published: (2026)
DC4CR: When Cloud Removal Meets Diffusion Control in Remote Sensing
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
Improved implicit diffusion model with knowledge distillation to estimate the spatial distribution density of carbon stock in remote sensing imagery
by: Yu, Zhenyu, et al.
Published: (2024)
by: Yu, Zhenyu, et al.
Published: (2024)
Rainy: Unlocking Satellite Calibration for Deep Learning in Precipitation
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
ForgetMe: Evaluating Selective Forgetting in Generative Models
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
A Diffusion-Based Framework for Terrain-Aware Remote Sensing Image Reconstruction
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images
by: Yu, Zhenyu, et al.
Published: (2025)
by: Yu, Zhenyu, et al.
Published: (2025)
A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge
by: Dahri, Tarique, et al.
Published: (2025)
by: Dahri, Tarique, et al.
Published: (2025)
Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models
by: Lee, Jonathan, et al.
Published: (2025)
by: Lee, Jonathan, et al.
Published: (2025)
Characterizing Disparity Between Edge Models and High-Accuracy Base Models for Vision Tasks
by: Wang, Zhenyu, et al.
Published: (2024)
by: Wang, Zhenyu, et al.
Published: (2024)
Taxonomy-Aware Evaluation of Vision-Language Models
by: Snæbjarnarson, Vésteinn, et al.
Published: (2025)
by: Snæbjarnarson, Vésteinn, et al.
Published: (2025)
Road Rage Reasoning with Vision-language Models (VLMs): Task Definition and Evaluation Dataset
by: Weng, Yibing, et al.
Published: (2025)
by: Weng, Yibing, et al.
Published: (2025)
Text-to-Image Representativity Fairness Evaluation Framework
by: Yamani, Asma, et al.
Published: (2024)
by: Yamani, Asma, et al.
Published: (2024)
CapsuleNet: A Deep Learning Model To Classify GI Diseases Using EfficientNet-b7
by: Das, Aniket, et al.
Published: (2024)
by: Das, Aniket, et al.
Published: (2024)
DISK: Dynamic Inference SKipping for World Models
by: Naman, Anugunj, et al.
Published: (2026)
by: Naman, Anugunj, et al.
Published: (2026)
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation
by: Izhar, Amaan, et al.
Published: (2025)
by: Izhar, Amaan, et al.
Published: (2025)
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
by: Li, Rongjie, et al.
Published: (2024)
by: Li, Rongjie, et al.
Published: (2024)
Bombardier Beetle Optimizer: A Novel Bio-Inspired Algorithm for Global Optimization
by: Shehadeh, Hisham A., et al.
Published: (2025)
by: Shehadeh, Hisham A., et al.
Published: (2025)
A Weighted Vision Transformer-Based Multi-Task Learning Framework for Predicting ADAS-Cog Scores
by: Hamid, Nur Amirah Abd, et al.
Published: (2025)
by: Hamid, Nur Amirah Abd, et al.
Published: (2025)
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation
by: Liu, Jiaming, et al.
Published: (2024)
by: Liu, Jiaming, et al.
Published: (2024)
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
by: Diao, Muxi, et al.
Published: (2025)
by: Diao, Muxi, et al.
Published: (2025)
LAW & ORDER: Adaptive Spatial Weighting for Medical Diffusion and Segmentation
by: Naman, Anugunj, et al.
Published: (2026)
by: Naman, Anugunj, et al.
Published: (2026)
Synthetic Designed Experiments for Diagnosing Vision Model Failure
by: Sarkar, Krisanu
Published: (2026)
by: Sarkar, Krisanu
Published: (2026)
CaST-POI: Candidate-Conditioned Spatiotemporal Modeling for Next POI Recommendation
by: Yu, Zhenyu, et al.
Published: (2026)
by: Yu, Zhenyu, et al.
Published: (2026)
MVT: Mask-Grounded Vision-Language Models for Taxonomy-Aligned Land-Cover Tagging
by: Chen, Siyi, et al.
Published: (2025)
by: Chen, Siyi, et al.
Published: (2025)
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models
by: Sarkar, Pritam, et al.
Published: (2025)
by: Sarkar, Pritam, et al.
Published: (2025)
Leveraging Vision Language Models for Specialized Agricultural Tasks
by: Arshad, Muhammad Arbab, et al.
Published: (2024)
by: Arshad, Muhammad Arbab, et al.
Published: (2024)
Vision-Language Models for Vision Tasks: A Survey
by: Zhang, Jingyi, et al.
Published: (2023)
by: Zhang, Jingyi, et al.
Published: (2023)
Self-Rewarding Vision-Language Model via Reasoning Decomposition
by: Li, Zongxia, et al.
Published: (2025)
by: Li, Zongxia, et al.
Published: (2025)
ColonNet: A Hybrid Of DenseNet121 And U-NET Model For Detection And Segmentation Of GI Bleeding
by: Singh, Ayushman, et al.
Published: (2024)
by: Singh, Ayushman, et al.
Published: (2024)
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
by: Yan, Ziang, et al.
Published: (2024)
by: Yan, Ziang, et al.
Published: (2024)
Methodology to Deploy CNN-Based Computer Vision Models on Immersive Wearable Devices
by: Malek, Kaveh, et al.
Published: (2024)
by: Malek, Kaveh, et al.
Published: (2024)
Olympus: A Universal Task Router for Computer Vision Tasks
by: Lin, Yuanze, et al.
Published: (2024)
by: Lin, Yuanze, et al.
Published: (2024)
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
by: Pereira, Gracile Astlin, et al.
Published: (2024)
by: Pereira, Gracile Astlin, et al.
Published: (2024)
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
by: Yu, Haiyang, et al.
Published: (2025)
by: Yu, Haiyang, et al.
Published: (2025)
Similar Items
-
DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling
by: Sarkar, Ayushman, et al.
Published: (2026) -
SatelliteCalculator: A Multi-Task Vision Foundation Model for Quantitative Remote Sensing Inversion
by: Yu, Zhenyu, et al.
Published: (2025) -
StoryState: Agent-Based State Control for Consistent and Editable Storybooks
by: Sarkar, Ayushman, et al.
Published: (2026) -
ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation
by: Sarkar, Ayushman, et al.
Published: (2026) -
DC4CR: When Cloud Removal Meets Diffusion Control in Remote Sensing
by: Yu, Zhenyu, et al.
Published: (2025)