Saved in:
| Main Authors: | Zhang, Song, Chen, Yanlong, Li, Yilin, Chen, Yining, Yi, Zili, Zhang, Xiaowei, Li, Yawei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.07562 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs
by: Chen, Yanlong, et al.
Published: (2026)
by: Chen, Yanlong, et al.
Published: (2026)
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
by: Mu, Yuxuan, et al.
Published: (2024)
by: Mu, Yuxuan, et al.
Published: (2024)
ApET: Approximation-Error Guided Token Compression for Efficient VLMs
by: Ma, Qiankun, et al.
Published: (2026)
by: Ma, Qiankun, et al.
Published: (2026)
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding
by: Shao, Run, et al.
Published: (2024)
by: Shao, Run, et al.
Published: (2024)
Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
by: Chen, Lin, et al.
Published: (2026)
by: Chen, Lin, et al.
Published: (2026)
Multi-view Remote Sensing Image Segmentation With SAM priors
by: Qi, Zipeng, et al.
Published: (2024)
by: Qi, Zipeng, et al.
Published: (2024)
Evaluating Remote Sensing Image Captions Beyond Metric Biases
by: Chen, Ziyun, et al.
Published: (2026)
by: Chen, Ziyun, et al.
Published: (2026)
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models
by: Guo, Haonan, et al.
Published: (2024)
by: Guo, Haonan, et al.
Published: (2024)
WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition
by: Chen, Yanlong, et al.
Published: (2025)
by: Chen, Yanlong, et al.
Published: (2025)
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
by: Zhang, Yiming, et al.
Published: (2024)
by: Zhang, Yiming, et al.
Published: (2024)
MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation
by: Luo, Jialin, et al.
Published: (2024)
by: Luo, Jialin, et al.
Published: (2024)
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
by: Zhang, Qizhe, et al.
Published: (2024)
by: Zhang, Qizhe, et al.
Published: (2024)
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
by: Chen, Zhenyuan, et al.
Published: (2025)
by: Chen, Zhenyuan, et al.
Published: (2025)
SADER: Structure-Aware Diffusion Framework with DEterministic Resampling for Multi-Temporal Remote Sensing Cloud Removal
by: Zhang, Yifan, et al.
Published: (2026)
by: Zhang, Yifan, et al.
Published: (2026)
OmniStyle: Filtering High Quality Style Transfer Data at Scale
by: Wang, Ye, et al.
Published: (2025)
by: Wang, Ye, et al.
Published: (2025)
Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis
by: Weng, Xingxing, et al.
Published: (2026)
by: Weng, Xingxing, et al.
Published: (2026)
RSEdit: Text-Guided Image Editing for Remote Sensing
by: Zhenyuan, Chen, et al.
Published: (2026)
by: Zhenyuan, Chen, et al.
Published: (2026)
P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation
by: Zhang, Qi, et al.
Published: (2024)
by: Zhang, Qi, et al.
Published: (2024)
MapGlue: Multimodal Remote Sensing Image Matching
by: Wu, Peihao, et al.
Published: (2025)
by: Wu, Peihao, et al.
Published: (2025)
RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification
by: Zou, Guangwenjie, et al.
Published: (2024)
by: Zou, Guangwenjie, et al.
Published: (2024)
Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks
by: Cai, Jinjin, et al.
Published: (2024)
by: Cai, Jinjin, et al.
Published: (2024)
Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration
by: Li, Zhili, et al.
Published: (2026)
by: Li, Zhili, et al.
Published: (2026)
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
by: Zhang, Leheng, et al.
Published: (2024)
by: Zhang, Leheng, et al.
Published: (2024)
SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing
by: Wu, Peihao, et al.
Published: (2026)
by: Wu, Peihao, et al.
Published: (2026)
Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images
by: Liu, Ye, et al.
Published: (2021)
by: Liu, Ye, et al.
Published: (2021)
MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)
by: Zhang, Luyuan, et al.
Published: (2026)
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
by: Zhang, Qizhe, et al.
Published: (2025)
by: Zhang, Qizhe, et al.
Published: (2025)
YOLO-SPCI: Enhancing Remote Sensing Object Detection via Selective-Perspective-Class Integration
by: Wang, Xinyuan, et al.
Published: (2025)
by: Wang, Xinyuan, et al.
Published: (2025)
ATD: Improved Transformer with Adaptive Token Dictionary for Image Restoration
by: Zhang, Leheng, et al.
Published: (2026)
by: Zhang, Leheng, et al.
Published: (2026)
Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution
by: Xiao, Yi, et al.
Published: (2024)
by: Xiao, Yi, et al.
Published: (2024)
InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images
by: Li, Wuzhou, et al.
Published: (2024)
by: Li, Wuzhou, et al.
Published: (2024)
How Much Information Can a Vision Token Hold? A Scaling Law for Recognition Limits in VLMs
by: Zhuang, Shuxin, et al.
Published: (2026)
by: Zhuang, Shuxin, et al.
Published: (2026)
FUSAR-KLIP: Towards Multimodal Foundation Models for Remote Sensing
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
TEFormer: Texture-Aware and Edge-Guided Transformer for Semantic Segmentation of Urban Remote Sensing Images
by: Zhou, Guoyu, et al.
Published: (2025)
by: Zhou, Guoyu, et al.
Published: (2025)
Lightweight Change Detection in Heterogeneous Remote Sensing Images with Online All-Integer Pruning Training
by: Zhang, Chengyang, et al.
Published: (2024)
by: Zhang, Chengyang, et al.
Published: (2024)
TASAM: Terrain-and-Aware Segment Anything Model for Temporal-Scale Remote Sensing Segmentation
by: Wang, Tianyang, et al.
Published: (2025)
by: Wang, Tianyang, et al.
Published: (2025)
OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs
by: Ailuro, Stefan Maria, et al.
Published: (2026)
by: Ailuro, Stefan Maria, et al.
Published: (2026)
Beyond Shortcuts: Mitigating Visual Illusions in Frozen VLMs via Qualitative Reasoning
by: Guo, Hao, et al.
Published: (2026)
by: Guo, Hao, et al.
Published: (2026)
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
by: Zhang, Yingying, et al.
Published: (2025)
by: Zhang, Yingying, et al.
Published: (2025)
Beyond Open Vocabulary: Multimodal Prompting for Object Detection in Remote Sensing Images
by: Yang, Shuai, et al.
Published: (2026)
by: Yang, Shuai, et al.
Published: (2026)
Similar Items
-
Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs
by: Chen, Yanlong, et al.
Published: (2026) -
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
by: Mu, Yuxuan, et al.
Published: (2024) -
ApET: Approximation-Error Guided Token Compression for Efficient VLMs
by: Ma, Qiankun, et al.
Published: (2026) -
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding
by: Shao, Run, et al.
Published: (2024) -
Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
by: Chen, Lin, et al.
Published: (2026)