Saved in:
| Main Authors: | Wang, Zhenyu, Nirjon, Shahriar |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.10016 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
mmJoints: Expanding Joint Representations Beyond (x,y,z) in mmWave-Based 3D Pose Estimation
by: Wang, Zhenyu, et al.
Published: (2025)
by: Wang, Zhenyu, et al.
Published: (2025)
mmWEAVER: Environment-Specific mmWave Signal Synthesis from a Photo and Activity Description
by: Monjur, Mahathir, et al.
Published: (2025)
by: Monjur, Mahathir, et al.
Published: (2025)
Evolving Prompt Adaptation for Vision-Language Models
by: Zhang, Enming, et al.
Published: (2026)
by: Zhang, Enming, et al.
Published: (2026)
Collaborative Edge-to-Server Inference for Vision-Language Models
by: Song, Soochang, et al.
Published: (2025)
by: Song, Soochang, et al.
Published: (2025)
NuWa: Deriving Lightweight Task-Specific Vision Transformers for Edge Devices
by: Wei, Ziteng, et al.
Published: (2025)
by: Wei, Ziteng, et al.
Published: (2025)
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
by: Tian, Xinyu, et al.
Published: (2025)
by: Tian, Xinyu, et al.
Published: (2025)
LVLM-Aided Alignment of Task-Specific Vision Models
by: Koebler, Alexander, et al.
Published: (2025)
by: Koebler, Alexander, et al.
Published: (2025)
A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
by: Jung, Hoin, et al.
Published: (2024)
by: Jung, Hoin, et al.
Published: (2024)
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models
by: Liu, Xiao, et al.
Published: (2024)
by: Liu, Xiao, et al.
Published: (2024)
Refusal as Silence: Gendered Disparities in Vision-Language Model Responses
by: Luo, Sha, et al.
Published: (2024)
by: Luo, Sha, et al.
Published: (2024)
mmCounter: Static People Counting in Dense Indoor Scenarios Using mmWave Radar
by: Toha, Tarik Reza, et al.
Published: (2025)
by: Toha, Tarik Reza, et al.
Published: (2025)
Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning
by: Yu, Zhenyu, et al.
Published: (2026)
by: Yu, Zhenyu, et al.
Published: (2026)
Federated Learning of Low-Rank One-Shot Image Detection Models in Edge Devices with Scalable Accuracy and Compute Complexity
by: Hannaan, Abdul, et al.
Published: (2025)
by: Hannaan, Abdul, et al.
Published: (2025)
Vision-Language Models for Edge Networks: A Comprehensive Survey
by: Sharshar, Ahmed, et al.
Published: (2025)
by: Sharshar, Ahmed, et al.
Published: (2025)
Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
by: Semiromizadeh, Narges, et al.
Published: (2025)
by: Semiromizadeh, Narges, et al.
Published: (2025)
Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in Teleoperation
by: Zhan, Zijun, et al.
Published: (2024)
by: Zhan, Zijun, et al.
Published: (2024)
GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding
by: Mathew, Athul M., et al.
Published: (2025)
by: Mathew, Athul M., et al.
Published: (2025)
Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss Functions
by: Jaiswal, Siddharth D, et al.
Published: (2025)
by: Jaiswal, Siddharth D, et al.
Published: (2025)
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
by: Chen, Peng, et al.
Published: (2025)
by: Chen, Peng, et al.
Published: (2025)
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
by: Pramanick, Shraman, et al.
Published: (2023)
by: Pramanick, Shraman, et al.
Published: (2023)
Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress
by: Zhang, Yuelin, et al.
Published: (2026)
by: Zhang, Yuelin, et al.
Published: (2026)
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
by: Zhang, Zilun, et al.
Published: (2025)
by: Zhang, Zilun, et al.
Published: (2025)
Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models
by: Hossain, Shamima
Published: (2025)
by: Hossain, Shamima
Published: (2025)
EdgeSync: Accelerating Edge-Model Updates for Data Drift through Adaptive Continuous Learning
by: Donga, Runchu, et al.
Published: (2025)
by: Donga, Runchu, et al.
Published: (2025)
Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption
by: Erol, Mehmet Kaan
Published: (2026)
by: Erol, Mehmet Kaan
Published: (2026)
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
by: Zhou, Zhenyu, et al.
Published: (2023)
by: Zhou, Zhenyu, et al.
Published: (2023)
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
by: Wu, Aodi, et al.
Published: (2025)
by: Wu, Aodi, et al.
Published: (2025)
Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts
by: Guo, Mingning, et al.
Published: (2025)
by: Guo, Mingning, et al.
Published: (2025)
Edge-AI for Agriculture: Lightweight Vision Models for Disease Detection in Resource-Limited Settings
by: Joshi, Harsh
Published: (2024)
by: Joshi, Harsh
Published: (2024)
Context-Aware Temporal Embedding of Objects in Video Data
by: Farhan, Ahnaf, et al.
Published: (2024)
by: Farhan, Ahnaf, et al.
Published: (2024)
Selective Visual Prompting in Vision Mamba
by: Yao, Yifeng, et al.
Published: (2024)
by: Yao, Yifeng, et al.
Published: (2024)
Dynamic Weight Adjustment for Knowledge Distillation: Leveraging Vision Transformer for High-Accuracy Lung Cancer Detection and Real-Time Deployment
by: Khan, Saif Ur Rehman, et al.
Published: (2025)
by: Khan, Saif Ur Rehman, et al.
Published: (2025)
Cognition-Inspired Dual-Stream Semantic Enhancement for Vision-Based Dynamic Emotion Modeling
by: Wang, Huanzhen, et al.
Published: (2026)
by: Wang, Huanzhen, et al.
Published: (2026)
TAP-SLF: Parameter-Efficient Adaptation of Vision Foundation Models for Multi-Task Ultrasound Image Analysis
by: Wan, Hui, et al.
Published: (2026)
by: Wan, Hui, et al.
Published: (2026)
On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
by: Huang, Lianming, et al.
Published: (2025)
by: Huang, Lianming, et al.
Published: (2025)
CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries
by: Liu, Shudong, et al.
Published: (2025)
by: Liu, Shudong, et al.
Published: (2025)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
by: Wang, Liuyi, et al.
Published: (2025)
by: Wang, Liuyi, et al.
Published: (2025)
Object-Centric Vision Token Pruning for Vision Language Models
by: Li, Guangyuan, et al.
Published: (2025)
by: Li, Guangyuan, et al.
Published: (2025)
Improved Belief-Attention in Vision Task
by: Zhang, Guoqiang
Published: (2026)
by: Zhang, Guoqiang
Published: (2026)
Similar Items
-
mmJoints: Expanding Joint Representations Beyond (x,y,z) in mmWave-Based 3D Pose Estimation
by: Wang, Zhenyu, et al.
Published: (2025) -
mmWEAVER: Environment-Specific mmWave Signal Synthesis from a Photo and Activity Description
by: Monjur, Mahathir, et al.
Published: (2025) -
Evolving Prompt Adaptation for Vision-Language Models
by: Zhang, Enming, et al.
Published: (2026) -
Collaborative Edge-to-Server Inference for Vision-Language Models
by: Song, Soochang, et al.
Published: (2025) -
NuWa: Deriving Lightweight Task-Specific Vision Transformers for Edge Devices
by: Wei, Ziteng, et al.
Published: (2025)