Saved in:
| Main Authors: | Zhang, Tongtong, Wei, Xian, Li, Yuanxiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.00544 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Nerf-Based Color Consistency Method for Remote Sensing Images
by: Zuo, Zongcheng, et al.
Published: (2024)
by: Zuo, Zongcheng, et al.
Published: (2024)
psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery
by: Zhang, Tongtong, et al.
Published: (2024)
by: Zhang, Tongtong, et al.
Published: (2024)
Online LiDAR-Camera Extrinsic Parameters Self-checking
by: Wei, Pengjin, et al.
Published: (2022)
by: Wei, Pengjin, et al.
Published: (2022)
Proximal Vision Transformer: Enhancing Feature Representation through Two-Stage Manifold Geometry
by: Yun, Haoyu, et al.
Published: (2025)
by: Yun, Haoyu, et al.
Published: (2025)
Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation
by: He, Rongzhao, et al.
Published: (2025)
by: He, Rongzhao, et al.
Published: (2025)
Superpixel Semantics Representation and Pre-training for Vision-Language Task
by: Zhang, Siyu, et al.
Published: (2023)
by: Zhang, Siyu, et al.
Published: (2023)
Environment-Driven Online LiDAR-Camera Extrinsic Calibration
by: Huang, Zhiwei, et al.
Published: (2025)
by: Huang, Zhiwei, et al.
Published: (2025)
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
by: Zhang, Ziyang, et al.
Published: (2025)
by: Zhang, Ziyang, et al.
Published: (2025)
Improved Belief-Attention in Vision Task
by: Zhang, Guoqiang
Published: (2026)
by: Zhang, Guoqiang
Published: (2026)
ZSPAPrune: Zero-Shot Prompt-Aware Token Pruning for Vision-Language Models
by: Zhang, Pu, et al.
Published: (2025)
by: Zhang, Pu, et al.
Published: (2025)
Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer
by: Zhang, Haoyu, et al.
Published: (2025)
by: Zhang, Haoyu, et al.
Published: (2025)
PMMD: A pose-guided multi-view multi-modal diffusion for person generation
by: Shang, Ziyu, et al.
Published: (2025)
by: Shang, Ziyu, et al.
Published: (2025)
NuWa: Deriving Lightweight Task-Specific Vision Transformers for Edge Devices
by: Wei, Ziteng, et al.
Published: (2025)
by: Wei, Ziteng, et al.
Published: (2025)
Deep Active Learning with Manifold-preserving Trajectory Sampling
by: Ji, Yingrui, et al.
Published: (2024)
by: Ji, Yingrui, et al.
Published: (2024)
LLaVA-OneVision: Easy Visual Task Transfer
by: Li, Bo, et al.
Published: (2024)
by: Li, Bo, et al.
Published: (2024)
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
by: Ren, Yuan, et al.
Published: (2024)
by: Ren, Yuan, et al.
Published: (2024)
Dynamic Universal Approximation Theory: The Basic Theory for Deep Learning-Based Computer Vision Models
by: Wang, Wei, et al.
Published: (2024)
by: Wang, Wei, et al.
Published: (2024)
Compressing Vision Transformers in Geospatial Transfer Learning with Manifold-Constrained Optimization
by: Snyder, Thomas, et al.
Published: (2026)
by: Snyder, Thomas, et al.
Published: (2026)
UNIC: Learning Unified Multimodal Extrinsic Contact Estimation
by: Xu, Zhengtong, et al.
Published: (2026)
by: Xu, Zhengtong, et al.
Published: (2026)
Efficient Terrain Stochastic Differential Efficient Terrain Stochastic Differential Equations for Multipurpose Digital Elevation Model Restoration
by: Zhang, Tongtong, et al.
Published: (2024)
by: Zhang, Tongtong, et al.
Published: (2024)
Two-Stream Interactive Joint Learning of Scene Parsing and Geometric Vision Tasks
by: Tang, Guanfeng, et al.
Published: (2026)
by: Tang, Guanfeng, et al.
Published: (2026)
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
by: Li, Siyuan, et al.
Published: (2023)
by: Li, Siyuan, et al.
Published: (2023)
Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress
by: Zhang, Yuelin, et al.
Published: (2026)
by: Zhang, Yuelin, et al.
Published: (2026)
Olympus: A Universal Task Router for Computer Vision Tasks
by: Lin, Yuanze, et al.
Published: (2024)
by: Lin, Yuanze, et al.
Published: (2024)
Human-Like Coarse Object Representations in Vision Models
by: Gizdov, Andrey, et al.
Published: (2026)
by: Gizdov, Andrey, et al.
Published: (2026)
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
by: Chen, Jiankang, et al.
Published: (2025)
by: Chen, Jiankang, et al.
Published: (2025)
GTMA: Dynamic Representation Optimization for OOD Vision-Language Models
by: Zhang, Jensen, et al.
Published: (2025)
by: Zhang, Jensen, et al.
Published: (2025)
Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology
by: Zhang, Jingyu, et al.
Published: (2024)
by: Zhang, Jingyu, et al.
Published: (2024)
Global Geometry Is Not Enough for Vision Representations
by: Chung, Jiwan, et al.
Published: (2026)
by: Chung, Jiwan, et al.
Published: (2026)
Behavior-Grounded Lane Representation Learning for Multi-Task Traffic Digital Twins
by: Tamaru, Rei, et al.
Published: (2026)
by: Tamaru, Rei, et al.
Published: (2026)
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery
by: Yang, Enneng, et al.
Published: (2024)
by: Yang, Enneng, et al.
Published: (2024)
Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching
by: Huang, Zhiwei, et al.
Published: (2024)
by: Huang, Zhiwei, et al.
Published: (2024)
Representation Separation for Semantic Segmentation with Vision Transformers
by: Hong, Yuanduo, et al.
Published: (2022)
by: Hong, Yuanduo, et al.
Published: (2022)
The Geometry of Representational Failures in Vision Language Models
by: Savietto, Daniele, et al.
Published: (2026)
by: Savietto, Daniele, et al.
Published: (2026)
SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments
by: Jiang, Wen, et al.
Published: (2026)
by: Jiang, Wen, et al.
Published: (2026)
PyVision-RL: Forging Open Agentic Vision Models via RL
by: Zhao, Shitian, et al.
Published: (2026)
by: Zhao, Shitian, et al.
Published: (2026)
Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models
by: He, Yuting, et al.
Published: (2026)
by: He, Yuting, et al.
Published: (2026)
PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement
by: Zhan, Yu-Wei, et al.
Published: (2025)
by: Zhan, Yu-Wei, et al.
Published: (2025)
LVLM-Aided Alignment of Task-Specific Vision Models
by: Koebler, Alexander, et al.
Published: (2025)
by: Koebler, Alexander, et al.
Published: (2025)
Demystifying KAN for Vision Tasks: The RepKAN Approach
by: Cheon, Minjong
Published: (2026)
by: Cheon, Minjong
Published: (2026)
Similar Items
-
A Nerf-Based Color Consistency Method for Remote Sensing Images
by: Zuo, Zongcheng, et al.
Published: (2024) -
psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery
by: Zhang, Tongtong, et al.
Published: (2024) -
Online LiDAR-Camera Extrinsic Parameters Self-checking
by: Wei, Pengjin, et al.
Published: (2022) -
Proximal Vision Transformer: Enhancing Feature Representation through Two-Stage Manifold Geometry
by: Yun, Haoyu, et al.
Published: (2025) -
Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation
by: He, Rongzhao, et al.
Published: (2025)