Saved in:
| Main Authors: | Bi, Chongke, Gao, Xin, Fu, Baofeng, Zhao, Yuheng, Chen, Siming, Zhao, Ying, Yang, Lu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.23257 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling
by: Zhang, Yu, et al.
Published: (2026)
by: Zhang, Yu, et al.
Published: (2026)
CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing
by: Gao, Jianxiong, et al.
Published: (2025)
by: Gao, Jianxiong, et al.
Published: (2025)
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis
by: Bi, Chongke, et al.
Published: (2024)
by: Bi, Chongke, et al.
Published: (2024)
CD-TVD: Contrastive Diffusion for 3D Super-Resolution with Scarce High-Resolution Time-Varying Data
by: Bi, Chongke, et al.
Published: (2025)
by: Bi, Chongke, et al.
Published: (2025)
NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis
by: Liu, Xiaoxing, et al.
Published: (2025)
by: Liu, Xiaoxing, et al.
Published: (2025)
Making Your Dreams A Reality: Decoding the Dreams into a Coherent Video Story from fMRI Signals
by: Fu, Yanwei, et al.
Published: (2025)
by: Fu, Yanwei, et al.
Published: (2025)
Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics
by: Tian, Beiwen, et al.
Published: (2024)
by: Tian, Beiwen, et al.
Published: (2024)
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
by: Shi, Yuheng, et al.
Published: (2024)
by: Shi, Yuheng, et al.
Published: (2024)
SlowPerception: Physical-World Latency Attack against Visual Perception in Autonomous Driving
by: Ma, Chen, et al.
Published: (2024)
by: Ma, Chen, et al.
Published: (2024)
LIPT: Latency-aware Image Processing Transformer
by: Qiao, Junbo, et al.
Published: (2024)
by: Qiao, Junbo, et al.
Published: (2024)
Optimization of Layer Skipping and Frequency Scaling for Convolutional Neural Networks under Latency Constraint
by: Chan, Minh David Thao, et al.
Published: (2025)
by: Chan, Minh David Thao, et al.
Published: (2025)
Attribute Distribution Modeling and Semantic-Visual Alignment for Generative Zero-shot Learning
by: Pu, Haojie, et al.
Published: (2026)
by: Pu, Haojie, et al.
Published: (2026)
VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning
by: Ji, Yuheng, et al.
Published: (2025)
by: Ji, Yuheng, et al.
Published: (2025)
Tango: Taming Visual Signals for Efficient Video Large Language Models
by: Yin, Shukang, et al.
Published: (2026)
by: Yin, Shukang, et al.
Published: (2026)
Representation Learning for Point Cloud Understanding
by: Yan, Siming
Published: (2025)
by: Yan, Siming
Published: (2025)
ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification
by: Gu, Hongyang, et al.
Published: (2025)
by: Gu, Hongyang, et al.
Published: (2025)
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models
by: He, Xiaoxuan, et al.
Published: (2025)
by: He, Xiaoxuan, et al.
Published: (2025)
Efficient Motion Prompt Learning for Robust Visual Tracking
by: Zhao, Jie, et al.
Published: (2025)
by: Zhao, Jie, et al.
Published: (2025)
MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding
by: Chen, Ketong, et al.
Published: (2025)
by: Chen, Ketong, et al.
Published: (2025)
GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation
by: Liao, Bangyan, et al.
Published: (2024)
by: Liao, Bangyan, et al.
Published: (2024)
UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset
by: Zhao, Chen, et al.
Published: (2025)
by: Zhao, Chen, et al.
Published: (2025)
Geometry-Aware Feature Matching for Large-Scale Structure from Motion
by: Chen, Gonglin, et al.
Published: (2024)
by: Chen, Gonglin, et al.
Published: (2024)
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
by: Zhao, Haoyu, et al.
Published: (2024)
by: Zhao, Haoyu, et al.
Published: (2024)
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
by: Leng, Sicong, et al.
Published: (2024)
by: Leng, Sicong, et al.
Published: (2024)
A Remote Sensing Image Change Detection Method Integrating Layer Exchange and Channel-Spatial Differences
by: Dong, Sijun, et al.
Published: (2025)
by: Dong, Sijun, et al.
Published: (2025)
Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework
by: Fu, Siming, et al.
Published: (2025)
by: Fu, Siming, et al.
Published: (2025)
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
by: Yan, Siming, et al.
Published: (2024)
by: Yan, Siming, et al.
Published: (2024)
LACO: Adaptive Latent Communication for Collaborative Driving
by: Chen, Tianhao, et al.
Published: (2026)
by: Chen, Tianhao, et al.
Published: (2026)
TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation
by: Patel, Manthan, et al.
Published: (2025)
by: Patel, Manthan, et al.
Published: (2025)
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
by: Liu, Zikang, et al.
Published: (2025)
by: Liu, Zikang, et al.
Published: (2025)
SRRT: Exploring Search Region Regulation for Visual Object Tracking
by: Zhu, Jiawen, et al.
Published: (2022)
by: Zhu, Jiawen, et al.
Published: (2022)
Pyramid Diffusion for Fine 3D Large Scene Generation
by: Liu, Yuheng, et al.
Published: (2023)
by: Liu, Yuheng, et al.
Published: (2023)
Efficient Large Multi-modal Models via Visual Context Compression
by: Chen, Jieneng, et al.
Published: (2024)
by: Chen, Jieneng, et al.
Published: (2024)
Deep Probabilistic Unfolding for Quantized Compressive Sensing
by: Qu, Gang, et al.
Published: (2026)
by: Qu, Gang, et al.
Published: (2026)
Physics-guided Deep Unfolding Network for Enhanced Kronecker Compressive sensing
by: Qu, Gang, et al.
Published: (2025)
by: Qu, Gang, et al.
Published: (2025)
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
by: Yang, Chenyu, et al.
Published: (2024)
by: Yang, Chenyu, et al.
Published: (2024)
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
by: Su, Yaofeng, et al.
Published: (2026)
by: Su, Yaofeng, et al.
Published: (2026)
Diffusion-based Visual Anagram as Multi-task Learning
by: Xu, Zhiyuan, et al.
Published: (2024)
by: Xu, Zhiyuan, et al.
Published: (2024)
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
by: Zhao, Weiguang, et al.
Published: (2026)
by: Zhao, Weiguang, et al.
Published: (2026)
VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies
by: Gao, Mingjian, et al.
Published: (2026)
by: Gao, Mingjian, et al.
Published: (2026)
Similar Items
-
Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling
by: Zhang, Yu, et al.
Published: (2026) -
CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing
by: Gao, Jianxiong, et al.
Published: (2025) -
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis
by: Bi, Chongke, et al.
Published: (2024) -
CD-TVD: Contrastive Diffusion for 3D Super-Resolution with Scarce High-Resolution Time-Varying Data
by: Bi, Chongke, et al.
Published: (2025) -
NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis
by: Liu, Xiaoxing, et al.
Published: (2025)