Saved in:
| Main Authors: | Luo, Yang, Chen, Zhineng, Zhou, Peng, Wu, Zuxuan, Gao, Xieping, Jiang, Yu-Gang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.00680 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Out of Length Text Recognition with Sub-String Matching
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
by: Su, Yuchen, et al.
Published: (2025)
by: Su, Yuchen, et al.
Published: (2025)
Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation
by: Huang, Yanglin, et al.
Published: (2025)
by: Huang, Yanglin, et al.
Published: (2025)
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
by: Yang, Haibo, et al.
Published: (2024)
by: Yang, Haibo, et al.
Published: (2024)
Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024)
by: Su, Yuchen, et al.
Published: (2024)
Learning Accurate Segmentation Purely from Self-Supervision
by: You, Zuyao, et al.
Published: (2026)
by: You, Zuyao, et al.
Published: (2026)
CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization
by: Chen, Yitong, et al.
Published: (2026)
by: Chen, Yitong, et al.
Published: (2026)
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
by: Wang, Junke, et al.
Published: (2024)
by: Wang, Junke, et al.
Published: (2024)
LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network
by: Su, Yuchen, et al.
Published: (2023)
by: Su, Yuchen, et al.
Published: (2023)
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
by: Chen, Yitong, et al.
Published: (2025)
by: Chen, Yitong, et al.
Published: (2025)
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process
by: Luo, Yang, et al.
Published: (2024)
by: Luo, Yang, et al.
Published: (2024)
Adaptive Retention & Correction: Test-Time Training for Continual Learning
by: Chen, Haoran, et al.
Published: (2024)
by: Chen, Haoran, et al.
Published: (2024)
Decoder Pre-Training with only Text for Scene Text Recognition
by: Zhao, Shuai, et al.
Published: (2024)
by: Zhao, Shuai, et al.
Published: (2024)
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
by: Chen, Haoran, et al.
Published: (2023)
by: Chen, Haoran, et al.
Published: (2023)
Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration
by: Wu, Gang, et al.
Published: (2025)
by: Wu, Gang, et al.
Published: (2025)
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
by: Chen, Haoran, et al.
Published: (2022)
by: Chen, Haoran, et al.
Published: (2022)
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
by: Chen, Haoran, et al.
Published: (2025)
by: Chen, Haoran, et al.
Published: (2025)
Patch Ranking: Efficient CLIP by Learning to Rank Local Patches
by: Wu, Cheng-En, et al.
Published: (2024)
by: Wu, Cheng-En, et al.
Published: (2024)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting
by: Feng, Qijun, et al.
Published: (2024)
by: Feng, Qijun, et al.
Published: (2024)
Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
by: Zou, Zichen, et al.
Published: (2026)
by: Zou, Zichen, et al.
Published: (2026)
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
by: Zhou, Ziwei, et al.
Published: (2025)
by: Zhou, Ziwei, et al.
Published: (2025)
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
GenRec: Unifying Video Generation and Recognition with Diffusion Models
by: Weng, Zejia, et al.
Published: (2024)
by: Weng, Zejia, et al.
Published: (2024)
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
by: Chen, Lin, et al.
Published: (2025)
by: Chen, Lin, et al.
Published: (2025)
Multi-Prompt Progressive Alignment for Multi-Source Unsupervised Domain Adaptation
by: Chen, Haoran, et al.
Published: (2025)
by: Chen, Haoran, et al.
Published: (2025)
UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning
by: Tian, Rui, et al.
Published: (2025)
by: Tian, Rui, et al.
Published: (2025)
Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization
by: Liu, Zhuohan, et al.
Published: (2026)
by: Liu, Zhuohan, et al.
Published: (2026)
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
by: Zhang, Hui, et al.
Published: (2024)
by: Zhang, Hui, et al.
Published: (2024)
OmniVid: A Generative Framework for Universal Video Understanding
by: Wang, Junke, et al.
Published: (2024)
by: Wang, Junke, et al.
Published: (2024)
OmniTracker: Unifying Object Tracking by Tracking-with-Detection
by: Wang, Junke, et al.
Published: (2023)
by: Wang, Junke, et al.
Published: (2023)
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
by: Meng, Lingchen, et al.
Published: (2024)
by: Meng, Lingchen, et al.
Published: (2024)
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
by: Chen, Yitong, et al.
Published: (2024)
by: Chen, Yitong, et al.
Published: (2024)
ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation
by: Yang, Zihan, et al.
Published: (2026)
by: Yang, Zihan, et al.
Published: (2026)
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
by: Sun, Zhihao, et al.
Published: (2024)
by: Sun, Zhihao, et al.
Published: (2024)
BadPatch: Diffusion-Based Generation of Physical Adversarial Patches
by: Wang, Zhixiang, et al.
Published: (2024)
by: Wang, Zhixiang, et al.
Published: (2024)
DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection
by: Zhang, Hui, et al.
Published: (2023)
by: Zhang, Hui, et al.
Published: (2023)
MDiff4STR: Mask Diffusion Model for Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2025)
by: Du, Yongkun, et al.
Published: (2025)
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
by: Zhang, Hui, et al.
Published: (2026)
by: Zhang, Hui, et al.
Published: (2026)
Similar Items
-
Out of Length Text Recognition with Sub-String Matching
by: Du, Yongkun, et al.
Published: (2024) -
LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
by: Su, Yuchen, et al.
Published: (2025) -
Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation
by: Huang, Yanglin, et al.
Published: (2025) -
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
by: Yang, Haibo, et al.
Published: (2024) -
Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024)