Saved in:
| Main Authors: | Wang, Jiawei, Zhang, Shunchi, Hu, Kai, Ma, Chixiang, Zhong, Zhuoyao, Sun, Lei, Huo, Qiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.09232 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
by: Wang, Jiawei, et al.
Published: (2024)
by: Wang, Jiawei, et al.
Published: (2024)
DLAFormer: An End-to-End Transformer For Document Layout Analysis
by: Wang, Jiawei, et al.
Published: (2024)
by: Wang, Jiawei, et al.
Published: (2024)
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
by: Wang, Jiawei, et al.
Published: (2025)
by: Wang, Jiawei, et al.
Published: (2025)
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024)
by: Su, Yuchen, et al.
Published: (2024)
Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
by: Zhong, Yunshan, et al.
Published: (2024)
by: Zhong, Yunshan, et al.
Published: (2024)
Mr. DETR++: Instructive Multi-Route Training for Detection Transformers with Mixture-of-Experts
by: Zhang, Chang-Bin, et al.
Published: (2024)
by: Zhang, Chang-Bin, et al.
Published: (2024)
Text-Guided Mixup Towards Long-Tailed Image Categorization
by: Franklin, Richard, et al.
Published: (2024)
by: Franklin, Richard, et al.
Published: (2024)
A Dynamic Transformer Network for Vehicle Detection
by: Tian, Chunwei, et al.
Published: (2025)
by: Tian, Chunwei, et al.
Published: (2025)
PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning
by: Liu, Zhuoyao, et al.
Published: (2025)
by: Liu, Zhuoyao, et al.
Published: (2025)
Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning
by: Zhu, Qiang, et al.
Published: (2025)
by: Zhu, Qiang, et al.
Published: (2025)
YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection
by: Hu, Jiawei
Published: (2025)
by: Hu, Jiawei
Published: (2025)
Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models
by: Xia, Tao, et al.
Published: (2026)
by: Xia, Tao, et al.
Published: (2026)
Dynamic and Compressive Adaptation of Transformers From Images to Videos
by: Zhang, Guozhen, et al.
Published: (2024)
by: Zhang, Guozhen, et al.
Published: (2024)
Aggregated Text Transformer for Scene Text Detection
by: Zhou, Zhao, et al.
Published: (2022)
by: Zhou, Zhao, et al.
Published: (2022)
Constrained Dynamic Gaussian Splatting
by: Zheng, Zihan, et al.
Published: (2026)
by: Zheng, Zihan, et al.
Published: (2026)
Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts
by: Sun, Yanguang, et al.
Published: (2025)
by: Sun, Yanguang, et al.
Published: (2025)
OCFER-Net: Recognizing Facial Expression in Online Learning System
by: Huo, Yi, et al.
Published: (2025)
by: Huo, Yi, et al.
Published: (2025)
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
by: Chen, Fangyi, et al.
Published: (2024)
by: Chen, Fangyi, et al.
Published: (2024)
FreeText: Training-Free Text Rendering in Diffusion Transformers via Attention Localization and Spectral Glyph Injection
by: Zhang, Ruiqiang, et al.
Published: (2026)
by: Zhang, Ruiqiang, et al.
Published: (2026)
Lane Departure Accident Prevention in Foggy Conditions: A Prior-Guided Dynamic Feature Fusion Transformer Framework for Real-Time Lane Detection
by: Zhang, Ronghui, et al.
Published: (2025)
by: Zhang, Ronghui, et al.
Published: (2025)
CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
by: Ma, Lichen, et al.
Published: (2024)
by: Ma, Lichen, et al.
Published: (2024)
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
by: Lyu, Jiahao, et al.
Published: (2024)
by: Lyu, Jiahao, et al.
Published: (2024)
VideoDirector: Precise Video Editing via Text-to-Video Models
by: Wang, Yukun, et al.
Published: (2024)
by: Wang, Yukun, et al.
Published: (2024)
Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
by: Zhang, Hao, et al.
Published: (2024)
by: Zhang, Hao, et al.
Published: (2024)
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
by: Fan, Lei, et al.
Published: (2024)
by: Fan, Lei, et al.
Published: (2024)
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
by: Cong, Peishan, et al.
Published: (2025)
by: Cong, Peishan, et al.
Published: (2025)
MultiColor: Image Colorization by Learning from Multiple Color Spaces
by: Du, Xiangcheng, et al.
Published: (2024)
by: Du, Xiangcheng, et al.
Published: (2024)
Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers
by: Chen, Ruidong, et al.
Published: (2026)
by: Chen, Ruidong, et al.
Published: (2026)
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
by: Deng, Wenxiao, et al.
Published: (2024)
by: Deng, Wenxiao, et al.
Published: (2024)
Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2025)
by: Wang, Zhongqi, et al.
Published: (2025)
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
by: Lan, Rui, et al.
Published: (2025)
by: Lan, Rui, et al.
Published: (2025)
Length Matters: Length-Aware Transformer for Temporal Sentence Grounding
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
by: Lin, Tiancheng, et al.
Published: (2024)
by: Lin, Tiancheng, et al.
Published: (2024)
Block-level Text Spotting with LLMs
by: Bannur, Ganesh, et al.
Published: (2024)
by: Bannur, Ganesh, et al.
Published: (2024)
Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution
by: Xu, Tianyi, et al.
Published: (2024)
by: Xu, Tianyi, et al.
Published: (2024)
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
by: Zhang, Haoji, et al.
Published: (2025)
by: Zhang, Haoji, et al.
Published: (2025)
Dynamic Diffusion Transformer
by: Zhao, Wangbo, et al.
Published: (2024)
by: Zhao, Wangbo, et al.
Published: (2024)
Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision
by: Lu, Qiang, et al.
Published: (2025)
by: Lu, Qiang, et al.
Published: (2025)
PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture
by: Zheng, Qiang, et al.
Published: (2024)
by: Zheng, Qiang, et al.
Published: (2024)
Similar Items
-
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
by: Wang, Jiawei, et al.
Published: (2024) -
DLAFormer: An End-to-End Transformer For Document Layout Analysis
by: Wang, Jiawei, et al.
Published: (2024) -
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
by: Wang, Jiawei, et al.
Published: (2025) -
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
by: Liu, Yang, et al.
Published: (2025) -
Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024)