Saved in:
| Main Authors: | Long, Rujiao, Xing, Hangdi, Yang, Zhibo, Zheng, Qi, Yu, Zhi, Yao, Cong, Huang, Fei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.01522 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
by: Long, Rujiao, et al.
Published: (2024)
by: Long, Rujiao, et al.
Published: (2024)
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
by: Shao, Zirui, et al.
Published: (2024)
by: Shao, Zirui, et al.
Published: (2024)
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
by: Wan, Jianqiang, et al.
Published: (2024)
by: Wan, Jianqiang, et al.
Published: (2024)
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
by: Zhu, Zhaoqing, et al.
Published: (2025)
by: Zhu, Zhaoqing, et al.
Published: (2025)
Robust Fine-tuning for Pre-trained 3D Point Cloud Models
by: Zhang, Zhibo, et al.
Published: (2024)
by: Zhang, Zhibo, et al.
Published: (2024)
LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
by: Khodabandeh, Borna, et al.
Published: (2025)
by: Khodabandeh, Borna, et al.
Published: (2025)
Revisiting Continual Semantic Segmentation with Pre-trained Vision Models
by: Zhang, Duzhen, et al.
Published: (2025)
by: Zhang, Duzhen, et al.
Published: (2025)
Platypus: A Generalized Specialist Model for Reading Text in Various Forms
by: Wang, Peng, et al.
Published: (2024)
by: Wang, Peng, et al.
Published: (2024)
SepFormer: Coarse-to-fine Separator Regression Network for Table Structure Recognition
by: Nguyen, Nam Quan, et al.
Published: (2025)
by: Nguyen, Nam Quan, et al.
Published: (2025)
HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition
by: Zhang, Yuyi, et al.
Published: (2024)
by: Zhang, Yuyi, et al.
Published: (2024)
LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing
by: Ouyang, Liangyang, et al.
Published: (2025)
by: Ouyang, Liangyang, et al.
Published: (2025)
Fake It Right: Injecting Anatomical Logic into Synthetic Supervised Pre-training for Medical Segmentation
by: Tang, Jiaqi, et al.
Published: (2026)
by: Tang, Jiaqi, et al.
Published: (2026)
Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)
by: Fang, Yan, et al.
Published: (2026)
Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model
by: Kim, Jihun, et al.
Published: (2024)
by: Kim, Jihun, et al.
Published: (2024)
Micro-Expression Recognition by Motion Feature Extraction based on Pre-training
by: Li, Ruolin, et al.
Published: (2024)
by: Li, Ruolin, et al.
Published: (2024)
MAA: Meticulous Adversarial Attack against Vision-Language Pre-trained Models
by: Zhang, Peng-Fei, et al.
Published: (2025)
by: Zhang, Peng-Fei, et al.
Published: (2025)
Deep Radar Inverse Sensor Models for Dynamic Occupancy Grid Maps
by: Wei, Zihang, et al.
Published: (2023)
by: Wei, Zihang, et al.
Published: (2023)
Self-Supervised Pre-Training for Table Structure Recognition Transformer
by: Peng, ShengYun, et al.
Published: (2024)
by: Peng, ShengYun, et al.
Published: (2024)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)
by: Luo, Chuwei, et al.
Published: (2024)
Visual Text Generation in the Wild
by: Zhu, Yuanzhi, et al.
Published: (2024)
by: Zhu, Yuanzhi, et al.
Published: (2024)
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
by: Wang, Yeyuan, et al.
Published: (2024)
by: Wang, Yeyuan, et al.
Published: (2024)
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
by: Zhang, Peng-Fei, et al.
Published: (2024)
by: Zhang, Peng-Fei, et al.
Published: (2024)
Stylized Structural Patterns for Improved Neural Network Pre-training
by: Salehi, Farnood, et al.
Published: (2025)
by: Salehi, Farnood, et al.
Published: (2025)
Pre-training for Action Recognition with Automatically Generated Fractal Datasets
by: Svyezhentsev, Davyd, et al.
Published: (2024)
by: Svyezhentsev, Davyd, et al.
Published: (2024)
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
by: Yang, Xiaomeng, et al.
Published: (2023)
by: Yang, Xiaomeng, et al.
Published: (2023)
A Foundation Model for DAS Signal Recognition and Visual Prompt Tuning of the Pre-trained Model for Downstream Tasks
by: Gui, Kun, et al.
Published: (2025)
by: Gui, Kun, et al.
Published: (2025)
Multi-modal Multi-task Pre-training for Improved Point Cloud Understanding
by: Liu, Liwen, et al.
Published: (2025)
by: Liu, Liwen, et al.
Published: (2025)
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
by: Shen, Shufan, et al.
Published: (2025)
by: Shen, Shufan, et al.
Published: (2025)
Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models
by: Huang, He, et al.
Published: (2025)
by: Huang, He, et al.
Published: (2025)
ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning
by: Wang, Yeyuan, et al.
Published: (2025)
by: Wang, Yeyuan, et al.
Published: (2025)
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
by: Huang, Ziyuan, et al.
Published: (2024)
by: Huang, Ziyuan, et al.
Published: (2024)
Logos as a Well-Tempered Pre-train for Sign Language Recognition
by: Ovodov, Ilya, et al.
Published: (2025)
by: Ovodov, Ilya, et al.
Published: (2025)
AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays
by: Yi, Chenlang, et al.
Published: (2025)
by: Yi, Chenlang, et al.
Published: (2025)
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
by: Liu, Jihao, et al.
Published: (2024)
by: Liu, Jihao, et al.
Published: (2024)
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
by: Zheng, Weijie, et al.
Published: (2024)
by: Zheng, Weijie, et al.
Published: (2024)
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
by: Shen, Yufan, et al.
Published: (2024)
by: Shen, Yufan, et al.
Published: (2024)
SR-Stereo & DAPE: Stepwise Regression and Pre-trained Edges for Practical Stereo Matching
by: Xiao, Weiqing, et al.
Published: (2024)
by: Xiao, Weiqing, et al.
Published: (2024)
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition
by: Lu, Feng, et al.
Published: (2024)
by: Lu, Feng, et al.
Published: (2024)
Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
by: Gao, Zuan, et al.
Published: (2024)
by: Gao, Zuan, et al.
Published: (2024)
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
by: Haliassos, Alexandros, et al.
Published: (2024)
by: Haliassos, Alexandros, et al.
Published: (2024)
Similar Items
-
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
by: Long, Rujiao, et al.
Published: (2024) -
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
by: Shao, Zirui, et al.
Published: (2024) -
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
by: Wan, Jianqiang, et al.
Published: (2024) -
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
by: Zhu, Zhaoqing, et al.
Published: (2025) -
Robust Fine-tuning for Pre-trained 3D Point Cloud Models
by: Zhang, Zhibo, et al.
Published: (2024)