Saved in:
| Main Author: | Jung, Hyun-Ki |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.19503 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
YOLO-Drone: An Efficient Object Detection Approach Using the GhostHead Network for Drone Images
by: Jung, Hyun-Ki
Published: (2025)
by: Jung, Hyun-Ki
Published: (2025)
Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models
by: Oh, Hyun-Jic, et al.
Published: (2024)
by: Oh, Hyun-Jic, et al.
Published: (2024)
IIR-VLM: In-Context Instance-level Recognition for Large Vision-Language Models
by: Shi, Liang, et al.
Published: (2026)
by: Shi, Liang, et al.
Published: (2026)
VIRES: Video Instance Repainting via Sketch and Text Guided Generation
by: Weng, Shuchen, et al.
Published: (2024)
by: Weng, Shuchen, et al.
Published: (2024)
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
by: Lee, Hongjae, et al.
Published: (2025)
by: Lee, Hongjae, et al.
Published: (2025)
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
by: Moon, Seokha, et al.
Published: (2024)
by: Moon, Seokha, et al.
Published: (2024)
Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks
by: Jung, Aecheon, et al.
Published: (2025)
by: Jung, Aecheon, et al.
Published: (2025)
ReaMIL: Reasoning- and Evidence-Aware Multiple Instance Learning for Whole-Slide Histopathology
by: Jung, Hyun Do, et al.
Published: (2026)
by: Jung, Hyun Do, et al.
Published: (2026)
Retrieval-Enhanced Contrastive Vision-Text Models
by: Iscen, Ahmet, et al.
Published: (2023)
by: Iscen, Ahmet, et al.
Published: (2023)
Text-Guided Multi-Instance Learning for Scoliosis Screening via Gait Video Analysis
by: Li, Haiqing, et al.
Published: (2025)
by: Li, Haiqing, et al.
Published: (2025)
GALAR-TemporalNet v2: Anatomy-Guided Dual-Branch Temporal Classification with Bidirectional Mamba and Dual-Graph GCN for Video Capsule Endoscopy -- after competition results
by: Won, Jiye, et al.
Published: (2026)
by: Won, Jiye, et al.
Published: (2026)
HTR-VT: Handwritten Text Recognition with Vision Transformer
by: Li, Yuting, et al.
Published: (2024)
by: Li, Yuting, et al.
Published: (2024)
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
by: Cheng, Zesen, et al.
Published: (2024)
by: Cheng, Zesen, et al.
Published: (2024)
SIT-FER: Integration of Semantic-, Instance-, Text-level Information for Semi-supervised Facial Expression Recognition
by: Ding, Sixian, et al.
Published: (2025)
by: Ding, Sixian, et al.
Published: (2025)
Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition
by: Behzad, Muzammil
Published: (2025)
by: Behzad, Muzammil
Published: (2025)
Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis
by: Lee, Byung Hyun, et al.
Published: (2025)
by: Lee, Byung Hyun, et al.
Published: (2025)
HotSpotter - Patterned Species Instance Recognition
by: Crall, Jonathan P., et al.
Published: (2025)
by: Crall, Jonathan P., et al.
Published: (2025)
Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
by: Zhou, Yitong, et al.
Published: (2024)
by: Zhou, Yitong, et al.
Published: (2024)
UNIT: Unifying Image and Text Recognition in One Vision Encoder
by: Zhu, Yi, et al.
Published: (2024)
by: Zhu, Yi, et al.
Published: (2024)
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
by: Shang, Tianyi, et al.
Published: (2025)
by: Shang, Tianyi, et al.
Published: (2025)
Tag2Text: Guiding Vision-Language Model via Image Tagging
by: Huang, Xinyu, et al.
Published: (2023)
by: Huang, Xinyu, et al.
Published: (2023)
Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation
by: Chi, Ruting, et al.
Published: (2024)
by: Chi, Ruting, et al.
Published: (2024)
Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models
by: Waseda, Futa, et al.
Published: (2025)
by: Waseda, Futa, et al.
Published: (2025)
Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
by: Min, Seonghui, et al.
Published: (2024)
by: Min, Seonghui, et al.
Published: (2024)
P3T: Prototypical Point-level Prompt Tuning with Enhanced Generalization for 3D Vision-Language Models
by: Jung, Geunyoung, et al.
Published: (2026)
by: Jung, Geunyoung, et al.
Published: (2026)
Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
by: Yang, Mingkun, et al.
Published: (2024)
by: Yang, Mingkun, et al.
Published: (2024)
SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor
by: Cheng, Xianfu, et al.
Published: (2024)
by: Cheng, Xianfu, et al.
Published: (2024)
Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models
by: Waseda, Futa, et al.
Published: (2025)
by: Waseda, Futa, et al.
Published: (2025)
Depth-Guided Semi-Supervised Instance Segmentation
by: Chen, Xin, et al.
Published: (2024)
by: Chen, Xin, et al.
Published: (2024)
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
by: Zhao, Shuai, et al.
Published: (2023)
by: Zhao, Shuai, et al.
Published: (2023)
3DVLA: Enhancing Vision-Language-Action Models via 3D Spatial and Instance Understanding
by: Xia, Zhongyu, et al.
Published: (2026)
by: Xia, Zhongyu, et al.
Published: (2026)
RaDL: Relation-aware Disentangled Learning for Multi-Instance Text-to-Image Generation
by: Park, Geon, et al.
Published: (2025)
by: Park, Geon, et al.
Published: (2025)
Efficient and Accurate Scene Text Recognition with Cascaded-Transformers
by: Ozkan, Savas, et al.
Published: (2025)
by: Ozkan, Savas, et al.
Published: (2025)
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
by: Byung-Ki, Kwon, et al.
Published: (2025)
by: Byung-Ki, Kwon, et al.
Published: (2025)
Unified Multi-Foundation-Model Slide Representation for Pan-Cancer Recognition and Text-Guided Tumor Localization
by: Wang, Tianyang, et al.
Published: (2026)
by: Wang, Tianyang, et al.
Published: (2026)
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition
by: Zhang, Zhenrong, et al.
Published: (2024)
by: Zhang, Zhenrong, et al.
Published: (2024)
GMT: Guided Mask Transformer for Leaf Instance Segmentation
by: Chen, Feng, et al.
Published: (2024)
by: Chen, Feng, et al.
Published: (2024)
VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
by: Kim, Hanjung, et al.
Published: (2023)
by: Kim, Hanjung, et al.
Published: (2023)
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
by: Bosetti, Massimo, et al.
Published: (2024)
by: Bosetti, Massimo, et al.
Published: (2024)
Similar Items
-
YOLO-Drone: An Efficient Object Detection Approach Using the GhostHead Network for Drone Images
by: Jung, Hyun-Ki
Published: (2025) -
Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models
by: Oh, Hyun-Jic, et al.
Published: (2024) -
IIR-VLM: In-Context Instance-level Recognition for Large Vision-Language Models
by: Shi, Liang, et al.
Published: (2026) -
VIRES: Video Instance Repainting via Sketch and Text Guided Generation
by: Weng, Shuchen, et al.
Published: (2024) -
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
by: Lee, Hongjae, et al.
Published: (2025)