:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Seungkwon, Park, GyuTae, Kim, Sangyeon, Nam, Seung-Hun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.02399
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification
by: Kim, Seungkwon, et al.
Published: (2024)

VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following
by: Hong, Hyesoo, et al.
Published: (2026)

SAFIRE: Segment Any Forged Image Region
by: Kwon, Myung-Joon, et al.
Published: (2024)

GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory
by: Yeo, Jeong Hun, et al.
Published: (2025)

Hierarchical Knowledge Graphs for Story Understanding in Visual Narratives
by: Chen, Yi-Chun
Published: (2025)

From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
by: Ji, Haonian, et al.
Published: (2025)

VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought
by: Lee, Eunsoo, et al.
Published: (2026)

FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games
by: Ahn, Jaewoo, et al.
Published: (2025)

SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement
by: Lei, Zeyu, et al.
Published: (2025)

VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
by: Zhang, Zhengbo, et al.
Published: (2026)

Preserving Old Memories in Vivid Detail: Human-Interactive Photo Restoration Framework
by: Back, Seung-Yeon, et al.
Published: (2024)

Topology-Preserving Polygon Augmentation for Segmentation in Structured Visual Domains
by: Laudari, Sudip, et al.
Published: (2026)

CoVis: A Collaborative Framework for Fine-grained Graphic Visual Understanding
by: Deng, Xiaoyu, et al.
Published: (2024)

Hand-object reconstruction via interaction-aware graph attention mechanism
by: Woo, Taeyun, et al.
Published: (2024)

VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation
by: Nam, Ju-Hyeon, et al.
Published: (2024)

MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports
by: Kyung, Sunggu, et al.
Published: (2025)

Bi-MCQ: Reformulating Vision-Language Alignment for Negation Understanding
by: Kim, Tae Hun, et al.
Published: (2026)

Leveraging Textual Compositional Reasoning for Robust Change Captioning
by: Park, Kyu Ri, et al.
Published: (2025)

VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models
by: Ghosal, Soumya Suvra, et al.
Published: (2026)

Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
by: Choi, Wonje, et al.
Published: (2024)

ViTA-PAR: Visual and Textual Attribute Alignment with Attribute Prompting for Pedestrian Attribute Recognition
by: Park, Minjeong, et al.
Published: (2025)

Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents
by: Kim, Dong-Hee, et al.
Published: (2026)

Scratching Visual Transformer's Back with Uniform Attention
by: Hyeon-Woo, Nam, et al.
Published: (2022)

Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation
by: Kim, Mingyu, et al.
Published: (2026)

LAN: Learning to Adapt Noise for Image Denoising
by: Kim, Changjin, et al.
Published: (2024)

Learning Context-Conditioned Predicate Semantics via Prototype Feedback
by: Jung, NamGyu, et al.
Published: (2026)

VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning
by: Dong, Mingkang, et al.
Published: (2026)

VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
by: Xie, Yupeng, et al.
Published: (2025)

3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation
by: Kim, Hwidong, et al.
Published: (2026)

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
by: Go, Hyojun, et al.
Published: (2025)

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
by: Park, Yeji, et al.
Published: (2024)

Real-Time Person Image Synthesis Using a Flow Matching Model
by: Jeong, Jiwoo, et al.
Published: (2025)

STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)

Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation
by: Zhang, Jianzhang, et al.
Published: (2026)

Multi-view Pyramid Transformer: Look Coarser to See Broader
by: Kang, Gyeongjin, et al.
Published: (2025)

COTTA: Context-Aware Transfer Adaptation for Trajectory Prediction in Autonomous Driving
by: Park, Seohyoung, et al.
Published: (2026)

VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
by: Hyeon-Woo, Nam, et al.
Published: (2024)

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space
by: Lim, Sohwi, et al.
Published: (2026)

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles
by: Shoouri, Sara, et al.
Published: (2026)

Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images
by: Kim, Donghwan, et al.
Published: (2024)