Saved in:
| Main Authors: | Kim, Seungkwon, Park, GyuTae, Kim, Sangyeon, Nam, Seung-Hun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.02399 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification
by: Kim, Seungkwon, et al.
Published: (2024)
by: Kim, Seungkwon, et al.
Published: (2024)
VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following
by: Hong, Hyesoo, et al.
Published: (2026)
by: Hong, Hyesoo, et al.
Published: (2026)
SAFIRE: Segment Any Forged Image Region
by: Kwon, Myung-Joon, et al.
Published: (2024)
by: Kwon, Myung-Joon, et al.
Published: (2024)
GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory
by: Yeo, Jeong Hun, et al.
Published: (2025)
by: Yeo, Jeong Hun, et al.
Published: (2025)
Hierarchical Knowledge Graphs for Story Understanding in Visual Narratives
by: Chen, Yi-Chun
Published: (2025)
by: Chen, Yi-Chun
Published: (2025)
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
by: Ji, Haonian, et al.
Published: (2025)
by: Ji, Haonian, et al.
Published: (2025)
VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought
by: Lee, Eunsoo, et al.
Published: (2026)
by: Lee, Eunsoo, et al.
Published: (2026)
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games
by: Ahn, Jaewoo, et al.
Published: (2025)
by: Ahn, Jaewoo, et al.
Published: (2025)
SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement
by: Lei, Zeyu, et al.
Published: (2025)
by: Lei, Zeyu, et al.
Published: (2025)
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
by: Zhang, Zhengbo, et al.
Published: (2026)
by: Zhang, Zhengbo, et al.
Published: (2026)
Preserving Old Memories in Vivid Detail: Human-Interactive Photo Restoration Framework
by: Back, Seung-Yeon, et al.
Published: (2024)
by: Back, Seung-Yeon, et al.
Published: (2024)
Topology-Preserving Polygon Augmentation for Segmentation in Structured Visual Domains
by: Laudari, Sudip, et al.
Published: (2026)
by: Laudari, Sudip, et al.
Published: (2026)
CoVis: A Collaborative Framework for Fine-grained Graphic Visual Understanding
by: Deng, Xiaoyu, et al.
Published: (2024)
by: Deng, Xiaoyu, et al.
Published: (2024)
Hand-object reconstruction via interaction-aware graph attention mechanism
by: Woo, Taeyun, et al.
Published: (2024)
by: Woo, Taeyun, et al.
Published: (2024)
VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation
by: Nam, Ju-Hyeon, et al.
Published: (2024)
by: Nam, Ju-Hyeon, et al.
Published: (2024)
MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports
by: Kyung, Sunggu, et al.
Published: (2025)
by: Kyung, Sunggu, et al.
Published: (2025)
Bi-MCQ: Reformulating Vision-Language Alignment for Negation Understanding
by: Kim, Tae Hun, et al.
Published: (2026)
by: Kim, Tae Hun, et al.
Published: (2026)
Leveraging Textual Compositional Reasoning for Robust Change Captioning
by: Park, Kyu Ri, et al.
Published: (2025)
by: Park, Kyu Ri, et al.
Published: (2025)
VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
by: Choi, Wonje, et al.
Published: (2024)
by: Choi, Wonje, et al.
Published: (2024)
ViTA-PAR: Visual and Textual Attribute Alignment with Attribute Prompting for Pedestrian Attribute Recognition
by: Park, Minjeong, et al.
Published: (2025)
by: Park, Minjeong, et al.
Published: (2025)
Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents
by: Kim, Dong-Hee, et al.
Published: (2026)
by: Kim, Dong-Hee, et al.
Published: (2026)
Scratching Visual Transformer's Back with Uniform Attention
by: Hyeon-Woo, Nam, et al.
Published: (2022)
by: Hyeon-Woo, Nam, et al.
Published: (2022)
Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation
by: Kim, Mingyu, et al.
Published: (2026)
by: Kim, Mingyu, et al.
Published: (2026)
LAN: Learning to Adapt Noise for Image Denoising
by: Kim, Changjin, et al.
Published: (2024)
by: Kim, Changjin, et al.
Published: (2024)
Learning Context-Conditioned Predicate Semantics via Prototype Feedback
by: Jung, NamGyu, et al.
Published: (2026)
by: Jung, NamGyu, et al.
Published: (2026)
VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning
by: Dong, Mingkang, et al.
Published: (2026)
by: Dong, Mingkang, et al.
Published: (2026)
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
by: Xie, Yupeng, et al.
Published: (2025)
by: Xie, Yupeng, et al.
Published: (2025)
3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation
by: Kim, Hwidong, et al.
Published: (2026)
by: Kim, Hwidong, et al.
Published: (2026)
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
by: Go, Hyojun, et al.
Published: (2025)
by: Go, Hyojun, et al.
Published: (2025)
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
by: Park, Yeji, et al.
Published: (2024)
by: Park, Yeji, et al.
Published: (2024)
Real-Time Person Image Synthesis Using a Flow Matching Model
by: Jeong, Jiwoo, et al.
Published: (2025)
by: Jeong, Jiwoo, et al.
Published: (2025)
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation
by: Zhang, Jianzhang, et al.
Published: (2026)
by: Zhang, Jianzhang, et al.
Published: (2026)
Multi-view Pyramid Transformer: Look Coarser to See Broader
by: Kang, Gyeongjin, et al.
Published: (2025)
by: Kang, Gyeongjin, et al.
Published: (2025)
COTTA: Context-Aware Transfer Adaptation for Trajectory Prediction in Autonomous Driving
by: Park, Seohyoung, et al.
Published: (2026)
by: Park, Seohyoung, et al.
Published: (2026)
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
by: Hyeon-Woo, Nam, et al.
Published: (2024)
by: Hyeon-Woo, Nam, et al.
Published: (2024)
CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space
by: Lim, Sohwi, et al.
Published: (2026)
by: Lim, Sohwi, et al.
Published: (2026)
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles
by: Shoouri, Sara, et al.
Published: (2026)
by: Shoouri, Sara, et al.
Published: (2026)
Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images
by: Kim, Donghwan, et al.
Published: (2024)
by: Kim, Donghwan, et al.
Published: (2024)
Similar Items
-
A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification
by: Kim, Seungkwon, et al.
Published: (2024) -
VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following
by: Hong, Hyesoo, et al.
Published: (2026) -
SAFIRE: Segment Any Forged Image Region
by: Kwon, Myung-Joon, et al.
Published: (2024) -
GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory
by: Yeo, Jeong Hun, et al.
Published: (2025) -
Hierarchical Knowledge Graphs for Story Understanding in Visual Narratives
by: Chen, Yi-Chun
Published: (2025)