:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Park, Jun-Hyung, Park, Hyuntae, Kang, Youjin, Jeon, Eojin, Lee, SangKeun
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.08021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Zero-shot Commonsense Reasoning over Machine Imagination
by: Park, Hyuntae, et al.
Published: (2024)

Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination
by: Park, Hyuntae, et al.
Published: (2026)

Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment
by: Park, Hyuntae, et al.
Published: (2025)

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning
by: Kim, Nayeon, et al.
Published: (2025)

VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation
by: Nam, Ju-Hyeon, et al.
Published: (2024)

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
by: Jeon, Jaehyeong, et al.
Published: (2024)

CAT: Contrastive Adapter Training for Personalized Image Generation
by: Park, Jae Wan, et al.
Published: (2024)

DIVE: Taming DINO for Subject-Driven Video Editing
by: Huang, Yi, et al.
Published: (2024)

VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM
by: Lee, Jeongwoo, et al.
Published: (2024)

A$^2$LC: Active and Automated Label Correction for Semantic Segmentation
by: Jeon, Youjin, et al.
Published: (2025)

Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model
by: Ko, Hyun-kyu, et al.
Published: (2025)

Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration
by: Park, Juhan, et al.
Published: (2025)

Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding
by: Back, Kyungryul, et al.
Published: (2025)

ESC: Erasing Space Concept for Knowledge Deletion
by: Lee, Tae-Young, et al.
Published: (2025)

VCD: A Dataset for Visual Commonsense Discovery in Images
by: Shen, Xiangqing, et al.
Published: (2024)

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
by: Chen, Jiali, et al.
Published: (2024)

MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection
by: Park, Jun Yeong, et al.
Published: (2026)

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
by: Park, Jeongkyun, et al.
Published: (2023)

Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning
by: Park, Min-Yeong, et al.
Published: (2024)

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
by: Zhou, Kaiwen, et al.
Published: (2023)

A Study of Commonsense Reasoning over Visual Object Properties
by: Kolari, Abhishek, et al.
Published: (2025)

PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators
by: Park, Keondo, et al.
Published: (2025)

Incorporating Domain Knowledge into Materials Tokenization
by: Oh, Yerim, et al.
Published: (2025)

Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models
by: Jeon, Wooseok, et al.
Published: (2026)

Unlocking Robust Semantic Segmentation Performance via Label-only Elastic Deformations against Implicit Label Noise
by: Kim, Yechan, et al.
Published: (2025)

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
by: Fu, Xingyu, et al.
Published: (2024)

Activating Visual Context and Commonsense Reasoning through Masked Prediction in VLMs
by: Yu, Jiaao, et al.
Published: (2025)

SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
by: Wang, Eileen, et al.
Published: (2024)

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
by: Park, Jaehyun, et al.
Published: (2026)

Inference-Time Diffusion Model Distillation
by: Park, Geon Yeong, et al.
Published: (2024)

Open-Set Domain Adaptation for Semantic Segmentation
by: Choe, Seun-An, et al.
Published: (2024)

Investigating Long-term Training for Remote Sensing Object Detection
by: Park, JongHyun, et al.
Published: (2024)

Towards Better Visualizing the Decision Basis of Networks via Unfold and Conquer Attribution Guidance
by: Hong, Jung-Ho, et al.
Published: (2023)

ViTA-PAR: Visual and Textual Attribute Alignment with Attribute Prompting for Pedestrian Attribute Recognition
by: Park, Minjeong, et al.
Published: (2025)

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
by: Baek, Eunsu, et al.
Published: (2024)

MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports
by: Kyung, Sunggu, et al.
Published: (2025)

MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science
by: Kim, Junho, et al.
Published: (2024)

FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
by: Lee, Yebin, et al.
Published: (2024)

No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather
by: Park, Junsung, et al.
Published: (2025)

AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
by: Park, Junho, et al.
Published: (2024)