Saved in:
| Main Authors: | Park, Jun-Hyung, Park, Hyuntae, Kang, Youjin, Jeon, Eojin, Lee, SangKeun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.08021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Zero-shot Commonsense Reasoning over Machine Imagination
by: Park, Hyuntae, et al.
Published: (2024)
by: Park, Hyuntae, et al.
Published: (2024)
Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination
by: Park, Hyuntae, et al.
Published: (2026)
by: Park, Hyuntae, et al.
Published: (2026)
Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment
by: Park, Hyuntae, et al.
Published: (2025)
by: Park, Hyuntae, et al.
Published: (2025)
Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning
by: Kim, Nayeon, et al.
Published: (2025)
by: Kim, Nayeon, et al.
Published: (2025)
VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation
by: Nam, Ju-Hyeon, et al.
Published: (2024)
by: Nam, Ju-Hyeon, et al.
Published: (2024)
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
by: Jeon, Jaehyeong, et al.
Published: (2024)
by: Jeon, Jaehyeong, et al.
Published: (2024)
CAT: Contrastive Adapter Training for Personalized Image Generation
by: Park, Jae Wan, et al.
Published: (2024)
by: Park, Jae Wan, et al.
Published: (2024)
DIVE: Taming DINO for Subject-Driven Video Editing
by: Huang, Yi, et al.
Published: (2024)
by: Huang, Yi, et al.
Published: (2024)
VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM
by: Lee, Jeongwoo, et al.
Published: (2024)
by: Lee, Jeongwoo, et al.
Published: (2024)
A$^2$LC: Active and Automated Label Correction for Semantic Segmentation
by: Jeon, Youjin, et al.
Published: (2025)
by: Jeon, Youjin, et al.
Published: (2025)
Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model
by: Ko, Hyun-kyu, et al.
Published: (2025)
by: Ko, Hyun-kyu, et al.
Published: (2025)
Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration
by: Park, Juhan, et al.
Published: (2025)
by: Park, Juhan, et al.
Published: (2025)
Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding
by: Back, Kyungryul, et al.
Published: (2025)
by: Back, Kyungryul, et al.
Published: (2025)
ESC: Erasing Space Concept for Knowledge Deletion
by: Lee, Tae-Young, et al.
Published: (2025)
by: Lee, Tae-Young, et al.
Published: (2025)
VCD: A Dataset for Visual Commonsense Discovery in Images
by: Shen, Xiangqing, et al.
Published: (2024)
by: Shen, Xiangqing, et al.
Published: (2024)
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
by: Chen, Jiali, et al.
Published: (2024)
by: Chen, Jiali, et al.
Published: (2024)
MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection
by: Park, Jun Yeong, et al.
Published: (2026)
by: Park, Jun Yeong, et al.
Published: (2026)
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
by: Park, Jeongkyun, et al.
Published: (2023)
by: Park, Jeongkyun, et al.
Published: (2023)
Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning
by: Park, Min-Yeong, et al.
Published: (2024)
by: Park, Min-Yeong, et al.
Published: (2024)
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
by: Zhou, Kaiwen, et al.
Published: (2023)
by: Zhou, Kaiwen, et al.
Published: (2023)
A Study of Commonsense Reasoning over Visual Object Properties
by: Kolari, Abhishek, et al.
Published: (2025)
by: Kolari, Abhishek, et al.
Published: (2025)
PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators
by: Park, Keondo, et al.
Published: (2025)
by: Park, Keondo, et al.
Published: (2025)
Incorporating Domain Knowledge into Materials Tokenization
by: Oh, Yerim, et al.
Published: (2025)
by: Oh, Yerim, et al.
Published: (2025)
Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models
by: Jeon, Wooseok, et al.
Published: (2026)
by: Jeon, Wooseok, et al.
Published: (2026)
Unlocking Robust Semantic Segmentation Performance via Label-only Elastic Deformations against Implicit Label Noise
by: Kim, Yechan, et al.
Published: (2025)
by: Kim, Yechan, et al.
Published: (2025)
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
by: Fu, Xingyu, et al.
Published: (2024)
by: Fu, Xingyu, et al.
Published: (2024)
Activating Visual Context and Commonsense Reasoning through Masked Prediction in VLMs
by: Yu, Jiaao, et al.
Published: (2025)
by: Yu, Jiaao, et al.
Published: (2025)
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
by: Wang, Eileen, et al.
Published: (2024)
by: Wang, Eileen, et al.
Published: (2024)
See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
by: Park, Jaehyun, et al.
Published: (2026)
by: Park, Jaehyun, et al.
Published: (2026)
Inference-Time Diffusion Model Distillation
by: Park, Geon Yeong, et al.
Published: (2024)
by: Park, Geon Yeong, et al.
Published: (2024)
Open-Set Domain Adaptation for Semantic Segmentation
by: Choe, Seun-An, et al.
Published: (2024)
by: Choe, Seun-An, et al.
Published: (2024)
Investigating Long-term Training for Remote Sensing Object Detection
by: Park, JongHyun, et al.
Published: (2024)
by: Park, JongHyun, et al.
Published: (2024)
Towards Better Visualizing the Decision Basis of Networks via Unfold and Conquer Attribution Guidance
by: Hong, Jung-Ho, et al.
Published: (2023)
by: Hong, Jung-Ho, et al.
Published: (2023)
ViTA-PAR: Visual and Textual Attribute Alignment with Attribute Prompting for Pedestrian Attribute Recognition
by: Park, Minjeong, et al.
Published: (2025)
by: Park, Minjeong, et al.
Published: (2025)
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
by: Baek, Eunsu, et al.
Published: (2024)
by: Baek, Eunsu, et al.
Published: (2024)
MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports
by: Kyung, Sunggu, et al.
Published: (2025)
by: Kyung, Sunggu, et al.
Published: (2025)
MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science
by: Kim, Junho, et al.
Published: (2024)
by: Kim, Junho, et al.
Published: (2024)
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
by: Lee, Yebin, et al.
Published: (2024)
by: Lee, Yebin, et al.
Published: (2024)
No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather
by: Park, Junsung, et al.
Published: (2025)
by: Park, Junsung, et al.
Published: (2025)
AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
by: Park, Junho, et al.
Published: (2024)
by: Park, Junho, et al.
Published: (2024)
Similar Items
-
Zero-shot Commonsense Reasoning over Machine Imagination
by: Park, Hyuntae, et al.
Published: (2024) -
Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination
by: Park, Hyuntae, et al.
Published: (2026) -
Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment
by: Park, Hyuntae, et al.
Published: (2025) -
Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning
by: Kim, Nayeon, et al.
Published: (2025) -
VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation
by: Nam, Ju-Hyeon, et al.
Published: (2024)