:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Uchida, Kengo, Shibuya, Takashi, Takida, Yuhta, Murata, Naoki, Tanke, Julian, Takahashi, Shusuke, Mitsufuji, Yuki
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.01867
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
by: Shibuya, Takashi, et al.
Published: (2023)

Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
by: Tanke, Julian, et al.
Published: (2025)

Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
by: Jain, Anubhav, et al.
Published: (2025)

Efficiency without Compromise: CLIP-aided Text-to-Image GANs with Increased Diversity
by: Kobayashi, Yuya, et al.
Published: (2025)

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
by: Takida, Yuhta, et al.
Published: (2023)

Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models
by: George, Naveen, et al.
Published: (2025)

Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
by: Tao, Zerui, et al.
Published: (2025)

Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion
by: Nguyen, Bac, et al.
Published: (2024)

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
by: Takida, Yuhta, et al.
Published: (2023)

GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning
by: Murata, Naoki, et al.
Published: (2026)

Zero- and Few-shot Sound Event Localization and Detection
by: Shimada, Kazuki, et al.
Published: (2023)

TraSCE: Trajectory Steering for Concept Erasure
by: Jain, Anubhav, et al.
Published: (2024)

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
by: Jain, Anubhav, et al.
Published: (2024)

SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator
by: Takida, Yuhta, et al.
Published: (2025)

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
by: Nguyen, Bac, et al.
Published: (2026)

Diffusion-based Signal Refiner for Speech Enhancement and Separation
by: Hirano, Masato, et al.
Published: (2023)

Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
by: Uesaka, Toshimitsu, et al.
Published: (2024)

G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving
by: Murata, Naoki, et al.
Published: (2024)

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation
by: Shimada, Kazuki, et al.
Published: (2024)

Noise Scheduling as Information-Guided Allocation in Diffusion Training
by: Raya, Gabriel, et al.
Published: (2026)

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
by: Saito, Koichi, et al.
Published: (2024)

Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation
by: Uppal, Anshuk, et al.
Published: (2025)

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher
by: Kim, Dongjun, et al.
Published: (2024)

Distillation of Discrete Diffusion through Dimensional Correlations
by: Hayakawa, Satoshi, et al.
Published: (2024)

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion
by: Hayakawa, Satoshi, et al.
Published: (2025)

VCT: Training Consistency Models with Variational Noise Coupling
by: Silvestri, Gianluigi, et al.
Published: (2025)

Theoretical Refinement of CLIP by Utilizing Linear Structure of Optimal Similarity
by: Yoshida, Naoki, et al.
Published: (2025)

A Unified View of Score-Based and Drifting Models
by: Lai, Chieh-Hsin, et al.
Published: (2026)

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
by: Kim, Dongjun, et al.
Published: (2023)

TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models
by: Simon, Christian, et al.
Published: (2025)

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
by: Cheuk, Kin Wai, et al.
Published: (2022)

$\textit{Jump Your Steps}$: Optimizing Sampling Schedule of Discrete Diffusion Models
by: Park, Yong-Hyun, et al.
Published: (2024)

MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation
by: Takahashi, Akira, et al.
Published: (2025)

Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
by: Nishigori, Shuichiro, et al.
Published: (2025)

Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits
by: Ishii, Masato, et al.
Published: (2025)

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
by: Yang, Shiqi, et al.
Published: (2024)

Large-Scale Training Data Attribution for Music Generative Models via Unlearning
by: Choi, Woosung, et al.
Published: (2025)

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
by: Shi, Hao, et al.
Published: (2023)

Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact
by: Rojas, Kevin, et al.
Published: (2025)

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
by: Ishii, Masato, et al.
Published: (2024)