Saved in:
| Main Authors: | Liu, Xiaohao, Xia, Xiaobo, Huang, Zhuo, Ng, See-Kiong, Chua, Tat-Seng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.18277 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Principled Multimodal Representation Learning
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Calibrated Multimodal Representation Learning with Missing Modalities
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Continual Multimodal Contrastive Learning
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
by: Liu, Renyang, et al.
Published: (2025)
by: Liu, Renyang, et al.
Published: (2025)
FreeAct: Freeing Activations for LLM Quantization
by: Liu, Xiaohao, et al.
Published: (2026)
by: Liu, Xiaohao, et al.
Published: (2026)
Extending Visual Dynamics for Video-to-Music Generation
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Omnimodal Dataset Distillation via High-order Proxy Alignment
by: Gao, Yuxuan, et al.
Published: (2026)
by: Gao, Yuxuan, et al.
Published: (2026)
Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory
by: Wang, Naibo, et al.
Published: (2024)
by: Wang, Naibo, et al.
Published: (2024)
One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity
by: Wang, Naibo, et al.
Published: (2024)
by: Wang, Naibo, et al.
Published: (2024)
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
by: Qu, Leigang, et al.
Published: (2025)
by: Qu, Leigang, et al.
Published: (2025)
SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models
by: Liu, Renyang, et al.
Published: (2026)
by: Liu, Renyang, et al.
Published: (2026)
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
by: Qu, Leigang, et al.
Published: (2024)
by: Qu, Leigang, et al.
Published: (2024)
Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration
by: He, Jinghan, et al.
Published: (2026)
by: He, Jinghan, et al.
Published: (2026)
Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models
by: Shi, Enyi, et al.
Published: (2026)
by: Shi, Enyi, et al.
Published: (2026)
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
by: Qu, Leigang, et al.
Published: (2024)
by: Qu, Leigang, et al.
Published: (2024)
Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations
by: Deng, Yang, et al.
Published: (2024)
by: Deng, Yang, et al.
Published: (2024)
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
Universal Scene Graph Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation
by: Hu, Minghui, et al.
Published: (2023)
by: Hu, Minghui, et al.
Published: (2023)
STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
by: Qiu, Haiyi, et al.
Published: (2024)
by: Qiu, Haiyi, et al.
Published: (2024)
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
by: Ghazanfari, Sara, et al.
Published: (2024)
by: Ghazanfari, Sara, et al.
Published: (2024)
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
by: Chu, Meng, et al.
Published: (2023)
by: Chu, Meng, et al.
Published: (2023)
Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2026)
by: Fang, Xiang, et al.
Published: (2026)
AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
by: Lyu, Jiayi, et al.
Published: (2026)
by: Lyu, Jiayi, et al.
Published: (2026)
Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art
by: Jin, Zhe, et al.
Published: (2025)
by: Jin, Zhe, et al.
Published: (2025)
FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
SEIS: Subspace-based Equivariance and Invariance Scores for Neural Representations
by: Lin, Huahua, et al.
Published: (2026)
by: Lin, Huahua, et al.
Published: (2026)
Disentangling Masked Autoencoders for Unsupervised Domain Generalization
by: Zhang, An, et al.
Published: (2024)
by: Zhang, An, et al.
Published: (2024)
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
by: Luo, Run, et al.
Published: (2025)
by: Luo, Run, et al.
Published: (2025)
DPL: Decoupled Prototype Learning for Enhancing Robustness of Vision-Language Transformers to Missing Modalities
by: Lu, Jueqing, et al.
Published: (2025)
by: Lu, Jueqing, et al.
Published: (2025)
Feature-based Graph Attention Networks Improve Online Continual Learning
by: Sim, Adjovi, et al.
Published: (2025)
by: Sim, Adjovi, et al.
Published: (2025)
UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation
by: Guo, Jiyu, et al.
Published: (2025)
by: Guo, Jiyu, et al.
Published: (2025)
LogiCode: an LLM-Driven Framework for Logical Anomaly Detection
by: Zhang, Yiheng, et al.
Published: (2024)
by: Zhang, Yiheng, et al.
Published: (2024)
Hierarchical Multi-Graphs Learning for Robust Group Re-Identification
by: Liu, Ruiqi, et al.
Published: (2024)
by: Liu, Ruiqi, et al.
Published: (2024)
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
by: Nezakati, Niki, et al.
Published: (2024)
by: Nezakati, Niki, et al.
Published: (2024)
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
by: Wang, Siyin, et al.
Published: (2025)
by: Wang, Siyin, et al.
Published: (2025)
ALScope: A Unified Toolkit for Deep Active Learning
by: Wu, Chenkai, et al.
Published: (2025)
by: Wu, Chenkai, et al.
Published: (2025)
VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones
by: Shen, Lefei, et al.
Published: (2025)
by: Shen, Lefei, et al.
Published: (2025)
Similar Items
-
Principled Multimodal Representation Learning
by: Liu, Xiaohao, et al.
Published: (2025) -
Calibrated Multimodal Representation Learning with Missing Modalities
by: Liu, Xiaohao, et al.
Published: (2025) -
Continual Multimodal Contrastive Learning
by: Liu, Xiaohao, et al.
Published: (2025) -
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
by: Liu, Renyang, et al.
Published: (2025) -
FreeAct: Freeing Activations for LLM Quantization
by: Liu, Xiaohao, et al.
Published: (2026)