Saved in:
| Main Authors: | Yang, Nianzu, Li, Pandeng, Zhao, Liming, Li, Yang, Xie, Chen-Wei, Tang, Yehui, Lu, Xudong, Liu, Zhihang, Zheng, Yun, Liu, Yu, Yan, Junchi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.03708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
by: Liu, Zhihang, et al.
Published: (2025)
by: Liu, Zhihang, et al.
Published: (2025)
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
by: Liu, Zhihang, et al.
Published: (2025)
by: Liu, Zhihang, et al.
Published: (2025)
Molecule Generation for Drug Design: a Graph Learning Perspective
by: Yang, Nianzu, et al.
Published: (2022)
by: Yang, Nianzu, et al.
Published: (2022)
EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning
by: Chen, Chao, et al.
Published: (2023)
by: Chen, Chao, et al.
Published: (2023)
GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)
by: Jiang, Kaixun, et al.
Published: (2026)
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement
by: Liu, Zhihang, et al.
Published: (2025)
by: Liu, Zhihang, et al.
Published: (2025)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach
by: Lu, Han, et al.
Published: (2024)
by: Lu, Han, et al.
Published: (2024)
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
Improved Video VAE for Latent Video Diffusion Model
by: Wu, Pingyu, et al.
Published: (2024)
by: Wu, Pingyu, et al.
Published: (2024)
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
by: Tang, Hao, et al.
Published: (2025)
by: Tang, Hao, et al.
Published: (2025)
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
by: Yang, Nianzu, et al.
Published: (2024)
by: Yang, Nianzu, et al.
Published: (2024)
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer
by: Gu, Chenyang, et al.
Published: (2026)
by: Gu, Chenyang, et al.
Published: (2026)
Motion Control for Enhanced Complex Action Video Generation
by: Zhou, Qiang, et al.
Published: (2024)
by: Zhou, Qiang, et al.
Published: (2024)
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
by: Tang, Hao, et al.
Published: (2025)
by: Tang, Hao, et al.
Published: (2025)
TerDiT: Ternary Diffusion Models with Transformers
by: Lu, Xudong, et al.
Published: (2024)
by: Lu, Xudong, et al.
Published: (2024)
Valuation of Exotic Options and Counterparty Games Based on Conditional Diffusion
by: Zhao, Helin, et al.
Published: (2025)
by: Zhao, Helin, et al.
Published: (2025)
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
by: Wang, Haoxuan, et al.
Published: (2024)
by: Wang, Haoxuan, et al.
Published: (2024)
The behavior of rich-club coefficient in scale-free networks
by: Liu, Zhihang, et al.
Published: (2023)
by: Liu, Zhihang, et al.
Published: (2023)
NeuroCEDT
by: Nianzu, Qu
Published: (2025)
by: Nianzu, Qu
Published: (2025)
Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space
by: Li, Yan, et al.
Published: (2025)
by: Li, Yan, et al.
Published: (2025)
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation
by: Liao, Zhaohe, et al.
Published: (2026)
by: Liao, Zhaohe, et al.
Published: (2026)
Diffusion-based Synthetic Data Generation for Visible-Infrared Person Re-Identification
by: Dai, Wenbo, et al.
Published: (2025)
by: Dai, Wenbo, et al.
Published: (2025)
The Chern Sectional Curvature of a Hermitian Manifold
by: Cao, Pandeng, et al.
Published: (2022)
by: Cao, Pandeng, et al.
Published: (2022)
An Embeddable Implicit IUVD Representation for Part-based 3D Human Surface Reconstruction
by: Li, Baoxing, et al.
Published: (2024)
by: Li, Baoxing, et al.
Published: (2024)
Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives
by: Liu, Yajiao, et al.
Published: (2025)
by: Liu, Yajiao, et al.
Published: (2025)
EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation
by: Li, Yan, et al.
Published: (2026)
by: Li, Yan, et al.
Published: (2026)
Multi-Granularity Semantic Revision for Large Language Model Distillation
by: Liu, Xiaoyu, et al.
Published: (2024)
by: Liu, Xiaoyu, et al.
Published: (2024)
Diffusion as Reasoning: Enhancing Object Navigation via Diffusion Model Conditioned on LLM-based Object-Room Knowledge
by: Ji, Yiming, et al.
Published: (2024)
by: Ji, Yiming, et al.
Published: (2024)
Saliency-driven Dynamic Token Pruning for Large Language Models
by: Tao, Yao, et al.
Published: (2025)
by: Tao, Yao, et al.
Published: (2025)
Rethinking Security of Diffusion-based Generative Steganography
by: Zhu, Jihao, et al.
Published: (2026)
by: Zhu, Jihao, et al.
Published: (2026)
Boundary Matters: A Bi-Level Active Finetuning Framework
by: Lu, Han, et al.
Published: (2024)
by: Lu, Han, et al.
Published: (2024)
RNDiff: Rainfall nowcasting with Condition Diffusion Model
by: Ling, Xudong, et al.
Published: (2024)
by: Ling, Xudong, et al.
Published: (2024)
Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation
by: Tang, Rong, et al.
Published: (2024)
by: Tang, Rong, et al.
Published: (2024)
Local Conditional Controlling for Text-to-Image Diffusion Models
by: Zhao, Yibo, et al.
Published: (2023)
by: Zhao, Yibo, et al.
Published: (2023)
VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate
by: Yuan, Zhihang, et al.
Published: (2025)
by: Yuan, Zhihang, et al.
Published: (2025)
DATE: Dynamic Absolute Time Enhancement for Long Video Understanding
by: Yuan, Chao, et al.
Published: (2025)
by: Yuan, Chao, et al.
Published: (2025)
Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier?
by: Lu, Junchi, et al.
Published: (2025)
by: Lu, Junchi, et al.
Published: (2025)
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
by: Yang, Cheng, et al.
Published: (2025)
by: Yang, Cheng, et al.
Published: (2025)
UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
by: Zhao, Min, et al.
Published: (2025)
by: Zhao, Min, et al.
Published: (2025)
ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models
by: Zheng, Kangjie, et al.
Published: (2025)
by: Zheng, Kangjie, et al.
Published: (2025)
Similar Items
-
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
by: Liu, Zhihang, et al.
Published: (2025) -
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
by: Liu, Zhihang, et al.
Published: (2025) -
Molecule Generation for Drug Design: a Graph Learning Perspective
by: Yang, Nianzu, et al.
Published: (2022) -
EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning
by: Chen, Chao, et al.
Published: (2023) -
GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)