:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Chen, Sixiang, Bai, Jinbin, Zhao, Zhuoran, Ye, Tian, Shi, Qingyu, Zhou, Donghao, Chai, Wenhao, Lin, Xin, Wu, Jianzong, Tang, Chao, Xu, Shilin, Zhang, Tao, Yuan, Haobo, Zhou, Yikang, Chow, Wei, Li, Linfeng, Li, Xiangtai, Zhu, Lei, Qi, Lu
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2504.05979
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
di: Shi, Qingyu, et al.
Pubblicazione: (2025)

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer
di: Shi, Qingyu, et al.
Pubblicazione: (2025)

DreamRelation: Bridging Customization and Relation Generation
di: Shi, Qingyu, et al.
Pubblicazione: (2024)

EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing
di: Chow, Wei, et al.
Pubblicazione: (2025)

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
di: Xu, Shilin, et al.
Pubblicazione: (2025)

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
di: Bai, Jinbin, et al.
Pubblicazione: (2024)

Masked Generative Transformer Is What You Need for Image Editing
di: Chow, Wei, et al.
Pubblicazione: (2026)

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
di: Bai, Jinbin, et al.
Pubblicazione: (2024)

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
di: Niu, Quanzhu, et al.
Pubblicazione: (2025)

Dense360: Dense Understanding from Omnidirectional Panoramas
di: Zhou, Yikang, et al.
Pubblicazione: (2025)

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
di: Zhou, Yikang, et al.
Pubblicazione: (2025)

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
di: Zhou, Yikang, et al.
Pubblicazione: (2024)

Conditional Panoramic Image Generation via Masked Autoregressive Modeling
di: Wang, Chaoyang, et al.
Pubblicazione: (2025)

RecTok: Reconstruction Distillation along Rectified Flow
di: Shi, Qingyu, et al.
Pubblicazione: (2025)

LLAVADI: What Matters For Multimodal Large Language Models Distillation
di: Xu, Shilin, et al.
Pubblicazione: (2024)

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
di: Li, Xiangtai, et al.
Pubblicazione: (2025)

Towards Open Vocabulary Learning: A Survey
di: Wu, Jianzong, et al.
Pubblicazione: (2023)

4th PVUW MeViS 3rd Place Report: Sa2VA
di: Yuan, Haobo, et al.
Pubblicazione: (2025)

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
di: Wu, Jianzong, et al.
Pubblicazione: (2024)

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models
di: Bai, Jinbin, et al.
Pubblicazione: (2026)

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
di: Xu, Shilin, et al.
Pubblicazione: (2024)

Point Cloud Mamba: Point Cloud Learning via State Space Model
di: Zhang, Tao, et al.
Pubblicazione: (2024)

Threshold-Guided Optimization for Visual Generative Models
di: Bai, Jinbin, et al.
Pubblicazione: (2026)

LLMs are Bug Replicators: An Empirical Study on LLMs' Capability in Completing Bug-prone Code
di: Guo, Liwei, et al.
Pubblicazione: (2025)

SAMTok: Representing Any Mask with Two Words
di: Zhou, Yikang, et al.
Pubblicazione: (2026)

MotionBooth: Motion-Aware Customized Text-to-Video Generation
di: Wu, Jianzong, et al.
Pubblicazione: (2024)

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
di: Yuan, Haobo, et al.
Pubblicazione: (2024)

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
di: Li, Xiangtai, et al.
Pubblicazione: (2023)

Integrating View Conditions for Image Synthesis
di: Bai, Jinbin, et al.
Pubblicazione: (2023)

Towards Customized Multimodal Role-Play
di: Tang, Chao, et al.
Pubblicazione: (2026)

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
di: Yuan, Haobo, et al.
Pubblicazione: (2025)

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement
di: Lin, Yunlong, et al.
Pubblicazione: (2024)

UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning
di: Jiang, Zhongyu, et al.
Pubblicazione: (2025)

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
di: Zhou, Donghao, et al.
Pubblicazione: (2024)

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
di: Chen, Yicheng, et al.
Pubblicazione: (2024)

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model
di: Yuan, Haobo, et al.
Pubblicazione: (2024)

From Masks to Worlds: A Hitchhiker's Guide to World Models
di: Bai, Jinbin, et al.
Pubblicazione: (2025)

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
di: Chow, Wei, et al.
Pubblicazione: (2025)

SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track
di: Gong, Dengxian, et al.
Pubblicazione: (2026)

EncGPT: A Multi-Agent Workflow for Dynamic Encryption Algorithms
di: Li, Donghe, et al.
Pubblicazione: (2025)