Saved in:
| Main Author: | Furfaro, Fabien |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.15512 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PixelBytes: Catching Unified Representation for Multimodal Generation
by: Furfaro, Fabien
Published: (2024)
by: Furfaro, Fabien
Published: (2024)
Unified Multimodal Understanding via Byte-Pair Visual Encoding
by: Zhang, Wanpeng, et al.
Published: (2025)
by: Zhang, Wanpeng, et al.
Published: (2025)
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
by: Zhang, Wanpeng, et al.
Published: (2024)
by: Zhang, Wanpeng, et al.
Published: (2024)
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
by: Liu, Ye, et al.
Published: (2025)
by: Liu, Ye, et al.
Published: (2025)
Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN
by: Kang, Minsoo, et al.
Published: (2024)
by: Kang, Minsoo, et al.
Published: (2024)
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)
by: Li, Yi, et al.
Published: (2025)
GLaMM: Pixel Grounding Large Multimodal Model
by: Rasheed, Hanoona, et al.
Published: (2023)
by: Rasheed, Hanoona, et al.
Published: (2023)
Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models
by: Kim, Jeonghwan, et al.
Published: (2026)
by: Kim, Jeonghwan, et al.
Published: (2026)
Semantic Generative Tuning for Unified Multimodal Models
by: Yu, Songsong, et al.
Published: (2026)
by: Yu, Songsong, et al.
Published: (2026)
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents
by: Sun, Yuwei, et al.
Published: (2026)
by: Sun, Yuwei, et al.
Published: (2026)
Enhancing Multimodal Unified Representations for Cross Modal Generalization
by: Huang, Hai, et al.
Published: (2024)
by: Huang, Hai, et al.
Published: (2024)
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
by: Qu, Liao, et al.
Published: (2024)
by: Qu, Liao, et al.
Published: (2024)
Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation
by: Mao, Jiawei, et al.
Published: (2025)
by: Mao, Jiawei, et al.
Published: (2025)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
by: Xu, Weihan, et al.
Published: (2025)
by: Xu, Weihan, et al.
Published: (2025)
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
by: Huang, Xiaoshuang, et al.
Published: (2024)
by: Huang, Xiaoshuang, et al.
Published: (2024)
OmniCam: Unified Multimodal Video Generation via Camera Control
by: Yang, Xiaoda, et al.
Published: (2025)
by: Yang, Xiaoda, et al.
Published: (2025)
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
by: Zhang, Huichao, et al.
Published: (2026)
by: Zhang, Huichao, et al.
Published: (2026)
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
by: Xiao, Yicheng, et al.
Published: (2025)
by: Xiao, Yicheng, et al.
Published: (2025)
Archon: A Unified Multimodal Model for Holistic Digital Human Generation
by: Bao, Chong, et al.
Published: (2026)
by: Bao, Chong, et al.
Published: (2026)
Bridging Pixels and Words: Mask-Aware Local Semantic Fusion for Multimodal Media Verification
by: Chen, Zizhao, et al.
Published: (2026)
by: Chen, Zizhao, et al.
Published: (2026)
PixelGen: Improving Pixel Diffusion with Perceptual Supervision
by: Ma, Zehong, et al.
Published: (2026)
by: Ma, Zehong, et al.
Published: (2026)
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
by: Jiao, Yang, et al.
Published: (2025)
by: Jiao, Yang, et al.
Published: (2025)
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding
by: Xu, Chenkai, et al.
Published: (2025)
by: Xu, Chenkai, et al.
Published: (2025)
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
by: AI, Inclusion, et al.
Published: (2025)
by: AI, Inclusion, et al.
Published: (2025)
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
by: Jiang, Liyao, et al.
Published: (2024)
by: Jiang, Liyao, et al.
Published: (2024)
Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
by: Ryan, Yuriel, et al.
Published: (2025)
by: Ryan, Yuriel, et al.
Published: (2025)
PixelArena: A benchmark for Pixel-Precision Visual Intelligence
by: Liang, Feng, et al.
Published: (2025)
by: Liang, Feng, et al.
Published: (2025)
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
by: Wu, Chengyue, et al.
Published: (2024)
by: Wu, Chengyue, et al.
Published: (2024)
Pixel-Aligned Multi-View Generation with Depth Guided Decoder
by: Tang, Zhenggang, et al.
Published: (2024)
by: Tang, Zhenggang, et al.
Published: (2024)
L2P: Unlocking Latent Potential for Pixel Generation
by: Chen, Zhennan, et al.
Published: (2026)
by: Chen, Zhennan, et al.
Published: (2026)
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
by: Li, Yiheng, et al.
Published: (2024)
by: Li, Yiheng, et al.
Published: (2024)
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation
by: Zhao, Xiangyu, et al.
Published: (2024)
by: Zhao, Xiangyu, et al.
Published: (2024)
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
by: Pu, Yuandong, et al.
Published: (2025)
by: Pu, Yuandong, et al.
Published: (2025)
Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space
by: Zhang, Hong, et al.
Published: (2025)
by: Zhang, Hong, et al.
Published: (2025)
SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding
by: Sheng, Zihao, et al.
Published: (2025)
by: Sheng, Zihao, et al.
Published: (2025)
UrbanGraphEmbeddings: Learning and Evaluating Spatially Grounded Multimodal Embeddings for Urban Science
by: Zhang, Jie, et al.
Published: (2026)
by: Zhang, Jie, et al.
Published: (2026)
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
Understanding and Harnessing Sparsity in Unified Multimodal Models
by: He, Shwai, et al.
Published: (2025)
by: He, Shwai, et al.
Published: (2025)
ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation
by: Wang, Kaishen, et al.
Published: (2025)
by: Wang, Kaishen, et al.
Published: (2025)
Similar Items
-
PixelBytes: Catching Unified Representation for Multimodal Generation
by: Furfaro, Fabien
Published: (2024) -
Unified Multimodal Understanding via Byte-Pair Visual Encoding
by: Zhang, Wanpeng, et al.
Published: (2025) -
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
by: Zhang, Wanpeng, et al.
Published: (2024) -
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
by: Liu, Ye, et al.
Published: (2025) -
Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN
by: Kang, Minsoo, et al.
Published: (2024)