Saved in:
| Main Authors: | Wang, Kaishen, Xia, Xun, Liu, Jian, Yi, Zhang, He, Tao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.13392 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models
by: Yu, Le, et al.
Published: (2025)
by: Yu, Le, et al.
Published: (2025)
Enhancing Feature Fusion of U-like Networks with Dynamic Skip Connections
by: Cao, Yue, et al.
Published: (2025)
by: Cao, Yue, et al.
Published: (2025)
FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks
by: He, Quansong, et al.
Published: (2025)
by: He, Quansong, et al.
Published: (2025)
Unsafe by Reciprocity: How Generation-Understanding Coupling Undermines Safety in Unified Multimodal Models
by: Wang, Kaishen, et al.
Published: (2026)
by: Wang, Kaishen, et al.
Published: (2026)
$Δ$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation
by: Zhu, Yijie, et al.
Published: (2026)
by: Zhu, Yijie, et al.
Published: (2026)
Layered 3D Human Generation via Semantic-Aware Diffusion Model
by: Wang, Yi, et al.
Published: (2023)
by: Wang, Yi, et al.
Published: (2023)
Optimizing Vision-Language Consistency via Cross-Layer Regional Attention Alignment
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode
by: Huang, Junjia, et al.
Published: (2025)
by: Huang, Junjia, et al.
Published: (2025)
Multi-Layer Dense Attention Decoder for Polyp Segmentation
by: Patel, Krushi, et al.
Published: (2024)
by: Patel, Krushi, et al.
Published: (2024)
AU-LLM: Micro-Expression Action Unit Detection via Enhanced LLM-Based Feature Fusion
by: Liu, Zhishu, et al.
Published: (2025)
by: Liu, Zhishu, et al.
Published: (2025)
BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP
by: Xia, Tian, et al.
Published: (2025)
by: Xia, Tian, et al.
Published: (2025)
HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model
by: Wang, Yi, et al.
Published: (2024)
by: Wang, Yi, et al.
Published: (2024)
Advancing Prompt Learning through an External Layer
by: Cui, Fangming, et al.
Published: (2024)
by: Cui, Fangming, et al.
Published: (2024)
LayerAnimate: Layer-level Control for Animation
by: Yang, Yuxue, et al.
Published: (2025)
by: Yang, Yuxue, et al.
Published: (2025)
LayerComposer: Multi-Human Personalized Generation via Layered Canvas
by: Qian, Guocheng Gordon, et al.
Published: (2025)
by: Qian, Guocheng Gordon, et al.
Published: (2025)
RevealLayer: Disentangling Hidden and Visible Layers via Occlusion-Aware Image Decomposition
by: Wang, Binhao, et al.
Published: (2026)
by: Wang, Binhao, et al.
Published: (2026)
LAMM-ViT: AI Face Detection via Layer-Aware Modulation of Region-Guided Attention
by: Zhang, Jiangling, et al.
Published: (2025)
by: Zhang, Jiangling, et al.
Published: (2025)
PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics
by: Xie, Tianyidan, et al.
Published: (2026)
by: Xie, Tianyidan, et al.
Published: (2026)
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
by: Yin, Shengming, et al.
Published: (2025)
by: Yin, Shengming, et al.
Published: (2025)
Q Cache: Visual Attention is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model
by: Zhuang, Jiedong, et al.
Published: (2026)
by: Zhuang, Jiedong, et al.
Published: (2026)
Radiology Report Generation with Layer-Wise Anatomical Attention
by: Muñiz-De-León, Emmanuel D., et al.
Published: (2025)
by: Muñiz-De-León, Emmanuel D., et al.
Published: (2025)
MUFASA: A Multi-Layer Framework for Slot Attention
by: Bock, Sebastian, et al.
Published: (2026)
by: Bock, Sebastian, et al.
Published: (2026)
Reasoning Resides in Layers: Restoring Temporal Reasoning in Video-Language Models with Layer-Selective Merging
by: Fu, Zihang, et al.
Published: (2026)
by: Fu, Zihang, et al.
Published: (2026)
Pruning Self-attentions into Convolutional Layers in Single Path
by: He, Haoyu, et al.
Published: (2021)
by: He, Haoyu, et al.
Published: (2021)
Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers
by: Zhang, Chunyang, et al.
Published: (2025)
by: Zhang, Chunyang, et al.
Published: (2025)
Move Anything with Layered Scene Diffusion
by: Ren, Jiawei, et al.
Published: (2024)
by: Ren, Jiawei, et al.
Published: (2024)
LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning
by: Fu, Fengyi, et al.
Published: (2025)
by: Fu, Fengyi, et al.
Published: (2025)
AU-TTT: Vision Test-Time Training model for Facial Action Unit Detection
by: Xing, Bohao, et al.
Published: (2025)
by: Xing, Bohao, et al.
Published: (2025)
AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition
by: Liu, Zhishu, et al.
Published: (2026)
by: Liu, Zhishu, et al.
Published: (2026)
Split-Layer: Enhancing Implicit Neural Representation by Maximizing the Dimensionality of Feature Space
by: Cai, Zhicheng, et al.
Published: (2025)
by: Cai, Zhicheng, et al.
Published: (2025)
MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
by: Liu, Zichen, et al.
Published: (2025)
by: Liu, Zichen, et al.
Published: (2025)
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
by: Song, Yiren, et al.
Published: (2025)
by: Song, Yiren, et al.
Published: (2025)
LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge
by: Kang, Kyoungkook, et al.
Published: (2025)
by: Kang, Kyoungkook, et al.
Published: (2025)
LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors
by: Dalva, Yusuf, et al.
Published: (2024)
by: Dalva, Yusuf, et al.
Published: (2024)
Over++: Generative Video Compositing for Layer Interaction Effects
by: Qi, Luchao, et al.
Published: (2025)
by: Qi, Luchao, et al.
Published: (2025)
LiWi: Layering in the Wild
by: He, Yu, et al.
Published: (2026)
by: He, Yu, et al.
Published: (2026)
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
by: Zhang, Yuting, et al.
Published: (2025)
by: Zhang, Yuting, et al.
Published: (2025)
Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding
by: Tong, Bingkui, et al.
Published: (2025)
by: Tong, Bingkui, et al.
Published: (2025)
SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
by: Zhang, Jian, et al.
Published: (2026)
by: Zhang, Jian, et al.
Published: (2026)
ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation
by: Wang, Kaishen, et al.
Published: (2025)
by: Wang, Kaishen, et al.
Published: (2025)
Similar Items
-
HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models
by: Yu, Le, et al.
Published: (2025) -
Enhancing Feature Fusion of U-like Networks with Dynamic Skip Connections
by: Cao, Yue, et al.
Published: (2025) -
FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks
by: He, Quansong, et al.
Published: (2025) -
Unsafe by Reciprocity: How Generation-Understanding Coupling Undermines Safety in Unified Multimodal Models
by: Wang, Kaishen, et al.
Published: (2026) -
$Δ$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation
by: Zhu, Yijie, et al.
Published: (2026)