Saved in:
| Main Authors: | Huang, Ching-Kai, Lin, Wen-Chieh, Lee, Yan-Cen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01298 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach
by: Chen, Chi-Han, et al.
Published: (2025)
by: Chen, Chi-Han, et al.
Published: (2025)
Mask Consistency Regularization in Object Removal
by: Yuan, Hua, et al.
Published: (2025)
by: Yuan, Hua, et al.
Published: (2025)
EraseLoRA: MLLM-Driven Foreground Exclusion and Background Subtype Aggregation for Dataset-Free Object Removal
by: Jo, Sanghyun, et al.
Published: (2025)
by: Jo, Sanghyun, et al.
Published: (2025)
Elysium: Exploring Object-level Perception in Videos via MLLM
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
POEM: Precise Object-level Editing via MLLM control
by: Schouten, Marco, et al.
Published: (2025)
by: Schouten, Marco, et al.
Published: (2025)
Multi-modal Motion Prediction using Temporal Ensembling with Learning-based Aggregation
by: Hong, Kai-Yin, et al.
Published: (2024)
by: Hong, Kai-Yin, et al.
Published: (2024)
ClickRemoval: An Interactive Open-Source Tool for Object Removal in Diffusion Models
by: Zhang, Ledun, et al.
Published: (2026)
by: Zhang, Ledun, et al.
Published: (2026)
Object Remover Performance Evaluation Methods using Class-wise Object Removal Images
by: Oh, Changsuk, et al.
Published: (2024)
by: Oh, Changsuk, et al.
Published: (2024)
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
by: Huang, Qihan, et al.
Published: (2025)
by: Huang, Qihan, et al.
Published: (2025)
CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning
by: Song, Qi, et al.
Published: (2025)
by: Song, Qi, et al.
Published: (2025)
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
by: Xu, Jingwei, et al.
Published: (2024)
by: Xu, Jingwei, et al.
Published: (2024)
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
by: Fang, Xinyu, et al.
Published: (2025)
by: Fang, Xinyu, et al.
Published: (2025)
Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition
by: Li, Xunsong, et al.
Published: (2024)
by: Li, Xunsong, et al.
Published: (2024)
DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
by: Huang, Chao, et al.
Published: (2025)
by: Huang, Chao, et al.
Published: (2025)
Grounded Visual Factualization: Factual Anchor-Based Finetuning for Enhancing MLLM Factual Consistency
by: Morbiato, Filippo, et al.
Published: (2025)
by: Morbiato, Filippo, et al.
Published: (2025)
GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
by: Wang, Yuxin, et al.
Published: (2024)
by: Wang, Yuxin, et al.
Published: (2024)
Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning
by: Huang, Qihan, et al.
Published: (2025)
by: Huang, Qihan, et al.
Published: (2025)
GenHOI: Towards Object-Consistent Hand-Object Interaction with Temporally Balanced and Spatially Selective Object Injection
by: Huang, Xuan, et al.
Published: (2026)
by: Huang, Xuan, et al.
Published: (2026)
Exploring Hierarchical Consistency and Unbiased Objectness for Open-Vocabulary Object Detection
by: Lee, Sanghoon, et al.
Published: (2026)
by: Lee, Sanghoon, et al.
Published: (2026)
Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction
by: Doh, Hyungjun, et al.
Published: (2025)
by: Doh, Hyungjun, et al.
Published: (2025)
Completing Visual Objects via Bridging Generation and Segmentation
by: Li, Xiang, et al.
Published: (2023)
by: Li, Xiang, et al.
Published: (2023)
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
by: Liu, Chunxu, et al.
Published: (2025)
by: Liu, Chunxu, et al.
Published: (2025)
Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
by: Hsiao, Tsu-Ching, et al.
Published: (2023)
by: Hsiao, Tsu-Ching, et al.
Published: (2023)
Shadow Removal Refinement via Material-Consistent Shadow Edges
by: Hu, Shilin, et al.
Published: (2024)
by: Hu, Shilin, et al.
Published: (2024)
Recovering Partially Corrupted Objects via Sketch-Guided Bidirectional Feature Interaction
by: Zhang, Yongle, et al.
Published: (2025)
by: Zhang, Yongle, et al.
Published: (2025)
Bootstrapping MLLM for Weakly-Supervised Class-Agnostic Object Counting
by: Zhang, Xiaowen, et al.
Published: (2026)
by: Zhang, Xiaowen, et al.
Published: (2026)
ReasonX: MLLM-Guided Intrinsic Image Decomposition
by: Dirik, Alara, et al.
Published: (2025)
by: Dirik, Alara, et al.
Published: (2025)
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
by: Dong, Bowen, et al.
Published: (2025)
by: Dong, Bowen, et al.
Published: (2025)
Static-Dynamic Class-level Perception Consistency in Video Semantic Segmentation
by: Cen, Zhigang, et al.
Published: (2024)
by: Cen, Zhigang, et al.
Published: (2024)
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
by: Luo, Xiangyang, et al.
Published: (2026)
by: Luo, Xiangyang, et al.
Published: (2026)
DeflareMamba: Hierarchical Vision Mamba for Contextually Consistent Lens Flare Removal
by: Huang, Yihang, et al.
Published: (2025)
by: Huang, Yihang, et al.
Published: (2025)
Traffic-MLLM: Curiosity-Regularized Supervised Learning for Traffic Scenario Case-Based Reasoning
by: Xiu, Waikit, et al.
Published: (2025)
by: Xiu, Waikit, et al.
Published: (2025)
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
by: Chi, Donghwan, et al.
Published: (2025)
by: Chi, Donghwan, et al.
Published: (2025)
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
by: Xun, Shuhang, et al.
Published: (2025)
by: Xun, Shuhang, et al.
Published: (2025)
MASRA: MLLM-Assisted Semantic-Relational Consistent Alignment for Video Temporal Grounding
by: Ran, Ran, et al.
Published: (2026)
by: Ran, Ran, et al.
Published: (2026)
HOMER: Homography-Based Efficient Multi-view 3D Object Removal
by: Ni, Jingcheng, et al.
Published: (2025)
by: Ni, Jingcheng, et al.
Published: (2025)
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
by: Zou, Yueying, et al.
Published: (2025)
by: Zou, Yueying, et al.
Published: (2025)
Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
by: Gu, Bohai, et al.
Published: (2026)
by: Gu, Bohai, et al.
Published: (2026)
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
GeoRemover: Removing Objects and Their Causal Visual Artifacts
by: Zhu, Zixin, et al.
Published: (2025)
by: Zhu, Zixin, et al.
Published: (2025)
Similar Items
-
Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach
by: Chen, Chi-Han, et al.
Published: (2025) -
Mask Consistency Regularization in Object Removal
by: Yuan, Hua, et al.
Published: (2025) -
EraseLoRA: MLLM-Driven Foreground Exclusion and Background Subtype Aggregation for Dataset-Free Object Removal
by: Jo, Sanghyun, et al.
Published: (2025) -
Elysium: Exploring Object-level Perception in Videos via MLLM
by: Wang, Han, et al.
Published: (2024) -
POEM: Precise Object-level Editing via MLLM control
by: Schouten, Marco, et al.
Published: (2025)