Saved in:
| Main Authors: | Yang, Pei, Ci, Hai, Shou, Mike Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.09241 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
by: Yang, Pei, et al.
Published: (2025)
by: Yang, Pei, et al.
Published: (2025)
H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
by: Ci, Hai, et al.
Published: (2025)
by: Ci, Hai, et al.
Published: (2025)
Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?
by: Yang, Pei, et al.
Published: (2024)
by: Yang, Pei, et al.
Published: (2024)
OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization
by: Xing, Jiazheng, et al.
Published: (2025)
by: Xing, Jiazheng, et al.
Published: (2025)
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
by: Hu, Siyuan, et al.
Published: (2024)
by: Hu, Siyuan, et al.
Published: (2024)
AUTO-Explorer: Automated Data Collection for GUI Agent
by: Guo, Xiangwu, et al.
Published: (2025)
by: Guo, Xiangwu, et al.
Published: (2025)
Image Watermarks are Removable Using Controllable Regeneration from Clean Noise
by: Liu, Yepeng, et al.
Published: (2024)
by: Liu, Yepeng, et al.
Published: (2024)
ProcessPainter: Learn Painting Process from Sequence Data
by: Song, Yiren, et al.
Published: (2024)
by: Song, Yiren, et al.
Published: (2024)
X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
by: Yang, Pei, et al.
Published: (2025)
by: Yang, Pei, et al.
Published: (2025)
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
by: Song, Yiren, et al.
Published: (2024)
by: Song, Yiren, et al.
Published: (2024)
RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification
by: Ci, Hai, et al.
Published: (2024)
by: Ci, Hai, et al.
Published: (2024)
Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation
by: Song, Yiren, et al.
Published: (2024)
by: Song, Yiren, et al.
Published: (2024)
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
by: Liu, Ye, et al.
Published: (2025)
by: Liu, Ye, et al.
Published: (2025)
DeepDefense: Layer-Wise Gradient-Feature Alignment for Building Robust Neural Networks
by: Lin, Ci, et al.
Published: (2025)
by: Lin, Ci, et al.
Published: (2025)
An Empirical Study of Agent Developer Practices in AI Agent Frameworks
by: Wang, Yanlin, et al.
Published: (2025)
by: Wang, Yanlin, et al.
Published: (2025)
DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection
by: Ci, Hai, et al.
Published: (2025)
by: Ci, Hai, et al.
Published: (2025)
WMAdapter: Adding WaterMark Control to Latent Diffusion Models
by: Ci, Hai, et al.
Published: (2024)
by: Ci, Hai, et al.
Published: (2024)
UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining
by: Yang, Pei, et al.
Published: (2026)
by: Yang, Pei, et al.
Published: (2026)
Impossible Videos
by: Bai, Zechen, et al.
Published: (2025)
by: Bai, Zechen, et al.
Published: (2025)
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
by: Zhao, Henry Hengyuan, et al.
Published: (2023)
by: Zhao, Henry Hengyuan, et al.
Published: (2023)
An Empirical Study of Agent Skills for Healthcare: Practice, Gaps, and Governance
by: Xu, Gelei, et al.
Published: (2026)
by: Xu, Gelei, et al.
Published: (2026)
Token Economics for LLM Agents: A Dual-View Study from Computing and Economics
by: Chen, Yuxi, et al.
Published: (2026)
by: Chen, Yuxi, et al.
Published: (2026)
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
by: Zhao, Henry Hengyuan, et al.
Published: (2025)
by: Zhao, Henry Hengyuan, et al.
Published: (2025)
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
by: Zeng, Ziyun, et al.
Published: (2026)
by: Zeng, Ziyun, et al.
Published: (2026)
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
by: Ouyang, Mingyu, et al.
Published: (2026)
by: Ouyang, Mingyu, et al.
Published: (2026)
CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing
by: Zheng, Zhengfei, et al.
Published: (2021)
by: Zheng, Zhengfei, et al.
Published: (2021)
ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands
by: Hu, Siyuan, et al.
Published: (2025)
by: Hu, Siyuan, et al.
Published: (2025)
An Empirical Study of Multi-Agent Collaboration for Automated Research
by: Shen, Yang, et al.
Published: (2026)
by: Shen, Yang, et al.
Published: (2026)
OAgents: An Empirical Study of Building Effective Agents
by: Zhu, He, et al.
Published: (2025)
by: Zhu, He, et al.
Published: (2025)
MachaGrasp: Morphology-Aware Cross-Embodiment Dexterous Hand Articulation Generation for Grasping
by: Zhang, Heng, et al.
Published: (2025)
by: Zhang, Heng, et al.
Published: (2025)
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
by: Wang, Jiaqi, et al.
Published: (2025)
by: Wang, Jiaqi, et al.
Published: (2025)
Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
by: Zeng, Ziyun, et al.
Published: (2025)
by: Zeng, Ziyun, et al.
Published: (2025)
Paper2Video: Automatic Video Generation from Scientific Papers
by: Zhu, Zeyu, et al.
Published: (2025)
by: Zhu, Zeyu, et al.
Published: (2025)
Context-Length Robustness in Question Answering Models: A Comparative Empirical Study
by: Dhara, Trishita, et al.
Published: (2026)
by: Dhara, Trishita, et al.
Published: (2026)
Can RL Improve Generalization of LLM Agents? An Empirical Study
by: Xi, Zhiheng, et al.
Published: (2026)
by: Xi, Zhiheng, et al.
Published: (2026)
Contextualized Privacy Defense for LLM Agents
by: Wen, Yule, et al.
Published: (2026)
by: Wen, Yule, et al.
Published: (2026)
Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)
by: Chen, Yanzhe, et al.
Published: (2025)
Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems
by: Ma, Yue, et al.
Published: (2026)
by: Ma, Yue, et al.
Published: (2026)
Empirical Computation
by: Tang, Eric, et al.
Published: (2025)
by: Tang, Eric, et al.
Published: (2025)
Olaf-World: Orienting Latent Actions for Video World Modeling
by: Jiang, Yuxin, et al.
Published: (2026)
by: Jiang, Yuxin, et al.
Published: (2026)
Similar Items
-
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
by: Yang, Pei, et al.
Published: (2025) -
H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
by: Ci, Hai, et al.
Published: (2025) -
Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?
by: Yang, Pei, et al.
Published: (2024) -
OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization
by: Xing, Jiazheng, et al.
Published: (2025) -
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
by: Hu, Siyuan, et al.
Published: (2024)