Saved in:
| Main Authors: | Tao, Xingjian, Wang, Yiwei, Cai, Yujun, Yang, Zhicheng, Tang, Jing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.15425 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
by: Tao, Xingjian, et al.
Published: (2024)
by: Tao, Xingjian, et al.
Published: (2024)
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
by: Tao, Xingjian, et al.
Published: (2026)
by: Tao, Xingjian, et al.
Published: (2026)
Mitigating Coordinate Prediction Bias from Positional Encoding Failures
by: Tao, Xingjian, et al.
Published: (2025)
by: Tao, Xingjian, et al.
Published: (2025)
Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs
by: Dan, Nifu, et al.
Published: (2025)
by: Dan, Nifu, et al.
Published: (2025)
SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking
by: Li, Sifan, et al.
Published: (2025)
by: Li, Sifan, et al.
Published: (2025)
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
by: Li, Zhecheng, et al.
Published: (2025)
by: Li, Zhecheng, et al.
Published: (2025)
Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
by: Sun, Bowen, et al.
Published: (2025)
by: Sun, Bowen, et al.
Published: (2025)
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
GeoSVG-RL: Geometry-Aware Reinforcement Learning for Layout-Constrained Text-to-SVG Diagram Generation
by: Li, Sifan, et al.
Published: (2026)
by: Li, Sifan, et al.
Published: (2026)
Self-Manager: Parallel Agent Loop for Long-form Deep Research
by: Xu, Yilong, et al.
Published: (2026)
by: Xu, Yilong, et al.
Published: (2026)
Structured Attention Matters to Multimodal LLMs in Document Understanding
by: Liu, Chang, et al.
Published: (2025)
by: Liu, Chang, et al.
Published: (2025)
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)
by: Wang, Xuehui, et al.
Published: (2025)
Do "New Snow Tablets" Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs
by: Li, Sifan, et al.
Published: (2025)
by: Li, Sifan, et al.
Published: (2025)
History-Aware Reasoning for GUI Agents
by: Wang, Ziwei, et al.
Published: (2025)
by: Wang, Ziwei, et al.
Published: (2025)
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents
by: Zhou, Yuqi, et al.
Published: (2025)
by: Zhou, Yuqi, et al.
Published: (2025)
DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents
by: Xu, Yibin, et al.
Published: (2025)
by: Xu, Yibin, et al.
Published: (2025)
Primacy Effect of ChatGPT
by: Wang, Yiwei, et al.
Published: (2023)
by: Wang, Yiwei, et al.
Published: (2023)
How Fragile is Relation Extraction under Entity Replacements?
by: Wang, Yiwei, et al.
Published: (2023)
by: Wang, Yiwei, et al.
Published: (2023)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
by: Xu, Yiheng, et al.
Published: (2024)
by: Xu, Yiheng, et al.
Published: (2024)
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
by: Wu, Zheng, et al.
Published: (2026)
by: Wu, Zheng, et al.
Published: (2026)
OptiSQL: Executable SQL Generation from Optical Tokens
by: Li, Sifan, et al.
Published: (2026)
by: Li, Sifan, et al.
Published: (2026)
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
by: Wang, Cheng, et al.
Published: (2024)
by: Wang, Cheng, et al.
Published: (2024)
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)
by: Wu, Qianhui, et al.
Published: (2025)
Energy-Calibrated VAE with Test Time Free Lunch
by: Luo, Yihong, et al.
Published: (2023)
by: Luo, Yihong, et al.
Published: (2023)
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
by: Wu, Hang, et al.
Published: (2025)
by: Wu, Hang, et al.
Published: (2025)
Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
by: Wang, Peidong
Published: (2026)
by: Wang, Peidong
Published: (2026)
DRS: Deep Question Reformulation With Structured Output
by: Li, Zhecheng, et al.
Published: (2024)
by: Li, Zhecheng, et al.
Published: (2024)
Texture or Semantics? Vision-Language Models Get Lost in Font Recognition
by: Li, Zhecheng, et al.
Published: (2025)
by: Li, Zhecheng, et al.
Published: (2025)
$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement
by: Li, Zhecheng, et al.
Published: (2025)
by: Li, Zhecheng, et al.
Published: (2025)
Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs
by: Wu, Yang, et al.
Published: (2025)
by: Wu, Yang, et al.
Published: (2025)
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)
by: Tang, Fei, et al.
Published: (2026)
Enhancing LLM Character-Level Manipulation via Divide and Conquer
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
DALD: Improving Logits-based Detector without Logits from Black-box LLMs
by: Zeng, Cong, et al.
Published: (2024)
by: Zeng, Cong, et al.
Published: (2024)
LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation
by: Chen, Jizheng, et al.
Published: (2026)
by: Chen, Jizheng, et al.
Published: (2026)
Stabilizing Policy Optimization via Logits Convexity
by: Chen, Hongzhan, et al.
Published: (2026)
by: Chen, Hongzhan, et al.
Published: (2026)
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
Similar Items
-
Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
by: Tao, Xingjian, et al.
Published: (2024) -
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
by: Tao, Xingjian, et al.
Published: (2026) -
Mitigating Coordinate Prediction Bias from Positional Encoding Failures
by: Tao, Xingjian, et al.
Published: (2025) -
Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs
by: Dan, Nifu, et al.
Published: (2025) -
SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking
by: Li, Sifan, et al.
Published: (2025)