:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tao, Xingjian, Wang, Yiwei, Cai, Yujun, Yang, Zhicheng, Tang, Jing
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.15425
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
by: Tao, Xingjian, et al.
Published: (2024)

ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
by: Tao, Xingjian, et al.
Published: (2026)

Mitigating Coordinate Prediction Bias from Positional Encoding Failures
by: Tao, Xingjian, et al.
Published: (2025)

Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs
by: Dan, Nifu, et al.
Published: (2025)

SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking
by: Li, Sifan, et al.
Published: (2025)

Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
by: Li, Zhecheng, et al.
Published: (2025)

Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
by: Sun, Bowen, et al.
Published: (2025)

Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
by: Xiong, Zhen, et al.
Published: (2025)

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM
by: Xiong, Zhen, et al.
Published: (2025)

GeoSVG-RL: Geometry-Aware Reinforcement Learning for Layout-Constrained Text-to-SVG Diagram Generation
by: Li, Sifan, et al.
Published: (2026)

Self-Manager: Parallel Agent Loop for Long-form Deep Research
by: Xu, Yilong, et al.
Published: (2026)

Structured Attention Matters to Multimodal LLMs in Document Understanding
by: Liu, Chang, et al.
Published: (2025)

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)

Do "New Snow Tablets" Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs
by: Li, Sifan, et al.
Published: (2025)

History-Aware Reasoning for GUI Agents
by: Wang, Ziwei, et al.
Published: (2025)

Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents
by: Zhou, Yuqi, et al.
Published: (2025)

DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents
by: Xu, Yibin, et al.
Published: (2025)

Primacy Effect of ChatGPT
by: Wang, Yiwei, et al.
Published: (2023)

How Fragile is Relation Extraction under Entity Replacements?
by: Wang, Yiwei, et al.
Published: (2023)

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
by: Xu, Yiheng, et al.
Published: (2024)

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
by: Wu, Zheng, et al.
Published: (2026)

OptiSQL: Executable SQL Generation from Optical Tokens
by: Li, Sifan, et al.
Published: (2026)

Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
by: Wang, Cheng, et al.
Published: (2024)

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)

Energy-Calibrated VAE with Test Time Free Lunch
by: Luo, Yihong, et al.
Published: (2023)

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
by: Wu, Hang, et al.
Published: (2025)

Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
by: Wang, Peidong
Published: (2026)

DRS: Deep Question Reformulation With Structured Output
by: Li, Zhecheng, et al.
Published: (2024)

Texture or Semantics? Vision-Language Models Get Lost in Font Recognition
by: Li, Zhecheng, et al.
Published: (2025)

$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement
by: Li, Zhecheng, et al.
Published: (2025)

Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs
by: Wu, Yang, et al.
Published: (2025)

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)

Enhancing LLM Character-Level Manipulation via Divide and Conquer
by: Xiong, Zhen, et al.
Published: (2025)

DALD: Improving Logits-based Detector without Logits from Black-box LLMs
by: Zeng, Cong, et al.
Published: (2024)

LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation
by: Chen, Jizheng, et al.
Published: (2026)

Stabilizing Policy Optimization via Logits Convexity
by: Chen, Hongzhan, et al.
Published: (2026)

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness
by: Huang, Kung-Hsiang, et al.
Published: (2025)