:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Daoan, Liu, Pai, Zhou, Xiaofei, Ge, Yuan, Lan, Guangchen, Bi, Jing, Brinton, Christopher, Hoque, Ehsan, Luo, Jiebo
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2512.09907
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization
von: Zhang, Daoan, et al.
Veröffentlicht: (2024)

Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies
von: Wang, Jing, et al.
Veröffentlicht: (2025)

See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay
von: Baghel, Ashish, et al.
Veröffentlicht: (2026)

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs
von: Fang, Wenzhi, et al.
Veröffentlicht: (2026)

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
von: Lan, Guangchen, et al.
Veröffentlicht: (2025)

HAL: Inducing Human-likeness in LLMs with Alignment
von: Hasan, Masum, et al.
Veröffentlicht: (2026)

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
von: Lan, Guangchen, et al.
Veröffentlicht: (2024)

Reinforcement Learning for Scalable and Trustworthy Intelligent Systems
von: Lan, Guangchen
Veröffentlicht: (2026)

Enhanced Real-Time Threat Detection in 5G Networks: A Self-Attention RNN Autoencoder Approach for Spectral Intrusion Analysis
von: Kouchaki, Mohammadreza, et al.
Veröffentlicht: (2024)

Act2See: Emergent Active Visual Perception for Video Reasoning
von: Ma, Martin Q., et al.
Veröffentlicht: (2026)

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation
von: Zhang, Daoan, et al.
Veröffentlicht: (2025)

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
von: Zhang, Daoan, et al.
Veröffentlicht: (2024)

Sphinx: Benchmarking and Modeling for LLM-Driven Pull Request Review
von: Zhang, Daoan, et al.
Veröffentlicht: (2026)

A Versatile Multimodal Agent for Multimedia Content Generation
von: Zhang, Daoan, et al.
Veröffentlicht: (2026)

LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
von: Wang, Shuai, et al.
Veröffentlicht: (2025)

See, Think, Act: Online Shopper Behavior Simulation with VLM Agents
von: Zhang, Yimeng, et al.
Veröffentlicht: (2025)

Enough Coin Flips Can Make LLMs Act Bayesian
von: Gupta, Ritwik, et al.
Veröffentlicht: (2025)

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
von: Liang, Yijun, et al.
Veröffentlicht: (2025)

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
von: Qin, Yiming, et al.
Veröffentlicht: (2025)

See Tomorrow, Act Today: Foresight-Driven Autonomous Driving
von: Zhang, Bozhou, et al.
Veröffentlicht: (2026)

Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation
von: Bai, Yongjie, et al.
Veröffentlicht: (2025)

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
von: Hua, Hang, et al.
Veröffentlicht: (2024)

Pixels, Patterns, but No Poetry: To See The World like Humans
von: Gao, Hongcheng, et al.
Veröffentlicht: (2025)

Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
von: Ge, Yuyao, et al.
Veröffentlicht: (2025)

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
von: Lan, Guangchen, et al.
Veröffentlicht: (2025)

VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?
von: Wang, Jiaqi, et al.
Veröffentlicht: (2025)

Can LLMs Emulate Human Belief Dynamics?
von: Proma, Adiba Mahbub, et al.
Veröffentlicht: (2026)

Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
von: Chen, Kaitao, et al.
Veröffentlicht: (2025)

Can Language Models Act as Knowledge Bases at Scale?
von: He, Qiyuan, et al.
Veröffentlicht: (2024)

Can Large Language Models Act as Symbolic Reasoners?
von: Sullivan, Rob, et al.
Veröffentlicht: (2024)

ActNAS : Generating Efficient YOLO Models using Activation NAS
von: Sah, Sudhakar, et al.
Veröffentlicht: (2024)

Learning Brain Tumor Representation in 3D High-Resolution MR Images via Interpretable State Space Models
von: Hu, Qingqiao, et al.
Veröffentlicht: (2024)

XplainAct: Visualization for Personalized Intervention Insights
von: Zhang, Yanming, et al.
Veröffentlicht: (2025)

What's not in the CR: SUPPORT Act, Second Chance Act
von: Alison Knopf
Veröffentlicht: (2025)

Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
von: Liu, Zhining, et al.
Veröffentlicht: (2025)

To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs
von: Hong, Rui, et al.
Veröffentlicht: (2026)

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent
von: Tang, Tianci, et al.
Veröffentlicht: (2026)

The Americans with Disabilities Act and Its Effect on Public Libraries.
von: Lewis, Christopher
Veröffentlicht: (1992)

The Act of Cataloging
von: Carson, Doris M.
Veröffentlicht: (1976)

Tensor Generalized Approximate Message Passing
von: Li, Yinchuan, et al.
Veröffentlicht: (2025)