Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Das, Susmit
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language I.2.7; I.2.6
Online Access:	https://arxiv.org/abs/2601.05300
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Reasoning-oriented language models typically expose explicit reasoning as a long, front-loaded chain of "thinking" tokens before the main output, either always enabled or externally toggled at inference time. Although this can help on arithmetic, coding, and other multi-step tasks, it is costly, weakens claim-level auditability, and does not allow the model to re-trigger explicit reasoning once presentation has begun. In dialogue, these limitations are compounded by weak sensitivity to temporal structure: unless time is explicitly stated in text, standard models treat replies separated by seconds and replies separated by weeks as equivalent. We introduce TIME (Temporally Intelligent Meta-reasoning Engine), a behavioral alignment framework that learns explicit reasoning as a context-triggered control policy rather than a fixed response mode. TIME augments dialogue with optional ISO 8601 <time> tags, tick events that represent silent time passage, and short <think> blocks that may appear anywhere in a response. Using a four-phase curriculum, including a small maximally diverse full-batch alignment stage, we train Qwen3 dense models to invoke brief, in-place reasoning bursts only when contextual cues warrant them, while keeping user-facing output compact. We also introduce TIMEBench, a diagnostic benchmark for evaluating reasoning from temporal cues in dialogue. Across 4B-32B scales, TIME improves TIMEBench scores over the corresponding base Qwen3 models in both thinking and no-thinking modes while reducing explicit reasoning tokens by roughly an order of magnitude. Beyond score improvements, TIME induces a distinct behavioral shift: explicit reasoning becomes more compact and more responsive to contextual cues. Code, training data, and benchmark artifacts are publicly available.

Similar Items