:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cloos, Nathan, Jens, Meagan, Naim, Michelangelo, Kuo, Yen-Ling, Cases, Ignacio, Barbu, Andrei, Cueva, Christopher J.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2407.13729
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Revealing Vision-Language Integration in the Brain with Multimodal Networks
by: Subramaniam, Vighnesh, et al.
Published: (2024)

A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field
by: Cloos, Nathan, et al.
Published: (2024)

Pact: A Choreographic Language for Agentic Ecosystems
by: Gopinathan, Kiran, et al.
Published: (2026)

AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
by: Jia, Mengzhao, et al.
Published: (2025)

Using Multimodal Deep Neural Networks to Disentangle Language from Visual Aesthetics
by: Conwell, Colin, et al.
Published: (2024)

Emerging categories in scientific explanations
by: Magnifico, Giacomo, et al.
Published: (2025)

Can summarization approximate simplification? A gold standard comparison
by: Magnifico, Giacomo, et al.
Published: (2025)

Base Models Beat Aligned Models at Randomness and Creativity
by: West, Peter, et al.
Published: (2025)

Network of Theseus (like the ship)
by: Subramaniam, Vighnesh, et al.
Published: (2025)

Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering
by: Barbu, Eduard, et al.
Published: (2025)

Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents
by: Zhang, Xing, et al.
Published: (2026)

MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations
by: Lazzaroni, Ruggero Marino, et al.
Published: (2025)

SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data
by: Alabdulkareem, Abdulrahman, et al.
Published: (2024)

Improving Estonian Text Simplification through Pretrained Language Models and Custom Datasets
by: Barbu, Eduard, et al.
Published: (2025)

Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks
by: Lin, Tzu-Ling, et al.
Published: (2025)

Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
by: Liu, Xunzhuo, et al.
Published: (2026)

AI Alignment Breaks at the Edge
by: Bao, Han, et al.
Published: (2026)

Training the Untrainable: Introducing Inductive Bias via Representational Alignment
by: Subramaniam, Vighnesh, et al.
Published: (2024)

Frictional Agent Alignment Framework: Slow Down and Don't Break Things
by: Nath, Abhijnan, et al.
Published: (2025)

DeonticBench: A Benchmark for Reasoning over Rules
by: Dou, Guangyao, et al.
Published: (2026)

Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race
by: Maier, Andreas, et al.
Published: (2026)

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
by: Zhou, Ruiwen, et al.
Published: (2024)

How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
by: Hashimoto, Kazuma, et al.
Published: (2022)

VMMU: A Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark
by: Dang, Vy Tuong, et al.
Published: (2025)

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning
by: Kopiczko, Dawid J., et al.
Published: (2026)

Conversational Agents and the Understanding of Human Language: Reflections on AI, LLMs, and Cognitive Science
by: Popescu-Belis, Andrei
Published: (2026)

Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs
by: Yang, Chen, et al.
Published: (2025)

From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring
by: Casabianca, Jodi M., et al.
Published: (2026)

Levels of AI Agents: from Rules to Large Language Models
by: Huang, Yu
Published: (2024)

Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin
by: Hsu, Po-Chun, et al.
Published: (2026)

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
by: Shi, Haojun, et al.
Published: (2024)

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
by: Hsu, Wei-Ling, et al.
Published: (2025)

MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind
by: Villa-Cueva, Emilio, et al.
Published: (2025)

Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
by: Yu, Yao-Ching, et al.
Published: (2024)

Re-examining learning linear functions in context
by: Naim, Omar, et al.
Published: (2024)

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
by: Kachwala, Zoher, et al.
Published: (2026)

ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing
by: Kang, Yu-Chen, et al.
Published: (2026)

Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents
by: Kholkar, Gauri, et al.
Published: (2025)

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
by: Pasca, Razvan-George, et al.
Published: (2023)

SSA: Improving Performance With a Better Scoring Function
by: Naim, Omar, et al.
Published: (2025)