Saved in:
| Main Authors: | Cloos, Nathan, Jens, Meagan, Naim, Michelangelo, Kuo, Yen-Ling, Cases, Ignacio, Barbu, Andrei, Cueva, Christopher J. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.13729 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revealing Vision-Language Integration in the Brain with Multimodal Networks
by: Subramaniam, Vighnesh, et al.
Published: (2024)
by: Subramaniam, Vighnesh, et al.
Published: (2024)
A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field
by: Cloos, Nathan, et al.
Published: (2024)
by: Cloos, Nathan, et al.
Published: (2024)
Pact: A Choreographic Language for Agentic Ecosystems
by: Gopinathan, Kiran, et al.
Published: (2026)
by: Gopinathan, Kiran, et al.
Published: (2026)
AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
by: Jia, Mengzhao, et al.
Published: (2025)
by: Jia, Mengzhao, et al.
Published: (2025)
Using Multimodal Deep Neural Networks to Disentangle Language from Visual Aesthetics
by: Conwell, Colin, et al.
Published: (2024)
by: Conwell, Colin, et al.
Published: (2024)
Emerging categories in scientific explanations
by: Magnifico, Giacomo, et al.
Published: (2025)
by: Magnifico, Giacomo, et al.
Published: (2025)
Can summarization approximate simplification? A gold standard comparison
by: Magnifico, Giacomo, et al.
Published: (2025)
by: Magnifico, Giacomo, et al.
Published: (2025)
Base Models Beat Aligned Models at Randomness and Creativity
by: West, Peter, et al.
Published: (2025)
by: West, Peter, et al.
Published: (2025)
Network of Theseus (like the ship)
by: Subramaniam, Vighnesh, et al.
Published: (2025)
by: Subramaniam, Vighnesh, et al.
Published: (2025)
Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering
by: Barbu, Eduard, et al.
Published: (2025)
by: Barbu, Eduard, et al.
Published: (2025)
Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents
by: Zhang, Xing, et al.
Published: (2026)
by: Zhang, Xing, et al.
Published: (2026)
MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations
by: Lazzaroni, Ruggero Marino, et al.
Published: (2025)
by: Lazzaroni, Ruggero Marino, et al.
Published: (2025)
SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data
by: Alabdulkareem, Abdulrahman, et al.
Published: (2024)
by: Alabdulkareem, Abdulrahman, et al.
Published: (2024)
Improving Estonian Text Simplification through Pretrained Language Models and Custom Datasets
by: Barbu, Eduard, et al.
Published: (2025)
by: Barbu, Eduard, et al.
Published: (2025)
Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks
by: Lin, Tzu-Ling, et al.
Published: (2025)
by: Lin, Tzu-Ling, et al.
Published: (2025)
Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
AI Alignment Breaks at the Edge
by: Bao, Han, et al.
Published: (2026)
by: Bao, Han, et al.
Published: (2026)
Training the Untrainable: Introducing Inductive Bias via Representational Alignment
by: Subramaniam, Vighnesh, et al.
Published: (2024)
by: Subramaniam, Vighnesh, et al.
Published: (2024)
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
by: Nath, Abhijnan, et al.
Published: (2025)
by: Nath, Abhijnan, et al.
Published: (2025)
DeonticBench: A Benchmark for Reasoning over Rules
by: Dou, Guangyao, et al.
Published: (2026)
by: Dou, Guangyao, et al.
Published: (2026)
Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race
by: Maier, Andreas, et al.
Published: (2026)
by: Maier, Andreas, et al.
Published: (2026)
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
by: Zhou, Ruiwen, et al.
Published: (2024)
by: Zhou, Ruiwen, et al.
Published: (2024)
How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
by: Hashimoto, Kazuma, et al.
Published: (2022)
by: Hashimoto, Kazuma, et al.
Published: (2022)
VMMU: A Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark
by: Dang, Vy Tuong, et al.
Published: (2025)
by: Dang, Vy Tuong, et al.
Published: (2025)
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning
by: Kopiczko, Dawid J., et al.
Published: (2026)
by: Kopiczko, Dawid J., et al.
Published: (2026)
Conversational Agents and the Understanding of Human Language: Reflections on AI, LLMs, and Cognitive Science
by: Popescu-Belis, Andrei
Published: (2026)
by: Popescu-Belis, Andrei
Published: (2026)
Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs
by: Yang, Chen, et al.
Published: (2025)
by: Yang, Chen, et al.
Published: (2025)
From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring
by: Casabianca, Jodi M., et al.
Published: (2026)
by: Casabianca, Jodi M., et al.
Published: (2026)
Levels of AI Agents: from Rules to Large Language Models
by: Huang, Yu
Published: (2024)
by: Huang, Yu
Published: (2024)
Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin
by: Hsu, Po-Chun, et al.
Published: (2026)
by: Hsu, Po-Chun, et al.
Published: (2026)
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
by: Shi, Haojun, et al.
Published: (2024)
by: Shi, Haojun, et al.
Published: (2024)
MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
by: Hsu, Wei-Ling, et al.
Published: (2025)
by: Hsu, Wei-Ling, et al.
Published: (2025)
MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind
by: Villa-Cueva, Emilio, et al.
Published: (2025)
by: Villa-Cueva, Emilio, et al.
Published: (2025)
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
by: Yu, Yao-Ching, et al.
Published: (2024)
by: Yu, Yao-Ching, et al.
Published: (2024)
Re-examining learning linear functions in context
by: Naim, Omar, et al.
Published: (2024)
by: Naim, Omar, et al.
Published: (2024)
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
by: Kachwala, Zoher, et al.
Published: (2026)
by: Kachwala, Zoher, et al.
Published: (2026)
ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing
by: Kang, Yu-Chen, et al.
Published: (2026)
by: Kang, Yu-Chen, et al.
Published: (2026)
Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents
by: Kholkar, Gauri, et al.
Published: (2025)
by: Kholkar, Gauri, et al.
Published: (2025)
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
by: Pasca, Razvan-George, et al.
Published: (2023)
by: Pasca, Razvan-George, et al.
Published: (2023)
SSA: Improving Performance With a Better Scoring Function
by: Naim, Omar, et al.
Published: (2025)
by: Naim, Omar, et al.
Published: (2025)
Similar Items
-
Revealing Vision-Language Integration in the Brain with Multimodal Networks
by: Subramaniam, Vighnesh, et al.
Published: (2024) -
A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field
by: Cloos, Nathan, et al.
Published: (2024) -
Pact: A Choreographic Language for Agentic Ecosystems
by: Gopinathan, Kiran, et al.
Published: (2026) -
AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
by: Jia, Mengzhao, et al.
Published: (2025) -
Using Multimodal Deep Neural Networks to Disentangle Language from Visual Aesthetics
by: Conwell, Colin, et al.
Published: (2024)