:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Kamphuis, Michiel
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2409.02114
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EasyMath: A 0-shot Math Benchmark for SLMs
by: Karki, Drishya, et al.
Published: (2025)

Tina: Tiny Reasoning Models via LoRA
by: Wang, Shangshang, et al.
Published: (2025)

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
by: LM-Provers, et al.
Published: (2026)

Zero-shot data citation function classification using transformer-based large language models (LLMs)
by: Byers, Neil, et al.
Published: (2025)

Ayn: A Tiny yet Competitive Indian Legal Language Model Pretrained from Scratch
by: Niyogi, Mitodru, et al.
Published: (2024)

PanGu-$π$ Pro:Rethinking Optimization and Architecture for Tiny Language Models
by: Tang, Yehui, et al.
Published: (2024)

TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning
by: Xu, Zhangchen, et al.
Published: (2025)

An exploration of features to improve the generalisability of fake news detection models
by: Hoy, Nathaniel, et al.
Published: (2025)

Base Models Look Human To AI Detectors
by: Xu, Yixuan Even, et al.
Published: (2026)

Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
by: Yang, Zachary, et al.
Published: (2023)

M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity
by: De Giorgis, Stefano, et al.
Published: (2026)

Toxicity Detection Should Measure Contextual Harm, Not Text-Intrinsic Badness
by: Berezin, Sergei, et al.
Published: (2025)

Physical models realizing the transformer architecture of large language models
by: Chen, Zeqian
Published: (2025)

Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI
by: Böck, Adrian Jaques, et al.
Published: (2024)

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
by: Beniwal, Himanshu, et al.
Published: (2026)

Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
by: Balestriero, Randall, et al.
Published: (2023)

An explainable transformer circuit for compositional generalization
by: Tang, Cheng, et al.
Published: (2025)

Accelerating Training Speed of Tiny Recursive Models with Curriculum Guided Adaptive Recursion
by: Qasim, Kaleem Ullah, et al.
Published: (2025)

Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
by: Zhou, Ying, et al.
Published: (2024)

IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector
by: Chen, Zheng, et al.
Published: (2025)

Language models show human-like content effects on reasoning tasks
by: Dasgupta, Ishita, et al.
Published: (2022)

ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation
by: Li, Peiran, et al.
Published: (2026)

Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
by: Goel, Aman, et al.
Published: (2025)

When can transformers reason with abstract symbols?
by: Boix-Adsera, Enric, et al.
Published: (2023)

Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content
by: Stepanov, Ihor, et al.
Published: (2026)

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector
by: Zhang, Andi, et al.
Published: (2024)

Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction
by: Smirnov, Alexander
Published: (2026)

Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
by: Soto, Rafael Rivera, et al.
Published: (2025)

SUS backprop: linear backpropagation algorithm for long inputs in transformers
by: Pankov, Sergey, et al.
Published: (2025)

Towards Building a Robust Toxicity Predictor
by: Bespalov, Dmitriy, et al.
Published: (2024)

Toxicity Detection towards Adaptability to Changing Perturbations
by: Kang, Hankun, et al.
Published: (2024)

ProdRev: A DNN framework for empowering customers using generative pre-trained transformers
by: Gupta, Aakash, et al.
Published: (2025)

TaeBench: Improving Quality of Toxic Adversarial Examples
by: Zhu, Xuan, et al.
Published: (2024)

Preference Tuning For Toxicity Mitigation Generalizes Across Languages
by: Li, Xiaochen, et al.
Published: (2024)

PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
by: Pulipaka, Srikar Kashyap
Published: (2026)

Classification is a RAG problem: A case study on hate speech detection
by: Willats, Richard, et al.
Published: (2025)

Towards detecting unanticipated bias in Large Language Models
by: Kruspe, Anna
Published: (2024)

Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering
by: Agrawal, Aryan
Published: (2024)

Prefill-Guided Thinking for zero-shot detection of AI-generated images
by: Kachwala, Zoher, et al.
Published: (2025)

Explained anomaly detection in text reviews: Can subjective scenarios be correctly evaluated?
by: Novoa-Paradela, David, et al.
Published: (2023)