Saved in:
| Main Author: | Kamphuis, Michiel |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.02114 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EasyMath: A 0-shot Math Benchmark for SLMs
by: Karki, Drishya, et al.
Published: (2025)
by: Karki, Drishya, et al.
Published: (2025)
Tina: Tiny Reasoning Models via LoRA
by: Wang, Shangshang, et al.
Published: (2025)
by: Wang, Shangshang, et al.
Published: (2025)
QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
by: LM-Provers, et al.
Published: (2026)
by: LM-Provers, et al.
Published: (2026)
Zero-shot data citation function classification using transformer-based large language models (LLMs)
by: Byers, Neil, et al.
Published: (2025)
by: Byers, Neil, et al.
Published: (2025)
Ayn: A Tiny yet Competitive Indian Legal Language Model Pretrained from Scratch
by: Niyogi, Mitodru, et al.
Published: (2024)
by: Niyogi, Mitodru, et al.
Published: (2024)
PanGu-$π$ Pro:Rethinking Optimization and Architecture for Tiny Language Models
by: Tang, Yehui, et al.
Published: (2024)
by: Tang, Yehui, et al.
Published: (2024)
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning
by: Xu, Zhangchen, et al.
Published: (2025)
by: Xu, Zhangchen, et al.
Published: (2025)
An exploration of features to improve the generalisability of fake news detection models
by: Hoy, Nathaniel, et al.
Published: (2025)
by: Hoy, Nathaniel, et al.
Published: (2025)
Base Models Look Human To AI Detectors
by: Xu, Yixuan Even, et al.
Published: (2026)
by: Xu, Yixuan Even, et al.
Published: (2026)
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
by: Yang, Zachary, et al.
Published: (2023)
by: Yang, Zachary, et al.
Published: (2023)
M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity
by: De Giorgis, Stefano, et al.
Published: (2026)
by: De Giorgis, Stefano, et al.
Published: (2026)
Toxicity Detection Should Measure Contextual Harm, Not Text-Intrinsic Badness
by: Berezin, Sergei, et al.
Published: (2025)
by: Berezin, Sergei, et al.
Published: (2025)
Physical models realizing the transformer architecture of large language models
by: Chen, Zeqian
Published: (2025)
by: Chen, Zeqian
Published: (2025)
Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI
by: Böck, Adrian Jaques, et al.
Published: (2024)
by: Böck, Adrian Jaques, et al.
Published: (2024)
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
by: Beniwal, Himanshu, et al.
Published: (2026)
by: Beniwal, Himanshu, et al.
Published: (2026)
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
by: Balestriero, Randall, et al.
Published: (2023)
by: Balestriero, Randall, et al.
Published: (2023)
An explainable transformer circuit for compositional generalization
by: Tang, Cheng, et al.
Published: (2025)
by: Tang, Cheng, et al.
Published: (2025)
Accelerating Training Speed of Tiny Recursive Models with Curriculum Guided Adaptive Recursion
by: Qasim, Kaleem Ullah, et al.
Published: (2025)
by: Qasim, Kaleem Ullah, et al.
Published: (2025)
Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
by: Zhou, Ying, et al.
Published: (2024)
by: Zhou, Ying, et al.
Published: (2024)
IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector
by: Chen, Zheng, et al.
Published: (2025)
by: Chen, Zheng, et al.
Published: (2025)
Language models show human-like content effects on reasoning tasks
by: Dasgupta, Ishita, et al.
Published: (2022)
by: Dasgupta, Ishita, et al.
Published: (2022)
ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation
by: Li, Peiran, et al.
Published: (2026)
by: Li, Peiran, et al.
Published: (2026)
Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
by: Goel, Aman, et al.
Published: (2025)
by: Goel, Aman, et al.
Published: (2025)
When can transformers reason with abstract symbols?
by: Boix-Adsera, Enric, et al.
Published: (2023)
by: Boix-Adsera, Enric, et al.
Published: (2023)
Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content
by: Stepanov, Ihor, et al.
Published: (2026)
by: Stepanov, Ihor, et al.
Published: (2026)
Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector
by: Zhang, Andi, et al.
Published: (2024)
by: Zhang, Andi, et al.
Published: (2024)
Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction
by: Smirnov, Alexander
Published: (2026)
by: Smirnov, Alexander
Published: (2026)
Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
by: Soto, Rafael Rivera, et al.
Published: (2025)
by: Soto, Rafael Rivera, et al.
Published: (2025)
SUS backprop: linear backpropagation algorithm for long inputs in transformers
by: Pankov, Sergey, et al.
Published: (2025)
by: Pankov, Sergey, et al.
Published: (2025)
Towards Building a Robust Toxicity Predictor
by: Bespalov, Dmitriy, et al.
Published: (2024)
by: Bespalov, Dmitriy, et al.
Published: (2024)
Toxicity Detection towards Adaptability to Changing Perturbations
by: Kang, Hankun, et al.
Published: (2024)
by: Kang, Hankun, et al.
Published: (2024)
ProdRev: A DNN framework for empowering customers using generative pre-trained transformers
by: Gupta, Aakash, et al.
Published: (2025)
by: Gupta, Aakash, et al.
Published: (2025)
TaeBench: Improving Quality of Toxic Adversarial Examples
by: Zhu, Xuan, et al.
Published: (2024)
by: Zhu, Xuan, et al.
Published: (2024)
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
by: Li, Xiaochen, et al.
Published: (2024)
by: Li, Xiaochen, et al.
Published: (2024)
PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
by: Pulipaka, Srikar Kashyap
Published: (2026)
by: Pulipaka, Srikar Kashyap
Published: (2026)
Classification is a RAG problem: A case study on hate speech detection
by: Willats, Richard, et al.
Published: (2025)
by: Willats, Richard, et al.
Published: (2025)
Towards detecting unanticipated bias in Large Language Models
by: Kruspe, Anna
Published: (2024)
by: Kruspe, Anna
Published: (2024)
Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering
by: Agrawal, Aryan
Published: (2024)
by: Agrawal, Aryan
Published: (2024)
Prefill-Guided Thinking for zero-shot detection of AI-generated images
by: Kachwala, Zoher, et al.
Published: (2025)
by: Kachwala, Zoher, et al.
Published: (2025)
Explained anomaly detection in text reviews: Can subjective scenarios be correctly evaluated?
by: Novoa-Paradela, David, et al.
Published: (2023)
by: Novoa-Paradela, David, et al.
Published: (2023)
Similar Items
-
EasyMath: A 0-shot Math Benchmark for SLMs
by: Karki, Drishya, et al.
Published: (2025) -
Tina: Tiny Reasoning Models via LoRA
by: Wang, Shangshang, et al.
Published: (2025) -
QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
by: LM-Provers, et al.
Published: (2026) -
Zero-shot data citation function classification using transformer-based large language models (LLMs)
by: Byers, Neil, et al.
Published: (2025) -
Ayn: A Tiny yet Competitive Indian Legal Language Model Pretrained from Scratch
by: Niyogi, Mitodru, et al.
Published: (2024)