:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fang, Qixiang, Oberski, Daniel L., Nguyen, Dong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Computers and Society
Online Access:	https://arxiv.org/abs/2404.01799
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

General-Purpose User Modeling with Behavioral Logs: A Snapchat Case Study
by: Fang, Qixiang, et al.
Published: (2023)

Improving Stance Detection by Leveraging Measurement Knowledge from Social Sciences: A Case Study of Dutch Political Tweets and Traditional Gender Role Division
by: Fang, Qixiang, et al.
Published: (2022)

A Methodological Guide on Using Large Language Models for Reproducible Text Annotation in the Social Sciences and Humanities with Python and R
by: Fang, Qixiang, et al.
Published: (2026)

Explainability in Practice: A Survey of Explainable NLP Across Various Domains
by: Mohammadi, Hadi, et al.
Published: (2025)

FinBen: A Holistic Financial Benchmark for Large Language Models
by: Xie, Qianqian, et al.
Published: (2024)

Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments
by: Vanhoyweghen, Arne, et al.
Published: (2026)

AceTone: Bridging Words and Colors for Conditional Image Grading
by: Ma, Tianren, et al.
Published: (2026)

Tracing Mathematical Proficiency Through Problem-Solving Processes
by: Park, Jungyang, et al.
Published: (2025)

Explainability-Based Token Replacement on LLM-Generated Text
by: Mohammadi, Hadi, et al.
Published: (2025)

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
by: Liu, Hongwei, et al.
Published: (2024)

Evaluating Telugu Proficiency in Large Language Models_ A Comparative Analysis of ChatGPT and Gemini
by: Kishore, Katikela Sreeharsha, et al.
Published: (2024)

BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP
by: Kabir, Mohsinul, et al.
Published: (2023)

Eyes on the Game: Deciphering Implicit Human Signals to Infer Human Proficiency, Trust, and Intent
by: Hulle, Nikhil, et al.
Published: (2024)

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
by: Hughes, Anthony, et al.
Published: (2025)

AI Meets Mathematics Education: A Case Study on Supporting an Instructor in a Large Mathematics Class with Context-Aware AI
by: Barghorn, Jérémy, et al.
Published: (2026)

De-mark: Watermark Removal in Large Language Models
by: Chen, Ruibo, et al.
Published: (2024)

BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine
by: Xie, Jiacheng, et al.
Published: (2025)

Unleashing Large Language Models' Proficiency in Zero-shot Essay Scoring
by: Lee, Sanwoo, et al.
Published: (2024)

PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings
by: Van Horn, Andrew, et al.
Published: (2025)

Automated Grading of Handwritten Mathematics Using Vision-Capable LLMs
by: Levine, Jacob, et al.
Published: (2026)

MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
by: Peng, Xueqing, et al.
Published: (2025)

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
by: Qiao, Runqi, et al.
Published: (2024)

Thinking with Images via Self-Calling Agent
by: Yang, Wenxi, et al.
Published: (2025)

BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali
by: Adib, Shefayat E Shams, et al.
Published: (2026)

BenTo: Benchmark Task Reduction with In-Context Transferability
by: Zhao, Hongyu, et al.
Published: (2024)

Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale
by: Li, Weiyue, et al.
Published: (2026)

Classifying German Language Proficiency Levels Using Large Language Models
by: Ahlers, Elias-Leander, et al.
Published: (2025)

Can Large Language Models Automatically Score Proficiency of Written Essays?
by: Mansour, Watheq, et al.
Published: (2024)

Social Perceptions of English Spelling Variation on Twitter: A Comparative Analysis of Human and LLM Responses
by: Nguyen, Dong, et al.
Published: (2025)

A Case for Specialisation in Non-Human Entities
by: El-Mhamdi, El-Mahdi, et al.
Published: (2025)

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction
by: Li, Ming, et al.
Published: (2025)

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT
by: Nguyen, Duy Anh
Published: (2026)

Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
by: Lothritz, Cedric, et al.
Published: (2025)

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
by: Dinh, Tu Anh, et al.
Published: (2024)

EPPCMinerBen: A Novel Benchmark for Evaluating Large Language Models on Electronic Patient-Provider Communication via the Patient Portal
by: Fodeh, Samah, et al.
Published: (2026)

TimeSense:Making Large Language Models Proficient in Time-Series Analysis
by: Zhang, Zhirui, et al.
Published: (2025)

SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
by: Cai, Hengxing, et al.
Published: (2024)

Qayyem: A Real-time Platform for Scoring Proficiency of Arabic Essays
by: Elbahnasawi, Hoor, et al.
Published: (2026)

ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution
by: Zhang, Xuanming, et al.
Published: (2024)

SimGrade: Using Code Similarity Measures for More Accurate Human Grading
by: Johnson-Yu, Sonja, et al.
Published: (2024)