Saved in:
| Main Authors: | Fang, Qixiang, Oberski, Daniel L., Nguyen, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.01799 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
General-Purpose User Modeling with Behavioral Logs: A Snapchat Case Study
by: Fang, Qixiang, et al.
Published: (2023)
by: Fang, Qixiang, et al.
Published: (2023)
Improving Stance Detection by Leveraging Measurement Knowledge from Social Sciences: A Case Study of Dutch Political Tweets and Traditional Gender Role Division
by: Fang, Qixiang, et al.
Published: (2022)
by: Fang, Qixiang, et al.
Published: (2022)
A Methodological Guide on Using Large Language Models for Reproducible Text Annotation in the Social Sciences and Humanities with Python and R
by: Fang, Qixiang, et al.
Published: (2026)
by: Fang, Qixiang, et al.
Published: (2026)
Explainability in Practice: A Survey of Explainable NLP Across Various Domains
by: Mohammadi, Hadi, et al.
Published: (2025)
by: Mohammadi, Hadi, et al.
Published: (2025)
FinBen: A Holistic Financial Benchmark for Large Language Models
by: Xie, Qianqian, et al.
Published: (2024)
by: Xie, Qianqian, et al.
Published: (2024)
Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments
by: Vanhoyweghen, Arne, et al.
Published: (2026)
by: Vanhoyweghen, Arne, et al.
Published: (2026)
AceTone: Bridging Words and Colors for Conditional Image Grading
by: Ma, Tianren, et al.
Published: (2026)
by: Ma, Tianren, et al.
Published: (2026)
Tracing Mathematical Proficiency Through Problem-Solving Processes
by: Park, Jungyang, et al.
Published: (2025)
by: Park, Jungyang, et al.
Published: (2025)
Explainability-Based Token Replacement on LLM-Generated Text
by: Mohammadi, Hadi, et al.
Published: (2025)
by: Mohammadi, Hadi, et al.
Published: (2025)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
by: Liu, Hongwei, et al.
Published: (2024)
by: Liu, Hongwei, et al.
Published: (2024)
Evaluating Telugu Proficiency in Large Language Models_ A Comparative Analysis of ChatGPT and Gemini
by: Kishore, Katikela Sreeharsha, et al.
Published: (2024)
by: Kishore, Katikela Sreeharsha, et al.
Published: (2024)
BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP
by: Kabir, Mohsinul, et al.
Published: (2023)
by: Kabir, Mohsinul, et al.
Published: (2023)
Eyes on the Game: Deciphering Implicit Human Signals to Infer Human Proficiency, Trust, and Intent
by: Hulle, Nikhil, et al.
Published: (2024)
by: Hulle, Nikhil, et al.
Published: (2024)
PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
by: Hughes, Anthony, et al.
Published: (2025)
by: Hughes, Anthony, et al.
Published: (2025)
AI Meets Mathematics Education: A Case Study on Supporting an Instructor in a Large Mathematics Class with Context-Aware AI
by: Barghorn, Jérémy, et al.
Published: (2026)
by: Barghorn, Jérémy, et al.
Published: (2026)
De-mark: Watermark Removal in Large Language Models
by: Chen, Ruibo, et al.
Published: (2024)
by: Chen, Ruibo, et al.
Published: (2024)
BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine
by: Xie, Jiacheng, et al.
Published: (2025)
by: Xie, Jiacheng, et al.
Published: (2025)
Unleashing Large Language Models' Proficiency in Zero-shot Essay Scoring
by: Lee, Sanwoo, et al.
Published: (2024)
by: Lee, Sanwoo, et al.
Published: (2024)
PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings
by: Van Horn, Andrew, et al.
Published: (2025)
by: Van Horn, Andrew, et al.
Published: (2025)
Automated Grading of Handwritten Mathematics Using Vision-Capable LLMs
by: Levine, Jacob, et al.
Published: (2026)
by: Levine, Jacob, et al.
Published: (2026)
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
by: Peng, Xueqing, et al.
Published: (2025)
by: Peng, Xueqing, et al.
Published: (2025)
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
by: Qiao, Runqi, et al.
Published: (2024)
by: Qiao, Runqi, et al.
Published: (2024)
Thinking with Images via Self-Calling Agent
by: Yang, Wenxi, et al.
Published: (2025)
by: Yang, Wenxi, et al.
Published: (2025)
BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali
by: Adib, Shefayat E Shams, et al.
Published: (2026)
by: Adib, Shefayat E Shams, et al.
Published: (2026)
BenTo: Benchmark Task Reduction with In-Context Transferability
by: Zhao, Hongyu, et al.
Published: (2024)
by: Zhao, Hongyu, et al.
Published: (2024)
Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale
by: Li, Weiyue, et al.
Published: (2026)
by: Li, Weiyue, et al.
Published: (2026)
Classifying German Language Proficiency Levels Using Large Language Models
by: Ahlers, Elias-Leander, et al.
Published: (2025)
by: Ahlers, Elias-Leander, et al.
Published: (2025)
Can Large Language Models Automatically Score Proficiency of Written Essays?
by: Mansour, Watheq, et al.
Published: (2024)
by: Mansour, Watheq, et al.
Published: (2024)
Social Perceptions of English Spelling Variation on Twitter: A Comparative Analysis of Human and LLM Responses
by: Nguyen, Dong, et al.
Published: (2025)
by: Nguyen, Dong, et al.
Published: (2025)
A Case for Specialisation in Non-Human Entities
by: El-Mhamdi, El-Mahdi, et al.
Published: (2025)
by: El-Mhamdi, El-Mahdi, et al.
Published: (2025)
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT
by: Nguyen, Duy Anh
Published: (2026)
by: Nguyen, Duy Anh
Published: (2026)
Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
by: Lothritz, Cedric, et al.
Published: (2025)
by: Lothritz, Cedric, et al.
Published: (2025)
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
by: Dinh, Tu Anh, et al.
Published: (2024)
by: Dinh, Tu Anh, et al.
Published: (2024)
EPPCMinerBen: A Novel Benchmark for Evaluating Large Language Models on Electronic Patient-Provider Communication via the Patient Portal
by: Fodeh, Samah, et al.
Published: (2026)
by: Fodeh, Samah, et al.
Published: (2026)
TimeSense:Making Large Language Models Proficient in Time-Series Analysis
by: Zhang, Zhirui, et al.
Published: (2025)
by: Zhang, Zhirui, et al.
Published: (2025)
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
by: Cai, Hengxing, et al.
Published: (2024)
by: Cai, Hengxing, et al.
Published: (2024)
Qayyem: A Real-time Platform for Scoring Proficiency of Arabic Essays
by: Elbahnasawi, Hoor, et al.
Published: (2026)
by: Elbahnasawi, Hoor, et al.
Published: (2026)
ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution
by: Zhang, Xuanming, et al.
Published: (2024)
by: Zhang, Xuanming, et al.
Published: (2024)
SimGrade: Using Code Similarity Measures for More Accurate Human Grading
by: Johnson-Yu, Sonja, et al.
Published: (2024)
by: Johnson-Yu, Sonja, et al.
Published: (2024)
Similar Items
-
General-Purpose User Modeling with Behavioral Logs: A Snapchat Case Study
by: Fang, Qixiang, et al.
Published: (2023) -
Improving Stance Detection by Leveraging Measurement Knowledge from Social Sciences: A Case Study of Dutch Political Tweets and Traditional Gender Role Division
by: Fang, Qixiang, et al.
Published: (2022) -
A Methodological Guide on Using Large Language Models for Reproducible Text Annotation in the Social Sciences and Humanities with Python and R
by: Fang, Qixiang, et al.
Published: (2026) -
Explainability in Practice: A Survey of Explainable NLP Across Various Domains
by: Mohammadi, Hadi, et al.
Published: (2025) -
FinBen: A Holistic Financial Benchmark for Large Language Models
by: Xie, Qianqian, et al.
Published: (2024)