Saved in:
| Main Authors: | Singh, Amrita, Karaca, H. Suhan, Joshi, Aditya, Paik, Hye-young, Jiang, Jiaojiao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.07849 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Survey of Classification Tasks and Approaches for Legal Contracts
by: Singh, Amrita, et al.
Published: (2025)
by: Singh, Amrita, et al.
Published: (2025)
RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles
by: Lin, Jonathan, et al.
Published: (2025)
by: Lin, Jonathan, et al.
Published: (2025)
Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models
by: Hu, Zhibo, et al.
Published: (2026)
by: Hu, Zhibo, et al.
Published: (2026)
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025)
by: Lou, Haowei, et al.
Published: (2025)
MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts
by: Shen, Yifan, et al.
Published: (2025)
by: Shen, Yifan, et al.
Published: (2025)
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
by: Arif, Samee, et al.
Published: (2024)
by: Arif, Samee, et al.
Published: (2024)
uOttawa at LegalLens-2024: Transformer-based Classification Experiments
by: Meghdadi, Nima, et al.
Published: (2024)
by: Meghdadi, Nima, et al.
Published: (2024)
Evaluating Dialect Robustness of Language Models via Conversation Understanding
by: Srirag, Dipankar, et al.
Published: (2024)
by: Srirag, Dipankar, et al.
Published: (2024)
What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction
by: Wyatt, Charlie, et al.
Published: (2025)
by: Wyatt, Charlie, et al.
Published: (2025)
LangLingual: A Personalised, Exercise-oriented English Language Learning Tool Leveraging Large Language Models
by: Gupta, Sammriddh, et al.
Published: (2025)
by: Gupta, Sammriddh, et al.
Published: (2025)
A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs
by: Karaca, Kemal Sami, et al.
Published: (2025)
by: Karaca, Kemal Sami, et al.
Published: (2025)
mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis
by: Kim, Dae-young, et al.
Published: (2024)
by: Kim, Dae-young, et al.
Published: (2024)
ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting
by: Wang, Steven H., et al.
Published: (2025)
by: Wang, Steven H., et al.
Published: (2025)
Far Out: Evaluating Language Models on Slang in Australian and Indian English
by: Dilsiz, Deniz Kaya, et al.
Published: (2026)
by: Dilsiz, Deniz Kaya, et al.
Published: (2026)
GRAM: Generative Recommendation via Semantic-aware Multi-granular Late Fusion
by: Lee, Sunkyung, et al.
Published: (2025)
by: Lee, Sunkyung, et al.
Published: (2025)
Nek Minit: Harnessing Pragmatic Metacognitive Prompting for Explainable Sarcasm Detection of Australian and Indian English
by: Singh, Ishmanbir, et al.
Published: (2025)
by: Singh, Ishmanbir, et al.
Published: (2025)
Experiences from Creating a Benchmark for Sentiment Classification for Varieties of English
by: Srirag, Dipankar, et al.
Published: (2024)
by: Srirag, Dipankar, et al.
Published: (2024)
Coding Agents with Multimodal Browsing are Generalist Problem Solvers
by: Soni, Aditya Bharat, et al.
Published: (2025)
by: Soni, Aditya Bharat, et al.
Published: (2025)
"Is Hate Lost in Translation?": Evaluation of Multilingual LGBTQIA+ Hate Speech Detection
by: Chan, Fai Leui, et al.
Published: (2024)
by: Chan, Fai Leui, et al.
Published: (2024)
BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English
by: Srirag, Dipankar, et al.
Published: (2024)
by: Srirag, Dipankar, et al.
Published: (2024)
Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?
by: Kim, Jane Paik
Published: (2026)
by: Kim, Jane Paik
Published: (2026)
The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines
by: Forster, Martina, et al.
Published: (2024)
by: Forster, Martina, et al.
Published: (2024)
Text Role Classification in Scientific Charts Using Multimodal Transformers
by: Kim, Hye Jin, et al.
Published: (2024)
by: Kim, Hye Jin, et al.
Published: (2024)
GLiClass: Generalist Lightweight Model for Sequence Classification Tasks
by: Stepanov, Ihor, et al.
Published: (2025)
by: Stepanov, Ihor, et al.
Published: (2025)
Evaluating K-Fold Cross Validation for Transformer Based Symbolic Regression Models
by: Kislay, Kaustubh, et al.
Published: (2024)
by: Kislay, Kaustubh, et al.
Published: (2024)
Efficient Prompt Optimisation for Legal Text Classification with Proxy Prompt Evaluator
by: Lee, Hyunji, et al.
Published: (2025)
by: Lee, Hyunji, et al.
Published: (2025)
TeamUp: Semantic Project Matching and Team Formation for Learning at Scale
by: Gulwani, Dhruv, et al.
Published: (2026)
by: Gulwani, Dhruv, et al.
Published: (2026)
Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models
by: Kim, Yujin, et al.
Published: (2023)
by: Kim, Yujin, et al.
Published: (2023)
Reasoning Over Recall: Evaluating the Efficacy of Generalist Architectures vs. Specialized Fine-Tunes in RAG-Based Mental Health Dialogue Systems
by: Kafi, Md Abdullah Al, et al.
Published: (2026)
by: Kafi, Md Abdullah Al, et al.
Published: (2026)
AI and the Law: Evaluating ChatGPT's Performance in Legal Classification
by: Weichbroth, Pawel
Published: (2025)
by: Weichbroth, Pawel
Published: (2025)
LLMs for Legal Subsumption in German Employment Contracts
by: Wardas, Oliver, et al.
Published: (2025)
by: Wardas, Oliver, et al.
Published: (2025)
LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification
by: Neto, Pedro Barbosa de Carvalho
Published: (2026)
by: Neto, Pedro Barbosa de Carvalho
Published: (2026)
Ambiguity in LLMs is a concept missing problem
by: Hu, Zhibo, et al.
Published: (2025)
by: Hu, Zhibo, et al.
Published: (2025)
SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models
by: Gu, Yiyang, et al.
Published: (2026)
by: Gu, Yiyang, et al.
Published: (2026)
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
by: Vir, Reya, et al.
Published: (2025)
by: Vir, Reya, et al.
Published: (2025)
Predicting the Target Word of Game-playing Conversations using a Low-Rank Dialect Adapter for Decoder Models
by: Srirag, Dipankar, et al.
Published: (2024)
by: Srirag, Dipankar, et al.
Published: (2024)
LegalCiteBench: Evaluating Citation Reliability in Legal Language Models
by: Chen, Sijia, et al.
Published: (2026)
by: Chen, Sijia, et al.
Published: (2026)
AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts
by: Braun, Daniel, et al.
Published: (2024)
by: Braun, Daniel, et al.
Published: (2024)
MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval
by: Zhang, Delvin Ce, et al.
Published: (2026)
by: Zhang, Delvin Ce, et al.
Published: (2026)
CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Question Answering
by: Kong, Liangji, et al.
Published: (2025)
by: Kong, Liangji, et al.
Published: (2025)
Similar Items
-
A Survey of Classification Tasks and Approaches for Legal Contracts
by: Singh, Amrita, et al.
Published: (2025) -
RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles
by: Lin, Jonathan, et al.
Published: (2025) -
Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models
by: Hu, Zhibo, et al.
Published: (2026) -
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025) -
MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts
by: Shen, Yifan, et al.
Published: (2025)