Saved in:
| Main Authors: | Seki, Yohei, Shu, Hakusen, Lhuissier, Anaïs, Lee, Hanwool, Kang, Juyeon, Day, Min-Yuh, Chen, Chung-Chi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.04473 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DimStance: Multilingual Datasets for Dimensional Stance Analysis
by: Becker, Jonas, et al.
Published: (2026)
by: Becker, Jonas, et al.
Published: (2026)
RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs
by: Saji, Alan, et al.
Published: (2025)
by: Saji, Alan, et al.
Published: (2025)
DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis
by: Lee, Lung-Hao, et al.
Published: (2026)
by: Lee, Lung-Hao, et al.
Published: (2026)
2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset
by: Costa-jussà, Marta R., et al.
Published: (2024)
by: Costa-jussà, Marta R., et al.
Published: (2024)
I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset
by: Ma, Longxuan, et al.
Published: (2023)
by: Ma, Longxuan, et al.
Published: (2023)
Generator-Guided Crowd Reaction Assessment
by: Ghosh, Sohom, et al.
Published: (2024)
by: Ghosh, Sohom, et al.
Published: (2024)
Exploring the Maze of Multilingual Modeling
by: Nezhad, Sina Bagheri, et al.
Published: (2023)
by: Nezhad, Sina Bagheri, et al.
Published: (2023)
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs
by: Tytarenko, Stepan, et al.
Published: (2024)
by: Tytarenko, Stepan, et al.
Published: (2024)
SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset
by: Smădu, Răzvan-Alexandru, et al.
Published: (2025)
by: Smădu, Răzvan-Alexandru, et al.
Published: (2025)
Towards Massive Multilingual Holistic Bias
by: Tan, Xiaoqing Ellen, et al.
Published: (2024)
by: Tan, Xiaoqing Ellen, et al.
Published: (2024)
What Drives Performance in Multilingual Language Models?
by: Nezhad, Sina Bagheri, et al.
Published: (2024)
by: Nezhad, Sina Bagheri, et al.
Published: (2024)
idT5: Indonesian Version of Multilingual T5 Transformer
by: Fuadi, Mukhlish, et al.
Published: (2023)
by: Fuadi, Mukhlish, et al.
Published: (2023)
Ensembling Multilingual Transformers for Robust Sentiment Analysis of Tweets
by: Bilehsavar, Meysam Shirdel, et al.
Published: (2025)
by: Bilehsavar, Meysam Shirdel, et al.
Published: (2025)
Socially Responsible Data for Large Multilingual Language Models
by: Smart, Andrew, et al.
Published: (2024)
by: Smart, Andrew, et al.
Published: (2024)
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
by: Ashuach, Tomer, et al.
Published: (2025)
by: Ashuach, Tomer, et al.
Published: (2025)
Adapting Multilingual Models to Code-Mixed Tasks via Model Merging
by: Kodali, Prashant, et al.
Published: (2025)
by: Kodali, Prashant, et al.
Published: (2025)
Boosting Accuracy and Interpretability in Multilingual Hate Speech Detection Through Layer Freezing and Explainable AI
by: Bilehsavar, Meysam Shirdel, et al.
Published: (2026)
by: Bilehsavar, Meysam Shirdel, et al.
Published: (2026)
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5
by: Dang, Thao Anh, et al.
Published: (2024)
by: Dang, Thao Anh, et al.
Published: (2024)
Predictive Simultaneous Interpretation: Harnessing Large Language Models for Democratizing Real-Time Multilingual Communication
by: Iida, Kurando, et al.
Published: (2024)
by: Iida, Kurando, et al.
Published: (2024)
"AGI" team at SHROOM-CAP: Data-Centric Approach to Multilingual Hallucination Detection using XLM-RoBERTa
by: Rathva, Harsh, et al.
Published: (2025)
by: Rathva, Harsh, et al.
Published: (2025)
Locations of Characters in Narratives: Andersen and Persuasion Datasets
by: Ozyurt, Batuhan, et al.
Published: (2025)
by: Ozyurt, Batuhan, et al.
Published: (2025)
Blocks Architecture (BloArk): Efficient, Cost-Effective, and Incremental Dataset Architecture for Wikipedia Revision History
by: Li, Lingxi, et al.
Published: (2024)
by: Li, Lingxi, et al.
Published: (2024)
A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
by: Toker, Michael, et al.
Published: (2024)
by: Toker, Michael, et al.
Published: (2024)
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin
by: Bothwell, Stephen, et al.
Published: (2024)
by: Bothwell, Stephen, et al.
Published: (2024)
MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs
by: Bouchekif, Abdessalam, et al.
Published: (2026)
by: Bouchekif, Abdessalam, et al.
Published: (2026)
LCFO: Long Context and Long Form Output Dataset and Benchmarking
by: Costa-jussà, Marta R., et al.
Published: (2024)
by: Costa-jussà, Marta R., et al.
Published: (2024)
EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause
by: Nguyen, Mia Huong, et al.
Published: (2024)
by: Nguyen, Mia Huong, et al.
Published: (2024)
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars
by: Sileo, Damien
Published: (2024)
by: Sileo, Damien
Published: (2024)
Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review
by: Peters, Sydney, et al.
Published: (2025)
by: Peters, Sydney, et al.
Published: (2025)
RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis
by: Bose, Joy
Published: (2026)
by: Bose, Joy
Published: (2026)
Ensemble Language Models for Multilingual Sentiment Analysis
by: Hasan, Md Arid
Published: (2024)
by: Hasan, Md Arid
Published: (2024)
Towards Red Teaming in Multimodal and Multilingual Translation
by: Ropers, Christophe, et al.
Published: (2024)
by: Ropers, Christophe, et al.
Published: (2024)
Tracking Semantic Change in Slovene: A Novel Dataset and Optimal Transport-Based Distance
by: Pranjić, Marko, et al.
Published: (2024)
by: Pranjić, Marko, et al.
Published: (2024)
The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations
by: Lequeu, Pierre-Antoine, et al.
Published: (2026)
by: Lequeu, Pierre-Antoine, et al.
Published: (2026)
HalalBench: A Multilingual OCR Benchmark for Food Packaging Ingredient Extraction
by: Arief, Hasan
Published: (2026)
by: Arief, Hasan
Published: (2026)
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
by: Chehbouni, Khaoula, et al.
Published: (2025)
by: Chehbouni, Khaoula, et al.
Published: (2025)
Multilingual jailbreaking of LLMs using low-resource languages
by: Marx, Dylan, et al.
Published: (2026)
by: Marx, Dylan, et al.
Published: (2026)
Self-Supervised Borrowing Detection on Multilingual Wordlists
by: Wientzek, Tim
Published: (2025)
by: Wientzek, Tim
Published: (2025)
On Initializing Transformers with Pre-trained Embeddings
by: Kim, Ha Young, et al.
Published: (2024)
by: Kim, Ha Young, et al.
Published: (2024)
Towards Fundamental Language Models: Does Linguistic Competence Scale with Model Size?
by: Collado-Montañez, Jaime, et al.
Published: (2025)
by: Collado-Montañez, Jaime, et al.
Published: (2025)
Similar Items
-
DimStance: Multilingual Datasets for Dimensional Stance Analysis
by: Becker, Jonas, et al.
Published: (2026) -
RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs
by: Saji, Alan, et al.
Published: (2025) -
DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis
by: Lee, Lung-Hao, et al.
Published: (2026) -
2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset
by: Costa-jussà, Marta R., et al.
Published: (2024) -
I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset
by: Ma, Longxuan, et al.
Published: (2023)