Guardado en:
| Autores principales: | Ball, Thomas, Chen, Shuo, Herley, Cormac |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2409.07638 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs
por: Sato, Ryoma
Publicado: (2026)
por: Sato, Ryoma
Publicado: (2026)
A Logical Fallacy-Informed Framework for Argument Generation
por: Mouchel, Luca, et al.
Publicado: (2024)
por: Mouchel, Luca, et al.
Publicado: (2024)
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities
por: Chen, Yuhao, et al.
Publicado: (2023)
por: Chen, Yuhao, et al.
Publicado: (2023)
When Can Transformers Count to n?
por: Yehudai, Gilad, et al.
Publicado: (2024)
por: Yehudai, Gilad, et al.
Publicado: (2024)
MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification
por: Helwe, Chadi, et al.
Publicado: (2023)
por: Helwe, Chadi, et al.
Publicado: (2023)
ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation
por: Tang, Zihao, et al.
Publicado: (2024)
por: Tang, Zihao, et al.
Publicado: (2024)
MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data
por: Poliakov, Mykhailo, et al.
Publicado: (2025)
por: Poliakov, Mykhailo, et al.
Publicado: (2025)
Can we trust the evaluation on ChatGPT?
por: Aiyappa, Rachith, et al.
Publicado: (2023)
por: Aiyappa, Rachith, et al.
Publicado: (2023)
Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension
por: Vatsal, Shubham, et al.
Publicado: (2024)
por: Vatsal, Shubham, et al.
Publicado: (2024)
Can LLMs Follow Simple Rules?
por: Mu, Norman, et al.
Publicado: (2023)
por: Mu, Norman, et al.
Publicado: (2023)
Can Post-Training Transform LLMs into Causal Reasoners?
por: Chen, Junqi, et al.
Publicado: (2026)
por: Chen, Junqi, et al.
Publicado: (2026)
Quantifying the Capabilities of LLMs across Scale and Precision
por: Badshah, Sher, et al.
Publicado: (2024)
por: Badshah, Sher, et al.
Publicado: (2024)
Non-Halting Queries: Exploiting Fixed Points in LLMs
por: Hammouri, Ghaith, et al.
Publicado: (2024)
por: Hammouri, Ghaith, et al.
Publicado: (2024)
Addressing the Ecological Fallacy in Larger LMs with Human Context
por: Soni, Nikita, et al.
Publicado: (2026)
por: Soni, Nikita, et al.
Publicado: (2026)
How Much Can We Forget about Data Contamination?
por: Bordt, Sebastian, et al.
Publicado: (2024)
por: Bordt, Sebastian, et al.
Publicado: (2024)
How Far Are We From AGI: Are LLMs All We Need?
por: Feng, Tao, et al.
Publicado: (2024)
por: Feng, Tao, et al.
Publicado: (2024)
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
por: Chen, Junying, et al.
Publicado: (2023)
por: Chen, Junying, et al.
Publicado: (2023)
Evaluating GPT's Capability in Identifying Stages of Cognitive Impairment from Electronic Health Data
por: Leng, Yu, et al.
Publicado: (2025)
por: Leng, Yu, et al.
Publicado: (2025)
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
por: Chen, Junying, et al.
Publicado: (2024)
por: Chen, Junying, et al.
Publicado: (2024)
Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM
por: Jia, Furong, et al.
Publicado: (2025)
por: Jia, Furong, et al.
Publicado: (2025)
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
por: Li, Ming, et al.
Publicado: (2024)
por: Li, Ming, et al.
Publicado: (2024)
We're Different, We're the Same: Creative Homogeneity Across LLMs
por: Wenger, Emily, et al.
Publicado: (2025)
por: Wenger, Emily, et al.
Publicado: (2025)
Can GRPO Help LLMs Transcend Their Pretraining Origin?
por: Ni, Kangqi, et al.
Publicado: (2025)
por: Ni, Kangqi, et al.
Publicado: (2025)
Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
por: Lin, Xiaofeng, et al.
Publicado: (2026)
por: Lin, Xiaofeng, et al.
Publicado: (2026)
Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!
por: TeBlunthuis, Nathan, et al.
Publicado: (2023)
por: TeBlunthuis, Nathan, et al.
Publicado: (2023)
Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation
por: Jeong, Jiwon, et al.
Publicado: (2025)
por: Jeong, Jiwon, et al.
Publicado: (2025)
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
por: Zhao, Justin, et al.
Publicado: (2024)
por: Zhao, Justin, et al.
Publicado: (2024)
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
por: Feng, Guhao, et al.
Publicado: (2024)
por: Feng, Guhao, et al.
Publicado: (2024)
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
por: Deng, Wenhao, et al.
Publicado: (2025)
por: Deng, Wenhao, et al.
Publicado: (2025)
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
por: Xiong, Boya, et al.
Publicado: (2025)
por: Xiong, Boya, et al.
Publicado: (2025)
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
por: Tian, Yijun, et al.
Publicado: (2024)
por: Tian, Yijun, et al.
Publicado: (2024)
Can We Predict Before Executing Machine Learning Agents?
por: Zheng, Jingsheng, et al.
Publicado: (2026)
por: Zheng, Jingsheng, et al.
Publicado: (2026)
False Fixed Points: Kantian Feedback, Stable Miscalibration, and Representational Compression in LLMs
por: Okutomi, Akira
Publicado: (2025)
por: Okutomi, Akira
Publicado: (2025)
Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents
por: Turk, Matt
Publicado: (2026)
por: Turk, Matt
Publicado: (2026)
Single layer tiny Co$^4$ outpaces GPT-2 and GPT-BERT
por: Zain, Noor Ul, et al.
Publicado: (2025)
por: Zain, Noor Ul, et al.
Publicado: (2025)
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
por: DeepSeek-AI, et al.
Publicado: (2025)
por: DeepSeek-AI, et al.
Publicado: (2025)
Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs
por: Miyamoto, Sora, et al.
Publicado: (2026)
por: Miyamoto, Sora, et al.
Publicado: (2026)
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
por: Sun, Yiyou, et al.
Publicado: (2025)
por: Sun, Yiyou, et al.
Publicado: (2025)
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
por: Dong, Yihong, et al.
Publicado: (2025)
por: Dong, Yihong, et al.
Publicado: (2025)
BgGPT 1.0: Extending English-centric LLMs to other languages
por: Alexandrov, Anton, et al.
Publicado: (2024)
por: Alexandrov, Anton, et al.
Publicado: (2024)
Ejemplares similares
-
Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs
por: Sato, Ryoma
Publicado: (2026) -
A Logical Fallacy-Informed Framework for Argument Generation
por: Mouchel, Luca, et al.
Publicado: (2024) -
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities
por: Chen, Yuhao, et al.
Publicado: (2023) -
When Can Transformers Count to n?
por: Yehudai, Gilad, et al.
Publicado: (2024) -
MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification
por: Helwe, Chadi, et al.
Publicado: (2023)