:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Li, Long, He, Xuzheng, Wang, Haozhe, Wang, Linlin, He, Liang
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Artificial Intelligence Computation and Language Programming Languages
Accesso online:	https://arxiv.org/abs/2402.15729
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
di: Vafa, Keyon, et al.
Pubblicazione: (2024)

Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
di: Opedal, Andreas, et al.
Pubblicazione: (2024)

CodeMind: Evaluating Large Language Models for Code Reasoning
di: Liu, Changshu, et al.
Pubblicazione: (2024)

How Well Do Large Language Models Truly Ground?
di: Lee, Hyunji, et al.
Pubblicazione: (2023)

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
di: Lu, Yi-Long, et al.
Pubblicazione: (2025)

Code Simulation Challenges for Large Language Models
di: La Malfa, Emanuele, et al.
Pubblicazione: (2024)

How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
di: Wang, Zora Zhiruo, et al.
Pubblicazione: (2025)

Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates
di: Hooda, Ashish, et al.
Pubblicazione: (2024)

Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not
di: Karakaş, Sercan
Pubblicazione: (2026)

How Likely Do LLMs with CoT Mimic Human Reasoning?
di: Bao, Guangsheng, et al.
Pubblicazione: (2024)

Do Large Language Models Possess Sensitive to Sentiment?
di: Liu, Yang, et al.
Pubblicazione: (2024)

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data
di: Wang, Yejie, et al.
Pubblicazione: (2024)

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
di: Shi, Yuling, et al.
Pubblicazione: (2024)

How Do Language Models Compose Functions?
di: Khandelwal, Apoorv, et al.
Pubblicazione: (2025)

Do Influence Functions Work on Large Language Models?
di: Li, Zhe, et al.
Pubblicazione: (2024)

Case-Based or Rule-Based: How Do Transformers Do the Math?
di: Hu, Yi, et al.
Pubblicazione: (2024)

How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect
di: Vemuri, Siddhartha K., et al.
Pubblicazione: (2024)

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
di: Paul, Indraneil, et al.
Pubblicazione: (2024)

Lost in the Pipeline: How Well Do Large Language Models Handle Data Preparation?
di: Spreafico, Matteo, et al.
Pubblicazione: (2025)

DOCE: Finding the Sweet Spot for Execution-Based Code Generation
di: Li, Haau-Sing, et al.
Pubblicazione: (2024)

Do Large Language Models Know How Much They Know?
di: Prato, Gabriele, et al.
Pubblicazione: (2025)

Do Large Language Models Understand Logic or Just Mimick Context?
di: Yan, Junbing, et al.
Pubblicazione: (2024)

To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
di: Wang, Haozhe, et al.
Pubblicazione: (2025)

Do Large Language Models Solve ARC Visual Analogies Like People Do?
di: Opiełka, Gustaw, et al.
Pubblicazione: (2024)

(How) Do Language Models Track State?
di: Li, Belinda Z., et al.
Pubblicazione: (2025)

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
di: Le, Hung, et al.
Pubblicazione: (2023)

FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
di: Ravi, Nikil, et al.
Pubblicazione: (2026)

How Do LLMs Use Their Depth?
di: Gupta, Akshat, et al.
Pubblicazione: (2025)

Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows
di: Gopalakrishnan, Sriram
Pubblicazione: (2026)

AutoCode: LLMs as Problem Setters for Competitive Programming
di: Zhou, Shang, et al.
Pubblicazione: (2025)

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
di: Tang, Hao, et al.
Pubblicazione: (2024)

Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?
di: Chen, Yuyan, et al.
Pubblicazione: (2024)

Do Efficient Transformers Really Save Computation?
di: Yang, Kai, et al.
Pubblicazione: (2024)

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
di: Dong, Yixin, et al.
Pubblicazione: (2024)

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
di: Wang, Yanbo, et al.
Pubblicazione: (2025)

ACE-$M^3$: Automatic Capability Evaluator for Multimodal Medical Models
di: Zhang, Xiechi, et al.
Pubblicazione: (2024)

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
di: Chuang, Yung-Sung, et al.
Pubblicazione: (2023)

Do Large Language Models Understand Word Senses?
di: Meconi, Domenico, et al.
Pubblicazione: (2025)

Do Large Language Models Know What They Are Capable Of?
di: Barkan, Casey O., et al.
Pubblicazione: (2025)

OverleafCopilot: Empowering Academic Writing in Overleaf with Large Language Models
di: Wen, Haomin, et al.
Pubblicazione: (2024)