:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zou, Bo, Yang, Chao, Qiao, Yu, Quan, Chengbin, Zhao, Youjian
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2404.00913
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VideoDistill: Language-aware Vision Distillation for Video Question Answering
by: Zou, Bo, et al.
Published: (2024)

LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image Registration
by: Ma, Mingrui, et al.
Published: (2024)

EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
by: Xing, Bohao, et al.
Published: (2024)

What If We Recaption Billions of Web Images with LLaMA-3?
by: Li, Xianhang, et al.
Published: (2024)

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
by: Chu, Xiangxiang, et al.
Published: (2024)

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
by: Zhang, Renrui, et al.
Published: (2023)

LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning
by: Jahangir, Md. Zihad Bin, et al.
Published: (2025)

LLaMA Pro: Progressive LLaMA with Block Expansion
by: Wu, Chengyue, et al.
Published: (2024)

Adapting LLaMA Decoder to Vision Transformer
by: Wang, Jiahao, et al.
Published: (2024)

LLaVA-Video: Video Instruction Tuning With Synthetic Data
by: Zhang, Yuanhan, et al.
Published: (2024)

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
by: Lee, Jewon, et al.
Published: (2025)

ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
by: Dialameh, Maryam, et al.
Published: (2025)

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
by: Wang, Zhengyi, et al.
Published: (2024)

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
by: Zhu, Tong, et al.
Published: (2024)

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
by: Zhang, Yanzhe, et al.
Published: (2023)

Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
by: Ma, Fan, et al.
Published: (2023)

Multimodal Medical Disease Classification with LLaMA II
by: Gapp, Christian, et al.
Published: (2024)

LogLLaMA: Transformer-based log anomaly detection with LLaMA
by: Yang, Zhuoyi, et al.
Published: (2025)

VoCo-LLaMA: Towards Vision Compression with Large Language Models
by: Ye, Xubing, et al.
Published: (2024)

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
by: Qu, Xiaoye, et al.
Published: (2024)

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain
by: Gema, Aryo Pradipta, et al.
Published: (2023)

Dia-LLaMA: Towards Large Language Model-driven CT Report Generation
by: Chen, Zhixuan, et al.
Published: (2024)

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
by: You, Zebin, et al.
Published: (2025)

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
by: Cheng, Zesen, et al.
Published: (2024)

Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
by: Sun, Shenghuan, et al.
Published: (2024)

Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge
by: Zou, Bo, et al.
Published: (2024)

LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
by: Bi, Jinhe, et al.
Published: (2024)

EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning
by: Sun, Haoran, et al.
Published: (2025)

Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
by: Cui, Yiming, et al.
Published: (2023)

Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages
by: Andersland, Michael
Published: (2024)

BanglaLlama: LLaMA for Bangla Language
by: Zehady, Abdullah Khan, et al.
Published: (2024)

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
by: Seyfioglu, Mehmet Saygin, et al.
Published: (2023)

High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2
by: M, Nandakishor, et al.
Published: (2025)

LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations in LLaMA Models Through Probing
by: Di Palma, Dario, et al.
Published: (2025)

How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
by: Yuan, Fei, et al.
Published: (2023)

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
by: Cocchi, Federico, et al.
Published: (2025)

LLaMA-Based Models for Aspect-Based Sentiment Analysis
by: Šmíd, Jakub, et al.
Published: (2025)

A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
by: Zhou, Shijie, et al.
Published: (2024)

ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation
by: Li, Siyou, et al.
Published: (2024)

Otter: A Multi-Modal Model with In-Context Instruction Tuning
by: Li, Bo, et al.
Published: (2023)