Saved in:
| Main Authors: | Zou, Bo, Yang, Chao, Qiao, Yu, Quan, Chengbin, Zhao, Youjian |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.00913 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VideoDistill: Language-aware Vision Distillation for Video Question Answering
by: Zou, Bo, et al.
Published: (2024)
by: Zou, Bo, et al.
Published: (2024)
LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image Registration
by: Ma, Mingrui, et al.
Published: (2024)
by: Ma, Mingrui, et al.
Published: (2024)
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
by: Xing, Bohao, et al.
Published: (2024)
by: Xing, Bohao, et al.
Published: (2024)
What If We Recaption Billions of Web Images with LLaMA-3?
by: Li, Xianhang, et al.
Published: (2024)
by: Li, Xianhang, et al.
Published: (2024)
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
by: Chu, Xiangxiang, et al.
Published: (2024)
by: Chu, Xiangxiang, et al.
Published: (2024)
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
by: Zhang, Renrui, et al.
Published: (2023)
by: Zhang, Renrui, et al.
Published: (2023)
LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning
by: Jahangir, Md. Zihad Bin, et al.
Published: (2025)
by: Jahangir, Md. Zihad Bin, et al.
Published: (2025)
LLaMA Pro: Progressive LLaMA with Block Expansion
by: Wu, Chengyue, et al.
Published: (2024)
by: Wu, Chengyue, et al.
Published: (2024)
Adapting LLaMA Decoder to Vision Transformer
by: Wang, Jiahao, et al.
Published: (2024)
by: Wang, Jiahao, et al.
Published: (2024)
LLaVA-Video: Video Instruction Tuning With Synthetic Data
by: Zhang, Yuanhan, et al.
Published: (2024)
by: Zhang, Yuanhan, et al.
Published: (2024)
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
by: Lee, Jewon, et al.
Published: (2025)
by: Lee, Jewon, et al.
Published: (2025)
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
by: Dialameh, Maryam, et al.
Published: (2025)
by: Dialameh, Maryam, et al.
Published: (2025)
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
by: Wang, Zhengyi, et al.
Published: (2024)
by: Wang, Zhengyi, et al.
Published: (2024)
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
by: Zhu, Tong, et al.
Published: (2024)
by: Zhu, Tong, et al.
Published: (2024)
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
by: Zhang, Yanzhe, et al.
Published: (2023)
by: Zhang, Yanzhe, et al.
Published: (2023)
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
by: Ma, Fan, et al.
Published: (2023)
by: Ma, Fan, et al.
Published: (2023)
Multimodal Medical Disease Classification with LLaMA II
by: Gapp, Christian, et al.
Published: (2024)
by: Gapp, Christian, et al.
Published: (2024)
LogLLaMA: Transformer-based log anomaly detection with LLaMA
by: Yang, Zhuoyi, et al.
Published: (2025)
by: Yang, Zhuoyi, et al.
Published: (2025)
VoCo-LLaMA: Towards Vision Compression with Large Language Models
by: Ye, Xubing, et al.
Published: (2024)
by: Ye, Xubing, et al.
Published: (2024)
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
by: Qu, Xiaoye, et al.
Published: (2024)
by: Qu, Xiaoye, et al.
Published: (2024)
Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain
by: Gema, Aryo Pradipta, et al.
Published: (2023)
by: Gema, Aryo Pradipta, et al.
Published: (2023)
Dia-LLaMA: Towards Large Language Model-driven CT Report Generation
by: Chen, Zhixuan, et al.
Published: (2024)
by: Chen, Zhixuan, et al.
Published: (2024)
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
by: You, Zebin, et al.
Published: (2025)
by: You, Zebin, et al.
Published: (2025)
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
by: Cheng, Zesen, et al.
Published: (2024)
by: Cheng, Zesen, et al.
Published: (2024)
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
by: Sun, Shenghuan, et al.
Published: (2024)
by: Sun, Shenghuan, et al.
Published: (2024)
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge
by: Zou, Bo, et al.
Published: (2024)
by: Zou, Bo, et al.
Published: (2024)
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
by: Bi, Jinhe, et al.
Published: (2024)
by: Bi, Jinhe, et al.
Published: (2024)
EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
by: Cui, Yiming, et al.
Published: (2023)
by: Cui, Yiming, et al.
Published: (2023)
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages
by: Andersland, Michael
Published: (2024)
by: Andersland, Michael
Published: (2024)
BanglaLlama: LLaMA for Bangla Language
by: Zehady, Abdullah Khan, et al.
Published: (2024)
by: Zehady, Abdullah Khan, et al.
Published: (2024)
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
by: Seyfioglu, Mehmet Saygin, et al.
Published: (2023)
by: Seyfioglu, Mehmet Saygin, et al.
Published: (2023)
High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2
by: M, Nandakishor, et al.
Published: (2025)
by: M, Nandakishor, et al.
Published: (2025)
LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations in LLaMA Models Through Probing
by: Di Palma, Dario, et al.
Published: (2025)
by: Di Palma, Dario, et al.
Published: (2025)
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
by: Yuan, Fei, et al.
Published: (2023)
by: Yuan, Fei, et al.
Published: (2023)
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
by: Cocchi, Federico, et al.
Published: (2025)
by: Cocchi, Federico, et al.
Published: (2025)
LLaMA-Based Models for Aspect-Based Sentiment Analysis
by: Šmíd, Jakub, et al.
Published: (2025)
by: Šmíd, Jakub, et al.
Published: (2025)
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
by: Zhou, Shijie, et al.
Published: (2024)
by: Zhou, Shijie, et al.
Published: (2024)
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation
by: Li, Siyou, et al.
Published: (2024)
by: Li, Siyou, et al.
Published: (2024)
Otter: A Multi-Modal Model with In-Context Instruction Tuning
by: Li, Bo, et al.
Published: (2023)
by: Li, Bo, et al.
Published: (2023)
Similar Items
-
VideoDistill: Language-aware Vision Distillation for Video Question Answering
by: Zou, Bo, et al.
Published: (2024) -
LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image Registration
by: Ma, Mingrui, et al.
Published: (2024) -
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
by: Xing, Bohao, et al.
Published: (2024) -
What If We Recaption Billions of Web Images with LLaMA-3?
by: Li, Xianhang, et al.
Published: (2024) -
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
by: Chu, Xiangxiang, et al.
Published: (2024)