Ge, C., Cheng, S., Wang, Z., Yuan, J., Gao, Y., Song, J., . . . Zheng, B. (2024). ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models.
Chicago Style (17th ed.) CitationGe, Chunjiang, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, and Bo Zheng. ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models. 2024.
MLA (9th ed.) CitationGe, Chunjiang, et al. ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models. 2024.
Warning: These citations may not always be 100% accurate.