:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Hendria, Willy Fitra
Format:	Preprint
Published:	2023
Subjects:	Multimedia Computation and Language Computer Vision and Pattern Recognition Machine Learning Image and Video Processing
Online Access:	https://arxiv.org/abs/2306.11341
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
by: Le-Duc, Khai, et al.
Published: (2024)

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
by: Henschel, Roberto, et al.
Published: (2024)

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
by: Javed, Sajid, et al.
Published: (2024)

Deep Video Codec Control for Vision Models
by: Reich, Christoph, et al.
Published: (2023)

HPC: Hierarchical Progressive Coding Framework for Volumetric Video
by: Zheng, Zihan, et al.
Published: (2024)

TIACam: Text-Anchored Invariant Feature Learning with Auto-Augmentation for Camera-Robust Zero-Watermarking
by: Tanvir, Abdullah All, et al.
Published: (2026)

Comparing the Robustness of Modern No-Reference Image- and Video-Quality Metrics to Adversarial Attacks
by: Antsiferova, Anastasia, et al.
Published: (2023)

FineVQ: Fine-Grained User Generated Content Video Quality Assessment
by: Duan, Huiyu, et al.
Published: (2024)

Spatial Visibility and Temporal Dynamics: Revolutionizing Field of View Prediction in Adaptive Point Cloud Video Streaming
by: Li, Chen, et al.
Published: (2024)

Video Quality Enhancement Using Deep Learning-Based Prediction Models for Quantized DCT Coefficients in MPEG I-frames
by: Busson, Antonio J G, et al.
Published: (2020)

HiLight: Technical Report on the Motern AI Video Language Model
by: Wang, Zhiting, et al.
Published: (2024)

A Survey on Super Resolution for video Enhancement Using GAN
by: Maity, Ankush, et al.
Published: (2023)

A Near-Raw Talking-Head Video Dataset for Various Computer Vision Tasks
by: Naderi, Babak, et al.
Published: (2026)

Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar
by: Qian, Rongsheng, et al.
Published: (2025)

SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation
by: Lu, Zhenyu, et al.
Published: (2026)

NAIMA: Semantics Aware RGB Guided Depth Super-Resolution
by: Nasir, Tayyab, et al.
Published: (2026)

Medical Image Analysis for Detection, Treatment and Planning of Disease using Artificial Intelligence Approaches
by: Yadav, Nand Lal, et al.
Published: (2024)

SCENE: Semantic-aware Codec Enhancement with Neural Embeddings
by: Lin, Han-Yu, et al.
Published: (2026)

Attention GhostUNet++: Enhanced Segmentation of Adipose Tissue and Liver in CT Images
by: Hayat, Mansoor, et al.
Published: (2025)

DeepFaceLab: Integrated, flexible and extensible face-swapping framework
by: Perov, Ivan, et al.
Published: (2020)

CFAT: Unleashing TriangularWindows for Image Super-resolution
by: Ray, Abhisek, et al.
Published: (2024)

Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss
by: Yu, Li, et al.
Published: (2024)

Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration
by: Teng, Siyue, et al.
Published: (2024)

Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge
by: Yang, Sicheng, et al.
Published: (2026)

LinMU: Multimodal Understanding Made Linear
by: Wang, Hongjie, et al.
Published: (2026)

CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video
by: Wang, Xinyi, et al.
Published: (2025)

Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks
by: Yan, Zijiang, et al.
Published: (2025)

qAttCNN - Self Attention Mechanism for Video QoE Prediction in Encrypted Traffic
by: Sidorov, Michael, et al.
Published: (2026)

Frequency-Spatial Interaction Driven Network for Low-Light Image Enhancement
by: Tao, Yunhong, et al.
Published: (2025)

HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios
by: Peng, Kunyu, et al.
Published: (2025)

Perceptual Video Quality Assessment: A Survey
by: Min, Xiongkuo, et al.
Published: (2024)

ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing
by: Naderi, Babak, et al.
Published: (2025)

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models
by: Sun, Wei, et al.
Published: (2023)

T2IW: Joint Text to Image & Watermark Generation
by: Liu, An-An, et al.
Published: (2023)

Learning Perceptual Representations for Gaming NR-VQA with Multi-Task FR Signals
by: Chen, Yu-Chih, et al.
Published: (2026)

Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking
by: Wang, Ziyi, et al.
Published: (2025)

R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
by: Li, Chunyi, et al.
Published: (2024)

Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment
by: Li, Yixiao, et al.
Published: (2024)

Scalable Event-Based Video Streaming for Machines with MoQ
by: Freeman, Andrew C.
Published: (2025)

Object-Attribute-Relation Representation Based Video Semantic Communication
by: Du, Qiyuan, et al.
Published: (2024)