:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Menn, Dennis, Liang, Feng, Marculescu, Diana
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2509.19589
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images
por: Menn, Dennis, et al.
Publicado: (2024)

Video Compression Meets Video Generation: Latent Inter-Frame Pruning with Attention Recovery
por: Menn, Dennis, et al.
Publicado: (2026)

SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
por: Liang, Feng, et al.
Publicado: (2022)

Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation
por: Menn, Dennis, et al.
Publicado: (2026)

Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
por: Mahmud, Tanvir, et al.
Publicado: (2024)

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling
por: Frumkin, Natalia, et al.
Publicado: (2025)

Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
por: Frumkin, Natalia, et al.
Publicado: (2023)

Looking Backward: Streaming Video-to-Video Translation with Feature Banks
por: Liang, Feng, et al.
Publicado: (2024)

PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
por: Mahmud, Tanvir, et al.
Publicado: (2024)

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
por: Mahmud, Tanvir, et al.
Publicado: (2024)

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models
por: Wu, Weijia, et al.
Publicado: (2023)

ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
por: Athar, Ali, et al.
Publicado: (2024)

MK-UNet: Multi-kernel Lightweight CNN for Medical Image Segmentation
por: Rahman, Md Mostafijur, et al.
Publicado: (2025)

LoMix: Learnable Weighted Multi-Scale Logits Mixing for Medical Image Segmentation
por: Rahman, Md Mostafijur, et al.
Publicado: (2025)

SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations
por: Liu, Wendi, et al.
Publicado: (2025)

Prominence-Aware Artifact Detection and Dataset for Image Super-Resolution
por: Molodetskikh, Ivan, et al.
Publicado: (2025)

Pixel-level Quality Assessment for Oriented Object Detection
por: Zhu, Yunhui, et al.
Publicado: (2025)

PixelArena: A benchmark for Pixel-Precision Visual Intelligence
por: Liang, Feng, et al.
Publicado: (2025)

Motion Artifact Removal in Pixel-Frequency Domain via Alternate Masks and Diffusion Model
por: Xu, Jiahua, et al.
Publicado: (2024)

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
por: Tang, Yuqi, et al.
Publicado: (2026)

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
por: Liang, Wenqi, et al.
Publicado: (2025)

IPAD-CLIP: Teaching CLIP to Detect Image Local Perceptual Artifacts
por: Wang, Juan, et al.
Publicado: (2026)

QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models
por: Chi, Tien-Yu, et al.
Publicado: (2025)

JPEG AI Image Compression Visual Artifacts: Detection Methods and Dataset
por: Tsereh, Daria, et al.
Publicado: (2024)

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
por: Mahmud, Tanvir, et al.
Publicado: (2024)

PixelLM: Pixel Reasoning with Large Multimodal Model
por: Ren, Zhongwei, et al.
Publicado: (2023)

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
por: Zhang, Tao, et al.
Publicado: (2025)

Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models
por: Yang, Yuedong, et al.
Publicado: (2026)

Synthesize Boundaries: A Boundary-aware Self-consistent Framework for Weakly Supervised Salient Object Detection
por: Xu, Binwei, et al.
Publicado: (2022)

SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models
por: Rawat, Abhay, et al.
Publicado: (2024)

OTR: Synthesizing Overlay Text Dataset for Text Removal
por: Zdenek, Jan, et al.
Publicado: (2025)

Simple Visual Artifact Detection in Sora-Generated Videos
por: Sugiyama, Misora, et al.
Publicado: (2025)

Detecting Human Artifacts from Text-to-Image Models
por: Wang, Kaihong, et al.
Publicado: (2024)

GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts
por: Kang, Jenna, et al.
Publicado: (2025)

LEHA-CVQAD: Dataset To Enable Generalized Video Quality Assessment of Compression Artifacts
por: Gushchin, Aleksandr, et al.
Publicado: (2025)

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
por: Wen, Siwei, et al.
Publicado: (2025)

Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection
por: Zhou, Chenming, et al.
Publicado: (2025)

Scaling Graph Convolutions for Mobile Vision
por: Avery, William, et al.
Publicado: (2024)

Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation
por: Wei, Xiwen, et al.
Publicado: (2024)

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
por: Liang, Feng, et al.
Publicado: (2023)