:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dey, Nolan, Taylor, Eric, Wong, Alexander, Tripp, Bryan, Taylor, Graham W.
Format:	Preprint
Published:	2020
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition I.2.10
Online Access:	https://arxiv.org/abs/2011.03043
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Appearance-based gaze estimation enhanced with synthetic images using deep neural networks
by: Herashchenko, Dmytro, et al.
Published: (2023)

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction
by: Costea, Dragos, et al.
Published: (2025)

NAC-TCN: Temporal Convolutional Networks with Causal Dilated Neighborhood Attention for Emotion Understanding
by: Mehta, Alexander, et al.
Published: (2023)

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models
by: Atighehchian, Parmida, et al.
Published: (2026)

VibrantVS: A high-resolution multi-task transformer for forest canopy height estimation
by: Chang, Tony, et al.
Published: (2024)

Improving Handshape Representations for Sign Language Processing: A Graph Neural Network Approach
by: Carbo, Alessa, et al.
Published: (2025)

MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
by: Lê, Eric-Tuan, et al.
Published: (2024)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

Multi-scale attention-based instance segmentation for measuring crystals with large size variation
by: Neubauer, Theresa, et al.
Published: (2024)

NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing
by: Xu, Zhenyu, et al.
Published: (2025)

VidNum-1.4K: A Comprehensive Benchmark for Video-based Numerical Reasoning
by: Cui, Shaoyang, et al.
Published: (2026)

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition
by: Nakamura, Ikuo
Published: (2024)

OmniFall: From Staged Through Synthetic to Wild, A Unified Multi-Domain Dataset for Robust Fall Detection
by: Schneider, David, et al.
Published: (2025)

An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
by: Castrillón-Santana, Modesto, et al.
Published: (2025)

A Challenging Benchmark of Anime Style Recognition
by: Li, Haotang, et al.
Published: (2022)

Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation
by: Liu, Jianming, et al.
Published: (2025)

FUSE-Flow: Scalable Real-Time Multi-View Point Cloud Reconstruction Using Confidence
by: Sun, Chentian
Published: (2026)

YotoR-You Only Transform One Representation
by: Villa, José Ignacio Díaz, et al.
Published: (2024)

MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation
by: Tran, Duc Dang Trung, et al.
Published: (2024)

ERNet: Efficient Non-Rigid Registration Network for Point Sequences
by: He, Guangzhao, et al.
Published: (2025)

GMAC: Global Multi-View Constraint for Automatic Multi-Camera Extrinsic Calibration
by: Sun, Chentian
Published: (2026)

Escaping The Big Data Paradigm in Self-Supervised Representation Learning
by: García, Carlos Vélez, et al.
Published: (2025)

SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation
by: Zheng, Peng, et al.
Published: (2024)

Co-Speech Gesture and Facial Expression Generation for Non-Photorealistic 3D Characters
by: Omine, Taisei, et al.
Published: (2025)

LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection
by: Xiao, Yutong, et al.
Published: (2026)

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)

Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation
by: Chan, Sheng-Wei, et al.
Published: (2026)

UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation
by: Liao, Xinyao, et al.
Published: (2025)

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
by: Ma, Jie, et al.
Published: (2024)

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025)

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS
by: Terven, Juan, et al.
Published: (2023)

SoccerLens: Grounded Soccer Video Understanding Beyond Accuracy
by: Elsharkawi, Ismael, et al.
Published: (2026)

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)

Classifying Simulated Gait Impairments using Privacy-preserving Explainable Artificial Intelligence and Mobile Phone Videos
by: Reddy, Lauhitya, et al.
Published: (2024)

Hierarchical Feature-level Reverse Propagation for Post-Training Neural Networks
by: Ding, Ni, et al.
Published: (2025)

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
by: Han, Yudong, et al.
Published: (2024)

SPARK: Scalable Real-Time Point Cloud Aggregation with Multi-View Self-Calibration
by: Sun, Chentian
Published: (2026)

ViG-LRGC: Vision Graph Neural Networks with Learnable Reparameterized Graph Construction
by: Elsharkawi, Ismael, et al.
Published: (2025)

Category-Agnostic Neural Object Rigging
by: He, Guangzhao, et al.
Published: (2025)