:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sharma, Aaryam, Czarnecki, Chris, Chen, Yuhao, Xi, Pengcheng, Xu, Linlin, Wong, Alexander
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2405.08717
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

In The Wild Ellipse Parameter Estimation for Circular Dining Plates and Bowls
by: Pathiranage, Akil, et al.
Published: (2024)

Food Portion Estimation: From Pixels to Calories
by: Vinod, Gautham, et al.
Published: (2026)

FoodTrack: Estimating Handheld Food Portions with Egocentric Video
by: Wang, Ervin, et al.
Published: (2025)

Understanding the Limitations of Diffusion Concept Algebra Through Food
by: Zeng, E. Zhixuan, et al.
Published: (2024)

Food Portion Estimation via 3D Object Scaling
by: Vinod, Gautham, et al.
Published: (2024)

NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches
by: Tai, Chi-en Amy, et al.
Published: (2023)

6D Pose Estimation on Spoons and Hands
by: Tan, Kevin, et al.
Published: (2025)

Improving Remote Sensing Classification using Topological Data Analysis and Convolutional Neural Networks
by: Sharma, Aaryam
Published: (2025)

Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
by: Vinod, Gautham, et al.
Published: (2026)

MetaFood3D: 3D Food Dataset with Nutrition Values
by: Chen, Yuhao, et al.
Published: (2024)

How Much 3D Do Video Foundation Models Encode?
by: Huang, Zixuan, et al.
Published: (2025)

NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images
by: Keller, Matthew, et al.
Published: (2024)

Guess the Unified Model: How Much Can We Recover from Generated Images?
by: Cekinmez, Jasin, et al.
Published: (2026)

SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction
by: Bhattacharyya, Prarthana, et al.
Published: (2024)

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts
by: Zeng, E. Zhixuan, et al.
Published: (2024)

DreamPose3D: Hallucinative Diffusion with Prompt Learning for 3D Human Pose Estimation
by: Bright, Jerrin, et al.
Published: (2025)

Domain-Guided Masked Autoencoders for Unique Player Identification
by: Balaji, Bavesh, et al.
Published: (2024)

Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion
by: Qi, Huiyan, et al.
Published: (2025)

How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models
by: Zhang, Huixuan, et al.
Published: (2025)

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
by: Yu, Shoubin, et al.
Published: (2026)

PortionNet: Distilling 3D Geometric Knowledge for Food Nutrition Estimation
by: Bright, Darrin, et al.
Published: (2025)

LensWalk: Agentic Video Understanding by Planning How You See in Videos
by: Li, Keliang, et al.
Published: (2026)

HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
by: Wang, Yimu, et al.
Published: (2025)

Fake It Till You Make It: Using Synthetic Data and Domain Knowledge for Improved Text-Based Learning for LGE Detection
by: Jacob, Athira J, et al.
Published: (2025)

Vision-Based Approach for Food Weight Estimation from 2D Images
by: Wimalasiri, Chathura, et al.
Published: (2024)

Artificial Intelligence in the Food Industry: Food Waste Estimation based on Computer Vision, a Brief Case Study in a University Dining Hall
by: Rokhva, Shayan, et al.
Published: (2025)

Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model Fusion
by: Huang, Yuxiang, et al.
Published: (2024)

How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions
by: Bilmes, Jeff A., et al.
Published: (2026)

Composite Classifier-Free Guidance for Multi-Modal Conditioning in Wind Dynamics Super-Resolution
by: Schnell, Jacob, et al.
Published: (2025)

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
by: Liu, Guoshan, et al.
Published: (2024)

Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics
by: Zhao, Pengcheng, et al.
Published: (2024)

An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection
by: Patel, Neel, et al.
Published: (2025)

Annolid: Annotate, Segment, and Track Anything You Need
by: Yang, Chen, et al.
Published: (2024)

How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment
by: Chen, Zhen, et al.
Published: (2025)

SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM
by: Tian, Yuhao, et al.
Published: (2025)

Ideal Registration? Segmentation is All You Need
by: Chen, Xiang, et al.
Published: (2025)

Memory augment is All You Need for image restoration
by: Zhang, Xiao Feng, et al.
Published: (2023)

Boosting Semi-Supervised Medical Image Segmentation via Masked Image Consistency and Discrepancy Learning
by: Zhou, Pengcheng, et al.
Published: (2025)

Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision
by: Pan, Pengcheng, et al.
Published: (2026)

Emergence of Fixational and Saccadic Movements in a Multi-Level Recurrent Attention Model for Vision
by: Pan, Pengcheng, et al.
Published: (2025)