cs.CV - Discuss Papers

LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation M ...
Vision Language Models (VLMs) have shown impressive performances on numerous tasks but their zero-shot capabilities can be limited compared to dedicated or fine-tuned models. Yet, fine-tuning VLMs comes with limitations as it re ... Read More >

Clap 0 0 Comment 0 Collection Report
Tumor aware recurrent inter-patient deformable image registration of computed to ...
Background: Voxel-based analysis (VBA) for population level radiotherapy (RT) outcomes modeling requires topology preserving inter-patient deformable image registration (DIR) that preserves tumors on moving images while avoiding ... Read More >

Clap 0 0 Comment 0 Collection Report
Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-A ...
Efficiently evaluating the performance of text-to-image models is difficult as it inherently requires subjective judgment and human preference, making it hard to compare different models and quantify the state of the art. Levera ... Read More >

Clap 0 0 Comment 0 Collection Report
ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Ima ...
In the fast-evolving field of Generative AI, platforms like MidJourney, DALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation. However, despite their impressive ability to create high-quality images, they ... Read More >

Clap 0 0 Comment 0 Collection Report
SpheriGait: Enriching Spatial Representation via Spherical Projection for LiDAR- ...
Gait recognition is a rapidly progressing technique for the remote identification of individuals. Prior research predominantly employing 2D sensors to gather gait data has achieved notable advancements; nonetheless, they have un ... Read More >

Clap 0 0 Comment 0 Collection Report
Distillation-free Scaling of Large SSMs for Images and Videos
State-space models (SSMs), exemplified by S4, have introduced a novel context modeling method by integrating state-space techniques into deep learning. However, they struggle with global context modeling due to their data-indepe ... Read More >

Clap 0 0 Comment 0 Collection Report
Physically-Based Photometric Bundle Adjustment in Non-Lambertian Environments
Photometric bundle adjustment (PBA) is widely used in estimating the camera pose and 3D geometry by assuming a Lambertian world. However, the assumption of photometric consistency is often violated since the non-diffuse reflecti ... Read More >

Clap 0 0 Comment 0 Collection Report
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
This paper introduces the Neural Transcoding Vision Transformer (\modelname), a generative model designed to estimate high-resolution functional Magnetic Resonance Imaging (fMRI) samples from simultaneous Electroencephalography ... Read More >

Clap 0 0 Comment 0 Collection Report
RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and ...
Cloth state estimation is an important problem in robotics. It is essential for the robot to know the accurate state to manipulate cloth and execute tasks such as robotic dressing, stitching, and covering/uncovering human beings ... Read More >

Clap 0 0 Comment 0 Collection Report
End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimat ...
6D object pose estimation is the problem of identifying the position and orientation of an object relative to a chosen coordinate system, which is a core technology for modern XR applications. State-of-the-art 6D object pose est ... Read More >

Clap 0 0 Comment 0 Collection Report