Random Image

Sourajit Saha

PhD Student, Computer Science at University of Maryland, Baltimore County

Email: ssaha2@umbc.edu Location: ITE 338, UMBC, Baltimore, MD 21250

Computer Vision | Interactive Video Retrieval | Visual Reasoning | Vision and Language

CV

Looking for Research Internship (Winter 2025-2026, Summer 2026)

Interactive Video Retrieval, Search, and Understanding: Advancing interactive video retrieval via VLMs, scene-graph reasoning, VQA-based finetuning, and dialogue-driven systems for improved semantic understanding.
Visual Reasoning: Investigating spatial reasoning, counterfactual visual inference, and editing techniques to enhance model interpretability, adaptability, and causal understanding.
Reliable Vision Systems: Evaluating vision models by detecting hallucinations and measuring generative quality in T2I and T2V outputs for fidelity and alignment.

Bio

I am a Computer Science PhD student, working under the guidance of Tejas Gokhale in the UMBC Cognitive Vision Group at University of Maryland, Baltimore County (UMBC). I work on interactive video retrieval/search, visual reasoning, and improving/assessing reliability for vision systems. My research in interactive video retrieval and search spans four key areas:

  1. Enhancing few-shot and zero-shot video search and retrieval by leveraging the rapid progress in Vision-Language Models (VLMs).
  2. Developing Scene Graph-based Chain-of-Thought reasoning frameworks to enable structured and interpretable understanding, retrieval, and search across complex video content.
  3. Investigating Video Question Answering (VQA) systems as auxiliary tasks for finetuning, with a focus on how the completeness of visual information affects downstream video understanding.
  4. Designing dialogue-driven interactive retrieval systems, where natural conversations guide iterative video exploration and search, improving user engagement and retrieval effectiveness.

News
Click to see older news
Publications

Most recent publications on Google Scholar.

Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling. Sourajit Saha, Tejas Gokhale

WACV 2025, ICCV 2023 (OODCV workshop) paper video poster code


RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget (Student Abstract). Sourajit Saha, Shaswati Saha, Md Osman Gani, Tim Oates, David Chapman

AAAI 2023 paper code


Mitigating Domain Shift in AI-Based TB Screening With Unsupervised Domain Adaptation. Nishanjan Ravin, Sourajit Saha, Alan Schweitzer, Ameena Elahi, Farouk Dako, Daniel Mollura, David Chapman

IEEE Access paper code


Pairwise Meta Learning Pipeline: Classifying COVID-19 abnormalities on chest radio-graphs. Sourajit Saha, Yaacov Yesha, Yelena Yesha, Aryya Gangopadhyay, David Chapman, Michael Morris, Babak Saboury, Phuong Nguyen

SPIE Medical Imaging Conference 2022 Paper


A comprehensive set of novel residual blocks for deep learning architectures for diagnosis of retinal diseases from optical coherence tomography images. Sharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir, Alireza Tavakkoli

Springer Book Series, 2020 paper code


Optic-Net: A Novel Convolutional Neural Network for Diagnosis of Retinal Diseases from Optical Tomography Images. Sharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir, Alireza Tavakkoli

ICMLA 2019 paper code

Academic Service
Collaborators
Acknowledgement

Website theme inspirations: Aniruddha Saha, Martin Saveski, Aditi Partap.

Sourajit Saha
Last updated 08/12/2025