Sourajit Saha | Homepage

Looking for Research Internship (Winter 2025-2026, Summer 2026)

♦ Interactive Video Retrieval, Search, and Understanding: Advancing interactive video retrieval via VLMs, scene-graph reasoning, VQA-based finetuning, and dialogue-driven systems for improved semantic understanding.
♦ Visual Reasoning: Investigating spatial reasoning, counterfactual visual inference, and editing techniques to enhance model interpretability, adaptability, and causal understanding.
♦ Reliable Vision Systems: Evaluating vision models by detecting hallucinations and measuring generative quality in T2I and T2V outputs for fidelity and alignment.

Bio

I am a Computer Science PhD student, working under the guidance of Tejas Gokhale in the UMBC Cognitive Vision Group at University of Maryland, Baltimore County (UMBC). I work on interactive video retrieval/search, visual reasoning, and improving/assessing reliability for vision systems. My research in interactive video retrieval and search spans four key areas:

Enhancing few-shot and zero-shot video search and retrieval by leveraging the rapid progress in Vision-Language Models (VLMs).
Developing Scene Graph-based Chain-of-Thought reasoning frameworks to enable structured and interpretable understanding, retrieval, and search across complex video content.
Investigating Video Question Answering (VQA) systems as auxiliary tasks for finetuning, with a focus on how the completeness of visual information affects downstream video understanding.
Designing dialogue-driven interactive retrieval systems, where natural conversations guide iterative video exploration and search, improving user engagement and retrieval effectiveness.

News

Jul 2025: Serving in program committee for AAAI 2026
Jul 2025: Recieved Lambda Research Grant Award at CVPR 2025
Mar 2025: Reviewing manuscripts for ICCV 2025 (Track: Vision, language, and reasoning)
Jan 2025: One paper accepted for Oral at WACV 2025
Dec 2024: Received UMBC GSA Travel Grant
Oct 2024: One paper accepted at WACV 2025 in Tucson, Arizona (Preprint)
Mar 2024: Got accepted into SCALE 2024 at JHU to work on event-based visual content retrieval on summer'24
Feb 2024: Guest lecture on CMSC 491/691: Computer Vision at UMBC
Feb 2024: Reviewing manuscripts for IJCAI 2024, ACM Transactions on Computing for Healthcare
Nov 2023: Guest lecture on CMSC 678: (Graduate) Machine Learning at UMBC
Oct 2023: Reviewing manuscripts for AAAI 2024 (student abstract)
Sep 2023: Joined Cognitive Vision Group at UMBC, lead by Tejas Gokhale
Aug 2023: One paper accepted at ICCV 2023 in OODCV, Paris, France

Click to see older news

Publications

Most recent publications on Google Scholar.

Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling. Sourajit Saha, Tejas Gokhale

WACV 2025, ICCV 2023 (OODCV workshop) paper video poster code

@InProceedings{Saha_2025_WACV,
    author    = {Saha, Sourajit and Gokhale, Tejas},
    title     = {Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025},
    pages     = {620-629}
}

RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget (Student Abstract). Sourajit Saha, Shaswati Saha, Md Osman Gani, Tim Oates, David Chapman

AAAI 2023 paper code

@inproceedings{saha2023rfc,
    title={RFC-net: learning high resolution global features for medical image segmentation on a computational budget (student abstract)},
    author={Saha, Sourajit and Saha, Shaswati and Gani, Md Osman and Oates, Tim and Chapman, David},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={37},
    number={13},
    pages={16314--16315},
    year={2023}
}

Mitigating Domain Shift in AI-Based TB Screening With Unsupervised Domain Adaptation. Nishanjan Ravin, Sourajit Saha, Alan Schweitzer, Ameena Elahi, Farouk Dako, Daniel Mollura, David Chapman

IEEE Access paper code

@article{ravin2022mitigating,
    title={Mitigating domain shift in AI-based TB screening with unsupervised domain adaptation},
    author={Ravin, Nishanjan and Saha, Sourajit and Schweitzer, Alan and Elahi, Ameena and Dako, Farouk and Mollura, Daniel and Chapman, David},
    journal={IEEE Access},
    volume={10},
    pages={45997--46013},
    year={2022},
    publisher={IEEE}
}

Pairwise Meta Learning Pipeline: Classifying COVID-19 abnormalities on chest radio-graphs. Sourajit Saha, Yaacov Yesha, Yelena Yesha, Aryya Gangopadhyay, David Chapman, Michael Morris, Babak Saboury, Phuong Nguyen

SPIE Medical Imaging Conference 2022 Paper

@article{saha2022pairwise,
    title={Pairwise meta learning pipeline: classifying COVID-19 abnormalities on chest radio-graphs},
    author={Saha, Sourajit and Yesha, Yaacov},
    journal={SPIE Medical Imaging 2022: Computer-Aided Diagnosis; PC1203302 (2022) Proceedings Volume PC12033, Medical Imaging 2022: Computer-Aided Diagnosis; PC1203302 (2022)},
    year={2022}
}

A comprehensive set of novel residual blocks for deep learning architectures for diagnosis of retinal diseases from optical coherence tomography images. Sharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir, Alireza Tavakkoli

Springer Book Series, 2020 paper code

@article{kamran2021comprehensive,
    title={A comprehensive set of novel residual blocks for deep learning architectures for diagnosis of retinal diseases from optical coherence tomography images},
    author={Kamran, Sharif Amit and Saha, Sourajit and Sabbir, Ali Shihab and Tavakkoli, Alireza},
    journal={Deep Learning Applications, Volume 2},
    pages={25--48},
    year={2021},
    publisher={Springer}
}

Optic-Net: A Novel Convolutional Neural Network for Diagnosis of Retinal Diseases from Optical Tomography Images. Sharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir, Alireza Tavakkoli

ICMLA 2019 paper code

@inproceedings{kamran2019optic,
    title={Optic-net: A novel convolutional neural network for diagnosis of retinal diseases from optical tomography images},
    author={Kamran, Sharif Amit and Saha, Sourajit and Sabbir, Ali Shihab and Tavakkoli, Alireza},
    booktitle={2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)},
    pages={964--971},
    year={2019},
    organization={IEEE}
}

Academic Service

Program Committee / Reviewer

Conferences: AAAI 2026, ICCV 2025, AAAI 2024 (student abstract), IJCAI 2024
Journals: APSIPA Transactions on Signal and Information Processing (Cambridge University Press), ACM Transactions on Computing for Healthcare Computers and Electronics in Agriculture

Membership

CVF, AAAI, ACL

Teaching

UMBC (TA): CMSC 678: Machine Learning Fall 2024
UMBC (TA): CMSC 691: Computer Vision Spring 2024
UMBC (TA): CMSC 678: Machine Learning Fall 2023
UMBC (TA): CMSC 471: Introduction to Artificial Intelligence Spring 2023
UMBC (TA): CMSC 313: Assembly Language and Computer Organization Fall 2021
UMBC (TA): CMSC 341: Data Structures Spring 2021

Collaborators

Current Collaborators

Previous Collaborators

Acknowledgement

Website theme inspirations: Aniruddha Saha, Martin Saveski, Aditi Partap.

Sourajit Saha

Last updated 08/12/2025