♦ T2I models: Image/concept editing, robustness evaluation
♦ Text-Image alignment
♦ In-context learning: Vision + Language
♦ Visual Commonsense Reasoning / Visual Question Answering
♦ Video Understanding
♦ Text-to-Video Retrieval
♦ Robust Visual Perceptrons: bridging data domains and distributions
Bio
I am currently pursuing PhD in Computer Science at University of Maryland, Baltimore County (UMBC). I am advised by Prof.
Tejas Gokhale
and I work on improving reliability and robustness in computer vision systems at the
UMBC Cognitive Vision Group
.
My current research focuses on two aspects of robustnes and reasoning in visual perception.
I spend my time evaluating and improving computer vision systems to adapt to new domains, i.e. visual recognition module on Tesla developed and tested in sunny California
should perform equally under change of domain, i.e. snowy Colorado. I investigate on leveraging LLMs and interpolation of images and latents in spectral domain
to study domain generalization, domain adaptation, OOD detection, open-set recognition and, AI explainability.
Other aspects of my research focuses on evaluation of generative AI e.g. text-to-image models
in terms of context alignment, spatial reasoning, commonsense reasoning etc.
I use Vision + Language to address, evaluate and improve robustnes and applicability in text-to-image models.
I am broadly interested in investigating attention based and knowledge infused vision models in the
application of autonomous driving, and other sensory perception tasks.
My long term goal is to explore mechanisms for infusing knowledge and visual grounding to build computer vision systems
that are robust, efficacious and can interact with human in meaningful ways.
Prior to joining UMBC, I was a Research Assistant at Center for Cognitive Skill Enhancement (CCSE) at Independent University Bangladesh (IUB).
I have received a Masters of Science in Computer Science from UMBC in 2023 and a Bachelor of Science in Computer Science from BRAC University in 2017.
In my free time I love composing EDMs, cooking and driving. If not computer science, I would have pursued stand up comedy!
RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget (Student Abstract). Sourajit Saha, Shaswati Saha, Md Osman Gani, Tim Oates, David Chapman
@inproceedings{saha2023rfc,
title={RFC-net: learning high resolution global features for medical image segmentation on a computational budget (student abstract)},
author={Saha, Sourajit and Saha, Shaswati and Gani, Md Osman and Oates, Tim and Chapman, David},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={37},
number={13},
pages={16314--16315},
year={2023}
}
Mitigating Domain Shift in AI-Based TB Screening With Unsupervised Domain Adaptation. Nishanjan Ravin, Sourajit Saha, Alan Schweitzer, Ameena Elahi, Farouk Dako, Daniel Mollura, David Chapman
@article{ravin2022mitigating,
title={Mitigating domain shift in AI-based TB screening with unsupervised domain adaptation},
author={Ravin, Nishanjan and Saha, Sourajit and Schweitzer, Alan and Elahi, Ameena and Dako, Farouk and Mollura, Daniel and Chapman, David},
journal={IEEE Access},
volume={10},
pages={45997--46013},
year={2022},
publisher={IEEE}
}
Pairwise Meta Learning Pipeline: Classifying COVID-19 abnormalities on chest radio-graphs. Sourajit Saha, Yaacov Yesha, Yelena Yesha, Aryya Gangopadhyay, David Chapman, Michael Morris, Babak Saboury, Phuong Nguyen
@article{saha2022pairwise,
title={Pairwise meta learning pipeline: classifying COVID-19 abnormalities on chest radio-graphs},
author={Saha, Sourajit and Yesha, Yaacov},
journal={SPIE Medical Imaging 2022: Computer-Aided Diagnosis; PC1203302 (2022) Proceedings Volume PC12033, Medical Imaging 2022: Computer-Aided Diagnosis; PC1203302 (2022)},
year={2022}
}
A comprehensive set of novel residual blocks for deep learning architectures for diagnosis of retinal diseases from optical coherence tomography images. Sharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir, Alireza Tavakkoli
@article{kamran2021comprehensive,
title={A comprehensive set of novel residual blocks for deep learning architectures for diagnosis of retinal diseases from optical coherence tomography images},
author={Kamran, Sharif Amit and Saha, Sourajit and Sabbir, Ali Shihab and Tavakkoli, Alireza},
journal={Deep Learning Applications, Volume 2},
pages={25--48},
year={2021},
publisher={Springer}
}
Optic-Net: A Novel Convolutional Neural Network for Diagnosis of Retinal Diseases from Optical Tomography Images. Sharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir, Alireza Tavakkoli
@inproceedings{kamran2019optic,
title={Optic-net: A novel convolutional neural network for diagnosis of retinal diseases from optical tomography images},
author={Kamran, Sharif Amit and Saha, Sourajit and Sabbir, Ali Shihab and Tavakkoli, Alireza},
booktitle={2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)},
pages={964--971},
year={2019},
organization={IEEE}
}
A Lightning fast approach to classify Bangla Handwritten Characters and Numerals using newly structured Deep Neural Network. Sourajit Saha, Nisha Saha
@inproceedings{saha2018total,
title={Total recall: understanding traffic signs using deep convolutional neural network},
author={Saha, Sourajit and Kamran, Sharif Amit and Sabbir, Ali Shihab},
booktitle={2018 21st international conference of computer and information technology (ICCIT)},
pages={1--6},
year={2018},
organization={IEEE}
}