Backend and Machine Learning Engineer
Research Assistant @ UniKlinikum OvGU
“I don't code, I compose.”
An end-to-end data science project that gets the data from source, stores it in in-house, custom-made lakehouse-based data lakehouse and processes it with dbt. Then this data is used to develop ML model with distributed training, which is then deployed and monitored in a streaming-based scenario.
Scientific individual project on early detection of fake news based on propagation style in Twitter. Python, NetworkX, Ray, Docker, Kubernetes, etc. was used in this project.
Volunteering project with DSSG Berlin. Task of the project includes web scraping and complete NLP analysis of partner websites of DRK. Python, Streamlit, etc. in being in this project.
Volunteering project with SWB (Statistic without Borders). Task of the project includes a comparative topic modeling project for Statistics without Borders (SWB), using Transformer-based (Top2Vec, BERTopic) and traditional (LDA) algorithms.
Semantic similarity based legal document retrieval using deep learning and NLP techniques.
A seminar project to determine the change in sentiment due to COVID-19 pandemic. Python, Pandas, NumPy, Gensim, SciPy, etc. was heavily used in this project.
Scientific team project on the translation of natural language to SQL and table summarization. PyTorch, HuggingFace, Transformers, Java, etc. was used in this project.
In this team project, we are investigating the fairness issues of a recommendation engine and will build a fair recommendation engine using an anonymized dataset of researchers and their publication, citation etc.
In this scientific team project, as part of master’s coursework, we are working with proprietary 3D MRI images to develop a Deep Learning based registration system. As the project is still running and reserved under proprietary rights, repository link is not available for public viewing.
In this team project, we used a dataset of 19th century fictional literature, created by Gutenberg Project. We extracted, based on the research done by Douglas Biber, a custom feature vector of 23 dimension based on writing style. The challenge of the task was the imbalance nature of the dataset. Later, we compared the performances of multiple algorithms, along with neural network in classifying the books.
This Java project is part of the Information Retrieval course. Given a repository of text/HTML dataset, we were to parse, index, and store them in ApacheTM Lucene. Upon searching, we were to perform a search and rank operation, based on provided string.
This project was done for a client. In it, I worked with core NodeJs for building a apartment search tool, that gives recommendation for renting an apartment based on a set of parameters of preference set by the user. The UI is pretty basic and was done in raw HTML, CSS, JS. Backbone for deploying this project is Docker.