Istiyak H. Siddiquee

Magnum Opus

An end-to-end data science project that gets the data from source, stores it in in-house, custom-made lakehouse-based data lakehouse and processes it with dbt. Then this data is used to develop ML model with distributed training, which is then deployed and monitored in a streaming-based scenario.

... ... ...

Propagation Style Analysis for Fake and Real News

Scientific individual project on early detection of fake news based on propagation style in Twitter. Python, NetworkX, Ray, Docker, Kubernetes, etc. was used in this project.

Web Scraping and Topic Modeling for Deutsche Rotes Kreuz (DRK)

Volunteering project with DSSG Berlin. Task of the project includes web scraping and complete NLP analysis of partner websites of DRK. Python, Streamlit, etc. in being in this project.

Topic Modeling of a Multilingual Survey Corpus of Women’s March Global (SWB)

Volunteering project with SWB (Statistic without Borders). Task of the project includes a comparative topic modeling project for Statistics without Borders (SWB), using Transformer-based (Top2Vec, BERTopic) and traditional (LDA) algorithms.

COLIEE Dataset Challenge

Semantic similarity based legal document retrieval using deep learning and NLP techniques.

Evolution of Weekly Avg. Sentiment due to COVID-19

A seminar project to determine the change in sentiment due to COVID-19 pandemic. Python, Pandas, NumPy, Gensim, SciPy, etc. was heavily used in this project.

NL2SQL and Table Summarization

Scientific team project on the translation of natural language to SQL and table summarization. PyTorch, HuggingFace, Transformers, Java, etc. was used in this project.

Fair Recommendations in Research

In this team project, we are investigating the fairness issues of a recommendation engine and will build a fair recommendation engine using an anonymized dataset of researchers and their publication, citation etc.

3D Medical Image Registration using Deep Learning

In this scientific team project, as part of master’s coursework, we are working with proprietary 3D MRI images to develop a Deep Learning based registration system. As the project is still running and reserved under proprietary rights, repository link is not available for public viewing.

Style Based Genre Classification of Gutenberg Corpus

In this team project, we used a dataset of 19th century fictional literature, created by Gutenberg Project. We extracted, based on the research done by Douglas Biber, a custom feature vector of 23 dimension based on writing style. The challenge of the task was the imbalance nature of the dataset. Later, we compared the performances of multiple algorithms, along with neural network in classifying the books.

... ... ...

Information Retrieval System using ApacheTM Lucene (November, 2019)

This Java project is part of the Information Retrieval course. Given a repository of text/HTML dataset, we were to parse, index, and store them in ApacheTM Lucene. Upon searching, we were to perform a search and rank operation, based on provided string.

ApartmentApp

This project was done for a client. In it, I worked with core NodeJs for building a apartment search tool, that gives recommendation for renting an apartment based on a set of parameters of preference set by the user. The UI is pretty basic and was done in raw HTML, CSS, JS. Backbone for deploying this project is Docker.

... ... ...

The Hive - Vocabulary app for GRE

In this Android app, I have used a dataset of 5000 GRE vocabularies, stored in SQLite database. To learn these, I have designed a user-friendly UI where the user can bookmark the important words and use them as a deck of cards.

... ... ...