Projects | Jahnavi Shah

Analyzing Customer Demographics and Behavior to Strategize Targeted Marketing for Online Retail

Word Association Game Using NLP

Privacy-Preserving Image Processing for Face Recognition Algorithms

Face data for most people does not stay protected against third-party corporations - this technique utilizes two privacy-preserving techniques - differential privacy and homomorphic encryption while passing data to a convolutional neural network, ensuring that this data cannot be accessed at any point of its computation.

Key Targets :

Differential Privacy, Homomorphic Encryption, Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Convolutional Neural Networks, Tensorflow, OneHot Encoding, LabelEncoding, Stochastic Gradient Descent, DIfferential-Privacy-Adam-Optimizer

Programming Language :

Python

green and red fruits on black plastic crate

Devised targeted and broad marketing strategies for an Online Retail Company's 85000+ products across 23 categories based on sales, overall perception, shipping capabilities and general demand across the landscape.

Key Targets :

Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

tidyverse, ggplot2, dplyr

Programming Language :

R Programming

Tackling the famous Italian game where a target word or phrase links multiple clues, using phrases and sentences gathered from numerous sources to extract nouns. Furthermore, multiple novel algorithms and transformers were implemented to achieve this task with rudimentary success.

Key Targets :

Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, NLTK, Skip-Gram Classification, WordNet, Recurrent Neural Network, Long Short Term Memory (LSTM), Bidirectional Auto Regressive Transformer (BART), T5 Architecture, Sentence Transformer, GPT-2

Programming Language :

Python

woman with brown hair looking at the light

Analyzing Factors Affecting the S&P500 Index and Forecasting the Stock Price for a Company in the Highest Performing Sector

Collated over 10 years of S&P500 data for each company within its 11 sectors. Provided a macro-micro approach, analyzing each sector's impact on the overall index and the impact of top-performing companies within the sector, and finally, forecasting Pfizer's stock for a year within the healthcare sector.

Key Targets :

Machine Learning, Regression, Data Analysis, Feature Engineering, Data Visualization, ARIMA, ARIMAx, SARIMA, Box Test, Chi-Square Test

Key Libraries :

tidyverse, ggplot2, dplyr

Programming Language :

R Programming

Analyzing and Predicting Factors Affecting Credit Card Approvals

For a group of 100,000 registered clients, inspecting, engineering, and analyzing factors that would influence the decision of banks to provide or deny lines of credit. Further, prediction was implemented to devise an algorithm that would achieve the same task, and optimization was implemented to be employed on outliers.

Key Targets :

Machine Learning, Regression, Statistical Inference, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

tidyverse, ggplot2, dplyr

Programming Language :

R Programming

a typewriter with a job application printed on it

Identification and Analysis of Factors Affecting the Job Market for Aspiring Data Scientists, and Predicting Possibilities of Garnering a Job

Analyzed over 100,000 potential candidate portfolios to identify factors that enable or inhibit the possibility of garnering a job in Data Science. Specific focus paid towards experience, projects, collaborative nature

Key Targets :

Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Classification Algorithms (scikit_learn), SMOTE, Tensorflow & Keras, GBM Classifier

Programming Language :

Python

black and white digital heart beat monitor at 97 display

Analyzing Factors Influencing Heart Disease and Predicting Risk of Onset Among Patients

For 85,000 patients over the age of 60, analyzed health conditions and vital reports to understand factors that influenced the onset of any cardio-related ailment. Also predicted the risk of contracting such an ailment from the data.

Key Targets :

Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Classification Algorithms, Label Encoding, OneHot Encoding, Boosting Algorithms

Programming Language :

Python

Generic Clustering System for Any Formatted Dataset

Devised a clustering algorithm based on K-Means and Agglomerative clustering techniques that could take in any data of a specific format and provide recommendations from clusters. Tested on Netflix data and University collection data. Further, devised a graph visualizing the clustering process via NetworkX.

Key Targets :

Machine Learning, Clustering, Feature Engineering, Data Analysis, Data Visualization

Key Libraries :

Pandas, Plotly, NLTK, OneHot Encoding, Label Encoding, K-Means Clustering, Agglomerative Clustering, Silhouette Score, Elbow Method, NetworkX.

Programming Language :

Python

Building a Twitter Database with an Optimized Search via Machine Learning and Sentiment Analysis

Developed an optimized algorithm to pull information from relational (PostgreSQL) and non-relational database (MongoDB) containing over 1,000,000 tweets. Further developed an algorithm to improve search based on hashtags, users, and a custom method of ranking engagement, optimized using sentiment analysis.

Key Targets :

Database Management, Machine Learning, Data Analysis, Feature Engineering, Data Visualization, Sentiment Analysis

Key Libraries :

Pandas, Plotly, NLTK, TextBlob, PostgreSQL, MongoDB, SQL, MySQL

Programming Language :

Python

Analyzing and Predicting Factors Affecting Credit Card Approvals

Contacts