Analyzing Customer Demographics and Behavior to Strategize Targeted Marketing for Online Retail
Word Association Game Using NLP
Privacy-Preserving Image Processing for Face Recognition Algorithms
Face data for most people does not stay protected against third-party corporations - this technique utilizes two privacy-preserving techniques - differential privacy and homomorphic encryption while passing data to a convolutional neural network, ensuring that this data cannot be accessed at any point of its computation.
Key Targets :
Differential Privacy, Homomorphic Encryption, Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Convolutional Neural Networks, Tensorflow, OneHot Encoding, LabelEncoding, Stochastic Gradient Descent, DIfferential-Privacy-Adam-Optimizer
Programming Language :
Python
Devised targeted and broad marketing strategies for an Online Retail Company's 85000+ products across 23 categories based on sales, overall perception, shipping capabilities and general demand across the landscape.
Key Targets :
Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
tidyverse, ggplot2, dplyr
Programming Language :
R Programming
Tackling the famous Italian game where a target word or phrase links multiple clues, using phrases and sentences gathered from numerous sources to extract nouns. Furthermore, multiple novel algorithms and transformers were implemented to achieve this task with rudimentary success.
Key Targets :
Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, NLTK, Skip-Gram Classification, WordNet, Recurrent Neural Network, Long Short Term Memory (LSTM), Bidirectional Auto Regressive Transformer (BART), T5 Architecture, Sentence Transformer, GPT-2
Programming Language :
Python
Analyzing Factors Affecting the S&P500 Index and Forecasting the Stock Price for a Company in the Highest Performing Sector
Collated over 10 years of S&P500 data for each company within its 11 sectors. Provided a macro-micro approach, analyzing each sector's impact on the overall index and the impact of top-performing companies within the sector, and finally, forecasting Pfizer's stock for a year within the healthcare sector.
Key Targets :
Machine Learning, Regression, Data Analysis, Feature Engineering, Data Visualization, ARIMA, ARIMAx, SARIMA, Box Test, Chi-Square Test
Key Libraries :
tidyverse, ggplot2, dplyr
Programming Language :
R Programming
Analyzing and Predicting Factors Affecting Credit Card Approvals
For a group of 100,000 registered clients, inspecting, engineering, and analyzing factors that would influence the decision of banks to provide or deny lines of credit. Further, prediction was implemented to devise an algorithm that would achieve the same task, and optimization was implemented to be employed on outliers.
Key Targets :
Machine Learning, Regression, Statistical Inference, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
tidyverse, ggplot2, dplyr
Programming Language :
R Programming
Identification and Analysis of Factors Affecting the Job Market for Aspiring Data Scientists, and Predicting Possibilities of Garnering a Job
Analyzed over 100,000 potential candidate portfolios to identify factors that enable or inhibit the possibility of garnering a job in Data Science. Specific focus paid towards experience, projects, collaborative nature
Key Targets :
Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Classification Algorithms (scikit_learn), SMOTE, Tensorflow & Keras, GBM Classifier
Programming Language :
Python
Analyzing Factors Influencing Heart Disease and Predicting Risk of Onset Among Patients
For 85,000 patients over the age of 60, analyzed health conditions and vital reports to understand factors that influenced the onset of any cardio-related ailment. Also predicted the risk of contracting such an ailment from the data.
Key Targets :
Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Classification Algorithms, Label Encoding, OneHot Encoding, Boosting Algorithms
Programming Language :
Python
Generic Clustering System for Any Formatted Dataset
Devised a clustering algorithm based on K-Means and Agglomerative clustering techniques that could take in any data of a specific format and provide recommendations from clusters. Tested on Netflix data and University collection data. Further, devised a graph visualizing the clustering process via NetworkX.
Key Targets :
Machine Learning, Clustering, Feature Engineering, Data Analysis, Data Visualization
Key Libraries :
Pandas, Plotly, NLTK, OneHot Encoding, Label Encoding, K-Means Clustering, Agglomerative Clustering, Silhouette Score, Elbow Method, NetworkX.
Programming Language :
Python
Building a Twitter Database with an Optimized Search via Machine Learning and Sentiment Analysis
Developed an optimized algorithm to pull information from relational (PostgreSQL) and non-relational database (MongoDB) containing over 1,000,000 tweets. Further developed an algorithm to improve search based on hashtags, users, and a custom method of ranking engagement, optimized using sentiment analysis.
Key Targets :
Database Management, Machine Learning, Data Analysis, Feature Engineering, Data Visualization, Sentiment Analysis
Key Libraries :
Pandas, Plotly, NLTK, TextBlob, PostgreSQL, MongoDB, SQL, MySQL
Programming Language :
Python