Akanksha Singh

Hi! I am an experienced data scientist with over 4 years of experience, including leadership roles in data science and analytics. I combine deep technical proficiency in SQL, Python, R, and Machine Learning techniques with strategic consulting skills. I have a proven track record across multiple industries such as sports, retail and home improvement. Currently a Masters student in the Business Analytics program at Purdue University, I am always on the lookout to upskill myself with the latest technology!

Academic Projects

Optimizing Manufacturing Supply Chain Forecasting

April 2024
Project Milestones

  • Problem Assessment: Improve demand forecasting accuracy of 13000+ unique parts within client's complex supply chain by integrating advanced machine learning and time-series predictive models.
  • Data Preparation: Conducted extensive exploratory data analysis to categorize materials based on sales volume and variance, establishing the basis for targeted forecasting models.
  • Model Building: Deployed multiple forecasting techniques including ARIMA, Prophet, Exponential Smoothing, and deep learning approaches tailored to the variability characteristics of different product segments.
  • Dynamic selection box: Created a dynamic model selection framework that chooses the best forecasting model for each material based on the lowest Weighted Mean Absolute Percentage Error (WMAPE).
  • Visualization tools:Developed a comprehensive Tableau dashboard that provides interactive visualization of forecasts at the material-month level.

Impact

Enhanced forecasting accuracy by an improvement in MAPE by atleast 30%, directly contributing to better stock management, reducing both understock and overstock scenarios, thereby improving manufacturing efficiency and supplier relations.

Cryptocurrency Price Prediction & Portfolio Optimization

February 2024
Project Milestones

  • Problem Assessment: Build a predictive model to predict the closing rate of 10+ cryptos with an automated framework to buy/sell stocks and optimize portfolio to maximize returns
  • Data Preparation: Collected and analyzed a comprehensive dataset with over 24 million rows covering various cryptocurrencies.
  • Data Preprocessing: Implemented data cleaning and feature engineering to prepare the dataset for modeling, focusing on standardization and multicollinearity reduction.
  • Model Development: Designed and trained advanced neural network models including Sequential Neural Networks, Bidirectional LSTM, and Stacked LSTM to capture complex patterns in time-series data.
  • Portfolio Optimization: Developed a dynamic trading strategy that uses model outputs to execute buy/sell decisions based on risk management components such as stop-loss and take-profit thresholds to balance the risk-return profile.

Impact

Enhanced decision-making tools significantly reduce the risks associated with high cryptocurrency market volatility.

NCAA Women's Basketball Ticket Sales Prediction

February 2024
Project Milestones

  • Problem Assessment: Develop predictive models using NCAA Division I Women’s Basketball customer data, along with external datasets, to accurately predict ticket purchases and identify the most effective sales channels (primary vs. secondary markets).
  • Data Preprocessing: Analyzed a dataset comprising 200,000 records with 25 features, enhancing the dataset by generating 10 new features to capture complex relationships and improve model accuracy.
  • Model Building: Utilized advanced machine learning algorithms such as CatBoost, XGBoost, and Random Forest, and experimented with ensemble techniques to enhance prediction accuracy.
  • Recommendations: Proposed the introduction of loyalty programs to encourage purchases through the primary market and enhance customer retention.

Impact

Achieved classification accuracy of 98.49% (Kaggle top 10) and moved to top 3 teams at the University level for Crossroads competion, 2024.

Enhancing Categorization Accuracy on Craigslist through Text and Image Classification

November 2023
Project Milestones

  • Problem Assessment: Tackled the challenge of over 30% misclassification in computer products and 15% in computer components on Craigslist.
  • Data Preparation: Employed advanced preprocessing techniques such as tokenization, lemmatization, and normalization to refine the text data and image data for effective analysis.
  • Image Recognition: Adopted the VGG16 model for meticulous image analysis, ensuring accurate and consistent categorization of visual content.
  • Textual Analysis: Leveraged a combination of LSTM, Random Forest, and Gradient Boosting algorithms for sophisticated text classification of product descriptions.
  • Model Development: Engineered an integrated model combining both text and image classification, enhancing the accuracy of product categorization and overall user experience on the platform.

Impact

The project significantly elevated search efficiency and user experience, markedly reducing misclassifications by 30%. This innovation contributed to a more intuitive, effective, and user-friendly marketplace.

Machine Learning driven data strategy for enhancing Airbnb ecosystem

December 2023
Project Milestones

  • Problem Assessment: Implement a robust and data-driven approach to optimize the performance and strategic decision-making for all stakeholders in the Airbnb ecosystem, including hosts, the platform itself, customers, and potential investors.
  • Predictive Host Classification: Utilized Logistic Regression to predict 'Superhost' status, analyzing factors such as average ratings, occupancy rates, and property descriptions.
  • Occupancy Rate Optimization: Deployed a Random Forest model to forecast occupancy rates for hosts newly classified as 'Superhosts,' aiding in their preparation for occupancy fluctuations.
  • Impact Validation: Applied a Difference-in-Difference model to validate the effect of 'Superhost' status on occupancy rates, ensuring the reliability of our predictions.
  • Market Competitiveness Analysis: Computed the Herfindahl Index to assess market concentration and competitiveness, offering valuable insights for potential investors focusing on Chicago neighborhoods.
  • Investor ROI Calculation: Developed a comprehensive ROI model for potential investors by scraping and correlating property price data with Airbnb's average nightly rates, categorized by zip codes.

Impact

Delivered holistic business insights, equipping stakeholders to navigate real-time challenges in a platform-based business model effectively. This strategy not only enhanced operational efficiency but also provided a clear roadmap for sustained growth and profitability in the dynamic Airbnb marketplace.

Bankruptcy prediction using economic indicators on SAS EM

December 2023
Project Milestones

  • Problem Assessment: Predicted bankruptcy probabilities using 64 economic indicators through ensemble modeling.
  • Data Handling: Implemented advanced outlier filtering, removing duplicate values, logarithmic transformation, handling data imbalance and data partitioning.
  • Model Creation: Developed a dual gradient boosting model framework, with a focus on financial ratios like (Net profit + Depreciation) / Total liabilities.
  • Evaluation: Overcame challenges in overfitting, class imbalance, and validated model performance.
  • Learning: Acquired key insights for model enhancement, focusing on iteration optimization, tree depth, addressing multicollinearity, and refining data splitting methodologies.

Impact

The model setup was able to achieve an accuracy of over 96%, thus being able to predict bankruptcy.

Predicting Consumer Tastes with Big Data for Gap Inc.

October 2023
Project Milestones

  • Problem Assessment: Led an initiative to shift traditional marketing paradigms towards a data-centric approach, harnessing the power of AI and machine learning for predictive analytics and market trend analysis.
  • Advanced Web Data Analytics: Utilized Beautiful soup,Azure API and Reddit API for sentiment analysis, coupled with competitive analysis and customer review mining from platforms like Reddit and Gap website, to gather deep market insights.
  • Customized Brand Analytics: Conducted specialized analytics for brands such as Gap, Old Navy, and Banana Republic, employing sales data analysis to decode product ratings and customer preferences.
  • Trend Analysis: Extracted information about the key trends from best sellers' pages on various e-commerce stores such as Amazon and performed regression analysis to understand impact of different style and product features on product sales.
Impact

This strategic approach not only revolutionized marketing tactics but also provided a robust framework for data-driven decision-making, leading to enhanced customer engagement and market positioning.

Professional Experience

Over four years of diverse experience in the data science field, covering a range of different industries

Data Scientist

Oct 2019 - July 2023

Started as a Data Scientist and eventually advanced to the role of Apprentice Leader, managing a team of seven decision scientists and analysts, supporting data science initiatives for Fortune 500 companies.

Teaching Assistant

Jan 2024 - Apr 2024

Assisted Professor Dejoie Roy in the delivery and evaluation of lab submissions of 50+ undergraduate students for the Python course at the Daniel's School of Business, Purdue University

Kearney Student Lab

Jan 2024 - Apr 2024

Created a time-series prediction model for a top manufacturer in the United States, boosting the forecasting precision by at least 30% and cutting down excessive production by 25%, which in turn bolstered partnerships with suppliers

Data Scientist

Oct 2023 - Present

Collaborated with organizations such as Accenture and Eli Lilly & Company through the Krenicki Research Center to enhance data-driven analysis and decision-making using Machine Learning techniques and Generative AI.

Core Competencies

Skills

  • Languages
    Python, SQL, R
  • Tools and Technologies
    Machine Learning, Deep Learning, Git, Airflow, MS Office, @Risk, Minitab, Snowflake, SAS EM, Tableau, PowerBI, Jupyter, Tensorflow, Keras, LLM
  • Business Fundamentals
    A/B Testing, Consumer Analytics, Digital Analytics, Data Quality Management, Stakeholder Management, Business Intelligence, Pricing Strategy

Certifications

  • AWS Certified Cloud Practitioner
  • Microsoft Azure Fundamentals
  • Operations Research with SAS Optimization
  • Machine Learning Scientist with Python (Datacamp)
  • Data Analyst with Python (Datacamp)
  • SQL Advanced (HackerRank)
  • Python Fundamental (Datacamp)

Awards

  • Beta Gamma Sigma Scholar
  • Informs Conference Poster Competition 2024 - Social Media Award
  • Crossroads Analytics Competion, 2024 3rd Place [University Level]
  • Krannert Merit Scholarship
  • Mu Sigma Impact Award
  • Mu Sigma SPOT Award
  • SRM Merit Scholarship