Akshay Sehgal

Data Scientist with over nine years of experience, currently working as a Manager, Data Science (Senior AI Specialist) at AI Garage, Mastercard, where I research, design, train and deploy ML models powering enterprise scale platforms and products. I have experience in working closely, business leadership, researchers, full stack developers, product teams and dev-ops to map out and productionalising AI/ML models in cloud-based products & platforms for Risk, Finance, Retail & Agritech domains. Some of my projects include, deep-learning based recommendations engines, geo-spatial route matching, distributed virtual assistants, document semantic matching, face recognition, fraud and risk scoring, anomaly detection in image data and natural language to querying of databases.

I hold the current top 0.1% global ranking on Stack Overflow for Python, Numpy, Pandas, Sklearn and Tensorflow. I am also a veteran educator in the field of AI and Data science, have been teaching advance ML and deep learning since 3+ years now, including being the lead faculty for PGD AI/ML @ IIIT-B+UpGrad

Previously, I was heading the Data Science initiatives as a General Manager, Data Science at Reliance Industries where I managed and mentored a large team of 15+ data scientists. I have significant entrepreneurial experience as co-founder/Head of data products for a 40 Cr valuation startup in ML domain called iPredictt Data Labs. My career in the data science domain started off in Mu-Sigma, a pure play analytics firm where I was groomed to solve large scale business problems with data for Fortune 500 companies.

Outside office I explore metaphysics, epistemology, theoretical physics, amateur mathematics and graphic designing. I have been a guitarist for over 12 years now and my latest hobby is cooking and baking.


Achievements

  • Top 0.1% ranking (current) in Stackoverflow, top 5% in Python globally. [0]
  • Lead Faculty with IIIT-B/UpGrad PGD AI/ML.[1]
  • Previous educator with INSAID, Digital Vidya, Edureka for Advance ML & Deep Learning.
  • Frequent contributor on Digital Vidya, Code Gladiators, Kaggle (medalist). [2]
  • Participated as a speaker at multiple tech events across India. [3][4]
  • Have 3 technology patents under my name. [201721005644, 201621034521, 201621034522]
  • Have 7 patent pending papers in the domain of Deep Learning & ML
  • Have been interviewed multiple times as a leader in AI/ML industry. [5][6]
  • Delivered multiple lectures as an expert speaker. [7][8]
  • Was the youngest FLL in Hydrocarbons with consistent A+ performance, at Reliance Industries

Stack and Algorithms

  • Deep Learning - Graph neural networks, GCN, GraphSAGE, Temporal Graphs, Computer vision, Image segmentation, Language modelling, NLP, RNNs, LSTMs, Word2vec, GloVe, Transformers, Encoder-decoder with attention, Variational Auto-encoders, U-net, DeepFM, GANs, Genetic algorithms, Reinforcement learning, Deep belief networks, Self organizing maps, RBMs, Deep dream networks
  • Classic ML - Generalized linear models, Ensemble models (Stacknet, Xgboost, Catboost, Random Forests), Tree based models (Decision Trees), SVMs, K-means, Gaussian mixtures, Heirarchial DBScan, TSNE, PCA, Matrix factorization, Probabilistic models, Network analysis, Markov models, Conditional random fields, Forecasting (ARIMA, S-ARIMAX)
  • Libraries - Numpy, Pandas, Collections, Itertools, Scikit learn, Tensorflow2, Keras, Pytorch, Scipy, Django, Flask, Multiprocessing, Pyspark, Numba, Selenium
  • Languages - Python3, R, SQL
  • Frameworks - Ngrok, Docker, AWS EC2, AWS Lambda (Zappa), Azure
  • Tools - Anaconda3, Jupyter Notebooks, Pycharm, Sublime, Homebrew, Tableau, PowerBI

“Craving” subgraph clustering for Food Recommendations

Manager, AI Garage, Mastercard

Building a general-purpose recommendation engine model utilizing graph neural networks to cluster ingredient sub-graph of the items currently in user’s basket. Based on exploitation/exploration strategy, the neural network’s goal is to identify what a customer is likely to order based on ingredient composition of their basket, in order to promote cold-start recommendation as well as food exploration.

Tools used: Python3, DGL, Pytorch, Tensorflow2, Sub-graph embeddings, GAT, GCN, LSTM, Node2vec

July 2021 – Ongoing

Virtual Cold-Chain assistant for Produce & Farm Management

Manager, AI Garage, Mastercard

Collaborating with Data.org to build a computer vision powered virtual assistant for Indian farmers to optimize decision on farms produce and gain access to sustainable off-grid cooling storages for reducing loss of produce and recover operational costs. The model is part of a largescale MasterCard initiative to help farmers estimate most energy and cost-efficient strategy to store/sell their produce. The architecture involves a multi-regression-based spatio-temporal graph model to forecast market rates for commodities, in conjunction with a computer vision-based pipeline to identify and categorize available produce with a farmer, flowing into a non-linear optimizer under constraints and a natural language generation system to make recommendations via an application interface.

Tools used: Python3, DGL, Pytorch, Tensorflow2, Temporal-GraphSAGE, Temporal-GAT, VGGNet, U-Net, Self-attention, encoder-decoder LSTM, Word2vec

June 2021 – Ongoing

Robust Fraud Detection using Graph Neural Networks

Manager, AI Garage, Mastercard

Developing a solution to identify compromised merchants and cards based on transaction data. Based on the graph connectivity of merchant and customer nodes, we model potential fraud by gauging known fraud nodes and utilizing label propagation methods to induct a component of risk to adjacent nodes. Exploring Adversarial attacks on the graphs with topological attacks and node poisoning attacks on the graph data, to further create defence strategies to protect the fraud detection GNNs against such attacks.

Tools used: Python3, DGL, Pytorch, Tensorflow2, Multihead Attention, GraphSAGE, Graph attention networks, GAT, GCN, Deepwalk

Feb 2021 – Ongoing

Recommendation engine for Restaurant drive-thru’s

Manager, AI Garage, Mastercard

Developed a recommendation engine with for multiple major US based restaurant chains using Deep factorization machines initialized with embedding representations and designed a custom soft-switch equation to optimize revenue vs conversion maximization. The model uses n-item recall for conversion maximization as an objective function and models the interactions between items, weather, time of the day and product metadata. The model is currently in production testing and has a potential of $2 margin gain over each transaction on average based on a simulation study we performed. Currently submitted a paper to a top tier conference on the problem that uses a variation of graph neural networks and a self-designed hierarchical attention mechanism to predict multi-level recommendations.

Tools used: Python3, Tensorflow2, Matlab, DeepFM, DeepFFM, Skipgram encoder, Catboost, Random Forest, Multihead Attention, GraphSAGE, GCN

Mar 2020 – Feb 2021

CIKM Adversarial Challenge on Object Detection (Competition)

Manager, AI Garage, Mastercard

Competed in an AI challenge by Alibaba-Tsinghua on fooling an object detection algorithm using patched images. The competition involves over 1500 teams and our current ranking stands at 80. The competition marks an important step in improving defences against adversarial attacks over image classification and object detection algorithms. Landed in the 10% of the winning solutions.

Tools used: Python3, Pytorch, Yolo4, Fast-RCNN, Dpatch, Adversarial Robustness toolbox (ART)

Jul 2020 – Oct 2020

Proctoring video interviews using computer vision

General Manager, Data Science, Reliance Industries

Built a video analysis platform for recordings collected from python based conference tool Jitsi. The capabilities include face detection, face landmarks, face orientation, face recognition and matching, audio extraction, transcript extraction, audio pause detection, topic modelling and other minor features. The video frame level signals act as an input for unsupervised anomaly detection methods to detect out of ordinary behaviour and compared against a rule engine to flag areas of interests for the hiring manager.

Tools used: Python3, CV2, open-cv, DBlib, Haar classifiers, U-Net encoder-decoder based anomaly detection, Tensorflow2.

Mar 2019 – Jan 2020

Natural Language querying of databases

General Manager, Data Science, Reliance Industries

Building a python framework which allows natural language querying on small-medium scale databases by using seq2seq neural networks to translate a query into a SQL query. The model is capable of predicting search and condition columns, conditions and aggregations needed in the sql query which is then run on the given database. The result is used with natural language generation to respond to the user as an answer to the query.

Tools used: Python, NLP, Word embeddings, Seq2Sql, Seq2seq with attention, SimpleNLG, Xsql framework, Keras, Scikit-learn.

Dec 2018 – May 2019

JD-CV matching algorithm for candidate shortlisting

General Manager, Data Science, Reliance Industries

Building a CV sourcing and shortlisting platform that allows hiring managers to access a ranked order of profiles matching the requirement. These profiles are enriched using multiple data sources and are parsed to extract education, experience, skillsets, project and personal information from the profile. This is followed by document clustering to obtain relevant domain cluster, and document similarity (ranking) algorithms to match JD document to profiles. A reinforcement learning layer is being added to capture and personalise hiring manager preferences and behaviours, while ensuring company standards and requirements.

Tools used: Python, NLP, Word embeddings, t-SNE, Doc2vec, PCA, Spacy, fuzzy matching, GMM, document classification using LSTMs, reinforcement learning, Keras.

August 2018 – Ongoing

Distributed Virtual Assistant Development Toolkit

General Manager, Data Science, Reliance Industries

Building a python based tool allowing non-technical users to design, train and deploy closed domain virtual assistants using a GUI. The bots are then integrated into a meta-model that allows intermediate intent switching to an intent on another bot deployed on some other server. Also, allows users to integrate APIs during any part of the conversation (for assisting user by fetching data, validating user inputs against database or completing a transaction on a service such as travel bookings, leave/regularization systems, HR queries etc). Integration with live systems and applications is ongoing.

Tools used: Python, NLP, NLG, RASA framework, entity extraction, Markov chains, LSTM based neural networks, Django, Docker, nginx, Keras, Scikit-learn.

June 2018 – Ongoing

Course Recommendation Engine for Reliance LMS

General Manager, Data Science, Reliance Industries

Productionalized a course recommendation engine for 30,000+ employees which integrates various businesses at user end and various learning partners of Reliance at content end. Utilized employee demographics and organisational data to create a multiple recommendation systems integrated via a multi-arm bandit based architecture to personalise each user’s experience. Have used matrix decompositions, fuzzy logic, collaborative filtering, association models, context clustering and reinforcement learning.

Tools used: Python, Text analysis, NLP, Collaborative filtering, SVD, Search strategies, Multi-arm, Bandits, Reinforcement Learning, Scikit-learn.

Oct 2017 – May 2018

Employee Car-pooling service using Geo-Spatial clustering

General Manager, Data Science, Reliance Industries

Designed a unsupervised model over employee address database to create geospatial clusters based on density of residence across the map and used polygon matching with dynamic programming to calculate delta in driver & passenger routes. This was followed a route optimization algorithm using network analysis of the graph of clusters and then a match making model for route matching which estimated polygon similarity between optimal (estimated) routes of the passengers and car driver. This model is currently being housed into a B2B employee services module called Share-a-Ride.

Tools used: Python, Google Enterprise API, Hierarchical DBScan, Dynamic Programming, Network centralities and route optimization, Polygonal similarity techniques.

Sep 2017 – Oct 2018

Expression & Empathy Detection

General Manager, Data Science, Reliance Industries

A two part module which involves expression detection using image processing applied over a live camera feed (interview) utilizing OpenCV and NLP based empathy detection algorithm over text data (emails, skype, communities) utilizing SVM over a 60 million tweet dataset categorized by types of empathy and emotions. This module is being housed in various upcoming systems which improve the quality of hire and employee services as part of the PMS 2.0 project.

Tools used: Python, NLP, Word2Vec, Naive bayes, SVM, OpenCV, Tensorflow.

Mar 2018 – June 2018

Viewer interest prediction on Rental Listings on Renthop (Kaggle)

Kaggler, ranked top 7% globally

Objective was to predict how popular an apartment rental listing is based on the listing content like text description, photos, number of bedrooms, price, etc. The data comes from renthop.com, an apartment listing website. I created an ensemble model using xgboost wrapped in a cross validator, stacked over KazAnova's StackNet with random forest and SVM. Features used included basic features, simple calculated features, constructed features over manager_id using tf-idf and clustered longitude-latitude positions, and finally magic feature. Model iterations were done with parameter tuning followed by averaging and geometric mean of predictions. The accuracy measure was log loss and my best model got me top 7% global ranking on Kaggle.

Tools used: Python, NLTK, SVM, K-Means, Random Forest, XGBoost with Cross Validation, StackNet by KazAnova.

Mar 2017 – May 2017

Recruitment decision making tool called Careerletics Enterprise

Head of DS Products, iPredictt

Careerletics Enterprise is an intelligent platform for recruiters which assists them with pre-hire decision making and reduces the hiring lifecycle from a few weeks to a few minutes. It assists a recruiter by parsing resume data, quantifying candidate metrics, calculating relevance against a job description and ranking candidates by a metric called employability score. First, an exhaustive database linking industries & functions to skill sets, companies, job positions and colleges was created by using natural language processing over a database of half a million resumes documents (without any specific template). This database was then utilized to identify qualification, skills and experience information from user resumes via a parser. This was coupled with a chatbot to collect missing candidate information directly. Next, a stacked model for filtering, relevance matching, and competitive ranking was developed. Candidates which were finally selected by the recruiter are captured and used as a feedback the self-learning algorithm to adjust parameter weights. The platform and algorithm are patented under iPredictt Data Science Labs.

Tools used: Python, NLTK, Expectation maximization, Gaussian mixture model, Gradient Boosting, PCA.

Jul 2016 – Aug 2017

Analysis of Political Affiliation and Sentiment over Social Media

Lead Data Scientist, iPredictt

The objective was to understand the sentiment of a popular Indian News Network with respect to different political parties over Twitter and Facebook and compare the sentiments of other competitor news networks against it. Tweepy & web scraping was used to pull data via Twitter and Facebook, followed by data cleaning, feature generation, and NLP treatment to generate a sentiment report. The sectors of analysis included comparing political party affiliation, quantifying shared sentiment across newsgroups, detecting targeted negative propaganda over social media and forecasting topic-wise sentiment over Twitter.

Tools used: Python, Tweepy, NLTK, Topic Modelling, Sentiment Analysis.

Jan 2016 – May 2016

Optimize Ad Exchange networks for increasing campaign value

Lead Data Scientist, iPredictt

The objective was to create a platform for a 60cr turnover Mobile Ad Exchange startup to optimize ad campaign time and direction which involves selecting the right publisher for the advertising campaign as a factor of time of the day, conversion rates, customer target category and network type. Variable importance calculated via Decision trees to categorize publisher efficiency and thus analyze trends better, while click probability for cookie ids was calculated by building a logistic model. The campaign statistics were visualized using charts and Sankey diagrams over an R-Shiny server.

Tools used: R, R-Shiny, Decision trees, Random Forest, Logistic regression.

Jul 2015 – Dec 2015

Supply Chain network optimization and planning

Senior Decision Scientist, Mu-Sigma

Client was a fortune 50 multinational computer technology giant. The project objective was to analyze backlogs and develop a network flow optimization model for Americas, EMEIA and Asia logistics team to enhance the efficiency of respective supply chains. A model was built on 3 years of backlog data with stage-wise & SKU-wise flow's starting from Manufacturing to Fulfillment Centers/Customers. Missing data were imputed using decision trees followed by Linear programming to minimize the objective function of the number of backlogs in each network. The resulting model was visualized using Tableau and shared with 1,000+ stakeholders and executives from Singapore, Austin, Hong Kong, London, Korea and India offices.

Tools used: SQL, R, Decision Trees, Linear Programming, Tableau.

Nov 2014 – Mar 2015

Theoretical Win prediction for customers of a Casino Giant

Senior Decision Scientist, Mu-Sigma

A Fortune 500 Casino Giant used certain business rules to calculate ADT (Accumulated daily theoretical win) for each of their customers to decide the category of their marketing spend which had an extremely low accuracy (32%). The objective was to build a regression model to predict ADT values for customers based on gambling spends, wins and other visit information. An ensemble model was created based on analysis of variation in the test variable (ADT). A certain segment of the customer population (which was primarily low spend customers) was tackled using generalized linear models while remaining segment (which comprised primarily of high spend customers) was tackled using 11 separate Support Vector Machine classification models. The test variable for these was bucketed into spend categories instead of using a continuous ADT value. The accuracy of this model was much higher than the base model (53%). The exercise was followed by creating a financial modeling simulator using these predictions to generate best and worst case profit/loss scenarios over variable marketing spends.

Tools used: Python, ANOVA, K-means clustering, Support vector machines, Monte-Carlo simulation.

Jun 2014 – Oct 2014

Driver analysis for market cannibalisation

Decision Scientist, Mu-Sigma

The 2nd largest toy manufacturer brand showed quarter on quarter ROI decline of 20% which amplified further during the latest holiday season. Clear understanding was required on what were the prime causes of this decline. A five dimension deterministic model was created to analyze parameters calculated through web analytics. This model was then passed through regression analysis for generating estimates for each parameter as a substitute for driver towards the sales decline. A major realization by the end of the exercise was that the decline was primarily due to cannibalization by a fresh brand they launched themselves but for a higher age category. This allowed them to take major decisions in time to stabilize the curve to around 8% decline in the coming quarter and also affected the launch dates of their upcoming brands.

Tools used: R, Deterministic modeling, Web analytics, Generalized regression models.

Dec 2013 – May 2014

Customer Segmentation and Targeting for retail products

Decision Scientist, Mu-Sigma

Client was the world's biggest home improvement retail company. The objective was to create customer segments based on their behavioral traits, spend patterns and volatility in purchase categories which would allow the client to understand and target better. Customer segmentation based on transaction data was done using RFM segmentation followed by item-based and user-based collaborative filters to create purchase category recommendations for customized targeting. This directly affected client’s top line for specific departments such as gardening and home repair.

Tools used: SQL, Excel, R, RFM Segmentation, Collaborative Filtering.

May 2013 – Nov 2013

Real Time in-store traffic analysis using Brickstream

Decision Scientist, Mu-Sigma

Client was the world's biggest home improvement retail company. They were on a pilot with Brickstream, which is a Video analytics software which uses aisle camera footage to virtually create trip lines and dwell zones. Exhaustive reports were created for the data collected by Bricksteam enabled cameras on a weekly level. Trip line analysis allowed client to predict traffic hours in real time and accordingly align their store associates for coming days/weeks, thereby improving resource management. Dwell analysis enabled the client to understand dwell times of customers at specific aisles positions thereby enabling them to take decisions on shelf space management.

Tools used: Brickstream, SQL, R, Video Processing

Nov 2012 – Apr 2013

Naive Bayes and its different classifiers, How do they work?

Mumbai, 22nd Jan 2019
#DataScience #NaiveBayes #Notebook

In this notebook I try to explore the intuition behind the probabilistic modelling techinque called Naive Bayes and its various classifiers. Understanding the implementation of each of these classifiers is important as they come with their own assumptions over the naive assumption of the algorithm itself. An deeper look at the implementation of these classifiers and the mathematics behind them can shine more light to the intuition behind this widely used algorithm that forms a foundation for a large number of other more complex methods in supervised classification....



Handling missing data (like a boss!)

New Delhi, 28th Jun 2017
#DataScience #MissingData #Notebook

Missing data is the nemesis of every data scientist, specially if they are still new to the field. We are all facinated by new algorithms and don't miss a single chance to apply them on every dataset we can get our hands on. But, alas, missing data becomes a major barrier between that dream, unless it can be handled properly. I learnt my lesson long back and while I know people who are wizards at data handling (one function to rule them all), I fortunately/unfortunately prefer simple short steps to handle missing data so as to verify each step. With what little python programming I can muster, I present to you the standard and advance Missing Data handling techniques I have learnt over the years....



Visualizations for EDA in Python

New Delhi, 15th Apr 2017
#DataScience #Visualizations #Seaborn #Notebook

Distributions, correlations and data variability are some of the most important tasks before feature engineering begins. One may consider these as the first major project stage a Data Scientist needs to be able to perform. A thorough exploration has not only helped me understand the data at hand, but also form basic notions about the ballparks and behavior of given features. Ballparks help a data scientist avoid logical errors during feature engineering. In this notebook I detail the most effective way I found to generate charts in a standardized way for understanding data during the exploration phase of the project. These primarily utilize the grid method to create a canvas for then plotting charts by categories. In a sense, its similar to data grouping in a visual way....



Group by & Aggregate using Pandas

New Delhi, 25th Mar 2017
#DataScience #DataGrouping #Notebook

Data Grouping is probably the most used concept in the field of data analysis. Almost every scripting language builds its foundation over grouping data by categories of a multi-dimensional variable. A data scientist uses this for summarizing data for analysis as well as changing the level at which data can be useful for a model. Example, transaction level data needs to be summarized at customer level data before predicting their spend. Usecases like these are where languages like SQL are very useful with their group by clauses. However python isn't too far behind. Pandas provides a large variety of methods which do so much more than the standard SQL grouping. This combined with the aggregate methods gives a Data Scientist a strong grasp over data handling....