GitHub - DataScienceWorks/AV-McKinseyHackathon-InsuranceRenewal-2018-July

McKinsey Hackathon - Insurance Renewal

[Note: The problem is solved only partially. That is part A is solved. Didn't know how to proceed with Part B of the problem statement.]

About McKinsey Analytics Online Hackathon

During our online Hackathon, you will get a glimpse at the sort of problems and challenges that our McKinsey data scientists solve on a daily basis. The winner will receive a NIPS conference ticket and 5,000 USD for personal expenses. The best participants may also be invited for interviews with McKinsey Analytics.

Problem Statement

Your client is an Insurance company and they need your help in building a model to predict the propensity to pay renewal premium and build an incentive plan for its agents to maximise the net revenue (i.e. renewals - incentives given to collect the renewals) collected from the policies post their issuance.

You have information about past transactions from the policy holders along with their demographics. The client has provided aggregated historical transactional data like number of premiums delayed by 3/ 6/ 12 months across all the products, number of premiums paid, customer sourcing channel and customer demographics like age, monthly income and area type.

In addition to the information above, the client has provided the following relationships:

Expected effort in hours put in by an agent for incentives provided; and
Expected increase in chances of renewal, given the effort from the agent.

Given the information, the client wants you to predict the propensity of renewal collection and create an incentive plan for agents (at policy level) to maximise the net revenues from these policies.

EVALUATION CRITERIA

Your solutions will be evaluated on 2 criteria:

The base probability of receiving a premium on a policy without considering any incentive
The monthly incentives you will provide on each policy to maximize the net revenue

Part A:

The probabilities predicted by the participants would be evaluated using AUC ROC score.

Part B:

The net revenue across all policies will be calculated in the following manner:

$p_{benchmark}$ is the renewal probability predicted using a benchmark model by the insurance company
∆p (% Improvement in renewal probability*$p_{benchmark}$) is the improvement in renewal probability calculated from the agent efforts in hours
‘Premium on policy’ is the premium paid by the policy holder for the policy in consideration
‘Incentive on policy’ is the incentive given to the agent for increasing the chance of renewal (estimated by the participant) for each policy

ProTip: p_benchmark is decided by the company. You can consider it as the best solution for the problem. More accurate your model, closer you are to the p_benchmark.

The following curve provide the relationship between extra effort in hours invested by the agent with Incentive to the agent and % improvement in renewal probability vs agent effort in hours.

Relationship b/w Extra efforts in hours invested by an agent and Incentive to agent. After a point more incentives does not convert to extra efforts.

Equation for the effort-incentives curve: Y = 10*(1-exp(-X/400))
Relationship between % improvement in renewal probability vs Agent effort in hours. The renewal probability cannot be improved beyond a certain level even with more efforts.

Equation for the % improvement in renewal prob vs effort curve: Y = 20*(1-exp(-X/5))

Note: The client has used sophisticated psychological research to arrive at these relationships and you can assume them to be true.

Leaderboard Ranking

Overall Ranking at the leaderboard would be done using the following equation:

Combined Score = w1AUC-ROC value + w2(net revenue collected from all policies)*lambda

Where -

w1 = 0.7

w2 = 0.3

lambda is a normalizing factor

Public and Private Split:

Public leaderboard is based on 40% of the policies, while private leaderboard will be evaluated on remaining 60% of policies in the test dataset.

Data

train.csv

It contains training data for customers along with renewal premium status (Renewed or Not?)

Variable	Definition
id	Unique ID of the policy
perc_premium_paid_by_cash_credit	Percentage of premium amount paid by cash or credit card
age_in_days	Age in days of policy holder
Income	Monthly Income of policy holder
Count_3-6_months_late	No of premiums late by 3 to 6 months
Count_6-12_months_late	No of premiums late by 6 to 12 months
Count_more_than_12_months_late	No of premiums late by more than 12 months
application_underwriting_score	Underwriting Score of the applicant at the time of application (No applications under the score of 90 are insured)
no_of_premiums_paid	Total premiums paid on time till now
sourcing_channel	Sourcing channel for application
residence_area_type	Area type of Residence (Urban/Rural)
premium	Monthly premium amount
renewal	Policy Renewed? (0 - not renewed, 1 - renewed

test.csv

Additionally test file contains premium which is required for the optimizing the incentives for each policy in the test set.

Variable	Definition
id	Unique ID of the policy
perc_premium_paid_by_cash_credit	Percentage of premium amount paid by cash or credit card
age_in_days	Age in days of policy holder
Income	Monthly Income of policy holder
Count_3-6_months_late	No of premiums late by 3 to 6 months
Count_6-12_months_late	No of premiums late by 6 to 12 months
Count_more_than_12_months_late	No of premiums late by more than 12 months
application_underwriting_score	Underwriting Score of the applicant at the time of application (No applications under the score of 90 are insured)
no_of_premiums_paid	Total premiums paid on time till now
sourcing_channel	Sourcing channel for application
residence_area_type	Area type of Residence (Urban/Rural)
premium	Monthly premium amount

sample_submission.csv

Please submit as per the given sample submission format only

Variable	Definition
id	Unique ID for the policy
renewal	Predicted Renewal Probability
incentives	Incentives for agent on policy

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
images		images
.gitignore		.gitignore
1-data-wrangling.ipynb		1-data-wrangling.ipynb
2-eda.ipynb		2-eda.ipynb
3-modeling-logistic-regression.ipynb		3-modeling-logistic-regression.ipynb
McKinsey Analytics Online Hackathon.pdf		McKinsey Analytics Online Hackathon.pdf
ReadMe.md		ReadMe.md
SolutionReadMe.md		SolutionReadMe.md
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

McKinsey Hackathon - Insurance Renewal

About McKinsey Analytics Online Hackathon

Problem Statement

EVALUATION CRITERIA

Part A:

Part B:

Leaderboard Ranking

Public and Private Split:

Data

train.csv

test.csv

sample_submission.csv

About

Releases

Packages

Languages

DataScienceWorks/AV-McKinseyHackathon-InsuranceRenewal-2018-July

Folders and files

Latest commit

History

Repository files navigation

McKinsey Hackathon - Insurance Renewal

About McKinsey Analytics Online Hackathon

Problem Statement

EVALUATION CRITERIA

Part A:

Part B:

Leaderboard Ranking

Public and Private Split:

Data

train.csv

test.csv

sample_submission.csv

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages