Dataset of personal medical data of 1,338 patients with a variety of variables that have an affect on the cost of medical services provided. The purpose of the analysis is to analyze the effects of variables on the cost of medical care, e.g. age, gender, region, etc.
There are a variety of variables that will impact the outcome of this analysis:
- Age: age of primary beneficiary
- Teen:13-19
- Adult: 20-39
- Middle Age Adult : 40-59
- Senior Adult :60+
- Sex: insurance contractor gender,
- Male
- Female
- Body Mass Index (BMI): Weight to height body mass ratio
- Underweight: Below 18.5
- Normal: 18.5 - 24.9
- Overweight: 25.0 – 29.9
- Obese: Greater Than or equal to 30.0
- Children: Number of dependents
- Smoker: Smoking
- Region: Region of domicile of the patient in the United States.
- Northeast
- Southeast
- Southwest
- Northwest
- Charges: Individual medical costs billed by health insurance for services provided
The objective of the analysis is to analyze the effects of variables on the cost of medical care. The analysis seeks to answer the following questions:
- Which region has the lowest cost of medical care?
- Which gender has the highest typical medical charges?
- Is there correlation between the number of dependents a patient has and medical charges?
- Is there correlation between age and medical charges?
- Do individuals with higher BMI have higher medical charges?
- What is the relationship between smoking and medical charges?
- What region has the lowest cost of medical care?
The overall average of charges in the dataset is $13,270.42. However, there is an uneven distribution of charges in the dataset. As indicated in the histogram above there is a right skew that is affecting the average. The median charge, $9,377.90, is a better indication of the amount of a typical medical charge.
The same is true for the distribution of charges in the four regions in the dataset. As aforementioned the median is a better indication of the typical medical charge within each region. The Southwest region has the lowest median charge, $8,798.59, of all the regions in the dataset. This can be attributed to the Southwest region having the least amount of smokers, 58, overall, the least amount of female smokers, 21, in the dataset, and the second smallest amount of male smokers, 37, in the dataset. This is significant due to smokers incurring charges that are four times greater than non-smokers on average!
- Which gender has the highest highest typical charges?
The distribution of the charges of both male and females have a right skew in which outliers are impacting the calculation of the average charges.
The median charges of each sex is a better indicator of the typical charges. Males have a median charge of $9,369.62 and females have a median charge of $9,412.96 as indicated in the bar ch art above.
- Is there correlation between the number of dependents a patient has and medical charges?
As indicated in the scatter plot above there is no linear correlation between number of dependents and charges. The correlation coefficient of comparing the number of children a patient has to charges incurred is .07, indicating that there is no correlation. This is also evident in the line of best fit seen on the scatter plot above.
- Is there correlation between age and medical charges?
There is a low correlation between age and charges incurred by patients. This is indicated by the correlation coefficient of 0.30 as well as the line of best fit in the scatter plot above. This is also indicated in the average charges of each age category as well. Teens have an average medical charge of $8,407.35, Adults have an average charge of $10,603.65, Middle Aged Adults have an average medical charge of $15,431.97, and Seniors have an average medical charge of $21,248.02. There is a clear pattern of increased cost as the age of the patient increases.
- Do individuals with higher BMI have higher medical costs?
Yes, individuals with higher BMIs have higher medical costs. The higher a patient’s BMI the higher the average charges as indicated in the line chart above.
- What is the relationship between smoking and medical charges?
Smoking adversely impacts medical charges. As indicated in the bar chart above smokers have an average charge of $32,050.23 and non-smokers have an average medical charge of $8,434.27. Smokers have an average charge that is nearly four times greater than the amount of non-smokers.
As seen in the histogram above, the distribution of charges of smokers is bimodal. The two modes are indicative of the subgroups related to BMI. There are two subgroups, patients that have a BMI less than 30 and patients that have a BMI greater than 30. Recall that a BMI over 30 is categorized as obese and a BMI less than 30 is categorized as not obese.
In the histogram above there is a slight right skew. So, the median is a better indication of the typical charge of a smoker that has a BMI of less than 30. The median charge is $20,167.34.
The histogram above shows an even distribution of charges with very few outliers. The average charge for smokers with a BMI greater than 30 is $41,557.99.