Skip to content

Performing exploratory analysis through visualizations to showcase relationships between variables and present results to stakeholders.

Notifications You must be signed in to change notification settings

willenny/PyBer_Analysis

Repository files navigation

PyBer Analysis

Overview

After being hired by PyBer, a Python based ride sharing app company, I was asked to perform an exploratory analysis on data in two csv files. With the help of Omar and Sasha, I created several types of visualizations to tell a compelling story about the data, that I will present to the CEO of PyBer, V. Isualize. While working on the presentation, I wrote Python scripts using Panda's libraries, the Jupyter notebook, and Matplotlib to create a summary DataFramce and a variety of charts that showcase the relationship between the type of city and the number of drivers and riders, as well as the percentage of total fares, riders, and drivers by type of city.

Purpose

The purpose of this analysis was to highlight the differences between different city types and how these types affect Total Rides, Total Drivers, Total Fares, Average Fare per Ride, and Average Fare per Driver. By breaking down each ride into the three city types, the tendencies become more clear which allows better decisions to be made. The analysis and visualizations that are presented to V. Isualize will help PyBer improve access to ride-sharing services and determine affordability for underserved neighborhoods.

Resources

  • Data Source: city_data.csv, ride_data.csv
  • Software: Anaconda Prompt (Python Data), Jupyter Notebooks
  • Dependencies: Pandas, Matplotlib

Results

The following DataFrame gives a summary of the three city types: Urban, Suburban, and Rural:

PyBer_Summary_DataFrame

A quick look at the table tells you that urban cities tend to have more total rides, total drivers, and total fares; whereas rural cities have greater averages fares per ride and per driver. This data is powerful but using visual representations give the ability to see a much more detailed breakdown of the data.

Below are three side-by-side boxplots that represent the central tendencies of Ride Count, Ride Fare, and Ride Driver data. These boxplots allow for a more complete picture compared the table and allow the viewer to make clear comparisons between the city types. We now see what was previously described in that urban cities tend to have more riders, drivers, and fares; but the boxplot includes the minimum, 1st quartile, median, 3rd quartile, maximum, and outliers of each city type, giving the view a more well rounded undertanding of the data.

Figure 1

Fig2

Figure 2

Fig3

Figure 3

Fig4

The use of pie charts are an effective way to show the portion a piece of data represents compared to the whole. In the summary DataFrame, finding the largest total rides, total fares, and total drivers was easy but the pie chart quickly shows how much each of those represent compared to all rides, all fares, and all drivers. Below are three pie charts that represent the porportionality of each city type to all cities, with respect to fares, rides, and drivers.

Figure 4

Fig5

Figure 5

Fig6

Figure 6

Fig7

Boxplots and pie charts are strong graphs that show central tendencies and proportion to the whole, respectively, but using a sophisticated graph will be a more efficient way to incorporate multiple variables into a visually appearing and purposeful chart.

Figure 7

Fig1

In the bubble chart in the figure above, we now can see the tendencies that were described above as well as each specific city and the overall trend that occurs between the different city types and fares and drivers. Each bubble represents a city within the data. Each city is plotted based on Average Fare as the y-axis, Number of Rides as the x-axis, as well as using the size of the bubble to represent the Number of Drivers. It is now much more clear that rural cities have high average fares and lower number of rides, compared to urban cities which have lower average fares and higher number of rides. Also, due to the graph including the different sizes of bubbles, rural cities tend to have less drivers (small bubbles) and urban cities tend to have more drivers (larger bubbles).

Lastly, it can be beneficial to observe and compare data across a specific time frame. By using a multi-line graph, comparing the total fares for each city type is quick and easy. This will allow us to determine if there are certain trends during specific times of the year for each city type or if they preform similarly regardless of their city's density. Using the figure below, we observe that on a weekly basis, urban cities have the most total fares from beginning of January to the end of April. Each city type had a steady increase with a first peak in the third week of February, so it may be beneficial to look into why all city types preformed well during that specific week.

Figure 8

PyBer_fare_summary

Summary

The data and it's visualizations have allowed for many observations to be made. Based on these observations I recommend the following:

  1. Create incentives for urban drivers to complete rides in local suburban cities due to suburban cities having a greater average fare per ride.
  2. Due to rural cities having the largest average fare per ride, but those 78 drivers only made 125 rides in four months, create a marketing campaign to show the rural drivers the potential income they could expect if they made more rides.
  3. Instead of only focusing on January through April, it could be beneficial to expand the data being observed to the entire year. That way trends throughout the other months of the year can be found and comparisons can be made from year to year.

About

Performing exploratory analysis through visualizations to showcase relationships between variables and present results to stakeholders.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published