This is an end-to-end simple data analytics solution using AWS services. From uploading the csv file to S3 bucket to visualizing results in Quicksight.The dataset used in this project is the data science job salaries from kaggle 'kaggle.com/datasets/ruchi798/data-science-job-salaries'.
The main objective of this project is to identify the top 5 popular data science salary in US based on job titlte, experience level, employment type, and remote job ratio by job title.
The dataset contains variables which include work_year, experience_level, enployement_type, job_type, salary, salary_currency, salary_in_usd, employee_residence, remote_ratio, company_location, company_size.The data analysis process will be done as follows:
Step 1- First, we create an IAM user to grant access permission to s3. In the search bar we type IAM > users > add users
1a- Set user details and access type then next permissions
1b- We proceed by choosing attach existing policies directly since we already have a policy set up.
1c-Review all the details and create user
1d- User successfully created
Step 2- S3 buckets are created. The data-science-salaries-bucket will hold the raw file, while the data-science-salaries-bucket-result will hold the query results from Athena.
2a- The csv file is uploaded to the data-science-salaries-bucket
Step 3- Moving on to Athena. Before we can create our table we need to choose the bucket where the output query will be sent. In the Athena query editor we select settings > manage > browse s3 to choose the appropriate bucket.
3a- In Athena data catalogue, we select >create table > AWS glue crawler > add crawler to retrieve data information schema automatically.
3b- Crawler succesfully created
Step 4- Data query is performed in Athena, then results are loaded to data-science-bucket-result
Step 5- Now quicksight needs to access S3 to build report. But before quicksight can read the s3 bucket, we have to make sure it has permission to do so. We navigate the account section by clicking on top right > manage quicksight > security & permissions > manage > select s3 bucket.
5a- Next we set up a new data source to access S3 from quicksight new analysis > new dataset > S3 > upload Json manifest file > importe to spice
5b- After the data is imported to spice, we create a report in Quicksight. Our interest was to identify top 5 popular data science salary in US based on job titlte, experience level, employment type, and remote job ratio by job title.
Aws provides a suite of powerful tools to analyze data effectively. By using Athena, Glue, IAM, and Quicksight, businesses can gained valuable insights into their data, make informed decisions and optimize processes.