Hello, I am Anca! I am currently a graduate student pursuing a Master's Degree in Data Analytics, concentrating on statistical modeling. I have always been passionate about statistics and data, and how certain aspects influence the world, the business and ultimately our actions. In my journey as a data analyst I hope to bring positive changes that can help organizations and society develop.
During my studies, I have developed skills for identifying patterns and trends in complex datasets, using data mining and regression techniques. I enjoy having walks early in the morning, creating clothes and tending to my orchid collection in the spare time. I also like to expand my knowledge and skills, and solving challenges give me great satisfaction.
This repository displays past and current projects I am working on, showcasing my skills and progress in the Data Analytics field.
-
Python
-
R
-
SQL
This project attempts to forecast the vehicle risk rating for an insurance company. Different data mining applications will be used and compared to see which one is best fit for this dataset. Data validation, cleaning, and exploratory analysis was performed. Moreover, the Jupyter Notebook will be using functions to facilitate coding and keep the notebook clean. Finally, three different classification methods were used: SVM with different kernels, Decision Tree, and Random Forest.
Code/Report: Forecast the Car Insurance Risk Rating
Skills: classification, decision tree, support vector machines, random forest, hyperparameter tuning, PCA, data validation, data visualization, function creation
Results: The SVM with RBF kernel achieved the best accuracy of 67.68%, demonstrating that non-linear relationships between features and risk ratings are important. The Random Forest model, which aggregates the predictions of multiple decision trees, achieved a close accuracy of 67.78%. Despite this, further hyperparameter tuning and cross-validation could still enhance its performance. Moreover, based on the analysis, the categorical features have shown little importance in the feature importance plots and haven't significantly contributed to improving the accuracy of the models, it could be worth considering their removal to simplify the model, and potentially improve generalization by reducing noise.
In order to find which mushrooms are safe to eat, the Decision Tree classification method is used. The dataset was first cleaned, exploratory data analysis was performed, the tree was built on the training dataset, optimization was applied, and finally the validation of the classification model results was achieved on the test set.
Report: Decision Tree - Mushrooms Selection
Code: Decision Tree - Mushrooms Selection Code
Skills: data cleaning, data visualization, classification, decision tree
Results: The most important features to look for when mushroom foraging are odor, spore print color, gill size, population, and habitat. The model is exceptional, but achieving perfect classification accuracy and precision is not common in practice, especially in real-world datasets with inherent complexities and noise.
Two dashboards were created to inform the targeted audience (the mayor of Boston and all city council members) of the shootings occurring in Boston. All the elements were carefully placed to avoid cluttering and focus the attention where it is needed. One dashboard answers questions related to where the shootings occur and a second dashboard shows when the shootings are more predominant.
Dashboards: Where and What Type of Offense, When do the shootings occur?
Skills: Tableau, data visualization, creating map charts, dashboards, and plots
Results: It is noticed that most shootings happen in Roxbury, Mattapan and Dorchester, and a few spots are in Jamaica Plain and South Boston. The safest district from this perspective is Charlestown. Most of the shootings happened when the officer went to investigate a property, so more caution is desired. A significant spike where 876 shootings occurred in 2021 is seen, marking a substantial increase from 2020. This rise is particularly surprising given that the onset of the Covid-19 pandemic in 2020 was expected to keep people indoors. Lastly, most shootings happen on the weekends, between 9 PM and 3 AM, as expected, and before the winter holidays (some causes can include holiday-related stress and the transition from warm to cold weather).
This dashboard presents different types of plots for the Seattle AirBnB rentals in 2016. Firstly three tables were joined using the ID of the listings, then the data was filtered to include only the relevant information for the project. The dashboard contains two bar charts, a map, a table, and a line plot.
Dashboard: AirBnB Occupancy in Seattle
Skills: Tableau, joining multiple tables, data visualization, creating map charts, dashboards, and plots
Results: Renting an apartment in certain zip codes can be more expensive on average because of the proximity to certain sightseeings. Moreover, the revenue is slowly increasing from February, with a peak near the winter holidays as expected. This information is helpful for any investor that wants to rent out an apartment and have the largest return on investment. It also makes sense to buy a larger apartement because the average price increases with size, and the competition is more scarcer.
The Store Database project presents SQL querying skills using three data tables. Data cleaning, data validation, and exploratory data analysis will be performed using diverse functions such as joining tables, grouping by certain criteria, and performing calculations.
Report: The Store Database: Report
Code: The Store Database: Code
Skills: joining, grouping, filtering data, data validation, SQL, indexing, databse schema
Results: Based on the analysis, Ben Franklin store achieved the highest revenue across all years, rice paper is the most sold product in terms of quantity, and Sultanas brings the highest revenue across all stores. In terms of profitability, corn brooms and Sultanas stand out with the highest profit margins, whereas hazelnut cream coffee yields the lowest margins. The sales trends over the years show a noticeable decline in December each year. Also, the best year in terms of revenue was 2018, toping $8.4 million dollars. The highest transaction ever recorded was in 2019, where a customer bought 411 items in one sale.
Northeastern University, Boston: Master of Professional Studies in Analytics, Statistical Modeling Concentration, 2023 - 2025
Bucharest University of Economic Studies, Bucharest, Romania: Bachelor's Degree in Business Administration, Tourism Concentratio, 2013 - 2016
Google Data Analytics Specialization Coursera, 2022