The goal of this project is to use data mining techniques to predict solar flares, which are powerful eruptions of energy from the sun that can have significant impacts on Earth and human technology. The project will use the UCI Solar Flare Dataset, which contains a variety of information about solar flares that have occurred in the past, including the time, location, and intensity of the flare, as well as various characteristics of the sun's surface.
The data mining techniques that will be used in this project may include classification algorithms (such as logistic regression or decision trees) to predict whether a given set of solar activity will result in a flare. Additionally, clustering and association rule mining techniques may be used to identify patterns and relationships within the data that can help to improve the accuracy of the flare predictions.
The ultimate goal of this project is to develop a model that can accurately predict solar flares, which can be used to help protect satellites and other technology from damage, as well as to aid in the study of the sun's behavior.
Datasets are downloaded from archive.ics.uci.edu website. Flare.data2 dataset has 13 columns and 1066 rows without the header. The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. Each instance represents captured features for 1 active region on the sun. The data are divided into two sections. The second section (flare.data2) has had much more error correction applied to the it, and has consequently been treated as more reliable. This informations had refered from archive.ics.uci.edu website.
- Code for class (modified Zurich class) (A,B,C,D,E,F,H)
- Code for largest spot size (X,R,S,A,H,K)
- Code for spot distribution (X,O,I,C)
- Activity (1 = reduced, 2 = unchanged)
- Evolution (1 = decay, 2 = no growth, 3 = growth)
- Previous 24 hour flare activity code (1 = nothing as big as an M1, 2 = one M1, 3 = more activity than one M1)
- Historically-complex (1 = Yes, 2 = No)
- Did region become historically complex on this pass across the sun's disk (1 = yes, 2 = no)
- Area (1 = small, 2 = large)
- Area of the largest spot (1 = <=5, 2 = >5)
From all these predictors three classes of flares are predicted, which are represented in the last three columns.
- C-class flares production by this region in the following 24 hours (common flares)
- M-class flares production by this region in the following 24 hours (moderate flares)
- X-class flares production by this region in the following 24 hours (severe flares)
- Download the UCI Solar Flare dataset from the link provided.
- Run the data cleaning process to get the dataset ready for data mining.
- Visualize the data to find relationships
- Apply different data mining techniques to the dataset to develop a model that can predict solar flares.
- Interpret the results and conclusion.
SolarFlare.csv
: The original dataset from UCI.SolarFlare_Clean.csv
: The cleaned version of the dataset.SolarFlare_Analysis.ipynb
: Jupyter notebook containing the data cleaning, data mining and model evaluation process.
The results of this project will be presented in the form of visualizations and performance metrics.
The conclusion will summarize the findings of the project and discuss the potential implications of the results.