This study area includes 4 Wilderness Areas located in the Roosevelt National Forest of Northern Colorado. These area represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological process rather than forest management practices.
Each observation is 30m x 30m forest cover type determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from the data originally obtained from US Geological Survey (USGS) and USFS data.
We have been given a total of 54 attributes/features, these attributes contain Binary and Quantative attributes, and we need to predict which Forest Cover-Type is it from the given features.
Given elevation, hydrologic, soil, and sunlight data can we predict what type of tree would be in a small patch of forest? This project attempts to predict the predominant type of tree in sections of wooded area. Understanding forest composition is a valuable aspect of managing the health and vitality of our wilderness areas. Classifying cover type can help further research regarding forest fire susceptibility and de/reforestation concerns. Forest cover type data is often collected by hand or computed using remote sensing techniques, e.g. satellite imagery. Such processes are both time and resource intensive. In this project, we aim to predict forest cover type using cartographic data and a variety of classification algorithms.
This project contains 3 files and 1 folder:
report.ipynb
: This is the main file where I have performed my work on the project.covtype.data
: The forest cover-type dataset. I have loaded this data in the notebook..proposal.pdf
: My detailed explaination of the project.export/
: Folder containing HTML and PDF version file of notebook.plots/
: Contains images of all the plots that are displayed inreport.ipynb
file.
This part has been reviewed throughly in proposal.pdf
file. Visit
This project requires Python 3.6 and the following Python libraries installed:
- Python 3.6.6 (Language Used for the project)
- NumPy (For Scientific Computing)
- Pandas (For Data Analysis)
- matplotlib (For Visualization)
- seaborn (For Visualization)
- scikit-learn (ML Library for Python)
You will also need to have software installed to run and execute a Jupyter Notebook
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.
In a terminal or command window, navigate to the top-level project directory Forest_Cover-Type
(that contains this README) and run one of the following commands:
ipython notebook report.ipynb
or
jupyter notebook report.ipynb
or if you have 'Jupyter Lab' installed
jupyter lab
This will open the Jupyter/iPython Notebook software and project file in your browser.
- How to research and investigate a real-world problem of interest.
- How to accurately apply specific machine learning algorithms and techniques.
- How to properly analyze and visualize your data and results for validity.
- How to document and write a report of your work.
My Proposal
was reviewed by an Udacity reviewer against the Capstone Proposal rubric. All criteria found in the rubric must be meeting specifications for me to pass.
My Project
was reviewed by an Udacity reviewer against the Capstone Project Rubric. All criteria found in the rubric must be meeting specifications for me to pass.
My Proposal Review by an Udacity Reviewer
My Project Review by an Udacity Reviewer