The goal is to prepare tidy data that can be used for later analysis.
- data.table v1.9.4
- dplyr v0.3.0.2
- Download the source data set; Human Activity Recognition Using Smartphones Dataset, Version 1.0
- Unzip the package contents into this directory, resulting in a subdirectory called
UCI HAR Dataset
The source data set (Human Activity Recognition Using Smartphones Dataset) contains accelerometer and gyroscope sensor signals collected from experiments where 30 volunteers performed six different activities while wearing a smartphone on the waist.
Full description of the source data is provided with the source data set, including details on feature selection and calculation in features_info.txt
.
- Reads and merges the training and the test set to create one data set with descriptive column names and activity labels applied,
- Extracts only the measurements on the mean and standard deviation for each measurement,
- Creates another data set with the average of each variable for each activity and each subject; AND
- Writes the resultant data set into a txt file,
averages.txt
The first row of averages.txt
is a header with column names. Each subsequent row contains one record of space-separated values.
Each record in the resultant data set represents a summary (average values) of measurements from a specific activity performed by a specific test subject. For example, the average measurements from test subject 1 walking is one record (row) in the file.
Each record contains:
An activity label string (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING).
A numeric subject id (1-30).
Average standard deviation (std) and mean (mean) for the following 3-axial (X, Y, Z) gyroscope (Gyro) and accelerometer (Acc) features in the frequency (f) and time (t) domains:
- fBodyAcc
- fBodyAccJerk
- fBodyGyro
- tBodyAcc
- tBodyAccJerk
- tBodyGyro
- tBodyGyroJerk
- tGravityAcc
Average standard deviation (std) and mean (mean) for the following gyroscope (Gyro) and accelerometer (Acc) magnitude (Mag) features in the frequency (f) and time (t) domains:
- fBodyAccMag
- fBodyBodyAccJerkMag
- fBodyBodyGyroJerkMag
- fBodyBodyGyroMag
- tBodyAccJerkMag
- tBodyAccMag
- tBodyGyroJerkMag
- tBodyGyroMag
- tGravityAccMag
features_info.txt
in the source data set describes these features in detail.
As per the source data set,
- accelerometer (Acc) feature values are in standard gravity units 'g'
- gyroscope (Gyro) feature values are in radians / second
- values are normalized and bounded within [-1,1]
- The key (group-by) columns are activity and subject_id
- Pattern for 3-axial feature column names:
<feature>-<mean() or std()>-<X,Y,Z>
- Pattern for other feature column names:
<feature>-<mean() or std()>
The columns containing average data for the 3-axial fBodyAcc feature are:
- fBodyAcc-mean()-X
- fBodyAcc-mean()-Y
- fBodyAcc-mean()-Z
- fBodyAcc-std()-Z
- fBodyAcc-std()-Z
- fBodyAcc-std()-Z
The columns containing average data for the fBodyAccMag feature are:
- fBodyAccMag-mean()
- fBodyAccMag-std()