-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
100 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
input: | ||
concat: | ||
- dates: | ||
start: 2020-12-30 00:00:00 | ||
end: 2021-01-01 12:00:00 | ||
frequency: 12h | ||
|
||
source1: | ||
- args | ||
|
||
- dates: | ||
start: 2021-01-02 00:00:00 | ||
end: 2021-01-03 12:00:00 | ||
frequency: 12h | ||
|
||
source2: | ||
- args |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
######################### | ||
Handling missing values | ||
######################### | ||
|
||
.. literalinclude:: ../../tests/create/nan.yaml | ||
:language: yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
.. _dataset-operations: | ||
|
||
############ | ||
Operations | ||
############ | ||
|
||
****** | ||
join | ||
****** | ||
|
||
The join is the process of combining several sources data. Each | ||
source is expected to provide different variables at the same dates. | ||
|
||
.. code-block:: yaml | ||
input: | ||
join: | ||
- source1 | ||
- source2 | ||
- ... | ||
******** | ||
concat | ||
******** | ||
|
||
The concatenation is the process of combining different sets of | ||
operation that handle different dates. This is typically used to | ||
build a dataset that spans several years, when the several sources | ||
are involved, each providing a different period. | ||
|
||
.. literalinclude:: concat.yaml | ||
:language: yaml | ||
|
||
|
||
****** | ||
pipe | ||
****** | ||
|
||
The pipe is the process of transforming fields using filters. The | ||
first step of a pipe is typically a source, a join or another pipe. | ||
The following steps are filters. | ||
|
||
|
||
.. code-block:: yaml | ||
input: | ||
pipe: | ||
- source | ||
- filter1 | ||
- filter2 | ||
- ... | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,33 @@ | ||
.. _gathering_statistics: | ||
|
||
Gathering statistics | ||
==================== | ||
###################### | ||
Gathering statistics | ||
###################### | ||
|
||
*Anemoi* will collect statistics about each variables in the dataset as it is created. | ||
These statistics are intended to be used to normalise the data during training. | ||
*Anemoi* will collect statistics about each variables in the dataset as | ||
it is created. These statistics are intended to be used to normalise the | ||
data during training. | ||
|
||
By defaults, the statistics are not computed on the whole dataset, but on a subset of | ||
dates. The subset is defined using the following algorythm: | ||
By defaults, the statistics are not computed on the whole dataset, but | ||
on a subset of dates. The subset is defined using the following | ||
algorithm: | ||
|
||
- If the dataset covers more than 20 years, the last 3 years are excluded. | ||
- If the dataset covers more than 10 years, the last 2 years are excluded. | ||
- If the dataset covers more than 5 years, the last year is excluded. | ||
- Otherwise, 80% of the dataset is used. | ||
- If the dataset covers more than 20 years, the last 3 years are | ||
excluded. | ||
- If the dataset covers more than 10 years, the last 2 years are | ||
excluded. | ||
- If the dataset covers more than 5 years, the last year is | ||
excluded. | ||
- Otherwise, 80% of the dataset is used. | ||
|
||
You can override this behaviour by setting the `statistics_dates` parameter. | ||
You can override this behaviour by setting the `statistics_dates` | ||
parameter. | ||
|
||
.. code-block:: yaml | ||
.. code:: yaml | ||
output: | ||
statistics_start: 2000 | ||
statistics_end: 2020 | ||
output: | ||
statistics_start: 2000 | ||
statistics_end: 2020 | ||
.. todo:: List the statistics that are computed | ||
.. | ||
.. todo:: List the statistics that are computed |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters