Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
b8raoult committed Mar 24, 2024
1 parent 5ac9aaa commit d277ff5
Show file tree
Hide file tree
Showing 6 changed files with 100 additions and 17 deletions.
17 changes: 17 additions & 0 deletions docs/building/concat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
input:
concat:
- dates:
start: 2020-12-30 00:00:00
end: 2021-01-01 12:00:00
frequency: 12h

source1:
- args

- dates:
start: 2021-01-02 00:00:00
end: 2021-01-03 12:00:00
frequency: 12h

source2:
- args
3 changes: 3 additions & 0 deletions docs/building/handling_missing_values.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
#########################
Handling missing values
#########################

.. literalinclude:: ../../tests/create/nan.yaml
:language: yaml
53 changes: 53 additions & 0 deletions docs/building/operations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. _dataset-operations:

############
Operations
############

******
join
******

The join is the process of combining several sources data. Each
source is expected to provide different variables at the same dates.

.. code-block:: yaml
input:
join:
- source1
- source2
- ...
********
concat
********

The concatenation is the process of combining different sets of
operation that handle different dates. This is typically used to
build a dataset that spans several years, when the several sources
are involved, each providing a different period.

.. literalinclude:: concat.yaml
:language: yaml


******
pipe
******

The pipe is the process of transforming fields using filters. The
first step of a pipe is typically a source, a join or another pipe.
The following steps are filters.


.. code-block:: yaml
input:
pipe:
- source
- filter1
- filter2
- ...
2 changes: 1 addition & 1 deletion docs/building/sources/mars.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ the MARS language specification.
grid: [0.25, 0.25]
Data from several levels types must be requested in separate requests,
with the `join` key.
with the ``join`` command.

.. code:: yaml
Expand Down
40 changes: 24 additions & 16 deletions docs/building/statistics.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,33 @@
.. _gathering_statistics:

Gathering statistics
====================
######################
Gathering statistics
######################

*Anemoi* will collect statistics about each variables in the dataset as it is created.
These statistics are intended to be used to normalise the data during training.
*Anemoi* will collect statistics about each variables in the dataset as
it is created. These statistics are intended to be used to normalise the
data during training.

By defaults, the statistics are not computed on the whole dataset, but on a subset of
dates. The subset is defined using the following algorythm:
By defaults, the statistics are not computed on the whole dataset, but
on a subset of dates. The subset is defined using the following
algorithm:

- If the dataset covers more than 20 years, the last 3 years are excluded.
- If the dataset covers more than 10 years, the last 2 years are excluded.
- If the dataset covers more than 5 years, the last year is excluded.
- Otherwise, 80% of the dataset is used.
- If the dataset covers more than 20 years, the last 3 years are
excluded.
- If the dataset covers more than 10 years, the last 2 years are
excluded.
- If the dataset covers more than 5 years, the last year is
excluded.
- Otherwise, 80% of the dataset is used.

You can override this behaviour by setting the `statistics_dates` parameter.
You can override this behaviour by setting the `statistics_dates`
parameter.

.. code-block:: yaml
.. code:: yaml
output:
statistics_start: 2000
statistics_end: 2020
output:
statistics_start: 2000
statistics_end: 2020
.. todo:: List the statistics that are computed
..
.. todo:: List the statistics that are computed
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ models from existing recipes but with their own data.
**Building training datasets**

- :doc:`building/introduction`
- :doc:`building/operations`
- :doc:`building/sources`
- :doc:`building/filters`
- :doc:`building/statistics`
Expand All @@ -57,6 +58,7 @@ models from existing recipes but with their own data.
:caption: Building datasets

building/introduction
building/operations
building/sources
building/filters
building/naming_variables
Expand Down

0 comments on commit d277ff5

Please sign in to comment.