update

ecmwf-lab · Mar 24, 2024 · d277ff5 · d277ff5
1 parent 5ac9aaa
commit d277ff5
Show file tree

Hide file tree

Showing 6 changed files with 100 additions and 17 deletions.
diff --git a/docs/building/concat.yaml b/docs/building/concat.yaml
@@ -0,0 +1,17 @@
+input:
+  concat:
+    - dates:
+        start: 2020-12-30 00:00:00
+        end: 2021-01-01 12:00:00
+        frequency: 12h
+
+      source1:
+        - args
+
+    - dates:
+        start: 2021-01-02 00:00:00
+        end: 2021-01-03 12:00:00
+        frequency: 12h
+
+      source2:
+        - args
diff --git a/docs/building/handling_missing_values.rst b/docs/building/handling_missing_values.rst
@@ -1,3 +1,6 @@
 #########################
  Handling missing values
 #########################
+
+.. literalinclude:: ../../tests/create/nan.yaml
+   :language: yaml
diff --git a/docs/building/operations.rst b/docs/building/operations.rst
@@ -0,0 +1,53 @@
+.. _dataset-operations:
+
+############
+ Operations
+############
+
+******
+ join
+******
+
+The join is the process of combining several sources data. Each
+source is expected to provide different variables at the same dates.
+
+.. code-block:: yaml
+
+    input:
+        join:
+            - source1
+            - source2
+            - ...
+
+
+********
+ concat
+********
+
+The concatenation is the process of combining different sets of
+operation that handle different dates. This is typically used to
+build a dataset that spans several years, when the several sources
+are involved, each providing a different period.
+
+.. literalinclude:: concat.yaml
+    :language: yaml
+
+
+******
+ pipe
+******
+
+The pipe is the process of transforming fields using filters. The
+first step of a pipe is typically a source, a join or another pipe.
+The following steps are filters.
+
+
+.. code-block:: yaml
+
+    input:
+        pipe:
+            - source
+            - filter1
+            - filter2
+            - ...
+
diff --git a/docs/building/sources/mars.rst b/docs/building/sources/mars.rst
@@ -17,7 +17,7 @@ the MARS language specification.
        grid: [0.25, 0.25]
 
 Data from several levels types must be requested in separate requests,
-with the `join` key.
+with the ``join`` command.
 
 .. code:: yaml
 

diff --git a/docs/building/statistics.rst b/docs/building/statistics.rst
@@ -1,25 +1,33 @@
 .. _gathering_statistics:
 
-Gathering statistics
-====================
+######################
+ Gathering statistics
+######################
 
-*Anemoi* will collect statistics about each variables in the dataset as it is created.
-These statistics are intended to be used to normalise the data during training.
+*Anemoi* will collect statistics about each variables in the dataset as
+it is created. These statistics are intended to be used to normalise the
+data during training.
 
-By defaults, the statistics are not computed on the whole dataset, but on a subset of
-dates. The subset is defined using the following algorythm:
+By defaults, the statistics are not computed on the whole dataset, but
+on a subset of dates. The subset is defined using the following
+algorithm:
 
-    - If the dataset covers more than 20 years, the last 3 years are excluded.
-    - If the dataset covers more than 10 years, the last 2 years are excluded.
-    - If the dataset covers more than 5 years, the last year is excluded.
-    - Otherwise, 80% of the dataset is used.
+   -  If the dataset covers more than 20 years, the last 3 years are
+      excluded.
+   -  If the dataset covers more than 10 years, the last 2 years are
+      excluded.
+   -  If the dataset covers more than 5 years, the last year is
+      excluded.
+   -  Otherwise, 80% of the dataset is used.
 
-You can override this behaviour by setting the `statistics_dates` parameter.
+You can override this behaviour by setting the `statistics_dates`
+parameter.
 
-.. code-block:: yaml
+.. code:: yaml
 
-    output:
-        statistics_start: 2000
-        statistics_end: 2020
+   output:
+       statistics_start: 2000
+       statistics_end: 2020
 
-.. todo:: List the statistics that are computed
+..
+   .. todo:: List the statistics that are computed
diff --git a/docs/index.rst b/docs/index.rst
@@ -47,6 +47,7 @@ models from existing recipes but with their own data.
 **Building training datasets**
 
 -  :doc:`building/introduction`
+-  :doc:`building/operations`
 -  :doc:`building/sources`
 -  :doc:`building/filters`
 -  :doc:`building/statistics`
@@ -57,6 +58,7 @@ models from existing recipes but with their own data.
    :caption: Building datasets
 
    building/introduction
+   building/operations
    building/sources
    building/filters
    building/naming_variables