When the Install and Upgrade Framework (IUF) or
System Admin Toolkit (SAT) bootprep
are failing to configure an image,
this procedure can be used to incrementally configure an image.
This avoids re-running the build and configuration steps that have succeeded, which would occur if IUF or SAT bootprep
were re-run.
The following example assumes the user is creating a compute node image, but can be adapted for other image types.
-
(
ncn-mw#
) Find the path to the IUF logs printed by theiuf
command. It will look like this:2023-11-01T14:06:36.128255Z INFO All logs will be stored in /etc/cray/upgrade/csm/iuf/cos-products-20231031/log/20231101140636
See
install.log
in this directory when subsequent steps reference the IUF logs. -
(
ncn-mw#
) Find theprepare-managed-images
SAT logs by looking for a line like the following in the IUF logs:2023-11-01T14:58:22.874482Z INFO [prepare-managed-images] END sat-bootprep-run [Failed] 2023-11-01T14:58:22.874567Z DBG [prepare-managed-images] LOG FILE FOR sat-bootprep-run: argo/cos-products-20231031-gdm63-prepare-images-2rkds/cos-products-20231031-gdm63-prepare-images-2rkds-sat-wrapper-2664865581/main.log
The string starting with
argo/
is the key within theconfig-data
bucket of the S3 object which contains the log output fromsat bootprep
. Save this S3 key to an environment variable:LOGS_S3_KEY="argo/cos-products-20231031-gdm63-prepare-images-2rkds/cos-products-20231031-gdm63-prepare-images-2rkds-sat-wrapper-2664865581/main.log"
-
(
ncn-mw#
) Download the workflow SAT logs.cray artifacts get config-data "$LOGS_S3_KEY" main.log
-
(
ncn-mw#
) SetBASE_IMAGE_ID
.-
View the log and look for a line that includes
Creation of image
.INFO: Creation of image ssi-uss-1.0.0-32-csm-1.5.x86_64-csm-1.5.0.beta37-10 succeeded: ID f5374c4c-8c5a-4b79-a5e5-aef778ed36cd
-
Set
BASE_IMAGE_ID
to the value of the image ID.It is important that this value is set to the image ID and not the image name.
BASE_IMAGE_ID="f5374c4c-8c5a-4b79-a5e5-aef778ed36cd"
SAT may have generated multiple images, in which case it is up to the user to determine which is the desired base image.
-
-
(
ncn-mw#
) Find theupdate-managed-cfs-config
SAT logs by looking for a line like the following in the IUF logs:2023-11-03T17:31:50.121175Z DBG [update-managed-cfs-config] LOG FILE FOR sat-bootprep-run: argo/cos-products-20231031-sm5qz-update-cfs-config-hh522/cos-products-20231031-sm5qz-update-cfs-config-hh522-sat-wrapper-1367779725/main.log
Note that this may have been logged in a different session within the IUF activity, so a different IUF log directory may need to be examined.
The string starting with
argo/
is the key within theconfig-data
bucket of the S3 object which contains the log output fromsat bootprep
. Save this S3 key to an environment variable:LOGS_S3_KEY="argo/cos-products-20231031-gdm63-prepare-images-2rkds/cos-products-20231031-gdm63-prepare-images-2rkds-sat-wrapper-2664865581/main.log"
-
(
ncn-mw#
) Download the workflow SAT logs.cray artifacts get config-data "$LOGS_S3_KEY" main.log
-
(
ncn-mw#
) SetCFS_CONFIGURATION_NAME
.-
View the log and look at the end for the list of configurations that have been created.
{ "configurations": [ { "name": "ssi-compute-23.11.0-SSI-csm-1.5.0.beta37-10" } ] }
-
Set
CFS_CONFIGURATION_NAME
to the name of the Configuration Framework Service (CFS) configuration.CFS_CONFIGURATION_NAME="ssi-compute-23.11.0-SSI-csm-1.5.0.beta37-10"
SAT may have generated multiple configurations, in which case it is up to the user to determine which is the desired configuration. However, if the default SAT
bootprep
file is being used, then the configuration name should includecompute
. -
-
(
ncn-mw#
) SetSESSION_NAME
.This can be set to any value, but the following example steps assume this variable is set.
SESSION_NAME="example-partial-image-configuration"
-
(Optional) Find the failed configuration layer.
If configuration of the image failed somewhere beyond the first configuration layer, and there is some confidence that the successful layers will continue to be successful, then it is possible to save time by generating the first partial image with all of the successful layers. Find the index of the of the failed configuration either by counting the number of successful configuration layers in the logs, or by comparing the name of the failed layer to the output of
cray cfs v3 configurations describe $CFS_CONFIGURATION_NAME
. Indices for the configuration layers start at 0.
-
(
ncn-mw#
) Set the configuration limit.If this is the first CFS run of this procedure, this should be
0
to apply only the first layer.CONFIGURATION_LIMIT=0
Optionally, this can instead be set to a comma-separated list of numbers starting at
0
when trying to apply all layers up to the failed layer. For example:CONFIGURATION_LIMIT=0,1,2
If this is not the first CFS run in this procedure, then set this value to the next layer to be applied.
CONFIGURATION_LIMIT=<previous CONFIGURATION limit +1>
-
(
ncn-mw#
) Generate a partially configured image.The following command will generate an image that only adds the configuration layers specified in the
CONFIGURATION_LIMIT
.cray cfs v3 sessions create --name $SESSION_NAME --configuration-name $CFS_CONFIGURATION_NAME --target-definition image --target-group Compute $BASE_IMAGE_ID --configuration-limit $CONFIGURATION_LIMIT
If this is the last layer of the configuration, it is optionally possible to specify the name of the resulting image by adding
--target-image-map
.cray cfs v3 sessions create --name $SESSION_NAME --configuration-name $CFS_CONFIGURATION_NAME --target-definition image --target-group Compute $BASE_IMAGE_ID --configuration-limit $CONFIGURATION_LIMIT --target-image-map $BASE_IMAGE_ID <desired image name>
-
(
ncn-mw#
) Monitor CFS and retrieve the image ID of the partially generated image.Monitor the CFS session with the following command until the status is complete. If
succeeded
is true, then move on to the next step; otherwise, debug the failure and re-run the session as necessary.cray cfs sessions describe $SESSION_NAME --format json | jq .status.session
Example output:
{ "completionTime": "2023-11-01T19:55:10", "job": "cfs-712dee37-2b80-498c-867c-42753716cad6", "startTime": "2023-11-01T19:53:09", "status": "complete", "succeeded": "true" }
-
When the session is complete, get the resulting Image Management Service (IMS) image ID.
cray cfs v3 sessions describe $SESSION_NAME --format json | jq .status.artifacts
Example output:
[ { "image_id": "<IMS IMAGE ID>", "result_id": "<RESULTANT IMS IMAGE ID>", "type": "ims_customized_image" } ]
-
(
ncn-mw#
) UpdateBASE_IMAGE_ID
.Set
BASE_IMAGE_ID
equal to theresult_id
from the last CFS session.BASE_IMAGE_ID=<RESULTANT IMS IMAGE ID>
-
(
ncn-mw#
) Cleanup the CFS session.Cleanup the CFS session to prevent naming conflicts with configuration runs for future layers.
cray cfs v3 sessions delete $SESSION_NAME
-
Repeat these steps for each layer.
If this was not the last layer in the configuration, return to the beginning of this section. If this was the last layer in the configuration, move on to the next section.
-
Create a copy of the original SAT
bootprep
file.Follow steps 1-5 of Obtaining IUF session variables and bootprep files in the SAT documentation.
-
Edit the Boot Orchestration Service (BOS) session templates in the SAT
bootprep
file.For any BOS templates in the
session_templates
section that will now be booting with the image that was just created, replaceimage_ref
withims
to specify an image from IMS by its ID:Before:
session_templates: - name: example-session-template image: image_ref: example-image
After:
session_templates: - name: example-session-template image: ims: id: <image id from the final CFS run>
-
Remove image generation and configuration from the SAT
bootprep
file.In the
images
section of the file, remove configuration of the already created image.Example image configuration section that can be removed:
- name: "compute-{{base.name}}" ref_name: compute_image.aarch64 base: image_ref: base_cos_image.aarch64 configuration: "{{default.note}}compute-{{recipe.version}}{{default.suffix}}" configuration_group_names: - Compute
Optionally remove the creation of the base image. This can save a lot of time in the IUF/SAT run, but requires more changes to the
bootprep
file to ensure that no other image is building off the base image.Example image creation section that can be removed:
- name: "{{default.note}}{{base.name}}{{default.suffix}}" ref_name: base_cos_image.aarch64 base: product: name: cos type: recipe version: "{{cos.version}}" filter: arch: aarch64
If the base image creation is removed, ensure that no other
images
orsession_templates
reference the base image. If they do share a base image, as is frequently the case with theUAN
image andCompute
image, then the images or templates must be updated to reference the image ID of the base image that was already created. The base image ID was stored inBASE_IMAGE_ID
earlier in this procedure.Image customization section before:
- name: "uan-{{base.name}}" ref_name: uan_image.aarch64 base: image_ref: base_cos_image.aarch64 configuration: "{{default.note}}uan-{{recipe.version}}{{default.suffix}}" configuration_group_names: - Application - Application_UAN
Image customization section after:
- name: "uan-{{base.name}}" ref_name: uan_image.aarch64 base: ims: id: <base image id> type: image configuration: "{{default.note}}uan-{{recipe.version}}{{default.suffix}}" configuration_group_names: - Application - Application_UAN
See the previous step for updating any templates.
-
Restart IUF.
IUF can now be restarted at the
prepare-images
step using the newbootprep
file. For more information, see Image Preparation