diff --git a/tutorials/llm/nemo2-peft.ipynb b/tutorials/llm/nemo2-peft.ipynb
index 516e13dd37f9a..c98d7a12c100e 100644
--- a/tutorials/llm/nemo2-peft.ipynb
+++ b/tutorials/llm/nemo2-peft.ipynb
@@ -12,9 +12,9 @@
     "\n",
     "This optimization process is known as fine-tuning, which involves adjusting the weights of a pre-trained foundation model with custom data.\n",
     "\n",
-    "Considering that foundation models can be significantly large, a variant of fine-tuning has gained traction recently known as PEFT. PEFT encompasses several methods, including P-Tuning, LoRA, Adapters, IA3, etc. NeMo 2.0 currently supports Low-Rank Adaptation(LoRA) method.\n",
+    "Considering that foundation models can be significantly large, a variant of fine-tuning has gained traction recently known as PEFT. PEFT encompasses several methods, including P-Tuning, LoRA, Adapters, IA3, etc. NeMo 2.0 currently supports Low-Rank Adaptation (LoRA) method.\n",
     "\n",
-    "This playbook involves applying LoRA to the Llama3 using NeMo 2.0. \n",
+    "This playbook involves applying LoRA to Llama3 using NeMo 2.0. \n",
     "\n",
     "## NeMo 2.0\n",
     "\n",
@@ -24,14 +24,14 @@
     "\n",
     "- Modular Abstractions - By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 simplifies adaptation and experimentation. This modular approach allows developers to more easily modify and experiment with different components of their models.\n",
     "\n",
-    "- Scalability - NeMo 2.0 seamlessly scaling large-scale experiments across thousands of GPUs using NeMo-Run, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.\n",
+    "- Scalability - NeMo 2.0 seamlessly scales large-scale experiments across thousands of GPUs using NeMo-Run, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.\n",
     "\n",
     "By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 makes it easy for users to adapt the framework to their specific use cases and experiment with various configurations. This section offers an overview of the new features in NeMo 2.0 and includes a migration guide with step-by-step instructions for transitioning your models from NeMo 1.0 to NeMo 2.0.\n",
     "\n",
     "## NeMo-Run\n",
     "\n",
     "NeMo-Run is a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across various computing environments. To run your experiments with Nemo-Run, you need to follow the three steps:\n",
-    "1. configure your function\n",
+    "1. Configure your function\n",
     "\n",
     "2. Define your Executor\n",
     "\n",
@@ -45,7 +45,7 @@
     "\n",
     "2. [NeMo-Run Github repo](https://github.com/NVIDIA/NeMo-Run/)\n",
     "\n",
-    "3. NeMo Framework Training container: `nvcr.io/nvidia/nemo:dev`  #TODO: FIX CONTAINER\n",
+    "3. NeMo Framework Training container: `nvcr.io/nvidia/nemo:dev`\n",
     "\n",
     "\n",
     "\n",
@@ -63,7 +63,7 @@
     "\n",
     "1. Use the latest [NeMo Framework Training container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags) . Note that you must be logged in to the container registry to view this page.\n",
     "\n",
-    "2. This notebook uses the container: `nvcr.io/nvidia/nemo:dev`  #TODO: FIX CONTAINER  \n",
+    "2. This notebook uses the container: `nvcr.io/nvidia/nemo:dev`.\n",
     "\n",
     "\n",
     "## Hardware Requirements\n",
@@ -81,7 +81,7 @@
    "source": [
     "# Step 0: Go inside docker container\n",
     "\n",
-    "You can start and enter the dev container by:  #TODO: FIX CONTAINER\n",
+    "You can start and enter the dev container by:\n",
     "```\n",
     "docker run --gpus device=1 --shm-size=2g --net=host --ulimit memlock=-1 --rm -it -v ${PWD}:/workspace -w /workspace -v ${PWD}/results:/results nvcr.io/nvidia/nemo:dev bash\n",
     "\n",
@@ -94,7 +94,7 @@
    "source": [
     "\n",
     "# Step 1: Import HuggingFace checkpoint\n",
-    "First request download permission from Meta and Hugging Face. Login through `huggingface-cli` using your Huggingface token before importing llama3 models. \n",
+    "First request download permission from Meta and Hugging Face. Log in through `huggingface-cli` using your Hugging Face token before importing llama3 models. \n",
     "\n",
     "```\n",
     "$ huggingface-cli login\n",
@@ -119,69 +119,40 @@
       "  from .autonotebook import tqdm as notebook_tqdm\n",
       "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n",
       "  warnings.warn(\n",
-      "[NeMo W 2024-10-22 23:54:34 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:290: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "      def forward(ctx, input, weight, bias, allreduce_dgrad):\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:34 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:301: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "      def backward(ctx, grad_output):\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:34 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:393: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "      def forward(\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:34 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:433: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "      def backward(ctx, grad_output):\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:34 nemo_logging:349] /opt/megatron-lm/megatron/core/dist_checkpointing/strategies/torch.py:17: DeprecationWarning: `torch.distributed._sharded_tensor` will be deprecated, use `torch.distributed._shard.sharded_tensor` instead\n",
-      "      from torch.distributed._sharded_tensor import ShardedTensor as TorchShardedTensor\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "      def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "      def backward(ctx, dout):\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:986: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "      def forward(\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1045: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "      def backward(ctx, dout, *args):\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:26: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "      def forward(ctx, x, weight, bias, process_group=None, sequence_parallel=True):\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:62: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "      def backward(ctx, grad_output):\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:758: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "      def forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states=None, seq_idx=None, dt_limit=(0.0, float(\"inf\")), return_final_states=False, activation=\"silu\",\n",
-      "    \n",
-      "[NeMo W 2024-10-22 23:54:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:836: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "      def backward(ctx, dout, *args):\n",
+      "[NeMo W 2024-11-15 09:43:51 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n",
+      "      cm = get_cmap(\"Set1\")\n",
       "    \n"
      ]
     },
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">─ </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Entering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1729666…</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ─</span>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">─ </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Entering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1731692…</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ─</span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[92m─ \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1729666…\u001b[0m\u001b[92m ─\u001b[0m\n"
+       "\u001b[92m─ \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1731692…\u001b[0m\u001b[92m ─\u001b[0m\n"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n"
+     ]
+    },
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[23:54:35] </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Launching task nemo.collections.llm.api.import_ckpt for experiment </span>                    <a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">experiment.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#596\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">596</span></a>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[09:43:52] </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Launching job nemo.collections.llm.api.import_ckpt for experiment </span>                     <a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">experiment.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">660</span></a>\n",
        "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">           </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">nemo.collections.llm.api.import_ckpt</span>                                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                 </span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[2;36m[23:54:35]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching task nemo.collections.llm.api.import_ckpt for experiment \u001b[0m                    \u001b]8;id=128430;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=182739;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#596\u001b\\\u001b[2m596\u001b[0m\u001b]8;;\u001b\\\n",
+       "\u001b[2;36m[09:43:52]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.import_ckpt for experiment \u001b[0m                     \u001b]8;id=6439;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=636758;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n",
        "\u001b[2;36m           \u001b[0m\u001b[1;36mnemo.collections.llm.api.import_ckpt\u001b[0m                                                   \u001b[2m                 \u001b[0m\n"
       ]
      },
@@ -192,26 +163,26 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1729666475/nemo.collections.llm.api.import_ckpt\n",
-      "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.import_ckpt-g20nvfvrft64t\n",
+      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n",
+      "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n",
       "AppStatus:\n",
       "    State: RUNNING\n",
       "    Num Restarts: 0\n",
       "    Roles: \n",
       "    Msg: <NONE>\n",
       "    Structured Error Msg: <NONE>\n",
-      "    UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1729666475/nemo.collections.llm.api.import_ckpt/nemo_run/nemo.collections.llm.api.import_ckpt-g20nvfvrft64t\n",
+      "    UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt/nemo_run/nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n",
       "    \n"
      ]
     },
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">──────────────── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Waiting for Experiment nemo.collections.llm.api.import_ckpt_1729666475 to finish</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ─────────────────</span>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">──────────────── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Waiting for Experiment nemo.collections.llm.api.import_ckpt_1731692632 to finish</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ─────────────────</span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[92m──────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.import_ckpt_1729666475 to finish\u001b[0m\u001b[92m ─────────────────\u001b[0m\n"
+       "\u001b[92m──────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.import_ckpt_1731692632 to finish\u001b[0m\u001b[92m ─────────────────\u001b[0m\n"
       ]
      },
      "metadata": {},
@@ -233,11 +204,11 @@
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Experiment Status for</span> <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.import_ckpt_1729666475</span>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Experiment Status for</span> <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.import_ckpt_1731692632</span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.import_ckpt_1729666475\u001b[0m\n"
+       "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\n"
       ]
      },
      "metadata": {},
@@ -250,8 +221,8 @@
        "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Task 0</span>: <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.import_ckpt</span>\n",
        "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Status</span>: RUNNING\n",
        "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Executor</span>: LocalExecutor\n",
-       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Job id</span>: nemo.collections.llm.api.import_ckpt-g20nvfvrft64t\n",
-       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Local Directory</span>: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1729666475/nemo.collections.llm.api.import_ckpt\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Job id</span>: nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Local Directory</span>: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n",
        "</pre>\n"
       ],
       "text/plain": [
@@ -259,8 +230,8 @@
        "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.import_ckpt\u001b[0m\n",
        "- \u001b[1;32mStatus\u001b[0m: RUNNING\n",
        "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n",
-       "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.import_ckpt-g20nvfvrft64t\n",
-       "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1729666475/nemo.collections.llm.api.import_ckpt\n"
+       "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n",
+       "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n"
       ]
      },
      "metadata": {},
@@ -283,62 +254,66 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Waiting for job nemo.collections.llm.api.import_ckpt-g20nvfvrft64t to finish [log=True]...\n"
+      "Waiting for job nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30 to finish [log=True]...\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:38 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       warnings.warn(\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:39 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:290: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def forward(ctx, input, weight, bias, allreduce_dgrad):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:39 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:301: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def backward(ctx, grad_output):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:39 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:393: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def forward(\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:39 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:433: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def backward(ctx, grad_output):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:39 nemo_logging:349] /opt/megatron-lm/megatron/core/dist_checkpointing/strategies/torch.py:17: DeprecationWarning: `torch.distributed._sharded_tensor` will be deprecated, use `torch.distributed._shard.sharded_tensor` instead\n",
-      "nemo.collections.llm.api.import_ckpt/0       from torch.distributed._sharded_tensor import ShardedTensor as TorchShardedTensor\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def backward(ctx, dout):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:986: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def forward(\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1045: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def backward(ctx, dout, *args):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:26: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def forward(ctx, x, weight, bias, process_group=None, sequence_parallel=True):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:62: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def backward(ctx, grad_output):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:758: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states=None, seq_idx=None, dt_limit=(0.0, float(\"inf\")), return_final_states=False, activation=\"silu\",\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n",
-      "nemo.collections.llm.api.import_ckpt/0 [NeMo W 2024-10-22 23:54:40 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:836: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.import_ckpt/0       def backward(ctx, dout, *args):\n",
-      "nemo.collections.llm.api.import_ckpt/0     \n"
+      "mport_ckpt/0 [NeMo W 2024-11-15 09:43:59 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n",
+      "mport_ckpt/0       cm = get_cmap(\"Set1\")\n",
+      "mport_ckpt/0     \n",
+      "mport_ckpt/0 Downloading shards: 100%|██████████| 4/4 [00:00<00:00, 4830.76it/s]\n",
+      "mport_ckpt/0 Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  3.13it/s]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_strategy:310] Fixing mis-match between ddp-config & mcore-optimizer config\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:396] Rank 0 has data parallel group : [0]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:410] Ranks 0 has data parallel rank: 0\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:418] Rank 0 has context parallel group: [0]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:421] All context parallel group ranks: [[0]]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:422] Ranks 0 has context parallel rank: 0\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:429] Rank 0 has model parallel group: [0]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:430] All model parallel group ranks: [[0]]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:443] All tensor model parallel group ranks: [[0]]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:476] Rank 0 has embedding group: [0]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:484] All embedding group ranks: [[0]]\n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:485] Rank 0 has embedding rank: 0\n",
+      "mport_ckpt/0 GPU available: True (cuda), used: False\n",
+      "mport_ckpt/0 TPU available: False, using: 0 TPU cores\n",
+      "mport_ckpt/0 HPU available: False, using: 0 HPUs\n",
+      "mport_ckpt/0 [NeMo W 2024-11-15 09:44:02 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.\n",
+      "mport_ckpt/0     \n",
+      "mport_ckpt/0 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n",
+      "mport_ckpt/0 ----------------------------------------------------------------------------------------------------\n",
+      "mport_ckpt/0 distributed_backend=gloo\n",
+      "mport_ckpt/0 All distributed processes registered. Starting with 1 processes\n",
+      "mport_ckpt/0 ----------------------------------------------------------------------------------------------------\n",
+      "mport_ckpt/0 \n",
+      "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n",
+      "mport_ckpt/0 [NeMo W 2024-11-15 09:44:02 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:1090: `trainer.init_module` cannot fully support proper instantiation of your model with the `MegatronStrategy` strategy. Please instantiate your model inside the`LightningModule.configure_model` hook instead\n",
+      "mport_ckpt/0     \n",
+      "mport_ckpt/0 [NeMo W 2024-11-15 09:44:40 megatron_strategy:324] Could not copy Trainer's 'max_steps' to LR scheduler's 'max_steps'. If you are not using an LR scheduler, this warning can safely be ignored.\n",
+      "mport_ckpt/0 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "mport_ckpt/0 To disable this warning, you can either:\n",
+      "mport_ckpt/0 \t- Avoid using `tokenizers` before the fork if possible\n",
+      "mport_ckpt/0 \t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "mport_ckpt/0 Converted Llama model to Nemo, model saved to /root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B in torch.bfloat16.\n",
+      "mport_ckpt/0 \u001b[32m $\u001b[0m\u001b[32mNEMO_MODELS_CACHE\u001b[0m\u001b[32m=\u001b[0m\u001b[32m/root/.cache/nemo/\u001b[0m\u001b[32mmodels\u001b[0m\u001b[32m \u001b[0m\n",
+      "mport_ckpt/0 \u001b[32m✓ Checkpoint imported to \u001b[0m\u001b[32m/root/.cache/nemo/models/meta-llama/\u001b[0m\u001b[32mMeta-Llama-3-8B\u001b[0m\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Job nemo.collections.llm.api.import_ckpt-g20nvfvrft64t finished: SUCCEEDED\n"
+      "Job nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30 finished: SUCCEEDED\n"
      ]
     },
     {
@@ -347,7 +322,7 @@
        "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"background-color: #272822\">                                                                                                                   </span>\n",
        "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># The experiment was run with the following tasks: ['nemo.collections.llm.api.import_ckpt']</span><span style=\"background-color: #272822\">                        </span>\n",
        "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># You can inspect and reconstruct this experiment at a later point in time using:</span><span style=\"background-color: #272822\">                                  </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment </span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\"> run</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">Experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">from_id(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.import_ckpt_1729666475\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">                             </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment </span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\"> run</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">Experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">from_id(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.import_ckpt_1731692632\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">                             </span>\n",
        "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">status() </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Gets the overall status</span><span style=\"background-color: #272822\">                                                                      </span>\n",
        "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">logs(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.import_ckpt\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">) </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Gets the log for the provided task</span><span style=\"background-color: #272822\">                       </span>\n",
        "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">cancel(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.import_ckpt\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">) </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Cancels the provided task if still running</span><span style=\"background-color: #272822\">             </span>\n",
@@ -358,7 +333,7 @@
        "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n",
        "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.import_ckpt']\u001b[0m\u001b[48;2;39;40;34m                        \u001b[0m\n",
        "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m                                  \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1729666475\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                             \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                             \u001b[0m\n",
        "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m                                                                      \u001b[0m\n",
        "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m                       \u001b[0m\n",
        "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m             \u001b[0m\n",
@@ -373,18 +348,18 @@
       "text/html": [
        "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"background-color: #272822\">                                                                                                                   </span>\n",
        "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># You can inspect this experiment at a later point in time using the CLI as well:</span><span style=\"background-color: #272822\">                                  </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment status nemo.collections.llm.api.import_ckpt_1729666475</span><span style=\"background-color: #272822\">                                             </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment logs nemo.collections.llm.api.import_ckpt_1729666475 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                             </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment cancel nemo.collections.llm.api.import_ckpt_1729666475 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                           </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment status nemo.collections.llm.api.import_ckpt_1731692632</span><span style=\"background-color: #272822\">                                             </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment logs nemo.collections.llm.api.import_ckpt_1731692632 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                             </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment cancel nemo.collections.llm.api.import_ckpt_1731692632 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                           </span>\n",
        "<span style=\"background-color: #272822\">                                                                                                                   </span>\n",
        "</pre>\n"
       ],
       "text/plain": [
        "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n",
        "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m                                  \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1729666475\u001b[0m\u001b[48;2;39;40;34m                                             \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1729666475\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                             \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1729666475\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                           \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[48;2;39;40;34m                                             \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                             \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                           \u001b[0m\n",
        "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n"
       ]
      },
@@ -451,7 +426,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For how to use your own data to create your custom `DataModule` in order to perform PEFT, refer to [NeMo 2.0 SFT notebook](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/nemo2-sft.ipynb). ##TODO: Verify this link works before publish"
+    "For how to use your own data to create your custom `DataModule` in order to perform PEFT, refer to [NeMo 2.0 SFT notebook](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/nemo2-sft.ipynb)."
    ]
   },
   {
@@ -483,7 +458,7 @@
     "    trainer = run.Config(\n",
     "        nl.Trainer,\n",
     "        devices=1,\n",
-    "        max_steps=40,\n",
+    "        max_steps=20,\n",
     "        accelerator=\"gpu\",\n",
     "        strategy=strategy,\n",
     "        plugins=bf16_mixed(),\n",
@@ -658,7 +633,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 9,
    "metadata": {
     "tags": []
    },
@@ -666,25 +641,32 @@
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">─── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Entering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1729667543</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ───</span>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">─── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Entering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1731692700</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ───</span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[92m─── \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1729667543\u001b[0m\u001b[92m ───\u001b[0m\n"
+       "\u001b[92m─── \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[92m ───\u001b[0m\n"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n"
+     ]
+    },
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[00:12:23] </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Launching task nemo.collections.llm.api.finetune for experiment </span>                       <a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">experiment.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#596\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">596</span></a>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[09:45:00] </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Launching job nemo.collections.llm.api.finetune for experiment </span>                        <a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">experiment.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">660</span></a>\n",
        "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">           </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">nemo.collections.llm.api.finetune</span>                                                      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                 </span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[2;36m[00:12:23]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching task nemo.collections.llm.api.finetune for experiment \u001b[0m                       \u001b]8;id=646240;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=228286;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#596\u001b\\\u001b[2m596\u001b[0m\u001b]8;;\u001b\\\n",
+       "\u001b[2;36m[09:45:00]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.finetune for experiment \u001b[0m                        \u001b]8;id=93593;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=6694;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n",
        "\u001b[2;36m           \u001b[0m\u001b[1;36mnemo.collections.llm.api.finetune\u001b[0m                                                      \u001b[2m                 \u001b[0m\n"
       ]
      },
@@ -695,26 +677,26 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1729667543/nemo.collections.llm.api.finetune\n",
-      "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.finetune-chz0jczhtwtxzc\n",
+      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n",
+      "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.finetune-wdj265kcplhnkd\n",
       "AppStatus:\n",
       "    State: RUNNING\n",
       "    Num Restarts: 0\n",
       "    Roles: \n",
       "    Msg: <NONE>\n",
       "    Structured Error Msg: <NONE>\n",
-      "    UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1729667543/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-chz0jczhtwtxzc\n",
+      "    UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-wdj265kcplhnkd\n",
       "    \n"
      ]
     },
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">────────────────── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Waiting for Experiment nemo.collections.llm.api.finetune_1729667543 to finish</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ──────────────────</span>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">────────────────── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Waiting for Experiment nemo.collections.llm.api.finetune_1731692700 to finish</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ──────────────────</span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[92m────────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.finetune_1729667543 to finish\u001b[0m\u001b[92m ──────────────────\u001b[0m\n"
+       "\u001b[92m────────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.finetune_1731692700 to finish\u001b[0m\u001b[92m ──────────────────\u001b[0m\n"
       ]
      },
      "metadata": {},
@@ -736,11 +718,11 @@
     {
      "data": {
       "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Experiment Status for</span> <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.finetune_1729667543</span>\n",
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Experiment Status for</span> <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.finetune_1731692700</span>\n",
        "</pre>\n"
       ],
       "text/plain": [
-       "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.finetune_1729667543\u001b[0m\n"
+       "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.finetune_1731692700\u001b[0m\n"
       ]
      },
      "metadata": {},
@@ -753,8 +735,8 @@
        "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Task 0</span>: <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.finetune</span>\n",
        "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Status</span>: RUNNING\n",
        "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Executor</span>: LocalExecutor\n",
-       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Job id</span>: nemo.collections.llm.api.finetune-chz0jczhtwtxzc\n",
-       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Local Directory</span>: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1729667543/nemo.collections.llm.api.finetune\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Job id</span>: nemo.collections.llm.api.finetune-wdj265kcplhnkd\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Local Directory</span>: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n",
        "</pre>\n"
       ],
       "text/plain": [
@@ -762,8 +744,8 @@
        "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.finetune\u001b[0m\n",
        "- \u001b[1;32mStatus\u001b[0m: RUNNING\n",
        "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n",
-       "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.finetune-chz0jczhtwtxzc\n",
-       "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1729667543/nemo.collections.llm.api.finetune\n"
+       "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.finetune-wdj265kcplhnkd\n",
+       "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n"
       ]
      },
      "metadata": {},
@@ -786,250 +768,628 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Waiting for job nemo.collections.llm.api.finetune-chz0jczhtwtxzc to finish [log=True]...\n"
+      "Waiting for job nemo.collections.llm.api.finetune-wdj265kcplhnkd to finish [log=True]...\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188] Starting elastic_operator with launch configs:\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   entrypoint       : nemo_run.core.runners.fdl_runner\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   min_nodes        : 1\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   max_nodes        : 1\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   nproc_per_node   : 1\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   run_id           : 4202\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_backend     : c10d\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_endpoint    : localhost:0\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_configs     : {'timeout': 900}\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   max_restarts     : 0\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   monitor_interval : 0.1\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   log_dir          : /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1729667543/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-chz0jczhtwtxzc/torchelastic/nemo.collections.llm.api.finetune\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188]   metrics_cfg      : {}\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.154000 140737350272832 torch/distributed/launcher/api.py:188] \n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.156000 140737350272832 torch/distributed/elastic/agent/server/api.py:825] [default] starting workers for entrypoint: python\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.157000 140737350272832 torch/distributed/elastic/agent/server/api.py:646] [default] Rendezvous'ing worker group\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] [default] Rendezvous complete for workers. Result:\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   restart_count=0\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   master_addr=eos0003.eos.clusters.nvidia.com\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   master_port=34229\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   group_rank=0\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   group_world_size=1\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   local_ranks=[0]\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   role_ranks=[0]\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   global_ranks=[0]\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   role_world_sizes=[1]\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   global_world_sizes=[1]\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] \n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/api.py:654] [default] Starting worker group\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:184] Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:24.463000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:216] Environment variable 'TORCHELASTIC_HEALTH_CHECK_PORT' not found. Do not start health check.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:27 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      warnings.warn(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:28 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:290: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def forward(ctx, input, weight, bias, allreduce_dgrad):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:28 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:301: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def backward(ctx, grad_output):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:28 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:393: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def forward(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:28 nemo_logging:349] /opt/megatron-lm/megatron/core/tensor_parallel/layers.py:433: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def backward(ctx, grad_output):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:28 nemo_logging:349] /opt/megatron-lm/megatron/core/dist_checkpointing/strategies/torch.py:17: DeprecationWarning: `torch.distributed._sharded_tensor` will be deprecated, use `torch.distributed._shard.sharded_tensor` instead\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      from torch.distributed._sharded_tensor import ShardedTensor as TorchShardedTensor\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def backward(ctx, dout):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:986: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def forward(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1045: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def backward(ctx, dout, *args):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:26: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def forward(ctx, x, weight, bias, process_group=None, sequence_parallel=True):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:62: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def backward(ctx, grad_output):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:758: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states=None, seq_idx=None, dt_limit=(0.0, float(\"inf\")), return_final_states=False, activation=\"silu\",\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:836: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      def backward(ctx, dout, *args):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:29 api:497] Disabling try_restore_best_ckpt restoration for adapters\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:29 nemo_logger:145] Experiments will be logged at results/nemo2_peft\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:GPU available: True (cuda), used: True\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:TPU available: False, using: 0 TPU cores\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:HPU available: False, using: 0 HPUs\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logger:123] No version folders would be created under the log folder as 'resume_if_exists' is enabled.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logger:173] \"update_logger_directory\" is True. Overwriting tensorboard logger \"save_dir\" to results\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:29 nemo_logger:189] The Trainer already contains a ModelCheckpoint callback. This will be overwritten.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:29 megatron_strategy:294] Fixing mis-match between ddp-config & mcore-optimizer config\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:30 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/modelopt/torch/quantization/tensor_quant.py:168: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      quantize_op_abstract = torch.library.impl_abstract(\"tensorrt::quantize_op\")(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:31 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      cm = get_cmap(\"Set1\")\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:32 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/nvidia/dali/_autograph/pyct/gast_util.py:79: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      if get_gast_version() < LooseVersion(\"0.5\"):\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:32 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/setuptools/_distutils/version.py:337: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      other = LooseVersion(other)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:314] Rank 0 has data parallel group : [0]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:320] Rank 0 has combined group of data parallel and context parallel : [0]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:325] All data parallel group ranks with context parallel combined: [[0]]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:328] Ranks 0 has data parallel rank: 0\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:336] Rank 0 has context parallel group: [0]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:339] All context parallel group ranks: [[0]]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:340] Ranks 0 has context parallel rank: 0\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:347] Rank 0 has model parallel group: [0]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:348] All model parallel group ranks: [[0]]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:357] Rank 0 has tensor model parallel group: [0]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:361] All tensor model parallel group ranks: [[0]]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:362] Rank 0 has tensor model parallel rank: 0\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:382] Rank 0 has pipeline model parallel group: [0]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:394] Rank 0 has embedding group: [0]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:400] All pipeline model parallel group ranks: [[0]]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:401] Rank 0 has pipeline model parallel rank 0\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:402] All embedding group ranks: [[0]]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:32 megatron_init:403] Rank 0 has embedding rank: 0\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:distributed_backend=nccl\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:All distributed processes registered. Starting with 1 processes\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/tensor_shape_pb2.py:18: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      DESCRIPTOR = _descriptor.FileDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/tensor_shape_pb2.py:36: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _descriptor.FieldDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/tensor_shape_pb2.py:29: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _TENSORSHAPEPROTO_DIM = _descriptor.Descriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/types_pb2.py:19: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      DESCRIPTOR = _descriptor.FileDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/types_pb2.py:33: DeprecationWarning: Call to deprecated create function EnumValueDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _descriptor.EnumValueDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/types_pb2.py:27: DeprecationWarning: Call to deprecated create function EnumDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _DATATYPE = _descriptor.EnumDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/resource_handle_pb2.py:20: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      DESCRIPTOR = _descriptor.FileDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/resource_handle_pb2.py:39: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _descriptor.FieldDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/resource_handle_pb2.py:32: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _RESOURCEHANDLEPROTO_DTYPEANDSHAPE = _descriptor.Descriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/tensor_pb2.py:21: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      DESCRIPTOR = _descriptor.FileDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/tensor_pb2.py:40: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _descriptor.FieldDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/tensor_pb2.py:33: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _TENSORPROTO = _descriptor.Descriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/summary_pb2.py:20: DeprecationWarning: Call to deprecated create function FileDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      DESCRIPTOR = _descriptor.FileDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/summary_pb2.py:35: DeprecationWarning: Call to deprecated create function EnumValueDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _descriptor.EnumValueDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/summary_pb2.py:29: DeprecationWarning: Call to deprecated create function EnumDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _DATACLASS = _descriptor.EnumDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/summary_pb2.py:74: DeprecationWarning: Call to deprecated create function FieldDescriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _descriptor.FieldDescriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/proto/summary_pb2.py:67: DeprecationWarning: Call to deprecated create function Descriptor(). Note: Create unlinked descriptors is going to go away. Please use get/find descriptors from generated code or query the descriptor_pool.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _SUMMARYDESCRIPTION = _descriptor.Descriptor(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:33 model_transform:66] Setting up ModelTransform for stage: fit\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:33 model_transform:69] Found model_transform attribute on pl_module\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:33 model_transform:72] Set model_transform to: <function _call_counter.<locals>.wrapper at 0x7ff7325936d0>\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:33 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:326: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      np.bool8: (False, True),\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:654: Checkpoint directory results/nemo2_peft/checkpoints exists and is not empty.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:33 num_microbatches_calculator:218] setting number of microbatches to constant 8\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:  | Name   | Type     | Params | Mode \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:--------------------------------------------\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:0 | module | GPTModel | 8.0 B  | train\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:--------------------------------------------\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:8.0 B     Trainable params\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:0         Non-trainable params\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:8.0 B     Total params\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:32,121.045Total estimated model params size (MB)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:649       Modules in train mode\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:0         Modules in eval mode\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:Restoring states from the checkpoint path at results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3957-epoch=0-last/weights\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /opt/megatron-lm/megatron/core/dist_checkpointing/strategies/torch.py:755: FutureWarning: `load_state_dict` is deprecated and will be removed in future versions. Please use `load` instead.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      checkpoint.load_state_dict(\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:33 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/torch/distributed/checkpoint/planner_helpers.py:311: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      device = getattr(value, \"device\", None)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:116] Building data files\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:528] Processing 1 data files using 1 workers\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:Restored all states from the checkpoint at results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3957-epoch=0-last/weights\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:To disable this warning, you can either:\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.074659\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:528] Processing 1 data files using 1 workers\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.064278\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:158] Loading data files\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:249] Loading /root/.cache/nemo/datasets/squad/training.jsonl\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000478\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo I 2024-10-23 00:12:46 text_memmap_dataset:165] Computing global indices\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:To disable this warning, you can either:\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:46 nemo_logging:349] /lustre/fsw/coreai_dlalgo_llm/huiyingl/nemo2sftpeft/NeMo-sftpeft/nemo/collections/nlp/data/language_modeling/megatron/dataset_utils.py:1332: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/tensor/python_tensor.cpp:78.)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      counts = torch.cuda.LongTensor([1])\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:46 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=223` in the `DataLoader` to improve performance.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 [default0]:`Trainer.fit` stopped: `max_steps=40` reached.\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:[NeMo W 2024-10-23 00:12:55 nemo_logging:349] /usr/lib/python3.10/tempfile.py:999: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpfnw3sxc9'>\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:      _warnings.warn(warn_message, ResourceWarning)\n",
-      "nemo.collections.llm.api.finetune/0 [default0]:    \n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:57.607000 140737350272832 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:57.607000 140737350272832 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish\n",
-      "nemo.collections.llm.api.finetune/0 I1023 00:12:57.607000 140737350272832 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.0003485679626464844 seconds\n"
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] Starting elastic_operator with launch configs:\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   entrypoint       : nemo_run.core.runners.fdl_runner\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   min_nodes        : 1\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   max_nodes        : 1\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   nproc_per_node   : 1\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   run_id           : 658\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_backend     : c10d\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_endpoint    : localhost:0\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_configs     : {'timeout': 900}\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   max_restarts     : 0\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   monitor_interval : 0.1\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   log_dir          : /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-wdj265kcplhnkd/torchelastic/nemo.collections.llm.api.finetune\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188]   metrics_cfg      : {}\n",
+      "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] \n",
+      "i.finetune/0 I1115 09:45:01.673000 140737350272832 torch/distributed/elastic/agent/server/api.py:825] [default] starting workers for entrypoint: python\n",
+      "i.finetune/0 I1115 09:45:01.673000 140737350272832 torch/distributed/elastic/agent/server/api.py:646] [default] Rendezvous'ing worker group\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] [default] Rendezvous complete for workers. Result:\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   restart_count=0\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   master_addr=eos0346.eos.clusters.nvidia.com\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   master_port=50753\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   group_rank=0\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   group_world_size=1\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   local_ranks=[0]\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   role_ranks=[0]\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   global_ranks=[0]\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   role_world_sizes=[1]\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   global_world_sizes=[1]\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] \n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:654] [default] Starting worker group\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:184] Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.\n",
+      "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:216] Environment variable 'TORCHELASTIC_HEALTH_CHECK_PORT' not found. Do not start health check.\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:08 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n",
+      "i.finetune/0 [default0]:      cm = get_cmap(\"Set1\")\n",
+      "i.finetune/0 [default0]:    \n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:09 api:734] Disabling try_restore_best_ckpt restoration for adapters\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:09 nemo_logger:145] Experiments will be logged at results/nemo2_peft\n",
+      "i.finetune/0 [default0]:GPU available: True (cuda), used: True\n",
+      "i.finetune/0 [default0]:TPU available: False, using: 0 TPU cores\n",
+      "i.finetune/0 [default0]:HPU available: False, using: 0 HPUs\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:09 nemo_logger:123] No version folders would be created under the log folder as 'resume_if_exists' is enabled.\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:09 nemo_logger:173] \"update_logger_directory\" is True. Overwriting tensorboard logger \"save_dir\" to results\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:09 nemo_logger:189] The Trainer already contains a ModelCheckpoint callback. This will be overwritten.\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:09 megatron_strategy:310] Fixing mis-match between ddp-config & mcore-optimizer config\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:396] Rank 0 has data parallel group : [0]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:410] Ranks 0 has data parallel rank: 0\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:418] Rank 0 has context parallel group: [0]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:421] All context parallel group ranks: [[0]]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:422] Ranks 0 has context parallel rank: 0\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:429] Rank 0 has model parallel group: [0]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:430] All model parallel group ranks: [[0]]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:443] All tensor model parallel group ranks: [[0]]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:476] Rank 0 has embedding group: [0]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:484] All embedding group ranks: [[0]]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:485] Rank 0 has embedding rank: 0\n",
+      "i.finetune/0 [default0]:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n",
+      "i.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n",
+      "i.finetune/0 [default0]:distributed_backend=nccl\n",
+      "i.finetune/0 [default0]:All distributed processes registered. Starting with 1 processes\n",
+      "i.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n",
+      "i.finetune/0 [default0]:\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 squad:87] Downloading SquadDataModule...\n",
+      "i.finetune/0 [default0]:\n",
+      "i.finetune/0 [default0]:Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:14 squad:106] Preprocessing SquadDataModule to jsonl format and splitting...\n",
+      "i.finetune/0 [default0]:\n",
+      "i.finetune/0 [default0]:Generating train split: 100%|██████████| 87599/87599 [00:00<00:00, 917065.61 examples/s]\n",
+      "i.finetune/0 [default0]:\n",
+      "i.finetune/0 [default0]:Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]\n",
+      "i.finetune/0 [default0]:Generating validation split: 100%|██████████| 10570/10570 [00:00<00:00, 854742.68 examples/s]\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:18 squad:137] training split saved to /root/.cache/nemo/datasets/squad/training.jsonl\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:19 squad:137] validation split saved to /root/.cache/nemo/datasets/squad/validation.jsonl\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:19 squad:137] test split saved to /root/.cache/nemo/datasets/squad/test.jsonl\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 model_transform:66] Setting up ModelTransform for stage: fit\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 model_transform:69] Found model_transform attribute on pl_module\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 model_transform:72] Set model_transform to: <function _call_counter.<locals>.wrapper at 0x7ff9ec5b0a60>\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 num_microbatches_calculator:228] setting number of microbatches to constant 8\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 megatron_strategy:745] Doing selective restore from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n",
+      "i.finetune/0 [default0]:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:20 megatron_strategy:324] Could not copy Trainer's 'max_steps' to LR scheduler's 'max_steps'. If you are not using an LR scheduler, this warning can safely be ignored.\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 megatron_strategy:750] Restoring model weights from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 megatron_strategy:757] Finished restoring from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True), cleaning up.\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:116] Building data files\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:528] Processing 1 data files using 1 workers\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:494] Building indexing for fn = /root/.cache/nemo/datasets/squad/training.jsonl\n",
+      "i.finetune/0 [default0]:\n",
+      "i.finetune/0 [default0]:  | Name   | Type     | Params | Mode \n",
+      "i.finetune/0 [default0]:--------------------------------------------\n",
+      "i.finetune/0 [default0]:0 | module | GPTModel | 8.0 B  | train\n",
+      "i.finetune/0 [default0]:--------------------------------------------\n",
+      "i.finetune/0 [default0]:8.0 B     Trainable params\n",
+      "i.finetune/0 [default0]:0         Non-trainable params\n",
+      "i.finetune/0 [default0]:8.0 B     Total params\n",
+      "i.finetune/0 [default0]:32,121.045Total estimated model params size (MB)\n",
+      "i.finetune/0 [default0]:649       Modules in train mode\n",
+      "i.finetune/0 [default0]:0         Modules in eval mode\n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:506] Saving idx file = /root/.cache/nemo/datasets/squad/training.jsonl.idx.npy\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:508] Saving metadata file = /root/.cache/nemo/datasets/squad/training.jsonl.idx.info\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:543] Time building 1 / 1 mem-mapped files: 0:00:00.133476\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:528] Processing 1 data files using 1 workers\n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.080904\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:158] Loading data files\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:249] Loading /root/.cache/nemo/datasets/squad/training.jsonl\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000532\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:165] Computing global indices\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:116] Building data files\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:528] Processing 1 data files using 1 workers\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:34 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=223` in the `DataLoader` to improve performance.\n",
+      "i.finetune/0 [default0]:    \n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:494] Building indexing for fn = /root/.cache/nemo/datasets/squad/validation.jsonl\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:506] Saving idx file = /root/.cache/nemo/datasets/squad/validation.jsonl.idx.npy\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:508] Saving metadata file = /root/.cache/nemo/datasets/squad/validation.jsonl.idx.info\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:543] Time building 1 / 1 mem-mapped files: 0:00:00.088056\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:528] Processing 1 data files using 1 workers\n",
+      "i.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.082867\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:158] Loading data files\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:249] Loading /root/.cache/nemo/datasets/squad/validation.jsonl\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000437\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:165] Computing global indices\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:35 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=223` in the `DataLoader` to improve performance.\n",
+      "i.finetune/0 [default0]:    \n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.self_attention.linear_proj\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.self_attention.linear_qkv\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.mlp.linear_fc1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.mlp.linear_fc2\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 model_transform:90] After applying model_transform:\n",
+      "i.finetune/0 [default0]:      | Name   | Type     | Params | Mode \n",
+      "i.finetune/0 [default0]:    --------------------------------------------\n",
+      "i.finetune/0 [default0]:    0 | module | GPTModel | 8.1 B  | train\n",
+      "i.finetune/0 [default0]:    --------------------------------------------\n",
+      "i.finetune/0 [default0]:    71.3 M    Trainable params\n",
+      "i.finetune/0 [default0]:    8.0 B     Non-trainable params\n",
+      "i.finetune/0 [default0]:    8.1 B     Total params\n",
+      "i.finetune/0 [default0]:    32,406.258Total estimated model params size (MB)\n",
+      "i.finetune/0 [default0]:    1289      Modules in train mode\n",
+      "i.finetune/0 [default0]:    0         Modules in eval mode\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 peft:177] Initializing model parallel\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 megatron_parallel:550]  > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 8101564416\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 megatron_parallel:553]  > number of trainable parameters: 71303168 (0.88% of total)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 utils:278] Setting up DistributedDataParallel with config DistributedDataParallelConfig(grad_reduce_in_fp32=True, overlap_grad_reduce=False, overlap_param_gather=False, align_param_gather=False, use_distributed_optimizer=True, check_for_nan_in_grad=True, bucket_size=None, average_in_collective=False, fp8_param_gather=False)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 utils:299] Number of buckets for gradient all-reduce / reduce-scatter: 1\n",
+      "i.finetune/0 [default0]:    Params for bucket 1 (71303168 elements):\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.16.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.24.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.31.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.9.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.17.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.10.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.5.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.8.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.mlp.linear_fc1.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.23.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.30.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.26.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.25.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.19.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.15.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.12.self_attention.linear_proj.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.2.mlp.linear_fc2.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.29.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.28.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.22.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.21.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.18.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.14.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.11.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.7.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.6.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.4.self_attention.linear_proj.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.0.mlp.linear_fc1.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.27.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.20.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.13.self_attention.linear_qkv.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.1.mlp.linear_fc2.adapter.linear_out.weight\n",
+      "i.finetune/0 [default0]:    \tmodule.decoder.layers.3.self_attention.linear_qkv.adapter.linear_in.weight\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 peft:181] Setting up optimizers\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 utils:278] Setting up optimizer with config OptimizerConfig(optimizer='adam', lr=0.0001, min_lr=None, decoupled_lr=None, decoupled_min_lr=None, weight_decay=0.01, fp16=False, bf16=True, params_dtype=torch.bfloat16, loss_scale=None, initial_loss_scale=4294967296, min_loss_scale=1.0, loss_scale_window=1000, hysteresis=2, adam_beta1=0.9, adam_beta2=0.98, adam_eps=1e-08, sgd_momentum=0.9, use_distributed_optimizer=True, overlap_param_gather_with_optimizer_step=False, clip_grad=1.0, log_num_zeros_in_grad=False, barrier_with_L1_time=False, timers=None, config_logger_dir='')\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 0/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 0 | reduced_train_loss: 1.956\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 1/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 1 | reduced_train_loss: 1.509 | consumed_samples: 16\n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:48 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('global_batch_size', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n",
+      "i.finetune/0 [default0]:    \n",
+      "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:48 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n",
+      "i.finetune/0 [default0]:    \n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 2/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 2 | reduced_train_loss: 0.3079 | consumed_samples: 24 | val_loss: 0.3142\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 3/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 3 | reduced_train_loss: 0.4225 | consumed_samples: 32 | val_loss: 0.3142\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 4/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 4 | reduced_train_loss: 0.2569 | consumed_samples: 40 | val_loss: 0.1524\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 5/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 5 | reduced_train_loss: 0.4586 | consumed_samples: 48 | val_loss: 0.1524\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 6/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 6 | reduced_train_loss: 0.4207 | consumed_samples: 56 | val_loss: 0.1952\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 7/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 7 | reduced_train_loss: 0.081 | consumed_samples: 64 | val_loss: 0.1952\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 8/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 8 | reduced_train_loss: 0.2103 | consumed_samples: 72 | val_loss: 0.1372\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 9/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 9 | reduced_train_loss: 0.3401 | consumed_samples: 80 | val_loss: 0.1372\n",
+      "i.finetune/0 [default0]:Epoch 0, global step 9: 'reduced_train_loss' reached 0.34012 (best 0.34012), saving model to 'results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0.ckpt' as top 1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:00 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0.ckpt\n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:01 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0-last.ckpt\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 10/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 10 | reduced_train_loss: 0.2867 | consumed_samples: 88 | val_loss: 0.1337\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:04 model_checkpoint:522] Async checkpoint save for step 10 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0.ckpt) finalized successfully.\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:04 model_checkpoint:522] Async checkpoint save for step 10 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0-last.ckpt) finalized successfully.\n",
+      "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] torch._dynamo hit config.cache_size_limit (8)\n",
+      "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8]    function: 'calculate_cross_entropy_loss' (/opt/megatron-lm/megatron/core/fusions/fused_cross_entropy.py:47)\n",
+      "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8]    last reason: tensor 'L['exp_logits']' size mismatch at index 0. expected 304, actual 336\n",
+      "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To log all recompilation reasons, use TORCH_LOGS=\"recompiles\".\n",
+      "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 11/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 11 | reduced_train_loss: 0.2758 | consumed_samples: 96 | val_loss: 0.1337\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 12/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 12 | reduced_train_loss: 0.206 | consumed_samples: 104 | val_loss: 0.1601\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 13/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 13 | reduced_train_loss: 0.1556 | consumed_samples: 112 | val_loss: 0.1601\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 14/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 14 | reduced_train_loss: 0.1831 | consumed_samples: 120 | val_loss: 0.1798\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 15/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 15 | reduced_train_loss: 0.1565 | consumed_samples: 128 | val_loss: 0.1798\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 16/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 16 | reduced_train_loss: 0.3776 | consumed_samples: 136 | val_loss: 0.2383\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 17/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 17 | reduced_train_loss: 0.483 | consumed_samples: 144 | val_loss: 0.2383\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 18/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 18 | reduced_train_loss: 0.188 | consumed_samples: 152 | val_loss: 0.2823\n",
+      "i.finetune/0 [default0]:Training epoch 0, iteration 19/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 19 | reduced_train_loss: 0.2591 | consumed_samples: 160 | val_loss: 0.2823\n",
+      "i.finetune/0 [default0]:Epoch 0, global step 19: 'reduced_train_loss' reached 0.25909 (best 0.25909), saving model to 'results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0.ckpt' as top 1\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:17 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0.ckpt\n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:17 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0-last.ckpt\n",
+      "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "i.finetune/0 [default0]:To disable this warning, you can either:\n",
+      "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n",
+      "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:19 model_checkpoint:522] Async checkpoint save for step 20 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0.ckpt) finalized successfully.\n",
+      "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:19 model_checkpoint:522] Async checkpoint save for step 20 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0-last.ckpt) finalized successfully.\n",
+      "i.finetune/0 [default0]:`Trainer.fit` stopped: `max_steps=20` reached.\n",
+      "i.finetune/0 I1115 09:46:34.257000 140737350272832 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.\n",
+      "i.finetune/0 I1115 09:46:34.257000 140737350272832 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish\n",
+      "i.finetune/0 I1115 09:46:34.258000 140737350272832 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.00025081634521484375 seconds\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Job nemo.collections.llm.api.finetune-chz0jczhtwtxzc finished: SUCCEEDED\n"
+      "Job nemo.collections.llm.api.finetune-wdj265kcplhnkd finished: SUCCEEDED\n"
      ]
     },
     {
@@ -1038,7 +1398,7 @@
        "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"background-color: #272822\">                                                                                                                   </span>\n",
        "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># The experiment was run with the following tasks: ['nemo.collections.llm.api.finetune']</span><span style=\"background-color: #272822\">                           </span>\n",
        "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># You can inspect and reconstruct this experiment at a later point in time using:</span><span style=\"background-color: #272822\">                                  </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment </span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\"> run</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">Experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">from_id(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.finetune_1729667543\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">                                </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment </span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\"> run</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">Experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">from_id(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.finetune_1731692700\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">                                </span>\n",
        "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">status() </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Gets the overall status</span><span style=\"background-color: #272822\">                                                                      </span>\n",
        "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">logs(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.finetune\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">) </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Gets the log for the provided task</span><span style=\"background-color: #272822\">                          </span>\n",
        "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">cancel(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.finetune\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">) </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Cancels the provided task if still running</span><span style=\"background-color: #272822\">                </span>\n",
@@ -1049,7 +1409,7 @@
        "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n",
        "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.finetune']\u001b[0m\u001b[48;2;39;40;34m                           \u001b[0m\n",
        "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m                                  \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune_1729667543\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                \u001b[0m\n",
        "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m                                                                      \u001b[0m\n",
        "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m                          \u001b[0m\n",
        "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m                \u001b[0m\n",
@@ -1064,18 +1424,18 @@
       "text/html": [
        "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"background-color: #272822\">                                                                                                                   </span>\n",
        "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># You can inspect this experiment at a later point in time using the CLI as well:</span><span style=\"background-color: #272822\">                                  </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment status nemo.collections.llm.api.finetune_1729667543</span><span style=\"background-color: #272822\">                                                </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment logs nemo.collections.llm.api.finetune_1729667543 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                                </span>\n",
-       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment cancel nemo.collections.llm.api.finetune_1729667543 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                              </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment status nemo.collections.llm.api.finetune_1731692700</span><span style=\"background-color: #272822\">                                                </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment logs nemo.collections.llm.api.finetune_1731692700 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                                </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment cancel nemo.collections.llm.api.finetune_1731692700 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                              </span>\n",
        "<span style=\"background-color: #272822\">                                                                                                                   </span>\n",
        "</pre>\n"
       ],
       "text/plain": [
        "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n",
        "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m                                  \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1729667543\u001b[0m\u001b[48;2;39;40;34m                                                \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1729667543\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                                \u001b[0m\n",
-       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1729667543\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                              \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[48;2;39;40;34m                                                \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                                \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                              \u001b[0m\n",
        "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n"
       ]
      },
@@ -1121,15 +1481,551 @@
    "source": [
     "## Step 4 Evaluation \n",
     "\n",
-    "We use the `llm.generate` API in NeMo 2.0 to generate results from the trained PEFT checkpoint. "
+    "We use the `llm.generate` API in NeMo 2.0 to generate results from the trained PEFT checkpoint. Find your last saved checkpoint from your experiment dir: `results/nemo2_peft/checkpoints`. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 10,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "We will load PEFT checkpoint from: results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0-last\n"
+     ]
+    }
+   ],
+   "source": [
+    "peft_ckpt_path=str(next((d for d in Path(\"./results/nemo2_peft/checkpoints/\").iterdir() if d.is_dir() and d.name.endswith(\"-last\")), None))\n",
+    "print(\"We will load PEFT checkpoint from:\", peft_ckpt_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "SQuAD test set contains over 10,000 samples. For a quick demonstration, we will use the first 100 lines as an example input. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the AFC at Super Bowl 50? Answer:\", \"output\": \"Denver Broncos\", \"original_answers\": [\"Denver Broncos\", \"Denver Broncos\", \"Denver Broncos\"]}\n",
+      "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the NFC at Super Bowl 50? Answer:\", \"output\": \"Carolina Panthers\", \"original_answers\": [\"Carolina Panthers\", \"Carolina Panthers\", \"Carolina Panthers\"]}\n",
+      "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Where did Super Bowl 50 take place? Answer:\", \"output\": \"Santa Clara, California\", \"original_answers\": [\"Santa Clara, California\", \"Levi's Stadium\", \"Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.\"]}\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "head -n 100 /root/.cache/nemo/datasets/squad/test.jsonl > toy_testset.jsonl\n",
+    "head -n 3 /root/.cache/nemo/datasets/squad/test.jsonl"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will pass the string `toy_testset.jsonl` to the `input_dataset` parameter of `llm.generate`.To evaluate the entire test set, you can instead pass the SQuAD data module directly, using `input_dataset=squad()`. The input JSONL file should follow the format shown above, containing `input` and `output` fields (additional keys are optional)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">─── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Entering Experiment nemo.collections.llm.api.generate with id: nemo.collections.llm.api.generate_1731692795</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ───</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[92m─── \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.generate with id: nemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[92m ───\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[09:46:35] </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Launching job nemo.collections.llm.api.generate for experiment </span>                        <a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">experiment.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">660</span></a>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">           </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">nemo.collections.llm.api.generate</span>                                                      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                 </span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[2;36m[09:46:35]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.generate for experiment \u001b[0m                        \u001b]8;id=926482;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=513888;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n",
+       "\u001b[2;36m           \u001b[0m\u001b[1;36mnemo.collections.llm.api.generate\u001b[0m                                                      \u001b[2m                 \u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n",
+      "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n",
+      "AppStatus:\n",
+      "    State: RUNNING\n",
+      "    Num Restarts: 0\n",
+      "    Roles: \n",
+      "    Msg: <NONE>\n",
+      "    Structured Error Msg: <NONE>\n",
+      "    UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate/nemo_run/nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n",
+      "    \n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">────────────────── </span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Waiting for Experiment nemo.collections.llm.api.generate_1731692795 to finish</span><span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ──────────────────</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[92m────────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.generate_1731692795 to finish\u001b[0m\u001b[92m ──────────────────\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Experiment Status for</span> <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.generate_1731692795</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.generate_1731692795\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Task 0</span>: <span style=\"color: #ffaf00; text-decoration-color: #ffaf00; font-weight: bold\">nemo.collections.llm.api.generate</span>\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Status</span>: RUNNING\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Executor</span>: LocalExecutor\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Job id</span>: nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n",
+       "- <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Local Directory</span>: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\n",
+       "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.generate\u001b[0m\n",
+       "- \u001b[1;32mStatus\u001b[0m: RUNNING\n",
+       "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n",
+       "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n",
+       "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Waiting for job nemo.collections.llm.api.generate-zhfd0nk1lqmhm to finish [log=True]...\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] Starting elastic_operator with launch configs:\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   entrypoint       : nemo_run.core.runners.fdl_runner\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   min_nodes        : 1\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   max_nodes        : 1\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   nproc_per_node   : 1\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   run_id           : 4470\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_backend     : c10d\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_endpoint    : localhost:0\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   rdzv_configs     : {'timeout': 900}\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   max_restarts     : 0\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   monitor_interval : 0.1\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   log_dir          : /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate/nemo_run/nemo.collections.llm.api.generate-zhfd0nk1lqmhm/torchelastic/nemo.collections.llm.api.generate\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188]   metrics_cfg      : {}\n",
+      "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] \n",
+      "i.generate/0 I1115 09:46:36.056000 140737350272832 torch/distributed/elastic/agent/server/api.py:825] [default] starting workers for entrypoint: python\n",
+      "i.generate/0 I1115 09:46:36.056000 140737350272832 torch/distributed/elastic/agent/server/api.py:646] [default] Rendezvous'ing worker group\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] [default] Rendezvous complete for workers. Result:\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   restart_count=0\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   master_addr=eos0346.eos.clusters.nvidia.com\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   master_port=53613\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   group_rank=0\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   group_world_size=1\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   local_ranks=[0]\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   role_ranks=[0]\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   global_ranks=[0]\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   role_world_sizes=[1]\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512]   global_world_sizes=[1]\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] \n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:654] [default] Starting worker group\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:184] Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.\n",
+      "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:216] Environment variable 'TORCHELASTIC_HEALTH_CHECK_PORT' not found. Do not start health check.\n",
+      "i.generate/0 [default0]:/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n",
+      "i.generate/0 [default0]:  warnings.warn(\n",
+      "i.generate/0 [default0]:[NeMo W 2024-11-15 09:46:42 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n",
+      "i.generate/0 [default0]:      cm = get_cmap(\"Set1\")\n",
+      "i.generate/0 [default0]:    \n",
+      "i.generate/0 [default0]:GPU available: True (cuda), used: True\n",
+      "i.generate/0 [default0]:TPU available: False, using: 0 TPU cores\n",
+      "i.generate/0 [default0]:HPU available: False, using: 0 HPUs\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:396] Rank 0 has data parallel group : [0]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:410] Ranks 0 has data parallel rank: 0\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:418] Rank 0 has context parallel group: [0]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:421] All context parallel group ranks: [[0]]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:422] Ranks 0 has context parallel rank: 0\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:429] Rank 0 has model parallel group: [0]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:430] All model parallel group ranks: [[0]]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:443] All tensor model parallel group ranks: [[0]]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:476] Rank 0 has embedding group: [0]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:484] All embedding group ranks: [[0]]\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:485] Rank 0 has embedding rank: 0\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n",
+      "i.generate/0 [default0]:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n",
+      "i.generate/0 [default0]:----------------------------------------------------------------------------------------------------\n",
+      "i.generate/0 [default0]:distributed_backend=nccl\n",
+      "i.generate/0 [default0]:All distributed processes registered. Starting with 1 processes\n",
+      "i.generate/0 [default0]:----------------------------------------------------------------------------------------------------\n",
+      "i.generate/0 [default0]:\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_parallel:550]  > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 8030261248\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:45 megatron_strategy:745] Doing selective restore from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 megatron_strategy:750] Restoring model weights from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 megatron_strategy:757] Finished restoring from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True), cleaning up.\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.self_attention.linear_proj\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.self_attention.linear_qkv\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.mlp.linear_fc1\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.mlp.linear_fc2\n",
+      "i.generate/0 [default0]:[NeMo I 2024-11-15 09:47:21 api:699] Predictions written to peft_prediction.jsonl\n",
+      "i.generate/0 I1115 09:47:24.254000 140737350272832 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.\n",
+      "i.generate/0 I1115 09:47:24.254000 140737350272832 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish\n",
+      "i.generate/0 I1115 09:47:24.254000 140737350272832 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.0003161430358886719 seconds\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Job nemo.collections.llm.api.generate-zhfd0nk1lqmhm finished: SUCCEEDED\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"background-color: #272822\">                                                                                                                   </span>\n",
+       "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># The experiment was run with the following tasks: ['nemo.collections.llm.api.generate']</span><span style=\"background-color: #272822\">                           </span>\n",
+       "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># You can inspect and reconstruct this experiment at a later point in time using:</span><span style=\"background-color: #272822\">                                  </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment </span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\"> run</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">Experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">from_id(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.generate_1731692795\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">                                </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">status() </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Gets the overall status</span><span style=\"background-color: #272822\">                                                                      </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">logs(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.generate\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">) </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Gets the log for the provided task</span><span style=\"background-color: #272822\">                          </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">experiment</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">cancel(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"nemo.collections.llm.api.generate\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">) </span><span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Cancels the provided task if still running</span><span style=\"background-color: #272822\">                </span>\n",
+       "<span style=\"background-color: #272822\">                                                                                                                   </span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n",
+       "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.generate']\u001b[0m\u001b[48;2;39;40;34m                           \u001b[0m\n",
+       "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m                                  \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m                                                                      \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m                          \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m                \u001b[0m\n",
+       "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"background-color: #272822\">                                                                                                                   </span>\n",
+       "<span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># You can inspect this experiment at a later point in time using the CLI as well:</span><span style=\"background-color: #272822\">                                  </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment status nemo.collections.llm.api.generate_1731692795</span><span style=\"background-color: #272822\">                                                </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment logs nemo.collections.llm.api.generate_1731692795 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                                </span>\n",
+       "<span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">nemo experiment cancel nemo.collections.llm.api.generate_1731692795 </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">0</span><span style=\"background-color: #272822\">                                              </span>\n",
+       "<span style=\"background-color: #272822\">                                                                                                                   </span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n",
+       "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m                                  \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[48;2;39;40;34m                                                \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                                \u001b[0m\n",
+       "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m                                              \u001b[0m\n",
+       "\u001b[48;2;39;40;34m                                                                                                                   \u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from megatron.core.inference.common_inference_params import CommonInferenceParams\n",
+    "\n",
+    "\n",
+    "def trainer() -> run.Config[nl.Trainer]:\n",
+    "    strategy = run.Config(\n",
+    "        nl.MegatronStrategy,\n",
+    "        tensor_model_parallel_size=1,\n",
+    "        pipeline_model_parallel_size=1,\n",
+    "        context_parallel_size=1,\n",
+    "        sequence_parallel=False,\n",
+    "        setup_optimizers=False,\n",
+    "        store_optimizer_states=False,\n",
+    "    )\n",
+    "    trainer = run.Config(\n",
+    "        nl.Trainer,\n",
+    "        accelerator=\"gpu\",\n",
+    "        devices=1,\n",
+    "        num_nodes=1,\n",
+    "        strategy=strategy,\n",
+    "        plugins=bf16_mixed(),\n",
+    "    )\n",
+    "    return trainer\n",
+    "\n",
+    "def configure_inference():\n",
+    "    return run.Partial(\n",
+    "        llm.generate,\n",
+    "        path=str(peft_ckpt_path),\n",
+    "        trainer=trainer(),\n",
+    "        input_dataset=\"toy_testset.jsonl\",\n",
+    "        inference_params=CommonInferenceParams(num_tokens_to_generate=20, top_k=1),\n",
+    "        output_path=\"peft_prediction.jsonl\",\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "def local_executor_torchrun(nodes: int = 1, devices: int = 1) -> run.LocalExecutor:\n",
+    "    # Env vars for jobs are configured here\n",
+    "    env_vars = {\n",
+    "        \"TORCH_NCCL_AVOID_RECORD_STREAMS\": \"1\",\n",
+    "        \"NCCL_NVLS_ENABLE\": \"0\",\n",
+    "        \"NVTE_DP_AMAX_REDUCE_INTERVAL\": \"0\",\n",
+    "        \"NVTE_ASYNC_AMAX_REDUCTION\": \"1\",\n",
+    "        \"NVTE_FUSED_ATTN\": \"0\",\n",
+    "    }\n",
+    "\n",
+    "    executor = run.LocalExecutor(ntasks_per_node=devices, launcher=\"torchrun\", env_vars=env_vars)\n",
+    "\n",
+    "    return executor\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    run.run(configure_inference(), executor=local_executor_torchrun())\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After the inference is complete, you will see results similar to the following:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the AFC at Super Bowl 50? Answer:\", \"original_answers\": [\"Denver Broncos\", \"Denver Broncos\", \"Denver Broncos\"], \"label\": \"Denver Broncos\", \"prediction\": \" Denver Broncos\"}\n",
+      "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the NFC at Super Bowl 50? Answer:\", \"original_answers\": [\"Carolina Panthers\", \"Carolina Panthers\", \"Carolina Panthers\"], \"label\": \"Carolina Panthers\", \"prediction\": \" Carolina Panthers\"}\n",
+      "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Where did Super Bowl 50 take place? Answer:\", \"original_answers\": [\"Santa Clara, California\", \"Levi's Stadium\", \"Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.\"], \"label\": \"Santa Clara, California\", \"prediction\": \" Levi's Stadium\"}\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "head -n 3 peft_prediction.jsonl"
+   ]
   }
  ],
  "metadata": {