You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
Following the TensorFlow CPU quickstart, I run into a couple of issues
When creating the pool, I get a
RuntimeError: Could not find an Azure Batch Node Agent Sku for this offer=ubuntuserver publisher=canonical sku=16.04-lts. You can list the valid and available Marketplace images with the command: account images
From a look at Azure Portal, it looks like only 18.04 is currently available; indeed, changing pool.yml to use 18.04-LTS instead is enough to get rid of this issue. This probably affects many of the bundled recipes:
After the pool is created and I try to create the included job, I get another error:
$ ../shipyard jobs add --tail stdout.txt
2021-09-16 10:16:30.581 INFO - Adding job tensorflowjob to pool tensorflow-cpu
2021-09-16 10:16:30.673 DEBUG - constructing 1 task specifications for submission to job tensorflowjob
2021-09-16 10:16:30.738 DEBUG - submitting 1 task specifications to job tensorflowjob
2021-09-16 10:16:30.741 DEBUG - submitting 1 tasks (0 -> 0) to job tensorflowjob
2021-09-16 10:16:30.971 INFO - submitted all 1 tasks to job tensorflowjob
2021-09-16 10:16:30.971 DEBUG - attempting to stream file stdout.txt from job=tensorflowjob task=task-00000
Traceback (most recent call last):
File "/mnt/c/Users/username/repos/batch-shipyard/shipyard.py", line 3136, in <module>
cli()
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/decorators.py", line 64, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/mnt/c/Users/username/repos/batch-shipyard/shipyard.py", line 1968, in jobs_add
convoy.fleet.action_jobs_add(
File "/mnt/c/Users/username/repos/batch-shipyard/convoy/fleet.py", line 4065, in action_jobs_add
batch.add_jobs(
File "/mnt/c/Users/username/repos/batch-shipyard/convoy/batch.py", line 5892, in add_jobs
stream_file_and_wait_for_task(
File "/mnt/c/Users/username/repos/batch-shipyard/convoy/batch.py", line 3309, in stream_file_and_wait_for_task
tfp = batch_client.file.get_properties_from_task(
File "/mnt/c/Users/username/repos/batch-shipyard/.shipyard/lib/python3.8/site-packages/azure/batch/operations/_file_operations.py", line 328, in get_properties_from_task
raise models.BatchErrorException(self._deserialize, response)
azure.batch.models._models_py3.BatchErrorException: Request encountered an exception.
Code: None
Message: None
Following the TensorFlow CPU quickstart, I run into a couple of issues
From a look at Azure Portal, it looks like only 18.04 is currently available; indeed, changing
pool.yml
to use 18.04-LTS instead is enough to get rid of this issue. This probably affects many of the bundled recipes:Removing the
resource_files
section is enough to take care of the issue; probably unsurprising as the givenblob_source
(https://raw.githubusercontent.com/tensorflow/models/master/tutorials/image/mnist/convolutional.py) 404s.The text was updated successfully, but these errors were encountered: