Singularity 0.23.0
Changes in 0.23.0
Check out the 0.23.0 milestone to see new features / bugfixes in detail. 0.23.0
in general represents a number of performance improvements in relation to Singularity's usage of zookeeper and mysql as well as a mesos version bump.
Migrations
MySQL/Postgres
0.23.0
contains multiple database migrations (#1928 + #1956). These must be run BEFORE deploying the new version of SingularityService and are compatible with the running 0.22.0 release. You can check out our docs on migrations to run these with liquibase. If you manage a larger installation of Singularity utilizaing mysql (e.g. millions of tasks in task history), we recommend running the migrations using pt-online-schema-change to minimize interruptions. Migrations and ptosc arguments are listed below for convinience:
- Addition of usage tracking table - This can be run with liquibase since it is a non-blocking migration and is the first changeSet in the new release. To run only a single changeSet in liquibase (e.g. to then run the remaining ones with ptosc), add the
--count 1
option when runningdb migrate
- Change requestHistory table charset + add json column -
--alter "CHARACTER SET ascii COLLATE ascii_bin, MODIFY COLUMN request blob DEFAULT NULL, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN requestState ENUM ('CREATED', 'UPDATED', 'DELETING', 'DELETED', 'PAUSED', 'UNPAUSED', 'ENTERED_COOLDOWN', 'EXITED_COOLDOWN', 'FINISHED', 'DEPLOYED_TO_UNPAUSE', 'BOUNCED', 'SCALED', 'SCALE_REVERTED') NOT NULL, MODIFY COLUMN user varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, MODIFY COLUMN message varchar(280) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, ADD COLUMN json JSON DEFAULT NULL"
- Change deployHistory table charset + add json column -
--alter "CHARACTER SET ascii COLLATE ascii_bin, MODIFY COLUMN bytes MEDIUMBLOB DEFAULT NULL, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN deployId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN user varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, MODIFY COLUMN message varchar(280) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, MODIFY COLUMN deployState ENUM ('SUCCEEDED', 'FAILED_INTERNAL_STATE', 'CANCELING', 'WAITING', 'OVERDUE', 'FAILED', 'CANCELED') NOT NULL, ADD COLUMN json JSON DEFAULT NULL"
- Change taskHistory table charset/enums + add json column -
--alter "CHARACTER SET ascii COLLATE ascii_bin, MODIFY COLUMN bytes MEDIUMBLOB DEFAULT NULL, MODIFY COLUMN taskId varchar(200) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN lastTaskStatus ENUM ('TASK_LAUNCHED', 'TASK_STAGING', 'TASK_STARTING', 'TASK_RUNNING', 'TASK_CLEANING', 'TASK_KILLING', 'TASK_FINISHED', 'TASK_FAILED', 'TASK_KILLED', 'TASK_LOST', 'TASK_LOST_WHILE_DOWN', 'TASK_ERROR', 'TASK_DROPPED', 'TASK_GONE', 'TASK_UNREACHABLE', 'TASK_GONE_BY_OPERATOR', 'TASK_UNKNOWN') NOT NULL, MODIFY COLUMN runId varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, MODIFY COLUMN deployId varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, ADD COLUMN json JSON DEFAULT NULL, ADD KEY requestDeployUpdated (requestId, deployId, updatedAt), ADD KEY hostUpdated (host, updatedAt)"
- Change taskUsage table charset -
--alter "CHARACTER SET ascii COLLATE ascii_bin ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '', MODIFY COLUMN taskId varchar(200) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT ''"
As seen above, these migrations prep Singularity to use mysql's json data type instead of a blob for history storage. All net-new history will be stored in the json format and old lob columns are not yet dropped. Singularity will look for either format currently when fetching individual task histories. You can kick off a backfill of data from blob -> json format by sending an http POST to the /api/history/sql-backfill?batchSize=20
endpoint on SingularityService (batch size is configurable to balance resources vs speed). If Singularity need to restart/etc, this process is idempotent and can be kicked off as many times as needed, though only one invocation can run at a time.
Zookeeper
On startup, Singularity will run a number of migrations to zookeeper task data. These are aimed at reducing the possible size of any single zookeeper read. You may notice that the first startup of the new Singularity release is slower due to these changes running. This migration is idempotent and will be re-attempted on next startup if it should fail.
Mesos Version Upgrade
Singularity 0.23.0
is build against mesos 1.8, but should be compatible with all earlier 1.x versions of mesos
New Features
- 1958 - Configurably request DNS preresolution for load-balanced services
- 1955 - Pre-resolve upstreams for Singularity-managed BaragonServices.
Performance Improvements
- 1976 - UI Performance Updates for active/pending tasks pages
- 1972 - More efficient active tasks call for executor cleanup
- 1956 - SQL migrations for efficiency (blob -> json + utf8 -> ascii)
- 1963 - Also purge old deploy and request history from SQL
- 1920 - Add ability to do sns-based updates instead of webhooks
- 1922 - The zoo is under new management (zk cleanup)
- 1938 - Make Slave/Rack Resource use proxy to leader
- 1932 - Refactor calling of offer evaluation
- 1939 - Add option to fetch a batch of requests
- 1928 - MySQL task resource usage storage
- 1906 - Reduce S3Uploader memory usage during directory scan
General Improvements
- 1962 - Support searching for request logs with a specified date range via SingularityClient.
- 1791 - Allow fetching full request data (with deploy data) in SingularityClient
- 1975 - junit5
- 1957 - Bump Baragon version to 0.9.0.
- 1961 - Rework cooldown logic
- 1954 - UI And Other Improvements
- 1919 - Configurably skip shell command prefix for Docker tasks only.
- 1908 - Cleaner failover + update dependencies
- 1909 - Skip fuser/lsof check when uploader is marked as immediate
- 1944 - Handle status updates from recovering agents appropriately
- 1951 - Add builder methods to
SingularityRequestBuilder
&SingularityDeployBuilder
- 1945 - Bump to mesos 1.8.0
- 1946 - Alternative way to specify auth for the mesos scheduler api
- 1947 - Add zk leader indicator on status ui
- 1941 - Add ability to disable task shuffle from UI
- 1942 - Check assigned ports are available in SingularityExecutor
- 1943 - Upstream validation
- 1937 - Shuffle tasks on hosts with overutilized memory resources
- 1930 - Flag for immediate task history persist
- 1915 - Make Singularity report byte counts to monitor against jute buffer size
- 1914 - Fix handling of file-based health check failure
- 1905 - Add token authenticator option
Bug Fixes
- 1978 - Fix deploy link on requests page + pending tasks table
- 1979 - Fix missing extension for additional logrotate file
- 1969 - Calculate max task lag after excluding on demands with instance limit
- 1970 - Proxy deploy cancellations to leader
- 1973 - tweak cooldown thresholds and evaluation logic
- 1974 - Fix task history page size on refresh
- 1953 - Only clean if the sandbox directory still exists.
- 1966 - Fix request state filter and task search task state in UI
- 1952 - Only log an exception if it has one
- 1949 - Update to newer download endpoint name
- 1950 - Configurably delete
logrotateAdditionalFile
s 15 mins after task termination. - 1933 - More explicit choice of canSkipZk flag
- 1940 - Files from S3Downloader should be world readable, not read-writeable
- 1934 - Explicitly check for UnknownHostExceptions
- 1936 - Correctly choose a system load metric.
- 1931 - Change persist strategy
- 1925 - Always remove from LB, even if add in WAITING
- 1926 - Also attempt to get TaskHistory from history manager for mail
- 1927 - Skip offers with null id
- 1929 - Put a time limit on the uploader check so we don't get stuck
- 1913 - Clarify logging in offer scheduler
- 1912 - Also fetch the original file for log snippets in email
- 1911 - Fix non-paginated fetch of s3 logs for task
- 1910 - Do the entire deploy history persist under the request lock
- 1907 - On demand requests with instances shouldn't trigger task lag over instance count
- 1902 - Fix typo in auth check path
- 1899 - Add missing enum
Documentation
- 1901 - New adopter