forked from haught/matlab-slurm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
158 lines (108 loc) · 4.47 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
Copyright 2010-2013 The MathWorks, Inc.
This folder contains a number of files to allow Parallel Computing Toolbox
to be used with SLURM via the generic cluster interface.
download and put these scripts in ~/matlab/
It's highly recommanded that you put
export TZ=America/Los_Angeles
in your ~/.bashrc file, or you will see complains from Matlab.
* To start a matlab pool on a single(compute) node, use
cluster=get_LOCAL_cluster
before you can use parpool or batch to start the pool and use parfor or spmd. For a more detailed example, see example_local.m file.
* To start a matlab pool spanning multiple(compute) nodes through SLURM, use
cluster=get_SLURM_cluster('-t 10:00:00 --mem-per-cpu=2G')
before you can use parpool or batch to start the pool and use parfor or spmd. Here "-t 10:00:00" is an example input for slurm sbatch options, you can put more options here, e.g. account request, memory request etc.
But don't put "-n" (number of tasks) here, "-n" option is requested throught the parpool or batch command. Read Matlab's documentation about parpool and batch for details.
The job's output and logs from Matlab will be stored in ~/MATLAB_JOB_STORAGE_local and ~/MATLAB_JOB_STORAGE, if you wish to change the locations, edit it in get_LOCAL_cluster.m and get_SLURM_cluster.m.
Matlab's terminal output can be captured by the diary cmd, for details, read Matlab's documentation.
>> c=get_SLURM_cluster('-t 4:00:00 --mem-per-cpu=2G -p isi')
c =
Generic Cluster
Properties:
Profile:
Modified: true
Host: hpc1458.hpcc.usc.edu
NumWorkers: Inf
NumThreads: 1
JobStorageLocation: /staging/hpcc/jimichu/MATLAB_JOB_STORAGE
ClusterMatlabRoot: /usr/usc/matlab/default/
OperatingSystem: unix
IndependentSubmitFcn: [1x2 cell]
CommunicatingSubmitFcn: [1x2 cell]
GetJobStateFcn: @getJobStateFcn
CancelJobFcn: []
CancelTaskFcn: []
DeleteJobFcn: @deleteJobFcn
DeleteTaskFcn: []
RequiresMathWorksHostedLicensing: false
Associated Jobs:
Number Pending: 0
Number Queued: 1
Number Running: 4
Number Finished: 0
>> j=c.batch('demo_spmd','Pool',90,'CaptureDiary',true)
j =
Job
Properties:
ID: 6
Type: pool
Username: jimichu
State: queued
SubmitDateTime: 30-Mar-2018 11:07:37
StartDateTime:
Running Duration: 0 days 0h 0m 0s
NumWorkersRange: [91 91]
NumThreads: 1
AutoAttachFiles: true
Auto Attached Files: /home/rcf-00/jimichu/matlab/demo_spmd.m
AttachedFiles: {}
AdditionalPaths: {}
Associated Tasks:
Number Pending: 91
Number Running: 0
Number Finished: 0
Task ID of Errors: []
Task ID of Warnings: []
>> j.State
ans =
finished
>> j.diary
--- Start Diary ---
ans =
1 2 3 4 5
ans =
1 2 3 4 5
ans =
11 12 13 14 15
...
Lab 1:
this is display from worker: 1
Lab 3:
this is display from worker: 3
Lab 4:
this is display from worker: 4
...
Lab 76:
this is display from worker: 76
Lab 80:
this is display from worker: 80
Lab 81:
this is display from worker: 81
Lab 90:
this is display from worker: 90
>> jj=c.findJob
jj =
9x1 Job array:
ID Type State FinishDateTime Username Tasks
-----------------------------------------------------------------------------
1 1 pool finished jimichu 81
2 2 pool finished jimichu 129
3 3 pool finished jimichu 91
4 4 pool finished jimichu 91
5 5 pool finished 30-Mar-2018 11:08:12 jimichu 91
6 6 pool queued jimichu 91
7 7 pool finished jimichu 161
8 8 pool queued jimichu 129
9 9 pool queued jimichu 129
>> jj(9).State
ans =
running