-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
494 lines (405 loc) · 23.1 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
MIPS-C Simulator
================
This was a pair coursework I worked on in 2nd year for a computer
architecture class. Below is the full spec for the project.
Architecture II Coursework
==========================
There are three central aims of this coursework:
- Solidify your understanding of how an instruction
processor actually functions. The overall functionality
of how a processor works is relatively easy to grasp,
but there is lots of interesting detail which gives
you some insight (both into CPUs, but also into
software and digital design).
- Understand the importance of having good specifications,
in terms of functionality, APIs, and requirements. This
is fundamental to CPU design and implementation, but is
also true in the wider world (again) of software and
digital design.
- Develop your skills in coding from scratch. There is
not much scaffolding here, I am genuinely asking you
to create your own CPU simulator from scratch. You
will also hopefully learn some important lessons about
reducing code repetition and automation.
Meta-comment
============
You might find this document very verbose, and there are lots of
clauses, clarifications, and restrictions on what you can do.
So try to think of it from the other side, and imagine you're
trying to write a spec that:
1 - will allow around 15 different simulators and testbenches to inter-operate perfectly with each other.
2 - gives as much freedom as possible in the implementation of both simulators and testbenches.
3 - allows both the simulators _and the testbenches_ to be accurately tested/asssessed.
The only way of achieving this is to try to define very clear APIs. You then
have to try to imagine all the possible ambiguities and corner cases in the
interpretation and implementation of this API, and try to close them down
or disambiguate them. So many of the clauses and restrictions here will
not seem relevant, unless you happen to think of doing something which
hits one of the anticipated problems.
This specification will still be imprecise, and _will_ evolve. Where there
is still a lack of clarity, it will be fixed.
Getting started
===============
The repository created during the lecture is at [LangProc/arch2-2017-cw-dt10](https://github.com/LangProc/arch2-2017-cw-dt10).
You can use anything within it without worrying about plagiarism. The video recorded while creating it [is also in panopto](https://imperial.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=81109762-911c-49fe-91ec-a5b7e820e4f3).
Specification
=============
Your task is to develop a MIPS CPU simulator, which can accurately execute
MIPS-1 big-endian binaries. You will also need to develop a testbench
which is able to test a MIPS simulator, and try to work out whether it
is correct.
_Note: I was originally going to add a cache extension once we got to
that part of the course, but on reflection decided it added more
complexity than was needed. The goal here is purely to get a good
simulator working._
Terminology
-----------
For the sake of clarity, this document will use the following terms:
- *Simulator* : The MIPS CPU simulator being developed by you. This program
is running natively under a Linux/Windows/OSX Environment and has direct access
to your files, the keyboard (stdin), the screen (stdout), etc. The simulator
will be responsible for implementing a register file, program counter, and
memory, and then sequentially executing MIPS instructions according to the
MIPS ISA. This is the thing that you will spend the most time working on,
and it is up to you to make sure that it implements the interface expected
by a Binary, while interacting correctly with the Environment.
- *Binary* : The MIPS binary/program/executable which is currently being
executed/run/simulated by your _Simulator_. Each time your simulator is
run it will need to be given a binary, as by itself the simulator does
nothing (just like a "real" CPU does nothing if you switch it on but don't
give it instructions to execute). While a simulator can only execute one
binary each time it is run, the set of binaries that it can run is
unrestricted. You will develop your own test binaries, as well as
executing binaries from 3rd-party sources.
- *Environment* : This is the thing which is hosting and executing the
Simulator. Part of it is the operating system, but it also contains
elements of the C run-time library (e.g. libc), and also some elements of
the compiler itself. The distinction between OS and language run-time
may not be obvious to you at the moment, but an example is `std::cout`
and `printf`, which are part of the C++ and C run-time libraries (respectively).
Neither of these functions is provided by Linux, instead it provides
lower-level functions like `write`, while Windows provides `WriteFile`,
and OSX has... something. A standards-conforming C++ program should not
use OS specific calls like `write`, but instead relies on the run-time
library to provide a compliant environment.
- *Testbench* : This is your testing framework, which can take a given
Simulator, and through running tests attempt to ascertain what features of the
Simulator work. This should serve both to help you test and develop your own
Simulator, but also to act as a check on the functionality of any other
Simulator. The aim is that your Testbench should be able to check the
functionality of a Simulator at an Instruction granularity.
The MIPS ISA acts as the boundary between the Simulator and the Binary,
so any correct Binary should run on any correct Simulator, and should
deterministically do exactly the same thing. This is the same principle
as for the Environment that is running your Simulator; you would assume
that Linux+glibc are going to run your Simulator correctly as long as
your code plays by the rules, and the creator of any Binary will assume
the same of your Simulator.
The target Evironment will be Ubuntu 16, with the standard GNU toolchain
installed (i.e. `g++`, `make`), standard command line utilities, and
bash. The lab Unix install should be a model of this environment, so
anything that works in the lab should be correct. Feel free to use other
environments during testing and development, but you should test in
the target environment too.
Simulator Input/Output
----------------------
Your Simulator will be a single executable, and has the following behaviour:
- *Binary* : the Binary location is passed as a command-line parameter, and
should be the path of a binary file containing MIPS-1 big-endian instructions.
These instructions should be loaded into a fixed region of "RAM" with a known
address, then execution should start at the first address in this region.
- *Input* : input to the simulated Binary will be passed in over the Simulator's
standard input (`std::cin` or `stdin`), and mapped into a 32-bit memory location.
If the Binary reads from the nominated memory location, it should be logically equivalent
to calling `std::getchar` or `getchar` (and one approach would be for the Simulator
to call these functions on behalf of the Binary).
- *Output* : output from the simulated Binary will be produced by
writing to a mapped 32-bit memory location. Writing to the nominated memory
location should be equivalent to calling `std::putchar` or `putchar` (and
again, the Simulator could call these functions on behalf of the Binary).
- *Exit* : A Binary signals successful termination/completion by executing the
instruction at address 0. This tells the Simulator that there are no more
instructions to execute, and that it should exit. The return code of the
Simulator is given by the low 8-bits of the value in register `$2`. These
8-bits should be used as a non-negative value to pass to `std::exit` or `exit`.
- *Exceptions* : The Binary may execute instructions which are illegal, and
so result in exceptions which should terminate execution of the Binary. To
indicate this, the Simulator should return one of the negative exit codes
detailed later on.
- *Errors* : Errors may occur within the Simulator (as opposed to exceptions
which are due to part of the Binary's logic). Examples might include instructions
which aren't implemented (limited functionality in the Simulator), or IO failures
(problems which occur due to run-time interactions between the Simulator and the Environment).
- *Logging* : A Simulator may choose to emit diagnostic/debugging messages at various
points, in order to record what is going on. This is completely fine,
but any diagnostic information _must_ be written to `std::cerr` / `stderr`.
Any output written to `std::cout` / `stdout` will be interpreted as
output from the Binary.
Your Simulator _may_ take other private command line parameters, for example to enable
or disable extended debug features during development. These should have the form `--ext-XXX`,
for any value of XXX, and may take optional values if you wish. Note that your Testbench
should not rely on a Simulator supporting any private extensions, as they are not part of
the API. Nor should your Simulator rely on any extended command line parameters being
passed at run-time, as nobody else will know about the existence of these parameters.
Simulator build and execution
-----------------------------
The compiler should be buildable using the command:
```
make simulator
```
in the root of the respository. This should result
in a binary called `bin/mips_simulator`. An artificial requirement of
this coursework for assessment purposes (i.e. it isn't really
required for API reasons) is that the simulator is:
- A binary compiled from C++ sources.
- It can be compiled in the target Environment.
This means that if the following sequence is executed:
```
rm bin/mips_simulator
make simulator
```
Then a new binary will be compiled from C++ sources that are
included in the submission.
If we assume the existence of a Binary called `x.bin`, we would
simulate it using:
```
bin/mips_simulator x.bin
```
On startup all MIPS registers will be zero, any uninitialised
memory will be zero, and the program counter will point at the
first instruction in memory.
A Simulator should not assume it is being executed from any particular
directory, so it should not try to open any data files. It should also
not create or write to any other files.
Testbench Input/Output
----------------------
A Testbench should take a single command-line parameter,
which is the path of the Simulator to be tested.
As output, the Testbench should [print a CSV file](https://github.com/m8pple/arch2-2017-cw/issues/24), where each row of
the file corresponds to exactly one execution of the Simulator under test.
Each row should have the following fields:
```
TestId , Instruction , Status , Author [, Message]
```
Whitespace between fields and commas is not important.
The meaning of the fields is as follows:
- `TestId` : A unique identifier for the particular test. This can be composed
of the characters `0-9`, `a-z`, `A-Z`, `-`, or `_`. So for example, ascending
integers would be fine, or combinations of words and integers, as long as there
are no spaces. Running the test-bench twice should produce the same set of
test identifiers in the same order, and this should reflect the order in which
tests are executed.
- `Instruction` : This should identify the instruction which is the _primary_
instruction being tested. Note that many (actually, most) instructions are
impossible to test in isolation, so a given test may fail either because
the instruction under test doesn't work, or because some other instruction
necessary for the tests is broken. The test should be written to be particularly
sensitive to the instruction under test, so it looks for a failure mode of
that particular instruction.
- `Status` : This will either be `Pass` or `Fail`. Note that a given test can
only test so much, so it is entirely possible that a test might pass even
if an instruction is broken. However, a `Fail` should be only be returned
if the instruction under test (or another instruction) has clearly done
something wrong.
- `Author` : The login of the person who created the test.
- `Message` : This is an optional field which gives more details about what
exactly went wrong. This field is free-form text, but it must not contain
any commas, and should only be a single line.
All fields are case insensitive, including `TestId`.
Testbench build and executable
------------------------------
The Testbench should be built (or otherwise setup) using:
```
make testbench
```
_Note: it is entirely possible that nothing needs to happen when
this is executed. It is to allow for freedom of implementation._
This should result in an executable called:
```
bin/mips_testbench
```
_Note: this only needs to be an executable file; so unlike the Simulator
it does not need to be binary built from C++, and could be a bash script._
The Testbench will _always_ be executed from within the root directory of
the submission, so you can use relative paths to data files.
Any temporary or working files created during execution should be created in a directory
called `test/temp`. Any files considered to be output of the Testbench (for
example per-test logfiles) should be created in `test/output`. However, there is no
requirement that output is created in either directory.
An example of running the Testbench on it's own Simulator would be:
```
bin/mips_testbench bin/mips_simulator
```
corresponding output might be:
```
0, ADDU, Pass, dt10
1, ADD, Pass, dt10
2, ADDI, Pass, dt10
```
If we assume a different Testbench, and have a Simulator at the
path `../other-simulator/bin/mips_simulator`, then we could execute with:
```
bin/mips_testbench ../other-simulator/bin/mips_simulator
```
and the corresponding output might be:
```
jr1 , jr, Pass, dt10, Single JR statement back to NULL
addi1 , addi, Pass, hes2, Add 5 to $0
addi2 , addi, Fail, hes2, Add -5 to $0
jr2 , jr, Pass, hes2, JR->NOP->JR->NOP
```
Memory-Map
----------
The memory map of the simulated process is as follows:
```
Offset | Length | Name | R | W | X | Cached |
-----------|-------------|------------|---|---|---|--------|--------------------------------------------------------------------
0x00000000 | 0x4 | ADDR_NULL | | | Y | | Jumping to this address means the Binary has finished execution.
0x00000004 | 0xFFFFFFC | .... | | | | |
0x10000000 | 0x1000000 | ADDR_INSTR | Y | | Y | Y | Executable memory. The Binary should be loaded here.
0x11000000 | 0xF000000 | .... | | | | |
0x20000000 | 0x4000000 | ADDR_DATA | Y | Y | | Y | Read-write data area. Should be zero-initialised.
0x24000000 | 0xC000000 | .... | | | | |
0x30000000 | 0x4 | ADDR_GETC | Y | | | | Location of memory mapped input. Read-only.
0x30000004 | 0x4 | ADDR_PUTC | | Y | | | Location of memory mapped output. Write-only.
0x30000008 | 0xCFFFFFF8 | .... | | | | |
-----------|-------------|------------|---|---|---|--------|--------------------------------------------------------------------
```
The Binary is not allowed to modify it's own code, nor should it attempt to execute code outside the executable memory.
When a simulated program reads from address `ADDR_GETC`, the simulator should
- Block until a character is available (e.g. if a key needs to be pressed)
- Return the 8-bit extended to 32-bits as the result of the memory read.
- If there are no more characters (EOF), the memory read should return -1.
When a simulated program writes to address `ADDR_PUTC`, the simulator should
write the character to stdout. If the write fails, the appropriate Error
should be signalled.
Exceptions and Errors
---------------------
*Exceptions* are due to instructions which the Binary wants to execute which result
in some kind of exceptional or abnormal situation. Exceptions should not occurr
due to bugs or errors within the Simulator. All exceptions are classified into
three types, each of which has a numeric code:
- Arithmetic exception (-10) : Any kind of arithmetic problem, such as overflow, divide by zero, ...
- Memory exception (-11) : Any problem relating to memory, such as address out of range, writing to
read-only memory, reading from an address that cannot be read, executing an address that cannot be executed ...
- Invalid instruction (-12) : The Binary tries to execute a memory location that does not contain a valid
instruction (this is not the same as trying to read a value that cannot be executed).
If any of these exceptions are encountered, the Simulator should immediately terminate
with the exit code given using `std::exit`. Please note than an exception does
not automatically mean that a Binary must be incorrect or buggy. For example,
there are very well-defined situations where arithmetic overflow occurs, and a
Binary may choose to rely on this behaviour for performance reasons, rather than
explicitly checking for overflow all the time. Indeed, this performance argument
is a big reason for hardware overflow exceptions, so a Binary _must_ be able to
rely on them being correctly reported.
*Errors* are due to problems occuring within the simulator, rather than something
that the Binary did wrong. As with exceptions, an error may indicate a genuine problem
with the Simulator, or it may be due to an interaction between the Simulator and
the Environment. An example of the former is where a Simulator doesn't support
a particular op-code (yet), so cannot execute a correct Binary.
An example of an error which is _not_ the Simulator's fault is where the Binary has tried
to output a character, but the request to the Environment has failed in some way. You
may never have worried about it, but `std::cin >> x` can fail in various ways, and this
would not be the fault of the Binary (so is not an exception).
Error codes are:
- Internal error (-20) : the simulator has failed due to some unknown error
- IO error (-21) : the simulator encountered an error reading/writing input/output
Instructions
------------
Instructions of interest are:
Code | Meaning | Complexity
--------|---------------------------------------------|-----------
ADD | Add (with overflow) | 2 XX
ADDI | Add immediate (with overflow) | 2 XX
ADDIU | Add immediate unsigned (no overflow) | 2 XX
ADDU | Add unsigned (no overflow) | 1 X
AND | Bitwise and | 1 X
ANDI | Bitwise and immediate | 2 XX
BEQ | Branch on equal | 3 XXX
BGEZ | Branch on greater than or equal to zero | 3 XXX
BGEZAL | Branch on non-negative (>=0) and link | 4 XXXX
BGTZ | Branch on greater than zero | 3 XXX
BLEZ | Branch on less than or equal to zero | 3 XXX
BLTZ | Branch on less than zero | 3 XXX
BLTZAL | Branch on less than zero and link | 4 XXXX
BNE | Branch on not equal | 3 XXX
DIV | Divide | 4 XXXX
DIVU | Divide unsigned | 4 XXXX
J | Jump | 3 XXX
JALR | Jump and link register | 4 XXXX
JAL | Jump and link | 4 XXXX
JR | Jump register | 1 X
LB | Load byte | 3 XXX
LBU | Load byte unsigned | 3 XXX
LH | Load half-word | 3 XXX
LHU | Load half-word unsigned | 3 XXX
LUI | Load upper immediate | 2 XX
LW | Load word | 2 XX
LWL | Load word left | 5 XXXXX
LWR | Load word right | 5 XXXXX
MFHI | Move from HI | 3 XXX
MFLO | Move from LO | 3 XXX
MTHI | Move to HI | 3 XXX
MTLO | Move to LO | 3 XXX
MULT | Multiply | 4 XXXX
MULTU | Multiply unsigned | 4 XXXX
OR | Bitwise or | 1 X
ORI | Bitwise or immediate | 2 XX
SB | Store byte | 3 XXX
SH | Store half-word | 3 XXX
SLL | Shift left logical | 2 XX
SLLV | Shift left logical variable | 3 XXX
SLT | Set on less than (signed) | 2 XX
SLTI | Set on less than immediate (signed) | 3 XXX
SLTIU | Set on less than immediate unsigned | 3 XXX
SLTU | Set on less than unsigned | 1 X
SRA | Shift right arithmetic | 2 XX
SRAV | Shift right arithmetic | 2 XX
SRL | Shift right logical | 2 XX
SRLV | Shift right logical variable | 3 XXX
SUB | Subtract | 2 XX
SUBU | Subtract unsigned | 1 X
SW | Store word | 2 XX
XOR | Bitwise exclusive or | 1 X
XORI | Bitwise exclusive or immediate | 2 XX
--------|---------------------------------------------|---------
INTERNAL| Not associated with a specific instruction |
FUNCTION| Testing the ability to support functions |
STACK | Testing for functions using the stack |
The final instructions are pseudo-instructions, for cases where they don't map to
a single instruction. You are not required to use them, but they may be useful
for tests which are looking at more complex functionality, rather than narrowly
looking at one.
Assessment
==========
Assessment is broken down into three components:
- Group: 80%
- Simulator : 50% (pairs) / 40% (triples)
- Testbench : 30% (pairs) / 40% (triples)
- Individual: 20%
- Reflection : 20%
Each group will be assigned a shared mark based on
the objective correctness of the simulator and
testbenches.
Groups must also come to a shared decision on how
each member contributed to the group, and include
a `contribution.md` which attempts to assign credit
on different aspects of the project. This will be
turned into an individual score between -2 and +2
where the total within a group is 0, and used to
additively adjust the group component. So the
_maximum_ influence of this is +-2% overall.
Individuals should submit an individual copy of `reflection.md`.
Submission
==========
Submission is via github _and_ blackboard:
- Make sure you have pushed to your group github repo.
- All group members submit the _hash_ of their submission via
blackboard (not a zip). All members should submit the same
hash (to show agreement). If there is any discrepancy, the
earliest hash submitted (in commit history terms) will be used.
The deadline is Fri 8th at 22:00.
- Each individual submits a copy of `reflection.md` to blackboard
by Mon 11th at 22:00.
Additional Notes
================