This document contains everything that you need to know to work the SimpleScalar problems this semester. Section 1 is a summary that will familiarize you with SimpleScalar and the use of simulation tools in computer architecture research. Section 2 tells you how to get the tool set from the Web and set up the part that is needed for the textbook problems. Section 3 describes each of the simulators, including the information supplied by each of them. Section 4 describes the SPEC95 benchmark binaries that can be used with the simulators. Section 5 consists of a complete example of the use of one of the simulators. Section 6 briefly describes the complete tool set for people interested in going beyond the problems in the textbook.
When computer architecture researchers work to improve the performance of a computer system, they often use an existing system to simulate a proposed system. Although the intent is not always to measure raw performance (estimating power consumption is one alternative), performance estimation is one of the most important results obtained by simulation. The SimpleScalar tool set is designed to measure the performance of several parts of a superscalar processor and its memory hierarchy. This document describes the SimpleScalar simulators. Other simulation systems may be similar or very different.
The SimpleScalar tool set includes a compiler that creates binaries for a non-existent processor. The binaries can be executed on one of several simulators that are included in the tool set. This section describes the goals of processor simulation.
The execution of a processor can be modeled as a series of known states and the time (or other costs, i.e., power) required to make the transition between each pair of states. The state information may include all or a subset of:
· The values stored in all memory locations.
· The values stored in and the status of all cache memories.
· The values stored in and the status of the translation-lookaside buffer (TLB).
· The values stored in and the status of the branch prediction table(s) or branch target buffer (BTB).
· All processor state (i.e., the pipeline, execution units (integer ALU, load/store unit, etc.), register file, register update unit (RUU), etc.)
A good way to evaluate the performance of a program on a proposed processor architecture is to simulate the state of the architecture during the execution of the program. By simulating the states through which the processor will pass during the execution of a program and estimating the time (or other measurement) necessary for each state transition, the amount of time that the simulated processor will need to execute the program can be estimated.
The more state that is simulated, the longer a simulation will take. Complex simulations can execute 100s of times slower than a real processor. Therefore, simulating the execution of a program that would take an hour of CPU time on an existing processor can take a week on a complex simulator. For this reason, it is important to evaluate what measurements are desired and limit the simulation to only the state that is necessary to properly estimate those measurements. This is the reason for the inclusion of several different simulators in the SimpleScalar tool set.
In addition to estimating the execution time of a program on the simulated processor, profile information may be of use to computer architects. Profile information is a count of the number or frequency of events that occur during the execution of a program. One common example of profile data is a count of how often each type of instruction (i.e., branch, load, store, ALU operation, etc.) is executed during the running of a program.
Profile information can be used to gauge the relative importance of each part of a processor's implementation in determining its performance when executing the profiled program.
The SimpleScalar tool set is based on the MIPS R2000 processor's instruction set architecture (ISA). The processor is described in MIPS RISC Architecture by Gerry Kane, published by Prentice Hall, 1988. The ISA describes the instructions that the processor is capable of executing—and therefore the instructions that a compiler can generate—but it does not describe how the instructions are implemented. The implementation is what computer architects change in order to improve the performance of a processor.
An existing processor can be chosen as a base processor for several reasons. These may include:
· The architecture of the processor is well known and documented.
· The architecture of the processor is state-of-the-art and therefore it is likely to be useful as a base for the study of future processors.
· The architecture of the processor has been implemented as a real processor, allowing simulations to be compared to executions on a real, physical processor.
An important consideration in the choice of the MIPS architecture for the SimpleScalar tool set was the fact that the GNU GCC compiler was available in source-code form, and could compile to the MIPS architecture. This allowed the use of this public-domain software as part of the SimpleScalar tool set.
The SimpleScalar simulators are part of the complete SimpleScalar tool set that is described in Section 6. The simulators are the easiest part of the tool set to set up, and they can be used to simulate existing SimpleScalar binaries without the need for any of the other parts of the tool set.
The simulators are available in a file that is available on the SimpleScalar Tools Home Page at
http://www.cs.wisc.edu/~mscalar/simplescalar.html
on the Web. The file is called simplesim.tar. You can get it directly from the Web page or via ftp (the ftp file is named simplesim.tar.gz) as described on the Web page. At the time that this document was written, the current release of SimpleScalar was version 2.0.
Using standard Unix commands, you can unpack the archive. All that remains is to build the simulators by moving to the directory containing them (probably simplesim-2.0) and building them by typing make. Note that the default optimization is -O. The simulators will run much faster (i.e., 10 times) if you modify the make file (Makefile) section “Choose your optimization level here” so that the “for optimization” flags are used as the OFLAGS. If you have built the simulators and want to rebuild them with different optimizations, type make clean and then make.
Note: Building the most complex simulator, sim-outorder, may require 64MB of memory when it is compiled with all optimizations. If the building process is stalled for several minutes on the line that compiles sim-outorder, you may want to try changing the optimizations. You may also want to simply wait for the build to finish, as the simulator will run much faster with the optimizations included.
After you have built the simulators, either move the executables to a directory in your execution path or add the directory in which you built the executables to your execution path. The simulator executable files include:
· sim-bpred
· sim-cache
· sim-cheetah
· sim-fast
· sim-outorder
· sim-profile
· sim-safe
Each of these is described in the next section.
Each of the simulators appears here in a separate section. Each section includes the name of the simulator and the output that is generated when you ask for the simulator’s help screen by executing it with no input file or parameters.
These help screens—written by the SimpleScalar authors—include a short description of the function of the simulator, a short description of why you would want to use the particular simulator, and a description of the input parameters for the simulator.
sim-bpred: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
Usage: sim-bpred {-options}
executable {arguments}
sim-bpred: This simulator
implements a branch predictor analyzer.
#
#
-option <args> #
<default> # description
#
-config <string> #
<null> # load configuration from a file
-dumpconfig <string> # <null> # dump configuration to a
file
-h <true|false> #
false # print help message
-v <true|false> #
false # verbose operation
-d <true|false> #
false # enable debug message
-i <true|false> #
false # start in Dlite debugger
-seed <int> # 1 # random number generator seed (0
for timer seed)
-q <true|false> #
false # initialize and terminate immediately
-bpred <string> # bimod # branch predictor type
{nottaken|taken|bimod|2lev|comb}
-bpred:bimod <int> # 2048 # bimodal predictor config
(<table size>)
-bpred:2lev <int list...> # 1 1024 8 0 # 2-level predictor config
(<l1size> <l2size> <hist_size>
<xor>)
-bpred:comb <int> # 1024 # combining predictor config
(<meta_table_size>)
-bpred:ras <int> # 8 # return address stack size (0
for no return stack)
-bpred:btb <int list...> # 512 4 # BTB config (<num_sets>
<associativity>)
Branch predictor configuration examples for
2-level predictor:
Configurations: N, M, W, X
N
# entries in first level (# of shift register(s))
W
width of shift register(s)
M
# entries in 2nd level (# of counters, or other
FSM)
X
(yes-1/no-0) xor history and address for 2nd level
index
Sample predictors:
GAg
: 1, W, 2^W, 0
GAp
: 1, W, M (M > 2^W), 0
PAg
: N, W, 2^W, 0
PAp
: N, W, M (M == 2^(N+W)), 0
gshare
: 1, W, 2^W, 1
Predictor `comb' combines a bimodal and a
2-level predictor.
sim-cache: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
Usage: sim-cache {-options}
executable {arguments}
sim-cache: This simulator
implements a functional cache simulator.
Cache
statistics are generated for
a user-selected cache and TLB configuration,
which
may include up to two levels of instruction and data cache (with
any
levels unified), and one
level of instruction and data TLBs. No
timing
information is
generated.
#
#
-option <args> #
<default> # description
#
-config <string> # <null> # load configuration from a
file
-dumpconfig <string> # <null> # dump configuration to a
file
-h <true|false> #
false # print help message
-v <true|false> #
false # verbose operation
-d <true|false> #
false # enable debug message
-i <true|false> #
false # start in Dlite debugger
-seed <int> # 1 # random number generator seed (0
for timer seed)
-q <true|false> #
false # initialize and terminate immediately
-cache:dl1 <string> # dl1:256:32:1:l # l1 data cache
config, i.e., {<config>|none}
-cache:dl2 <string> # ul2:1024:64:4:l # l2 data cache
config, i.e., {<config>|none}
-cache:il1 <string> # il1:256:32:1:l # l1 inst cache
config, i.e., {<config>|dl1|dl2|none}
-cache:il2 <string> # dl2 # l2 instruction cache config,
i.e., {<config>|dl2|none}
-tlb:itlb <string> # itlb:16:4096:4:l # instruction TLB
config, i.e., {<config>|none}
-tlb:dtlb <string> # dtlb:32:4096:4:l # data TLB config,
i.e., {<config>|none}
-flush <true|false> #
false # flush caches on system calls
-icompress <true|false> #
false # convert 64-bit inst addresses to 32-bit inst
equivalents
-pcstat <string list...> # <null> # profile stat(s) against
text addr's (mult uses ok)
The cache config parameter <config> has
the following format:
<nsets> - number of sets in the
cache
<bsize> - block size of the
cache
<assoc> - associativity of the
cache
<repl> - block replacement strategy, 'l'-LRU,
'f'-FIFO, 'r'-random
Examples:
-cache:dl1 dl1:4096:32:1:l
-dtlb
dtlb:128:4096:32:r
hierarchy at the data cache hiearchy using
the "dl1" and "dl2" cache
configuration arguments. Most sensible combinations are supported,
e.g.,
-cache:il1 il1:128:64:1:l -cache:il2
dl2
-cache:dl1 dl1:256:32:1:l -cache:dl2
ul2:1024:64:2:l
Or, a fully unified cache hierarchy (il1
pointed at dl1):
-cache:il1 dl1
-cache:dl1 ul1:256:32:1:l -cache:dl2
ul2:1024:64:2:l
sim-cheetah: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
Portions Copyright (C)
1989-1993 by Rabin A. Sugumar and Santosh G. Abraham.
Usage: sim-cheetah {-options}
executable {arguments}
sim-cheetah: This program
implements a functional simulator driver for
Cheetah. Cheetah is a cache simulation package written
by Rabin Sugumar
and
Santosh Abraham which can efficiently simulate multiple
cache
configurations in a single
run of a program. Specifically, Cheetah
can
simulate ranges of single
level set-associative and fully-associative
caches. See the directory libcheetah/ for more
details on Cheetah.
#
#
-option <args> #
<default> # description
#
-config <string> # <null> # load configuration from a
file
-dumpconfig <string> # <null> # dump configuration to a
file
-h <true|false> #
false # print help message
-v <true|false> #
false # verbose operation
-d <true|false> #
false # enable debug message
-i <true|false> #
false # start in Dlite debugger
-seed <int> # 1 # random number generator seed (0
for timer seed)
-q <true|false> #
false # initialize and terminate immediately
-refs <string> # data # reference stream to analyze,
i.e., {inst|data|unified}
-R <string> # lru # replacement policy, i.e., lru
or opt
-C <string> # sa # cache configuration, i.e., fa,
sa, or dm
-a <int> # 7 # min number of sets (log base 2,
line size for DM)
-b <int> # 14 # max number of sets (log base 2, line
size for DM)
-l <int> # 4 # line size of the caches (log
base 2)
-n <int> # 1 # max degree of associativity to
analyze (log base 2)
-in <int> # 512 # cache size intervals at which
miss ratio is shown
-M <int> # 524288 # maximum cache size of
interest
-c <int> # 16 # size of cache (log base 2) for
DM analysis
sim-fast: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
Usage: sim-fast {-options}
executable {arguments}
sim-fast: This simulator
implements a very fast functional simulator.
This
functional simulator
implementation is much more difficult to digest than
the
simpler, cleaner sim-safe functional simulator.
By default, this
simulator performs no
instruction error checking, as a result, any
instruction errors will
manifest as simulator execution errors, possibly
causing sim-fast to execute
incorrectly or dump core. Such is
the
price
we pay for speed!!!!
#
#
-option <args> #
<default> # description
#
-config <string> # <null> # load configuration from a
file
-dumpconfig <string> # <null> # dump configuration to a
file
-h <true|false> #
false # print help message
-v <true|false> #
false # verbose operation
-d <true|false> #
false # enable debug message
-i <true|false> #
false # start in Dlite debugger
-seed <int> # 1 # random number generator seed (0
for timer seed)
-q <true|false> #
false # initialize and terminate immediately
sim-outorder: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
Usage: sim-outorder
{-options} executable {arguments}
sim-outorder: This simulator
implements a very detailed out-of-order issue
superscalar processor with a
two-level memory system and speculative
execution support. This simulator is a performance simulator,
tracking the
latency of all pipeline
operations.
#
#
-option <args> #
<default> # description
#
-config <string> # <null> # load configuration from a
file
-dumpconfig <string> # <null> # dump configuration to a
file
-h <true|false> #
false # print help message
-v <true|false> #
false # verbose operation
-d <true|false> #
false # enable debug message
-i <true|false> #
false # start in Dlite debugger
-seed <int> # 1 # random number generator seed (0
for timer seed)
-q <true|false> #
false # initialize and terminate immediately
-ptrace <string list...> # <null> # generate pipetrace, i.e.,
<fname|stdout|stderr> <range>
-fetch:ifqsize <int> # 4 # instruction fetch queue size
(in insts)
-fetch:mplat <int> # 3 # extra branch mis-prediction
latency
-fetch:speed <int> # 1 # speed of front-end of machine
relative to execution core
-bpred <string> # bimod # branch predictor type
{nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod <int> # 2048 # bimodal predictor config
(<table size>)
-bpred:2lev <int list...> # 1 1024 8 0 # 2-level predictor config
(<l1size> <l2size> <hist_size>
<xor>)
-bpred:comb <int> # 1024 # combining predictor config
(<meta_table_size>)
-bpred:ras <int> # 8 # return address stack size (0
for no return stack)
-bpred:btb <int list...> # 512 4 # BTB config (<num_sets>
<associativity>)
-bpred:spec_update
<string> # <null> # speculative predictors
update in {ID|WB} (default non-spec)
-decode:width <int> # 4 # instruction decode B/W
(insts/cycle)
-issue:width <int> # 4 # instruction issue B/W
(insts/cycle)
-issue:inorder <true|false> #
false # run pipeline with in-order issue
-issue:wrongpath
<true|false> # true # issue instructions down wrong
execution paths
-commit:width <int> # 4 # instruction commit B/W
(insts/cycle)
-ruu:size <int> # 16 # register update unit (RUU)
size
-lsq:size <int> # 8 # load/store queue (LSQ)
size
-cache:dl1 <string> # dl1:128:32:4:l # l1 data cache
config, i.e., {<config>|none}
-cache:dl1lat <int> # 1 # l1 data cache hit latency (in
cycles)
-cache:dl2 <string> # ul2:1024:64:4:l # l2 data cache
config, i.e., {<config>|none}
-cache:dl2lat <int> # 6 # l2 data cache hit latency (in
cycles)
-cache:il1 <string> # il1:512:32:1:l # l1 inst cache
config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat <int> # 1 # l1 instruction cache hit
latency (in cycles)
-cache:il2 <string> # dl2 # l2 instruction cache config,
i.e., {<config>|dl2|none}
-cache:il2lat <int> # 6 # l2 instruction cache hit
latency (in cycles)
-cache:flush <true|false> #
false # flush caches on system calls
-cache:icompress
<true|false> # false # convert 64-bit inst addresses
to 32-bit inst equivalents
-mem:lat <int list...> # 18 2 # memory access latency
(<first_chunk> <inter_chunk>)
-mem:width <int> # 8 # memory access bus width (in
bytes)
-tlb:itlb <string> # itlb:16:4096:4:l # instruction TLB
config, i.e., {<config>|none}
-tlb:dtlb <string> # dtlb:32:4096:4:l # data TLB config,
i.e., {<config>|none}
-tlb:lat <int> # 30 # inst/data TLB miss latency (in
cycles)
-res:ialu <int> # 4 # total number of integer ALU's
available
-res:imult <int> # 1 # total number of integer
multiplier/dividers available
-res:memport <int> # 2 # total number of memory system
ports available (to CPU)
-res:fpalu <int> # 4 # total number of floating point
ALU's available
-res:fpmult <int> # 1 # total number of floating point
multiplier/dividers available
-pcstat <string list...> # <null> # profile stat(s) against
text addr's (mult uses ok)
-bugcompat <true|false> #
false # operate in backward-compatible bugs mode (for testing
only)
Pipetrace range arguments are formatted as
follows:
Both ends of the range are optional, if
neither are specified, the entire
execution is traced. Ranges that start with a `@' designate an
address
range to be traced, those that start with an
`#' designate a cycle count
range.
All other range values represent an instruction count range. The
second argument, if specified with a `+',
indicates a value relative
to the first argument, e.g., 1000:+100 ==
1000:1100. Program symbols
may
be used in all
contexts.
Examples:
-ptrace FOO.trc #0:#1000
-ptrace BAR.trc
@2000:
-ptrace BLAH.trc
:1500
-ptrace UXXE.trc
:
-ptrace FOOBAR.trc
@main:+278
Branch predictor configuration examples for
2-level predictor:
Configurations: N, M, W, X
N
# entries in first level (# of shift register(s))
W
width of shift register(s)
M
# entries in 2nd level (# of counters, or other
FSM)
X
(yes-1/no-0) xor history and address for 2nd level
index
Sample predictors:
GAg
: 1, W, 2^W, 0
GAp
: 1, W, M (M > 2^W), 0
PAg
: N, W, 2^W, 0
PAp
: N, W, M (M == 2^(N+W)), 0
gshare
: 1, W, 2^W, 1
Predictor `comb' combines a bimodal and a
2-level predictor.
The cache config parameter <config> has
the following format:
<nsets> - number of sets in the
cache
<bsize> - block size of the
cache
<assoc> - associativity of the
cache
<repl> - block replacement strategy, 'l'-LRU,
'f'-FIFO, 'r'-random
Examples:
-cache:dl1 dl1:4096:32:1:l
-dtlb
dtlb:128:4096:32:r
Cache levels can be unified by pointing a
level of the instruction cache
hierarchy at the data cache hiearchy using
the "dl1" and "dl2" cache
configuration arguments. Most sensible combinations are supported,
e.g.,
A unified l2 cache (il2 is pointed at
dl2):
-cache:il1 il1:128:64:1:l -cache:il2
dl2
-cache:dl1 dl1:256:32:1:l -cache:dl2
ul2:1024:64:2:l
Or, a fully unified cache hierarchy (il1
pointed at dl1):
-cache:il1 dl1
-cache:dl1 ul1:256:32:1:l -cache:dl2
ul2:1024:64:2:l
sim-profile: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
Usage: sim-profile {-options}
executable {arguments}
sim-profile: This simulator
implements a functional simulator with
profiling support. Run with the `-h' flag to see profiling
options
available.
#
#
-option <args> #
<default> # description
#
-config <string> # <null> # load configuration from a
file
-dumpconfig <string> # <null> # dump configuration to a
file
-h <true|false> #
false # print help message
-v <true|false> #
false # verbose operation
-d <true|false> #
false # enable debug message
-i <true|false> #
false # start in Dlite debugger
-seed <int> # 1 # random number generator seed (0
for timer seed)
-q <true|false> #
false # initialize and terminate immediately
-all <true|false> #
false # enable all profile options
-iclass <true|false> #
false # enable instruction class profiling
-iprof <true|false> #
false # enable instruction profiling
-brprof <true|false> #
false # enable branch instruction profiling
-amprof <true|false> #
false # enable address mode profiling
-segprof <true|false> #
false # enable load/store address segment profiling
-tsymprof <true|false> #
false # enable text symbol profiling
-taddrprof <true|false> #
false # enable text address profiling
-dsymprof <true|false> #
false # enable data symbol profiling
-internal <true|false> #
false # include compiler-internal symbols during symbol
profiling
-pcstat <string list...> # <null> # profile stat(s) against
text addr's (mult uses ok)
sim-safe: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
Usage: sim-safe {-options}
executable {arguments}
sim-safe: This simulator
implements a functional simulator.
This
functional simulator is the
simplest, most user-friendly simulator in the
simplescalar tool set. Unlike sim-fast, this functional simulator
checks
for
all instruction errors, and the implementation is crafted for
clarity
rather than
speed.
#
#
-option <args> #
<default> # description
#
-config <string> #
<null> # load
configuration from a file
-dumpconfig <string> # <null> # dump configuration to a
file
-h <true|false> #
false # print help message
-v <true|false> #
false # verbose operation
-d <true|false> #
false # enable debug message
-i <true|false> #
false # start in Dlite debugger
-seed <int> # 1 # random number generator seed (0
for timer seed)
-q <true|false> #
false # initialize and terminate immediately
The Standard Performance Evaluation Corporation (SPEC) provides (for a fee) a set of benchmarks that are used to evaluate processors. At the time this document was written the current release of these benchmarks was called SPEC95. It consists of two parts: (i) CINT95 is a set of integer benchmarks and (ii) CFP95 is a set of floating point benchmarks. You can find information about these benchmarks on the Web at
The Standard Performance Evaluation Corporation has allowed the authors of the SimpleScalar tool set to make a set of SimpleScalar binaries for the SPEC95 benchmarks available on the Web and via ftp. These binaries can be run on the simulators as provided with no compilation or other manipulation. Because SPEC does not want the inputs used for their standard benchmark suite to be distributed, you will have to create appropriate command-line options and/or input files for each of the benchmarks. The inputs have been provided in the written problems that use the SPEC95 benchmarks. Additional information can be found by taking advantage of the fact that many of the SPEC95 benchmarks are derived from programs that are available free-of-charge on the Web.
In order to install the SPEC95 benchmarks, you will have to determine the endianness of the processor on which you will be running the simulators. The endianness of a processor is defined as the part (byte) of a multi-byte number that appears first in a word, when the word is viewed as consisting of bytes ordered by their physical order in memory. Thus, a little-endian processor will have the least-significant (e.g., the “little”) byte first and big-endian will have the most-significant (e.g., the “big”) byte first. The ordering of the first two 4-byte words of memory for each endianness is shown in Table 1.
|
Processor Endianness |
First Word Byte Locations |
Second Word Byte Locations | ||||||
|
Little-endian |
3 |
2 |
1 |
0 |
7 |
6 |
5 |
4 |
|
Big-endian |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Table 1: The difference between big- and little-endian.
Intel x86 processors are little-endian. Most other processors (Alpha, SPARC, MIPS, etc.) are big-endian. When you get the SPEC95 benchmark binaries, get the file that corresponds to the endianness of your processor. If you are not certain which endianness you processor uses, you can find out by executing the program sysprobe that is built in the same directory as the simulators (See Section 2.)
The SPEC95 SimpleScalar binaries are available on the Web at
or via ftp (the ftp files will have a .gz suffix) as described on that page. The file for little-endian processors is called simplebench.little.tar and the file for big-endian processors is called simplebench.big.tar. Using standard Unix commands, you can unpack the archive. The result will be a directory that contains one file for each SPEC95 benchmark. The files have a .ss extension to distinguish them as SimpleScalar binary files. See the next section to test your installation.
You can test your installation by executing the SPEC95 go benchmark using the fastest simulator. After you have installed the simulators, set up the execution path, and set up the proper SPEC95 SimpleScalar binaries as described above, move to the directory that contains the SPEC95 SimpleScalar binaries. You should then create an empty file called go.in using the Unix command touch go.in. You can now run the go benchmark by typing the command sim-fast go.ss 2 8 go.in. If the simulation works, you can watch the two computer players as they present their moves. The game (and therefore the simulation) will end at move number 40.
Here is the output you will see when you run the example:
sim-fast: Version 2.0 of
July, 1997.
Copyright (c) 1994-1997 by
Todd M. Austin. All Rights
Reserved.
sim:
simulation started @ Fri May 28
sim-fast: This simulator
implements a very fast functional simulator.
This
functional simulator
implementation is much more difficult to digest than
the
simpler, cleaner sim-safe functional simulator.
By default, this
simulator performs no
instruction error checking, as a result, any
instruction errors will
manifest as simulator execution errors, possibly
causing sim-fast to execute
incorrectly or dump core. Such is
the
price
we pay for speed!!!!
#
-config # load
configuration from a file
#
-dumpconfig # dump
configuration to a file
#
-h false # print help
message
#
-v false # verbose operation
#
-d false # enable
debug message
#
-i false # start in
Dlite debugger
-seed 1 # random number
generator seed (0 for timer seed)
#
-q false # initialize
and terminate immediately
sim:
** starting *fast* functional simulation **
Initial position from
go.in
1 B D6
2 W C4
3 B E3
4 W B6
5 B C2
6 W C7
7 B D7
8 W E4
9 B D4
10 W D8
11 B E6
12 W F8
13 B B3
14 W F7
15 B F6
16 W D5
17 B G4
18 W E7
19 B C8
20 W E8
21 B B7
22 W C6
23 B G7
24 W B8
25 B G6
26 W A7
27 B F3
28 W D3
29 B D2
30 W B4
31 B E2
32 W A3
33 B A2
34 W A4
35 B G8
36 W C8
37 B C3
38 W D4
39 B pass
40 W pass
Game
over
sim:
** simulation statistics **
sim_elapsed_time 8 # total simulation time in
seconds
ld_text_base 0x00400000 # program text (code)
segment base
ld_text_size 621600 # program text (code)
size in bytes
ld_data_base 0x10000000 # program initialized
data segment base
ld_data_size 578004 # program init'ed
`.data' and uninit'ed `.bss' size in bytes
ld_stack_base 0x7fffc000 # program stack segment
base (highest address in stack)
ld_stack_size 16384 # program initial stack
size
ld_prog_entry 0x00400140 # program entry point
(initial PC)
ld_environ_base 0x7fff8000 # program environment base
address address
ld_target_big_endian 0 # target executable endian-ness,
non-zero if big endian
mem_brk_point 0x10094000 # data segment break
point
mem_stack_min 0x7fff7500 # lowest address
accessed in stack segment
mem_total_data 565k # total bytes used in
init/uninit data segment
mem_total_heap 28k # total bytes used in
program heap segment
mem_total_stack 19k # total bytes used in
stack segment
mem_total_mem 612k # total bytes used in
data, heap, and stack segments
In addition to the simulators, the SimpleScalar tool set includes several other packages that allow you to create SimpleScalar executable files. These are described in detail in the document The SimpleScalar Tool Set, Version 2.0, written by Doug Burger and Todd M. Austin while they were at the University of Wisconsin. The document includes installation instructions for each of the packages. You can find a copy of the document at
on the Web. The file is called TR_1342.ps.
The complete tool set consists of these packages:
· simplesim.tar (simplesim.tar.gz via ftp) is the simulators as mentioned above.
· simpleutils.tar (simpleutils.tar.gz via ftp) is the GNU binutils version 2.5.2 modified to compile to the SimpleScalar architecture. These are needed if you are going to use the SimpleScalar compiler.
· simpletools.tar (simpletools.tar.gz via ftp) is the SimpleScalar-targeted GNU compiler version 2.6.3. With this and simpleutils.tar you can compile your own C programs to run on the SimpleScalar simulators. If you plan to do this, you should read the document The SimpleScalar Tool Set, Version 2.0 that is mentioned at the beginning of this section.