Last updated:
Sat Mar 13 05:04:37 CET 2010
How big is the design space?
This table lists the dimension (including non-valid points) of the design space per cache mechanism.
| Parametric | 2799360 = 10^6 |
| Structural | 5533090560 = 10^10 |
| Hierarchical | 6.213557e+26 = 10^27 |
| Compiler flags | 2723 = 10^3 |
| Joint Compiler/Architecture | 1.691951e+30 = 10^30 |
How to make my cache mechanism compatible with the framework?
The easiest way is to start from an existing cache module, e.g. write back non-blocking cache, and integrate your new features inside. This technique was successfully used by Nathanaël Prémillieu to implement a skewed datacache proposed by André Seznec.
How to register a parameter?
Use the construct 'parameters.add()' in your NewModule.sim.
| // Registering nCPUtoCacheDataPathSize |
| parameters.add("nCPUtoCacheDataPathSize",nCPUtoCacheDataPathSize); |
| // Registering nAssociativity with range specified |
| parameters.add("nAssociativity",nAssociativity,2,4); |
There are fixed names for the three basic cache parameters: nCacheLines, nAssociativity, and nLineSize.
You may not change them. All other parameter namings are free.
How to replace a cache
Assume you want the
Stride Prefetcher instead of the default
non-blocking cache. Then you
simply replace in
dse.uni.cxx
with
| #include "CacheWBNBSP.sim" |
and modify the MyL1Cache instantiation corresponding the class signature for CacheWBNBSP.sim into
| typedef MicrolibCacheWrapperWBNBSP<Instruction, |
|
__CacheWBNBSP_nCPUtoCacheDataPathSize,__CacheWBNBSP_nCachetoCPUDataPathSize, |
|
__CacheWBNBSP_nMemtoCacheDataPathSize,__CacheWBNBSP_nCachetoMemDataPathSize, |
|
__CacheWBNBSP_nLineSize,__CacheWBNBSP_nCacheLines,__CacheWBNBSP_nAssociativity, |
|
__CacheWBNBSP_nStages,__CacheWBNBSP_nDelay,1,__CacheWBNBSP_nMSHR, |
|
__CacheWBNBSP_nMSHRRead,__CacheWBNBSP_nSPEntries,__CacheWBNBSP_nSPPCShift,0>MyL1Cache; |
You need to recompile by typing 'make'.
For all registered parameters you need to specify a default value
If in your NewModule.sim
| parameters.add("nCPUtoCacheDataPathSize",nCPUtoCacheDataPathSize); |
you should include in your NewModule.default.h
| #ifndef __NewModule_nCPUtoCacheDataPathSize |
| #define __NewModule_nCPUtoCacheDataPathSize 8 |
| #endif |
If in your NewModule.sim
| parameters.add("nAssociativity",nAssociativity,2,4); |
you should include in your NewModule.default.h
| #ifndef __NewModule_nAssociativity |
| #define __NewModule_nAssociativity 2 |
| #endif |
For all registerd parameters, you need a single line in your NewModule.range which either
enumerates the possible values that should be explored, or specifies that the value
depends on settings outside your module.
Specifying that the DataPath-like parameters should be set from outside NewModule.h by using -1:
| nCPUtoCacheDataPathSize -1
|
| nCachetoCPUDataPathSize -1
|
| nMemtoCacheDataPathSize -1
|
| nCachetoMemDataPathSize -1
|
Enumerates the possible values to be explored:
| nLineSize 32 64 128
|
| nCacheLines 256 1024 4096 16384 65536
|
| nStages 1
|
| nDelay 1
|
| Snooping false
|
You can allow direct-mapped by using 1 and fully associativity by using 999:
You can specify separate ranges for the different cache levels by using a keyword to separate the different levels (see sample in NewModule.range).
How can I verify area and latency of my configuration?
For latency and area estimating we rely on
Cacti
described
here.
Every run generates a file 'areachecks' and 'latencychecks', with call to Cacti that
are used to determine latency and area estimates. If you install Cacti yourself you
can see the estimates.
The option '--checking' allows you to stop simulation after generating the
check-files.
| ./dse --checking ../benchmarks/powerpc/hello |
How can latency and area be set properly?
For latency and area estimating we will rely on
Cacti
described
here.
To do so, we ask you to implement a 'get_area()' and 'get_latency()' function that simply generates the external Cacti calls to estimate latency and area for your cache. For area you may generate multiple AreaEstimator calls if your cache consists of several blocks. Latency and area will be set automatically to what is specified by Cacti.
Sample for estimating the area
| fprintf(fp, "AreaEstimator %d %d %d 1 0 0 0 1 %f %d 0 0 0 0 %d\n",
|
| (nAssociativity==nCacheLines ? nLineSize*nAssociativity : |
| nLineSize*nAssociativity*nCacheLines),
|
| nLineSize,
|
| (nAssociativity==nCacheLines ? 0 : nAssociativity),
|
| TECHNOLOGY,
|
| nCachetoCPUDataPathSize, extra_tag_bits);
|
Use LatencyEstimator instead of AreaEstimator in 'get_latency()'.
Why is nAssociativity limited to 16?
The current version of Cacti does not provide latency/area/power estimates for
associativity higher than 16 if not fully associative.
So, this limitation only holds if you rely on Cacti for latency, area or power estimates.
I have evaluated millions of design points myself. Can I add my results?
No. The reason is that we cannot verify your results without re-simulating.
Yet, you can submit your module specifying your best configuration as the
default settings and see how it ranks.