Project Milestone Report (4/16)
Swamynathan Siva (swamynas) and Andrew Zhao (ahzhao)
Week |
Todo |
Week 4 (4/16 - 4/19) |
Swamy - Complete modifications to intercon simulator to support P2P comm and selective snooping
Andrew - Complete/Test MSI implementations and changes to coherence interface |
Week 4.5 (4/19 - 4/22) |
Swamy/Andrew - Integration of respective components |
Week 5 (4/23 - 4/26) |
Overflow of previous tasks/buffer |
Week 5.5 (4/27 - 4/29) |
Swamy - Implement perfect predictor Andrew- Identify/generate interesting workload traces to compare |
Week 6 (4/30 - 5/2) |
Andrew - Implement probabilistic imperfect predictor
Swamy - Workload running and data collection |
Week 6.5 (5/3 - 5/5) |
Swamy, Andrew - Write final report/presentation |
In our proposal, we hoped to have an initial implementation of CADSS modified with a directory-based cache coherence protocol complete by the milestone. So far, we have spent a substantial amount of time understanding and mapping out CADSS, planning necessary modifications, and thinking about the level of simulation depth we need from the cache and interconnect components. In the current state, we have implemented two changes to swap the simulator to directory-based coherence: (1) modifying the coherence network from a bus to one with point-to-point capabilities, and (2) modifying the existing MI protocol to an MSI one that allows shared reads. These two components have been implemented separately, but have not been merged together and are still being tested and refined. A further complication is the development being in C; we expected to be able to work in C++, but mixing languages has been more difficult than anticipated, and we are also experimenting on this front.
Our target goals and deliverables remain similar, and the list from the proposal still holds. We still hope to show speedup graphs from our predictor simulations as our key result - in this case our speedups would be relative to the snooping bus implementations provided by the simulators, as well as speedups for coherence predictor at different prediction accuracies over a regular directory. We also hope to supplement these with analysis of which application characteristics are most amenable to speedup from directory. We believe we will be able to meet these goals barring significant roadblocks.
Additionally, we have discovered that the base CADSS simulator simplifies and abstracts systems to a greater degree than we expected. As such, we will also analyze our results from the perspective of simulation fidelity, since some benefits we hope to observe may only be observable under a higher resolution simulation. For example, we plan to start with the simpleCache for which source code is provided, keeping in mind that this implementation has infinite capacity. Since the simple processor in the simulator does not model latency on non-memory actions, all our workloads will be severely backend-bound, which may be a good thing in some cases as it brings memory performance to the forefront, but it may be bad because it anulls latency-hiding mechanisms that interleave memory with compute.
We do not have preliminary results at this time, as per our plan, simulator development is a prerequisite and is also the 75% progress point we have yet to meet.