< Back

Project Milestone Report (4/16)

Swamynathan Siva (swamynas) and Andrew Zhao (ahzhao)

Updated Schedule

Week

Todo

Week 4 (4/16 -

  4/19)

Swamy - Complete modifications to intercon

simulator to support P2P comm and selective snooping

Andrew - Complete/Test MSI implementations and changes to coherence interface

Week 4.5  (4/19 - 4/22)

Swamy/Andrew - Integration  of respective components

Week 5 (4/23 - 4/26)

Overflow of previous tasks/buffer

Week 5.5 (4/27 - 4/29)

Swamy - Implement perfect predictor

Andrew- Identify/generate interesting workload traces to compare

Week 6 (4/30 - 5/2)

Andrew - Implement probabilistic imperfect predictor

Swamy - Workload running and data collection

Week 6.5 (5/3 - 5/5)

Swamy, Andrew - Write final report/presentation

Progress to Date

In our proposal, we hoped to have an initial implementation of CADSS modified with a directory-based cache coherence protocol complete by the milestone. So far, we have spent a substantial amount of time understanding and mapping out CADSS, planning necessary modifications, and thinking about the level of simulation depth we need from the cache and interconnect components. In the current state, we have implemented two changes to swap the simulator to directory-based coherence: (1) modifying the coherence network from a bus to one with point-to-point capabilities, and (2) modifying the existing MI protocol to an MSI one that allows shared reads. These two components have been implemented separately, but have not been merged together and are still being tested and refined. A further complication is the development being in C; we expected to be able to work in C++, but mixing languages has been more difficult than anticipated, and we are also experimenting on this front.

Goals and Deliverables Update

Our target goals and deliverables remain similar, and the list from the proposal still holds. We still hope to show speedup graphs from our predictor simulations as our key result - in this case our speedups would be relative to the snooping bus implementations provided by the simulators, as well as speedups for coherence predictor at different prediction accuracies over a regular directory.  We also hope to supplement these with analysis of which application characteristics are most amenable to speedup from directory. We believe we will be able to meet these goals barring significant roadblocks.

Additionally, we have discovered that the base CADSS simulator simplifies and abstracts systems to a greater degree than we expected. As such, we will also analyze our results from the perspective of simulation fidelity, since some benefits we hope to observe may only be observable under a higher resolution simulation. For example, we plan to start with the simpleCache for which source code is provided, keeping in mind that this implementation has infinite capacity. Since the simple processor in the simulator does not model latency on non-memory actions, all our workloads will be severely backend-bound, which may be a good thing in some cases as it brings memory performance to the forefront, but it may be bad because it anulls latency-hiding mechanisms that interleave memory with compute.

We do not have preliminary results at this time, as per our plan, simulator development is a prerequisite and is also the 75% progress point we have yet to meet.

Points of Concern