Traleika

Traleika Glacier

Goal:  Research and mature software technologies addressing major Exascale challenges and get ready to intercept by 2018-2020

Objectives:

  • Energy efficiency: Software components interoperate, harmonize, exploit hardware features, and optimize the system for energy efficiency
  • Data locality: PGM system and system software optimize to reduce data movement
  • Scalability: Software components scalable, portable to O(109)—extreme parallelism
  • Programmability: New (Codelet) and legacy (MPI), with gentle slope for productivity
  • Execution model: Objective function based, dynamic, global system optimization
  • Self-awareness: Dynamically respond to changing conditions and demands
  • Resiliency: Asymptotically provide reliability of N-modular redundancy using hardware/software co-design; hardware detection, software correction

Scope of the Project

TG-Scope.png

 

Roadmap

TG-Roadmap.png

 

Architecture

Straw-man System Architecture and Evaluation

TG-Strawman-System.png

Data-locality and BW Tapering, Why So Important?

TG-Data-Locality.png

 

Programming and Execution Models

TG-Programming-Model.png

Programming model

  • Separation of concerns: Domain specification & HW mapping
  • Express data locality with hierarchical tiling
  • Global, shared, non-coherent address space
  • Optimization and auto generation of codelets (HW specific)

Execution model

  • Dataflow inspired, tiny codelets (self contained)
  • Dynamic, event-driven scheduling, non-blocking
  • Dynamic decision to move computation to data
  • Observation based adaption (self-awareness)
  • Implemented in the runtime environment

Separation of concerns

  • User application, control, and resource management

 

Programming System Components

TG-System-Components.png

Runtime

  • Different runtimes target different aspects
    • IRR: targeted for Intel Straw-man architecture
    • SWARM: runtime for a wide range of parallel machines
    • DAR3TS: explore codelet PXM using portable C++
    • Habanero-C: interfaces IRR, tie-in to CnC
  • All explore related aspects of the codelet Program Exec Model (PXM)
  • Goal: Converge towards Open Collaborative Runtime (OCR)
    • Enabling technology development for codelet execution
    • Model systems, foster novel runtime systems research
  • Greater visibility through SW stack -> efficient computing
    • Break OS/Runtime information firewall

Some Promising Results:

TG-Runtime-Results.png

Runtime Research Agenda

  • Locality aware scheduling—heuristics for locality/E-efficiency
    • Extensions to standard Habanero-C runtime
  • Adaptive boosting and idling of hardware
    • Avoid energy expensive unsuccessful steals that perform no work
    • Turbo mode for a core executing serial code
    • Fine grain resource (including energy) management
  • Dynamic data-block movement
    • Co-locate codelets and data
    • Move codelets to data
  • Introspection and dynamic optimization
    • Performance counters, sensors provide real time information
    • Optimization of the system for user defined objective
    • (Go beyond energy proportional computing)

 

Simulators and Tools

TG-Simulators-Tools.png

Simulators—what to expect and not

  • Evaluation of architecture features for PGM and EXE models
  • Relative comparison of performance, energy
  • Data movement patterns to memory and interconnect
  • Relative evaluation of resource management techniques

TG-Simulator-Expect-Not.png

Results Using Simulators

TG-Simulator-Results.png

 

Applications and HW-SW Codesign

TG-App-HW-Co-design.png

 

X-Stack Components

TG-XStack-Components.png

 

Metrics

TG-Metrics.png

Resources

Handouts

The Traleika Glacier project will research and mature software technologies addressing major Exascale challenges, and get ready to intercept by the...

Posters

Traleika Poster Oct 2013

Quad Charts

Traleika Quad Chart Oct 2013