• Workshop on the Future of Scientific Workflows
    04/20/2015 - 04/21/2015
    Rockville
    Maryland

Workshop on the Future of Scientific Workflows

Mission

The mission of this workshop is to develop requirements for workflow methods and tools in a combined HPC and distributed work environment to enable science applications to better manage their end-to-end data flow. The following objectives will be investigated:

  1. Identifying the workflows of representative science use cases in HPC and distributed settings
  2. Understanding the state of the art in existing workflow technologies, including creation, execution, provenance, (re)usability, and reproducibility
  3. Addressing emerging hardware and software trends, both in centralized and distributed environments, as they relate to workflows
  4. Bridging the gap between in situ HPC and distributed postprocessing workflows, for example:
      • Increasing productivity in situ workflows given our experience in distributed workflows, and
      • Understanding the interface between in situ and distributed components of a total workflow

Definitions

The term workflow refers to sequencing and orchestrating operations, along with the attendant tasks of, for example, moving data between workflow processing stages. Workflow management systems aid in the automation of these processes, freeing the scientist from the details of the process.

In the context of scientific computing, a workflow is the orchestration of multiple computing codes in the course of a science campaign. Examples of codes are computational simulations and data analysis/visualization software. It is assumed that a large-scale science campaign consists of several such codes working together. To use a programming analogy, workflows can be considered “programming in the large:” workflows are to codes what programs are to functions or subroutines. The workflow is the outer structure of the individual codes.

Workflows perform two basic functions. They manage the (a) execution of constituent codes and (b) information exchanged between them. Therefore, an instantiation of a workflow must represent both the operations and the data products associated with a particular scientific domain. It should be assumed that individual operations and data products were developed independently in an uncoordinated fashion. Workflows must be usable by the target audience (computational scientists) on target platforms (computing environments) while being represented by abstractions that can be reused across sciences and computing environments and whose performance and correctness can be modeled and verified.


Call for White Papers

We welcome 1-2 page white papers soliciting your input on scientific computing workflows. The white papers are voluntary and may be used for generating a report for future DOE funding opportunities. Your white papers will be visible to all the participants of the workshop. A list of suggested topics and driving questions appears below. You may consider but are not limited to addressing one or more of these topics. Please clearly indicate the topic(s) you are addressing.

Due date: Please complete white papers by April 10, 2015 and submit them using this website.

Suggested topics and driving questions:

  1. Science applications / use cases of workflows
    What are the needs of science communities in terms of managing large-scale workflows? What are the current practices? What would you like to do? How are the technological trends (better instruments, bigger data, different computing capabilities) going to affect your work?

  2. State of the art in distributed area (DA) and in situ (IS) workflow management systems
    What is the state of the art in DA and in IS? What are the strengths and weaknesses of the DA and IS workflows? What are the common functions in DA and IS workflows?

  3. Impact of emerging architectures on workflow systems
    How do new extreme scale architectures affect the DA and IS workflows? Possible considerations include power, concurrency, heterogeneity, bottlenecks, scheduling/use policies.

  4. Future needs for extreme-scale DA and IS workflows
    What are the challenges for DA and IS workflows going forward? These may include data management, computation scheduling, usability, verification, provenance, fault tolerance, performance, others?

  5. The interface between DA and IS workflow systems
    In an overall science campaign, what is the interface between IS and DA workflows? How can the gap be bridged? How is information (data and metadata products) transferred from IS to DA systems?

 

Venue

Hilton Washington DC/Rockville Hotel & Executive Meeting Center
1750 Rockville Pike
Rockville, MD 20852
United States

Agenda

Day 1

8:30 Introduction (Ewa Deelman & Tom Peterka)

9:00 Panel: Science use cases panel; what are current workflow practices and what do scientists need in the future? (Peter Nugent, moderator)

10:30  Break      

11:00  Panel: Workflows state of the art; what is the current state of the art and is there a path to extreme scale? (Manish Parashar, moderator)

12:00  Working lunch

13:30  Parallel breakouts: How do new extreme scale architectures affect workflows?

    New technology trends (Jeff Vetter & Wilf Pinfold, moderators)

    Programming (Mary Hall & Rob Ross, moderators)

    Execution (Alok Choudhary & Pavan Balaji, moderators)

15:00  Break

15:30  Breakout regroup

16:15  Report back and discussion

17:00  Adjourn

Day 2

8:30 Parallel breakouts: Extreme scale workflows; what's needed?

    Workflow system design (Chris Carothers & Doug Thain, moderators)

    Programming, usability (Lavanya Ramakrishnan & Jim Ahrens, moderators)

    Validation (Michela Taufer & Kerstin Klees van Dam, moderators)

10:00  Break

10:30  Breakout regroup

11:15  Report back and discussion

12:00  Working lunch

13:15  Parallel breakouts: What is the interface between in situ / distributed area workflows?

    Data management (Ian Foster & Ilkay Altintas, moderators)

    Workflow management (Valerio Pascucci & Miron Livny, moderators)

    Productivity and usability (Ken Moreland & Dan Katz, moderators)

15:00  Break, breakout regroup

15:30  Report back and wrapup

16:30  Adjourn

Resources

Handouts

The increasing complexity of scientific processing and growth in datahas led to the emergence of scientific workflows. Today, workflowsexist in many...

Presentations

My presentation to the committee -- on Fermilab's program and challenges in the area of data science
Introduction slides
Science use cases panel slides
State of the art panel slides
Rich Carlson lunch talk

Breakout Notes

Posted by Tom Peterka on 05/07/2015
Ken Moreland & Dan Katz
Posted by Tom Peterka on 05/07/2015
Valerio Pascucci & Miron Livny
Posted by Tom Peterka on 05/07/2015
Ian Foster & Ilkay Altintas
Posted by Tom Peterka on 04/27/2015
Michela Taufer & Kerstin Kleese van Dam
Posted by Tom Peterka on 04/27/2015
Lavanya Ramakrishnan & Jim Ahrens
Posted by Tom Peterka on 04/27/2015
Chris Carothers and Doug Thain
Posted by Tom Peterka on 04/27/2015
Alok Choudhary & Pavan Balaji
Posted by Tom Peterka on 04/27/2015
Mary Hall & Rob Ross
Posted by Tom Peterka on 04/27/2015
Jeff Vetter & Wilf Pinfold