The mission of this workshop is to develop requirements for workflow methods and tools in a combined HPC and distributed work environment to enable science applications to better manage their end-to-end data flow. The following objectives will be investigated:
- Identifying the workflows of representative science use cases in HPC and distributed settings
- Understanding the state of the art in existing workflow technologies, including creation, execution, provenance, (re)usability, and reproducibility
- Addressing emerging hardware and software trends, both in centralized and distributed environments, as they relate to workflows
- Bridging the gap between in situ HPC and distributed postprocessing workflows, for example:
- Increasing productivity in situ workflows given our experience in distributed workflows, and
- Understanding the interface between in situ and distributed components of a total workflow
The term workflow refers to sequencing and orchestrating operations, along with the attendant tasks of, for example, moving data between workflow processing stages. Workflow management systems aid in the automation of these processes, freeing the scientist from the details of the process.
In the context of scientific computing, a workflow is the orchestration of multiple computing codes in the course of a science campaign. Examples of codes are computational simulations and data analysis/visualization software. It is assumed that a large-scale science campaign consists of several such codes working together. To use a programming analogy, workflows can be considered “programming in the large:” workflows are to codes what programs are to functions or subroutines. The workflow is the outer structure of the individual codes.
Workflows perform two basic functions. They manage the (a) execution of constituent codes and (b) information exchanged between them. Therefore, an instantiation of a workflow must represent both the operations and the data products associated with a particular scientific domain. It should be assumed that individual operations and data products were developed independently in an uncoordinated fashion. Workflows must be usable by the target audience (computational scientists) on target platforms (computing environments) while being represented by abstractions that can be reused across sciences and computing environments and whose performance and correctness can be modeled and verified.
Call for White Papers
We welcome 1-2 page white papers soliciting your input on scientific computing workflows. The white papers are voluntary and may be used for generating a report for future DOE funding opportunities. Your white papers will be visible to all the participants of the workshop. A list of suggested topics and driving questions appears below. You may consider but are not limited to addressing one or more of these topics. Please clearly indicate the topic(s) you are addressing.
Due date: Please complete white papers by April 10, 2015 and submit them using this website.
Suggested topics and driving questions:
Science applications / use cases of workflows
What are the needs of science communities in terms of managing large-scale workflows? What are the current practices? What would you like to do? How are the technological trends (better instruments, bigger data, different computing capabilities) going to affect your work?
State of the art in distributed area (DA) and in situ (IS) workflow management systems
What is the state of the art in DA and in IS? What are the strengths and weaknesses of the DA and IS workflows? What are the common functions in DA and IS workflows?
Impact of emerging architectures on workflow systems
How do new extreme scale architectures affect the DA and IS workflows? Possible considerations include power, concurrency, heterogeneity, bottlenecks, scheduling/use policies.
Future needs for extreme-scale DA and IS workflows
What are the challenges for DA and IS workflows going forward? These may include data management, computation scheduling, usability, verification, provenance, fault tolerance, performance, others?
The interface between DA and IS workflow systems
In an overall science campaign, what is the interface between IS and DA workflows? How can the gap be bridged? How is information (data and metadata products) transferred from IS to DA systems?