One of the greatest research challenges of the 21st century is to effectively understand and leverage the growing wealth of scientific data. To analyze and understand this data, complex computational processes need to be assembled, often requiring the combination of loosely-coupled resources, specialized libraries, distributed computing infrastructure, and Web services.
Workflow (and workflow-based) systems have recently emerged as an alternative to ad-hoc approaches to constructing computational tasks widely used in the scientific community. These systems can capture complex analysis processes at various levels of detail and systematically capture the provenance information necessary for reproducibility, result publication, and sharing. Although the benefits of using workflow systems are well known, the fact that workflows are hard to create and maintain has been a major barrier to wider adoption of the technology in the scientific domain. Constructing complex analysis processes requires expertise in both in the domain of the data being explored and in a number of different analysis and visualization tools. Furthermore, the path from ``data to insight'' requires a laborious, trial-and-error process, where users successively assemble, modify, and execute multiple workflows. Often, it also entails tightly-coupled collaborative efforts.
We advocate a data-centric view of workflow-based computational processes, where the workflows and information about their evolution are stored, along with their impact on the data they manipulate. This information captures detailed provenance of the steps followed in exploratory processes. We propose a new framework that lets users explore and re-use this detailed provenance information through intuitive interfaces. Our framework consists of two key components: a query-by-example interface for querying workflows whereby users query workflows through the same familiar interface they use to create them; and a mechanism for semi-automatically creating and refining workflows by analogy, without requiring users to directly manipulate or edit the workflow specifications.
In this talk, we will describe the framework and we will also demonstrate its use in VisTrails (www.vistrails.org), a publicly-available, open-source system.
Dr Juliana Freire
Juliana Freire joined the faculty of the School of Computing at the University of Utah in July 2005. Before, she was member of technical staff at the Database Systems Research Department at Bell Laboratories (Lucent Technologies) and an Assistant Professor at OGI/OHSU. Juliana's research has focused on extending traditional database technology and developing techniques to address new data management problems introduced by the Web and scientific applications. She is an active member of the database community, having co-authored over 60 technical papers and holding 4 U.S. patents. She has participated as a program committee member in over 40 events. She is a vice-chair of WWW2008; served as vice-chair for ICDE2007 and WWW2005; and co-chaired WebDB2003. Her research has been funded by grants from the National Science Foundation, Department of Energy and the University of Utah.