BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160725Z
LOCATION:D173
DTSTART;TZID=America/Chicago:20181111T103000
DTEND;TZID=America/Chicago:20181111T105500
UID:submissions.supercomputing.org_SC18_sess163_ws_works101@linklings.com
SUMMARY:Reduction of Workflow Resource Consumption Using a Density-based C
 lustering Model
DESCRIPTION:Workshop\nReproducibility, Scientific Computing, Scientific Wo
 rkflows, Workflows, Workshop Reg Pass, HPC, Data Intensive\n\nReduction of
  Workflow Resource Consumption Using a Density-based Clustering Model\n\nZ
 hang, Kremer-Herman, Tovar, Thain\n\nOften times, a researcher running a s
 cientific workflow will ask for orders of magnitude too few or too many re
 sources to run their workflow. If the resource requisition is too small, t
 he job may fail due to resource exhaustion; if it is too large, resources 
 will be wasted though job may succeed. It would be ideal to achieve a near
 -optimal number of resources the workflow runs to ensure all jobs succeed 
 and minimize resource waste. We present a strategy for solving the resourc
 e allocation problem: (1) resources consumed by each job are recorded by a
  resource monitor tool; (2) a density-based clustering model is proposed f
 or discovering clusters in all jobs; (3) a maximal resource requisition is
  calculated as the ideal number of each cluster. We ran experiments with a
  synthetic workflow of homogeneous tasks as well as the bioinformatics too
 ls Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of 
 resource consumption of a workflow, the clustering allowed by the model, a
 nd its usefulness in real workflows. In Lifemapper, the least time saving,
  cores saving, memory saving, and disk saving are 13.82%, 16.62%, 49.15%, 
 and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores sa
 ving, memory saving and disk saving are 50%, 90.14%, and 51.82%, respectiv
 ely.  Compared with fixed resource allocation strategy, our approach provi
 de a noticeable reduction of workflow resource consumption.
URL:https://sc18.supercomputing.org/presentation/?id=ws_works101&sess=sess
 163
END:VEVENT
END:VCALENDAR

