BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160726Z
LOCATION:D220
DTSTART;TZID=America/Chicago:20181111T141200
DTEND;TZID=America/Chicago:20181111T141500
UID:submissions.supercomputing.org_SC18_sess160_ws_whpc103@linklings.com
SUMMARY:Optimizing Python Data Processing for the DESI Experiment on the N
 ERSC Cori Supercomputer
DESCRIPTION:Workshop\nDiversity, Education, Hot Topics, Workshop Reg Pass\
 n\nOptimizing Python Data Processing for the DESI Experiment on the NERSC 
 Cori Supercomputer\n\nStephey, Thomas, Bailey\n\nThe goal of the Dark Ener
 gy Spectroscopic Instrument (DESI) experiment is to better understand dark
  energy by making the most detailed 3D map of the universe to date. The im
 ages obtained each night over a period of 5 years starting in 2019 will be
  sent to the NERSC Cori supercomputer for processing and scientific analys
 is. \n\nThe DESI spectroscopic pipeline for processing these data is writt
 en exclusively in Python. Writing in Python allows the DESI scientists to 
 write very readable scientific code in a relatively short amount of time. 
 However, the drawback is that Python can be substantially slower than more
  traditional HPC languages like C, C++, and Fortran. \n\nThe goal of this 
 work is to increase the efficiency of the DESI spectroscopic data processi
 ng at NERSC while satisfying their requirement that the software remain in
  Python. As of this writing we have obtained speedups of over 6x and 7x on
  the Cori Haswell and KNL partitions, respectively. Several profiling tech
 niques were used to determine potential areas for improvement including Py
 thon's cProfile, line_profiler, Intel Vtune, and Tau. Once we identified e
 xpensive kernels, we used the following techniques: 1) JIT-compiling hotsp
 ots using Numba (the most successful strategy so far), 2) reducing MPI dat
 a transfer where possible (i.e. replacing broadcast operations with scatte
 r), and 3) re-structuring the code to compute and store important data rat
 her than repeatedly calling expensive functions. We will continue using th
 ese strategies and also explore the requirements for future architectures 
 (for example, transitioning the DESI workload to GPUs).
URL:https://sc18.supercomputing.org/presentation/?id=ws_whpc103&sess=sess1
 60
END:VEVENT
END:VCALENDAR

