BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160728Z
LOCATION:D167/174
DTSTART;TZID=America/Chicago:20181112T163000
DTEND;TZID=America/Chicago:20181112T170000
UID:submissions.supercomputing.org_SC18_sess151_ws_mlhpce121@linklings.com
SUMMARY:Optimizing Machine Learning on Apache Spark in HPC Environments
DESCRIPTION:Workshop\nDeep Learning, Machine Learning, Workshop Reg Pass\n
 \nOptimizing Machine Learning on Apache Spark in HPC Environments\n\nLi, D
 avis, Jarvis\n\nMachine learning has established itself as a powerful tool
  for the construction of decision making models and algorithms through the
  use of statistical techniques on training data. However, a significant im
 pediment to its progress is the time spent training and improving the accu
 racy of these models. A common approach to accelerate this process is to e
 mploy the use of multiple machines simultaneously, a trait shared with the
  field of High Performance Computing (HPC) and its clusters. However, exis
 ting distributed frameworks for data analytics and machine learning are de
 signed for commodity servers, which do not realize the full potential of a
  HPC cluster.\n\nIn this work, we adapt the application of Apache Spark, a
  distributed data-flow framework, to support the use of machine learning i
 n HPC environments for the purposes of machine learning. There are inheren
 t challenges to using Spark in this context;  memory management, communica
 tion costs and synchronization overheads all pose challenges to its effici
 ency. To this end we introduce: (i) the application of MapRDD, a fine grai
 ned distributed data representation; (ii) a task-based all-reduce implemen
 tation; and (iii) a new asynchronous Stochastic Gradient Descent (SGD) alg
 orithm using non-blocking all-reduce. We demonstrate up to a 2.6x overall 
 speedup (or a 11.2x theoretical speedup with a Nvidia K80 graphics card), 
 when training the GoogLeNet model to classify 10% of the ImageNet dataset 
 on a 32-node cluster. We also demonstrate a comparable convergence rate us
 ing the new asynchronous SGD with respect to the synchronous method.
URL:https://sc18.supercomputing.org/presentation/?id=ws_mlhpce121&sess=ses
 s151
END:VEVENT
END:VCALENDAR

