BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160728Z
LOCATION:D168
DTSTART;TZID=America/Chicago:20181112T140000
DTEND;TZID=America/Chicago:20181112T142500
UID:submissions.supercomputing.org_SC18_sess140_ws_isav112@linklings.com
SUMMARY:In-Transit Molecular Dynamics Analysis with Apache Flink
DESCRIPTION:Workshop\nData Analytics, Data Management, Visualization, Work
 shop Reg Pass\n\nIn-Transit Molecular Dynamics Analysis with Apache Flink\
 n\nColao, Raffin, Mures, Padrón\n\nIn this paper, an on-line parallel anal
 ytics framework is proposed to process and store in transit all the data b
 eing generated by a Molecular Dynamics (MD) simulation run using staging n
 odes in the same cluster executing the simulation. The implementation and 
 deployment of such a parallel workflow with standard HPC tools, managing p
 roblems such as data partitioning and load balancing can be a hard task fo
 r scientists. In this paper we propose to leverage Apache Flink, a scalabl
 e stream processing engine from the Big Data domain, in this HPC context. 
 Flink enables to program analyses within a simple window based map/reduce 
 model, while the runtime takes care of the deployment, load balancing  and
  fault tolerance. We build a complete in transit analytics workflow, conne
 cting an MD simulation to Apache Flink and to a distributed database, Apac
 he HBase, to persist all the desired data. To demonstrate the expressivity
  of this programming model and its suitability for HPC scientific environm
 ents, two common analytics in the MD field have been implemented. We asses
 sed the performance of this framework, concluding that it can handle simul
 ations of sizes used in the literature while providing an effective and ve
 rsatile tool for scientists to easily incorporate on-line parallel analyti
 cs in their current workflows.
URL:https://sc18.supercomputing.org/presentation/?id=ws_isav112&sess=sess1
 40
END:VEVENT
END:VCALENDAR

