BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160903Z
LOCATION:C2/3/4 Ballroom
DTSTART;TZID=America/Chicago:20181115T083000
DTEND;TZID=America/Chicago:20181115T170000
UID:submissions.supercomputing.org_SC18_sess324_post151@linklings.com
SUMMARY:Performance Evaluation of the NVIDIA Tesla V100: Block Level Pipel
 ining vs. Kernel Level Pipelining
DESCRIPTION:Poster\nTech Program Reg Pass, Exhibits Reg Pass\n\nPerformanc
 e Evaluation of the NVIDIA Tesla V100: Block Level Pipelining vs. Kernel L
 evel Pipelining\n\nCui, Scogland, de Supinski, Feng\n\nAs accelerators bec
 ome more common, expressive and performant, interfaces for them become eve
 r more important. Programming models like OpenMP offer simple-to-use but p
 owerful directive-based offload mechanisms. By default, these models naive
 ly copy data to or from the device without overlapping computation. Achiev
 ing performance can require extensive hand-tuning to apply optimizations s
 uch as pipelining. To pipeline a task, users must manually partition the t
 ask into multiple chunks then launch multiple sub-kernels. This approach c
 an suffer from high kernel launch overhead. Also, the hyper parameters mus
 t be carefully tuned to achieve optimal performance. To ameliorate this is
 sue, we propose a block-level pipeline approach that overlaps data transfe
 rs and computation in one kernel handled by different streaming multiproce
 ssors on GPUs. Our results show that, without exhaustive tuning, our appro
 ach can provide 95% to 108% stable performance compared to the best tuned 
 results with traditional kernel-level pipelining on NVIDIA V100 GPUs.
URL:https://sc18.supercomputing.org/presentation/?id=post151&sess=sess324
END:VEVENT
END:VCALENDAR