BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160726Z
LOCATION:D220
DTSTART;TZID=America/Chicago:20181111T145600
DTEND;TZID=America/Chicago:20181111T145800
UID:submissions.supercomputing.org_SC18_sess160_ws_whpc117@linklings.com
SUMMARY:Challenges of Performance Portability for Fortran Unstructured Mes
 h Codes
DESCRIPTION:Workshop\nDiversity, Education, Hot Topics, Workshop Reg Pass\
 n\nChallenges of Performance Portability for Fortran Unstructured Mesh Cod
 es\n\nHsu, Neill Asanza\n\nWhat pathways exist for Fortran performance por
 tability to exascale? Fortran-based codes present different challenges for
  performance portability and productivity. Ideally, we want to develop one
  codebase that can run on many different HPC architectures. In reality, ea
 ch architecture has its own idiosyncrasies, requiring architecture-specifi
 c code. Therefore, we strive to write code that is as portable as possible
  to minimize the amount of development and maintenance effort. This projec
 t investigates how different approaches to parallel optimization impact th
 e performance portability for unstructured mesh Fortran codes. In addition
 , it explores the productivity challenges due to the software tool and com
 piler support limitations unique to Fortran. For this study, we use the Tr
 uchas software, a casting manufacturing simulation code, and develop initi
 al ports for OpenMP CPU, OpenMP offload GPU. and CUDA for computational ke
 rnels. There is no CUDA Fortran compiler compatible with Truchas, it must 
 rewrite kernel in CUDA C and have the interface linked for C function call
 s in Fortran. Meanwhile, only the IBM xlf compiler is supported for OpenMP
  offload GPU at this moment and it is still immature. In additional of the
  difficulty that Fortran brings, the unstructured mesh uses more complex d
 ata access patterns. From the analysis of the Truchas gradient calculation
  computational kernel, we show some success for performance and portabilit
 y along with some issues unique to Fortran using unstructured mesh. Throug
 h this study, we hope to encourage users and venders to focus on the produ
 ctive pathways to developing Fortran applications for exascale architectur
 es.
URL:https://sc18.supercomputing.org/presentation/?id=ws_whpc117&sess=sess1
 60
END:VEVENT
END:VCALENDAR