BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160727Z
LOCATION:D172
DTSTART;TZID=America/Chicago:20181112T112000
DTEND;TZID=America/Chicago:20181112T114500
UID:submissions.supercomputing.org_SC18_sess168_ws_ia106@linklings.com
SUMMARY:Impact of Traditional Sparse Optimizations on a Migratory Thread A
 rchitecture
DESCRIPTION:Workshop\nArchitectures, Data Analytics, Graph Algorithms, Wor
 kshop Reg Pass\n\nImpact of Traditional Sparse Optimizations on a Migrator
 y Thread Architecture\n\nRolinger, Krieger\n\nAchieving high performance f
 or sparse applications is challenging due to irregular access patterns and
  weak locality. These properties preclude many static optimizations and de
 grade cache performance on traditional systems. To address these challenge
 s, novel systems such as the Emu architecture have been proposed. The Emu 
 design uses light-weight migratory threads, narrow memory, and near-memory
  processing capabilities to address weak locality and reduce the total loa
 d on the memory system. Because the Emu architecture is fundamentally diff
 erent than cache based hierarchical memory systems, it is crucial to under
 stand the cost-benefit tradeoffs of standard sparse algorithm optimization
 s on Emu hardware. In this work, we explore sparse matrix-vector multiplic
 ation (SpMV) on the Emu architecture. We investigate the effects of differ
 ent sparse optimizations such as dense vector data layouts, work distribut
 ions, and matrix reorderings. Our study finds that initially distributing 
 work evenly across the system is inadequate to maintain load balancing ove
 r time due to the migratory nature of Emu threads.  In severe cases, matri
 x sparsity patterns produce hot-spots as many migratory threads converge o
 n a single resource. We demonstrate that known matrix reordering technique
 s can improve SpMV performance on the Emu architecture by as much as 70% b
 y encouraging more consistent load balancing. This can be compared with a 
 performance gain of no more than 16% on a cache-memory based system.
URL:https://sc18.supercomputing.org/presentation/?id=ws_ia106&sess=sess168
END:VEVENT
END:VCALENDAR

