What: JVM Challenges and Directions in the Multicore Era
Who: JVM Challenges and Directions in the Multicore Era - Dr. Cliff Click The Azul Pauseless GC Algorithm - Dr. Cliff Click
When: October 15, 2008 6:00 PM
Where: Google Office - 111 8th Ave 4th Floor New York NY 10011 - Google Maps
Description:

Google Engineering Offices

76 Ninth Avenue (between 15th/16th St), 4th Floor

New York, NY 10011



GOOGLE-MAPS
Coordinates [map is mouse-draggable]






IF YOU DO NOT REGISTER, YOU WILL NOT GET IN. PLEASE BE
PROMPT.






JVM Challenges and Directions in the Multicore Era



Available core counts are going up, up, up! Intel is shipping
quad-core chips; Sun’s Rock has (effectively) 64 CPUs and
Azul’s hardware nearly a thousand cores. How do we use all
those cores effectively? The JVM proper can directly make use of a
small number of cores (JIT compilation, profiling), and garbage
collection can use about 20 percent more cores than the application
is using to make garbage— but this hardly gets us to four
cores. Application servers and
transactional—J2EE/bean—applications scale well with
thread pools to about 40 or 60 CPUs, and then internal locking
starts to limit scaling. Unless your application (such as a data
mining; risk analysis; or, heaven forbid, Fortran-style
weather-prediction application) has embarrassingly parallel data,
how can you use more CPUs to get more performance? How do you debug
the million-line concurrent program? “Locking”
paradigms (lock ranking, visual inspection) appear to be nearing
the limits of program sizes that are understandable and
maintainable. “Transactions,” the hot new academic
solution to concurrent-programming woes, has its own unsolved
issues (open nesting, “wait,” livelock, significant
slowdowns without contention). Neither locks nor transactions
provide compiler support for keeping the correct variables guarded
by the correct synchronization, such as atomic sets.
Application-specific programming, such as stream programming or
graphics, is, well, application-specific. Tools (debuggers, static
analyzers, profilers) and libraries (JDK Concurrent utilities) are
necessary but not sufficient. Where is the general-purpose
concurrent programming model? Cliff claims that we need another
revolution in thinking about programs, especially for the JVM.



The Azul Pauseless GC Algorithm



Modern transactional response-time sensitive applications have
run into practical limits on the size of garbage collected heaps.
The heap can only grow until GC pauses exceed the response-time
limits. Sustainable, scalable concurrent collection has become a
feature worth paying for. Azul Systems has built a custom system
(CPU, chip, board, and OS) specifically to run garbage collected
virtual machines. The custom CPU includes a read barrier
instruction. The read barrier enables a highly concurrent (no
stop-the-world phases), parallel and compacting GC algorithm. The
Pauseless algorithm is designed for uninterrupted application
execution and consistent mutator throughput in every GC phase.
Beyond the basic requirement of collecting faster than the
allocation rate, the Pauseless collector is never in a
“rush” to complete any GC phase. No phase places an
undue burden on the mutators nor do phases race to complete before
the mutators produce more work. Portions of the Pauseless algorithm
also feature a “self-healing” behavior which limits
mutator overhead and reduces mutator sensitivity to the current GC
state. We present the Pauseless GC algorithm and the supporting
hardware features that enable it.