| Description: |
Google Engineering Offices 76 Ninth Avenue (between 15th/16th St), 4th Floor New York, NY 10011
GOOGLE-MAPS Coordinates [map is mouse-draggable]
IF YOU DO NOT REGISTER, YOU WILL NOT GET IN. PLEASE BE PROMPT.
JVM Challenges and Directions in the Multicore Era
Available core counts are going up, up, up! Intel is shipping quad-core chips; Sun’s Rock has (effectively) 64 CPUs and Azul’s hardware nearly a thousand cores. How do we use all those cores effectively? The JVM proper can directly make use of a small number of cores (JIT compilation, profiling), and garbage collection can use about 20 percent more cores than the application is using to make garbage— but this hardly gets us to four cores. Application servers and transactional—J2EE/bean—applications scale well with thread pools to about 40 or 60 CPUs, and then internal locking starts to limit scaling. Unless your application (such as a data mining; risk analysis; or, heaven forbid, Fortran-style weather-prediction application) has embarrassingly parallel data, how can you use more CPUs to get more performance? How do you debug the million-line concurrent program? “Locking” paradigms (lock ranking, visual inspection) appear to be nearing the limits of program sizes that are understandable and maintainable. “Transactions,” the hot new academic solution to concurrent-programming woes, has its own unsolved issues (open nesting, “wait,” livelock, significant slowdowns without contention). Neither locks nor transactions provide compiler support for keeping the correct variables guarded by the correct synchronization, such as atomic sets. Application-specific programming, such as stream programming or graphics, is, well, application-specific. Tools (debuggers, static analyzers, profilers) and libraries (JDK Concurrent utilities) are necessary but not sufficient. Where is the general-purpose concurrent programming model? Cliff claims that we need another revolution in thinking about programs, especially for the JVM.
The Azul Pauseless GC Algorithm
Modern transactional response-time sensitive applications have run into practical limits on the size of garbage collected heaps. The heap can only grow until GC pauses exceed the response-time limits. Sustainable, scalable concurrent collection has become a feature worth paying for. Azul Systems has built a custom system (CPU, chip, board, and OS) specifically to run garbage collected virtual machines. The custom CPU includes a read barrier instruction. The read barrier enables a highly concurrent (no stop-the-world phases), parallel and compacting GC algorithm. The Pauseless algorithm is designed for uninterrupted application execution and consistent mutator throughput in every GC phase. Beyond the basic requirement of collecting faster than the allocation rate, the Pauseless collector is never in a “rush” to complete any GC phase. No phase places an undue burden on the mutators nor do phases race to complete before the mutators produce more work. Portions of the Pauseless algorithm also feature a “self-healing” behavior which limits mutator overhead and reduces mutator sensitivity to the current GC state. We present the Pauseless GC algorithm and the supporting hardware features that enable it.
|