ࡱ> WRoot Entry F(?@Data Z1Tableb"dWordDocumentY  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUV[\]^_`acdefghijklmnopqrstuvwxyz{|}~XHigh Performance Computing: Crays, Clusters, and Centers. What Next? Gordon Bell and Jim Gray August 2001 Technical Report MSR-TR-2001-76 Microsoft Research Microsoft Corporation 301 Howard Street, #830 San Francisco, CA, 94105 High Performance Computing: Crays, Clusters, and Centers. What Next? Gordon Bell and Jim Gray {GBell, Gray} @ Microsoft.com Bay Area Research Center Microsoft Research August 2001 Abstract: After 50 years of building high performance scientific computers, two major architectures exist: (1) clusters of Cray-style vector supercomputers; (2) clusters of scalar uni- and multi-processors. Clusters are in transition from (a) massively parallel computers and clusters running proprietary software to (b) proprietary clusters running standard software, and (c) do-it-yourself Beowulf clusters built from commodity hardware and software. In 2001, only five years after its introduction, Beowulf has mobilized a community around a standard architecture and tools. Beowulfs economics and sociology are poised to kill off the other architectural lines and will likely affect traditional super-computer centers as well. Peer-to-peer and Grid communities are beginning to provide significant advantages for embarrassingly parallel problems and sharing vast numbers of files. The Computational Grid can federate systems into supercomputers far beyond the power of any current computing center. The centers will become super-data and super-application centers. While these trends make high-performance computing much less expensive and much more accessible, there is a dark side. Clusters perform poorly on applications that require large shared memory. Although there is vibrant computer architecture activity on microprocessors and on high-end cellular architectures, we appear to be entering an era of super-computing mono-culture. Investing in next generation software and hardware supercomputer architecture is essential to improve the efficiency and efficacy of systems. Introduction: Vectors and Clusters High performance comes from parallelism, fast-dense circuitry, and packaging technology. In the 1960s Seymour Cray introduced parallel instruction execution using parallel (CDC 6600) and pipelined (7600) function units, and by 1975 a vector register processor architecture (Cray 1). These were the first production supercomputers. By 1982 Cray Research had synthesized the multi-processor (XMP) structure and vector processor to establish the modern supercomputer architecture. That architecture worked extremely well with fortran because the innermost loops could be carried out by a few pipelined vector instructions, and multiple processors could execute the outermost loops in parallel. Several manufacturers adopted this architecture for large machines (e.g. Fujitsu, Hitachi, IBM and NEC), while others built and delivered mini-supercomputers aka Crayettes (Alliant, Ardent, and Convex) in the early 1980s. In 2001 Cray-style supercomputers remain a significant part (10%) of the market and are vital for applications with fine grain parallelism on a shared memory (e.g. legacy climate modeling and crash codes.) Single node vector supers have a maximum performance. To go beyond that limit, they must be clustered. It has been clear since the early 1980s that clusters of CMOS-based killer-micros would eventually challenge the performance of the vector-supers with much better price-performance and an ability to scale to thousands of processors and memory banks. By 1985, companies such as Encore and Sequent began building shared memory multiple-micro-processors with a single shared bus that allowed any processor to access all connected memories. Combining a cache with the microprocessor reduced memory traffic by localizing memory accesses, and by providing a mechanism to observe all memory transactions. By snooping the bus transactions, a single coherent memory image could be preserved. Bell predicted that all future computers or computer nodes would be multis [Bell, 1985]. A flurry of new multi designs emerged to challenge custom bipolar and ECL minicomputers and mainframes. A cluster is a single system comprised of inter-connected computers that communicate with one another either via a message passing; or by direct, inter-node memory access using a single address space. In a cluster inter-node communication is 10-1000 times slower than intra-node memory access. Clusters with over 1000 processors were called massively parallel processors or MPPs. A constellation connotes clusters made up of nodes with more than 16 processor multis. However, parallel software rarely exploits the shared memory aspect of nodes, especially if it is to be portable across clusters. Tandem introduced its 16-node, uni-processsor cluster architecture in 1975, followed in 1983 by Digital VAXClusters and the Teradatas 1,024 node database machine. This was followed by the IBM Sysplex and SP2 in the early 90s. By the late-90s most manufacturers had evolved their micro-based products to be clusters or multicomputers (Bell and Newell, 1971) -- the only known way to build an arbitrarily large, scalable, computer system. In the late 1990s, SGI pioneered large, non-uniform memory access (NUMA) shared memory clusters. In 1983 ARPA embarked on the Strategic Computing Initiative (SCI) to research, design, build, and buy exotic new, scalable, computer architectures. About 20 research efforts and 40 companies were funded by ARPA to research and build scalable computers to exploit the new technologies. By the mid-90s, nearly all of these efforts had failed. The main benefit was increased effort in scalability and parallelism that helped shift the market to coarse grain parallelism required by a cluster. Several other forces aided the transition to the cluster architecture. They were helped by exorbitant tariffs and by policies that prevented US government agencies from purchasing Japanese supercomputers. Low cost clusters empowered users to find an alternative to hard-to-use, proprietary, and expensive alternatives. The shift from vectors to micro-based clusters can be quantified by comparing the Top500 machines in 1993 with 2001. Clusters and constellations from Compaq, Cray, HP, IBM, SGI, and SUN comprise 90% of the TOP500. IBM supplied 42% of the 500, including the fastest (12.3 Tflops peak with 8192 processors) and slowest (96 Gflops peak with 64 processors). Vector supercomputers, including clustered supers from Fujitsu, Hitachi, and NEC comprise only 10%. NECs 128 processor clustered vector supercomputer operates at a peak of 1.28 Tflops. Based on the ratio of their peak speeds, one vector processor is equal to 6-8 microprocessors. Although supers peak advertised performance (PAP) is very expensive, their real applications performance (RAP) can be competitive or better than clusters on some applications. Shared memory computers deliver RAP of 30-50% of the peak advertised performance (PAP). Clusters typically deliver 5-15%. ([Bailey and Buzbee).]. Table 1.Comparison of computer types in the Top500 between 1993 and 2001Type19932001NumberVendorsNumberVendorsNewDefunct*Scalar1339450636Vector332450301SIMD**3510001*Either computer or the company producing it has ceased to exist. **Single Instruction stream, Multiple Data-operations. An architecture with 16-64 thousand units to exploit VLSI that was abandoned as microprocessors overtook it. High performance computing has evolved into a small, stable, high-priced market for vector supers and constellations. This allows suppliers to lock customers in to a unique hardware-software environment e.g. PowerPC/Linux or sparc/Solaris. Proprietary environments allow vendors to price systems at up to $30K per microprocessor versus 3K$ per slice for commodity microprocessors, and to maintain the margins needed to fund high-end, diseconomies of scale. Enter Beowulf: Commercial Off-the-shelf hardware and software The 1993 Beowulf Project goal was to satisfy NASAs requirement for a one Gflops workstation costing less than $50,000. The idea was to use commercial off-the-shelf (COTS) hardware and software configured as a cluster of machines. In 1994 a 16-node $40,000 cluster built from Intel 486 computers achieved that goal. In 1997, a Beowulf cluster won the Gordon Bell performance/price Prize. By 2000, several thousand-node Beowulf computers were operating. In June 2001, 28 Beowulfs were in the top500 and the Beowulf population is estimated to be several thousand. High schools can now buy and assemble a Beowulf using the recipe How to Build a Beowulf [Sterling, et al 2001]. Beowulf is mostly about software. Beowulf clusters success stems from the unification of public domain parallel tools and applications for the scientific software community. It builds on decades of parallel processing research and on many attempts to apply loosely coupled computers to a variety of applications. Some of the components include: message passing interface (MPI) programming model parallel virtual machine (PVM) programming, execution, and debugging model parallel file systems tools to configure, schedule, manage and tune parallel applications (e.g. Condor, the Maui scheduler, PBS) higher-level libraries e.g. Linpack, BLAS Beowulfs enabled do-it-yourself cluster computing using commodity microprocessors -- the Linux/GNU or Windows 2000 operating system, plus tools that have evolved from the research community. This standard software platform allows applications to run on many computer types thereby fosters competition (and avoids lock-in). Most importantly Beowulf is a convergent architecture that will run over multiple computer generations, and hence protects application investment. Beowulf fosters a community of users with common language, skills, and tools, but with diverse hardware. Beowulf is the alternative to vector supercomputers and proprietary clusters normally found in centers. Centers: Havent We Seen this Movie? Over time, we have seen computation and data migrate from central facilities when no low cost facilities were available, then to distributed VAX minicomputers in the early 80s, then back to a few large NSF and state-supported centers with personal computers for access in the mid-80s, then to fewer, large again back to centers in the late 90s, and now back to build-it-yourself clusters. Beowulfs economics have important socio-economic-political effects. Now individuals and laboratories believe they can assemble and incrementally grow any-size super-computer anywhere in the world. The decision of where and how to compute is a combination of: cost, performance, availability (e.g. resource allocation, application program, ease of access, and service), the applications focus and dataset support, and the need or desire for individual control. Economics is a key Beowulf advantage -- the hardware and software is much less expensive. Centers add a cost factor of 2 to 5. Centers costs are explicit: space (e.g. air conditioning, power, and raised floors for wiring and chilled air ducts), networking, and personnel for administration, system maintenance, consulting, etc. A centers explicit costs are implicit when users build and operate their own centers because home grown centers ride free on their organizational overhead that includes space, networks, and especially personnel. Sociology is an equally important Beowulf advantage. Its standards-setting and community nature, though not usually part of the decision, eliminates a barrier because users have access to both generic and profession-specific programs and talent that centers try to provide. Furthermore, a standard platform enables a market for programs and enhanced technical recognition. The situation is similar to the late 70s when VAX was introduced and Cray users concluded that it was more productive and cost effective to own and operate their own, smaller, focused centers. Scientists left centers because they were unable to get sufficient computing power compared to a single user VAX. Although the performance gap between the VAX and a centers Cray was a factor of 5-10 and could be 100, the performance per price was usually the reverse. By the mid 80s, government studies bemoaned the lack of supercomputer centers and super-computer access for university scientists. These researchers were often competing to make break-throughs with their counter-parts in extremely well funded Dept. of Energy Labs. The various DOE labs had been given the mandate with the Advanced Strategic Computing Initiative (ASCI) to reach 10 Teraflops and Petaflops (1012 and 1015 floating-point operations per second, respectively) levels in 2001 in order to fulfill their role as the nations nuclear stockpile steward. In response NSF established five centers in 1985. Keeping all of the expensive supercomputer centers at the leading edge was neither affordable nor justified, especially in view of the relatively small number of users. To be competitive a center has to be among the worlds largest computers (about two orders of magnitude larger than what a single researcher can afford). In 1999, in response to these realities, NSF reduced the number of supercomputing centers to two. This concentrated enough funding to achieve several Teraflops at each center. The plan was that each year or so, one of the two centers would leapfrog the other with new technology to keep centers at the forefront and provide services that no single user could afford. In 2001, NSF seemed to have forgotten all this and created a third center -- or at least they funded the CPU and memory with what turned out to be the last, Alpha cluster and inherently an orphan.. Storage was unaffordable! The next act is predictable easy to predict: NSF will under-fund all three centers and then eventually discontinue one of them. The viability of individual centers decreases as more centers dilute funding. Some centers claim a role with constellations built from large shared memory multiprocessor nodes. Each of these nodes is more powerful than a Beowulf cluster of commodity PC uni- or dual-processors. The centers idea may already be obsolete in light of Beowulfs, computational Grids, and peer-to-peer computing. Departmental Beowulfs are attractive for a small laboratory because they give low-overhead dedicated access nearly the same capability a large center provides. A center typically allocates between 64 and128 nodes to a job, comparable to the Beowulf that most researchers can build in their labs (like their VAXen two decades earlier). To be competitive, a supercomputer center needs to have at least 1,000 new (less than two years old) nodes, large data storage for each user community, and some asset beyond the scope of a small laboratory. We believe that supercomputer centers may end up being fully distributed computation brokers either collocated with instrumentation sites as in the case of the astronomy community, or centers to support peer-to-peer computing e.g.  HYPERLINK "http://www.seti.org" www.seti.org averaging 10 teraflops from 1.6 million participants who donate their computer time, or  HYPERLINK "http://www.Entropia.com" www.Entropia.com that brokers fully-distributed-problems to internet PCs. We see two possible futures from super-computer centers: Exotic: An application centric vector or cellular supercomputer  HYPERLINK "http://www.research.ibm.com/BlueGene" www.research.ibm.com/BlueGene for an area like weather prediction to run apps that users have been unable to convert to a Beowulf architecture or Japans Earth Observation Research Center Simulator  HYPERLINK "http://www.eorc.nasda.go.jp" www.eorc.nasda.go.jp. Data Center: a concentration of peta-scale datasets (and their applications) in one place so that users can get efficient and convenient access to the data. The various NASA Data Access Archives and Science Data Centers fit this model. The Data Center becomes increasingly feasible with an Internet II delivering 1-10 Gbits per second. Both these models cast the supercomputer center as the steward of a unique resource for specific application domains. Paths to PetaFlops Computing The dark side to Beowulf commodity clusters is they perform poorly on applications that require large shared memory. We are concerned that traditional super-computer architecture is dead and that we are entering a supercomputer mono-culture. At a minimum we recommend increased investment in research on ultra-high-performance hardware-software architectures including new programming paradigms, user interfaces, and especially peta-scale distributed databases. In 1995 a group of eminent architects outlined approaches that would achieve a petaops by 2010 [Sterling, et al 1995]. Their recommendation was three interconnected machines: (1) a 200 Teraflops multi-threaded shared memory architecture; (2) a 10,000- 0.1 Teraflops nodes; and (3) a 1 million 1 Gflops processor in memory nodes. Until recently, Sterling had been pursuing data-flow architectures with radical packaging and circuit technology. IBMs BlueGene is following the third path (a million gigaflops chips) to build a petaflops machine by 2005 geared to protein folding and other embarrassingly parallel tasks with limited memory needs (it has mips:megabyte ratio of 20:1 versus 1:1). IBM is also considering a better balanced machine codenamed Blue Light. Only a small number of unconventional experimental architectures e.g. Berkeleys processor-in-memory are being pursued. Because custom system-on-a-chip experiments are so complex and the tools so limited, we can only afford a few such experiments. Next generation Beowulfs represent the middle path. It has taken 25 years to evolve the crude clusters we have today. The number of processors has stayed below a maximum of 10,000 for at least five years, with very few apps able to utilize more than 100 processors. By 2010, the cluster is likely to be the principal computing structure. Therefore research programs that stimulate cluster understanding and training are a good investment for laboratories that depend on the highest performance machines. Sandias computational plant program is a good example of this ( HYPERLINK "http://www.cs.sandia.gov/cplant/" http://www.cs.sandia.gov/cplant/). Future Investments Continued investments to assure that Moores Law will continue to be valid underlies all of our assumptions about the future. Based on recent advances and predictions, progress is likely to continue for at least another decade. Assuming continued circuit progress, performance will come from a hierarchy of computers starting with multi-processors on a chip. For example, several commodity chips with multiple processing units are being introduced that will operate at 20 Gflops. As the performance of single, multi-processor chips approaches 100 Gflops, a petaflops machine will only need 10,000 units. On the other hand, it is hardly reasonable to expect a revolutionary technology within this time period because we see no laboratory results for near term revolution. Certainly petaflops performance will be achieved by special purpose computers like IBMs Blue Gene project, but they stand alone. SGI builds a shared memory system with up to 256 processors and then clusters these to form a constellation. But this architecture is low-volume and hence expensive. On the other hand, research into high speed interconnections such as Infiniband"!, may make the SGI approach a commodity. It is entirely possible that huge cache-only memory architectures might emerge in the next decade. All these systems require good locality because on-chip latencies and bandwidth are so much better than off-chip. A processor-in-memory architecture or multi-system on a chip will no doubt be part of the high-performance equation. In 2001, the worlds 500 top computers consist of about 100,000 processors, each operating at about one gigaflops. Together they deliver slightly over 100 teraflops.  HYPERLINK "mailto:Seti@home" Seti@home does not run Linpack, so does not qualify in the top500. But  HYPERLINK "mailto:Seti@home" Seti@home averages 13 Tflops, making it more powerful than the top 3 of the top500 machines combined. This suggests that GRID and peer-to-peer computing using the Internet II is likely to remain the worlds most powerful supercomputer. Beowulfs and Grid computing technologies will likely merge in the next decade. When multi-gigabit LANs and WANS become ubiquitous, and when message passing applications can tolerate high latency, the Grid becomes a Beowulf. So all the LAN-based PCs become Beowulfs and together they form the Grid. Progress has been great in parallelizing applications that had been challenging in the past (e.g. n-body problems). It is important to continue on this course to parallelize applications heretofore deemed the province of shared memory multi-processors. These include problems requiring random variable access and adaptive mesh refinement. For example, automotive and aerodynamic engineering, climate and ocean modeling, and applications involving heterogeneous space remain the province of vector multi-processors. It is essential to have the list of challenges to log progress unfortunately, the vector-super folks have not provided this list. Although great progress has been made by computational scientists working with computer scientists, the effort to adopt, understand, and train computer scientists in cluster and constellation parallelism has been minimal. Few computer science departments are working with their counter-parts in other scientific disciplines to explore the application of these new architectures to scientific problems. Acknowledgments An early draft of this paper was circulated and received considerable discussion and constructive criticism. We are especially indebted to David Bailey (NERSC), Bill Buzbee (NCAR), Ken Kennedy (Rice U.), David Lifka (Cornell Theory Center), Ken Miura (U. Kysuhu), George Spix (Microsoft), Tom Sterling (CalTech), Peter Denning (GWU) Erich Strohmaier (LBL), Tadashi Watanabe (NEC) for their constructive comments. It is controversial; so it was not possible for us to accept all comments and suggestions. References Bailey, D.H. and W. Buzbee. Private communication. Bell, C.G. and A. Newell, Computer Structures, McGraw Hill, New York 1971. Bell, C. G., "Multis: A New Class of Multiprocessor Computers", Science, Vol. 228, pp. 462-467 (April 26, 1985). Bell, G., "Ultracomputers: A Teraflop Before Its Time", Communications of the ACM, Vol. 35, No. 8, August 1992, pp 27-45. Earth Observation Research Center  HYPERLINK "http://www.eorc.nasda.go.jp" www.eorc.nasda.go.jp Foster, I. and Kesselman, C. editors The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufman, San Francisco, 1999. IBM BlueGene web site  HYPERLINK "http://www.research.ibm.com/BlueGene" www.research.ibm.com/BlueGene seti@Home web site  HYPERLINK "http://www.seti.org" www.seti.org Sterling, Thomas; Paul Messina; and Paul H. Smith, Enabling Technologies for Petaflops Computing, MIT Press, Cambridge, MA, July 1995 Sterling, T. Beowulf PC Cluster Computing with Windows and Beowulf PC Cluster Computing with Linux, MIT Press, Cambridge, MA, 2001.  : This work has been submitted for publication to the Communications of the ACM. Copyright may be transferred without further notice and the publisher may then post the accepted version. A version of this article appears at  HYPERLINK "http://research.microsoft.com/pubs/" http://research.microsoft.com/pubs/  The Top500 is a world-wide roster of the most powerful computers as measured by Linpack. See  HYPERLINK "http://www.Top500.org" www.Top500.org.  Although NSF is an independent agency that is directly funded by congress, it is subject to varying political winds and climate that include congress persons, conflicting centers and directorate advisory committees, and occasionally its own changing leadership. FGHIab6 N l ثؼq`RA!Hh4Y&hOh #CJaJHh4Y&hOCJaJ!Hh4Y&hOhOCJaJ!hOh 4Y&*CJaJHh4Y&h #CJaJ!Hh4Y&hOh CJaJHh4Y&h Hh4Y&h #B*CJph333Hh4Y&h #"hA`h #4Y&*B*ph333Hh4Y&h #B*ph333)jHh4Y&h #0JB*Uph333h #HIbcop6 O gdOo4Y&d&$a$gdOgdO$a$gd #$a$gd #$a$gd #gd #gd #o4Y&d&F$a$ gd #dά . C I i j k u  ˿˳˨˝~eRBh13cHdhdhdh4Y&$hOh 64Y&*CJaJ0hOh 5CJOJQJ4Y&*CJaJHh4Y&hOCJaJ!hOh 4Y&*CJaJhhh CJaJhhh13CJaJhhhO6CJaJhhh 6CJaJ$hhh 4Y&*6CJaJHh4Y&h #hO(Hh4Y&h #hO5OJQJ^J L##$$$$$Ifa$gdOo4Y&d&$a$gdOo4Y&d&$a$gdOo4Y&d&$a$$a$gdO # '*7?TUt$ǼǥǼǼvnǼvYvvǥǼǼǼ(hA`h 0J64Y&*CJaJhhCJaJ$hA`h 64Y&*CJaJ!Hh4Y&hA`h CJaJhA`hA`CJaJ-hA`h13CJaJcHdhdhdh4Y&hA`h13CJaJ!hA`h 4Y&*CJaJ!hOh 4Y&*CJaJ,hOh :OJQJ4Y&*CJaJ"f g i X!"#########$veveTI8!hA`h 4Y&*CJaJhOh CJaJ!Hh4Y&hOh CJaJ!Hh4Y&hOhOCJaJh13cHdhdhdh4Y&6hOh 0JB*OJQJph4Y&*CJaJ2hOh B*OJQJph4Y&*CJaJ7jhOh 0JU4Y&*jCJUaJ*hOh B*ph4Y&*CJaJ!hOh 4Y&*CJaJ$$$$$"$5$=$>$?$F$L$V$W$^$d$k$m$n$r$s$t$u$z$$$j%K&нЦЦ{бoЦdS!hOh 4Y&*CJaJhA`h13CJaJhA`hA`5CJaJ2hA`h B*CJOJQJph4Y&*aJ!hA`h CJ4Y&*aJhA`h CJaJhA`h135CJaJ$hA`h 5CJ4Y&*aJhA`h 5CJaJ!hA`h 4Y&*CJaJ$hA`h 54Y&*CJaJ$$$ $$ $$Ifa$gdh$Ifgdho4Y&d&$a$]kd$$IfTl^04 laT$$$$"$)$1$5$>${hRRRRRR$$Ifa$gdOo4Y&d&$a$$IfgdOo4Y&d&$a$kd$$IfTlFN p40    4 laT>$?$F$/$IfgdOo4Y&d&$a$kd,$$IfTl֞N ^(88*04 laTF$J$L$P$R$T$V$$$Ifa$gdOo4Y&d&$a$V$W$^$/$IfgdOo4Y&d&$a$kd"$$IfTl֞N ^(88*04 laT^$b$d$g$i$k$m$$$Ifa$gdOo4Y&d&$a$m$n$u$/$IfgdOo4Y&d&$a$kd$$IfTl֞N ^(88*04 laTu$x$z$|$~$$$$$Ifa$gdOo4Y&d&$a$$$j%/+xkd$$IfTl֞N ^(88*04 laTK&P&{/0 00*090?0E0H0M0j0000000011111P7888{9ǼǼǼǼǼǼǼؐؐnW-hhh13CJaJcHdhdhdh4Y&!hhhO4Y&*CJaJ!hhh 4Y&*CJaJ$hOh 64Y&*CJaJ!Hh4Y&hA`h CJaJhhCJaJhA`h13CJaJ!hA`h 4Y&*CJaJ!hOh 4Y&*CJaJ,hOh :OJQJ4Y&*CJaJj%6't'*}++~U & FEƀQY&gdOo4Y&d&$a$gdOo4Y&d&$a$gdOo4Y&d&$a$xgdOo4Y&d&$a$++,YS & FEƀQY&gdOo4Y&d&$a$S & FEƀQY&gdOo4Y&d&$a$,|,,V/{/12YI;IIgdOo4Y&d&$a$xgdOo4Y&d&$a$S & FEƀQY&gdOo4Y&d&$a$S & FEƀQY&gdOo4Y&d&$a$24l6A8s:;??kBlBUDDF{^ & FEƀQY&.gdOo4Y&d&$ hh^ha$$a$gdOgdOo4Y&d&$a$xgdOo4Y&d&$a$ {999999999::!:(:)::::=====> >!>">N>Y>Z>i>?~p~_!Hh4Y&hhh CJaJhhh136CJ]aJ'Hh4Y&hhh 6CJ]aJ$hhh 64Y&*CJaJ!jhhh130JCJUaJhhCJaJhhhhCJaJhhhA`CJaJhhh13CJH*OJQJaJ!hhh 4Y&*CJaJhhh13CJaJ?@@(A)AkBlBUCVCwCxCyCCCCCDDDDDDDDDEmPm9jhOh U4Y&*jCJUaJ%hOh 0J4Y&*CJaJ9jhOh U4Y&*jCJUaJ3jhOh U4Y&*jCJUaJh0YCJaJ7jhOh 0JU4Y&*jCJUaJ$hOh 54Y&*CJaJ!hOh 4Y&*CJaJEEE"E#EEEEEE F FFFOOO,PȵȤȤȵȤt^?^<jhOh B*Uph4Y&*jCJUaJ*hOh B*ph4Y&*CJaJ$hOh 54Y&*CJaJ9jhOh U4Y&*jCJUaJ!hOh 4Y&*CJaJ%hOh 0J4Y&*CJaJ3jhOh U4Y&*jCJUaJ9jhOh U4Y&*jCJUaJFdGGGI@MMRPePRS^WXYZ@]gdOo4Y&d&$a$gdOo4Y&d&$a$\ & FEƀQY&.gdOo4Y&d&$ hh^ha$,P-P.PNPOPRPePQQlRmRRRXX&X޿p`p`pMp3p3jhOh U4Y&*jCJUaJ$hOh 64Y&*CJaJh13cHdhdhdh4Y&!hOh 4Y&*CJaJ(hOh 6CJ4Y&*CJaJ*hOh B*ph4Y&*CJaJ%hOh 0J4Y&*CJaJ<jhOh B*Uph4Y&*jCJUaJBjZhOh B*Uph4Y&*jZCJUaJ&X'X(X1X2XqXrXXXXXX[[\\^^^___``ȵȤȤȵȤwwf[f[fJf!Hh4Y&hhh;hiCJaJhhh13CJaJ!hhh 4Y&*CJaJh13cHdhdhdh4Y&9j hOh U4Y&*j CJUaJ!hOh 4Y&*CJaJ%hOh 0J4Y&*CJaJ3jhOh U4Y&*jCJUaJ9jS hOh U4Y&*jS CJUaJ@]^^``ahaaTbb;cccrdd3ffgͬάϬ$a$gd #gdOo4Y&d&$h^h`a$gdOo4Y&d&$a$gdOo4Y&d&$a$gdOo4Y&d&$a$``0`2`G`I``````````haaaab+bvbwbbbǥǥu[>9j hhh U4Y&*j CJUaJ3jhhh U4Y&*jCJUaJ$hhh >*4Y&*CJaJhhh13>*CJaJ!hOh 4Y&*CJaJ!Hh4Y&hhh CJaJ!hhh 4Y&*CJaJ!Hh4Y&hhh;hiCJaJhhhhCJaJhhh13CJaJ!hhh;hi4Y&*CJaJbbbbbcPcQcRcccccccccccccdLdyWyy:9j= hhh U4Y&*j= CJUaJBj hhh B*Uph4Y&*j CJUaJ<jhhh B*Uph4Y&*jCJUaJ*hhh B*ph4Y&*CJaJ$hhh 64Y&*CJaJ!hhh 4Y&*CJaJ%hhh 0J4Y&*CJaJ3jhhh U4Y&*jCJUaJLddddddddeeee f ff1f2f3f4fƵm]A7jh [h0Y0JUSY&*jCJUaJHh4Y&h0Y0JCJaJ&j Hh4Y&h0YCJUHh4Y&h0YCJ jHh4Y&h0YCJUHh4Y&h0Y5\Hh4Y&h0Y jHh4Y&h0Y0JU)hhh OJQJ4Y&*CJaJ$hhh 64Y&*CJaJ!hhh 4Y&*CJaJ4f@ffffffffffffgg̬ͬάЬѬ׬ج٬۬Աԋtjfjd^fZfPJPJf h0Y0Jjh0Y0JUh [ h0YCJUh0Yjh0Y0JU-h [h0Y0JOJQJSY&*CJaJJj h [h0YB*OJQJUphSY&*j CJUaJDjh [h0YB*OJQJUphSY&*jCJUaJ2h [h0YB*OJQJphSY&*CJaJ!h [h0YSY&*CJaJAt a center (with approximately 600 SP2 processors), one observed: 65% of the users ran on more than 16 processors; 24% on more than 32; 4% on more than 64; 4% on more than 128; and 1%on more than 256. PAGE  27 August 2001  PAGE 7 FINAL DRAFT Submitted to CACM   Jim GrayϺǦ۔{)hhh OJQJ4Y&*CJaJh [h0YHh4Y&h0Y0JCJ'h0Y0JCJcHdhdhdh4Y&h [0JCJmHnHuh0Y0JCJjh0Y0JCJU h0YCJHh4Y&h0YCJ#h0YCJcHdhdhdh4Y&SummaryInformation( 0DocumentSummaryInformation8XCompObjj0TableX~$$If!vh5^#v^:V l05^4T$$If!vh55p54#v#vp#v4:V l055p544T$$If!vh5585855*55#v#v8#v#v*#v#v:V l055855*554T$$If!vh5585855*55#v#v8#v#v*#v#v:V l055855*554T$$If!vh5585855*55#v#v8#v#v*#v#v:V l055855*554T$$If!vh5585855*55#v#v8#v#v*#v#v:V l055855*554TDyK  www.seti.orgyK *http://www.seti.org/DyK www.Entropia.comyK 2http://www.entropia.com/DD/BlueGeneyK Jhttp://www.research.ibm.com/BlueGeneDyK www.eorc.nasda.go.jpyK :http://www.eorc.nasda.go.jp/DyK !http://www.cs.sandia.gov/cplant/yK Bhttp://www.cs.sandia.gov/cplant/DyK  Seti@homeyK "mailto:Seti@homeDyK  Seti@homeyK "mailto:Seti@homeDyK www.eorc.nasda.go.jpyK :http://www.eorc.nasda.go.jp/DyK yK Jhttp://www.research.ibm.com/BlueGeneDyK  www.seti.orgyK *http://www.seti.org/DyK yK Hhttp://research.microsoft.com/pubs/DyK www.Top500.orgyK .http://www.top500.org/!H@H Normal d[$CJ_HaJmH sH tH ^@B^ # Heading 1$<@&[$5CJ KH OJQJ\^JaJ `@` Heading 2$ & F<@&56CJOJQJ\]^JV@V Heading 3$<@&5CJOJQJ\^JaJDA@D Default Paragraph FontVi@V  Table Normal :V 44 la (k(No List F@F  Footnote Text x[$CJaJ@&@@ Footnote ReferenceH*4/@4 Listh^h`H@"H  Balloon TextCJOJQJ^JaJV>@2V Title$<@&a$5CJ KHOJQJ\^JaJ 4B@B4 Body Text$a$4 @R4 Footer  !.)@a. Page Number4@r4 Header  !<^@< Normal (Web) d\$6U@6 Hyperlink >*B*phFV@F FollowedHyperlink >*B* phB'B Comment ReferenceCJaJ<<  Comment TextCJaJ@j@ Comment Subject5\,L@, #Date [$:O: #Author$[$a$aJFO12F #Style1 @& [$CJOJQJ\^JaJFf5(9_<_>?VWmn__900 D0 _900 E08.@ _900 LE0_900 E0_900 EPS_90 0 x0_900 HIbcop6O  L ")15>?FJLPRTVW^bdgikmnuxz|~j6t"}###$|$$V'{')*,l.A0s2377k:l:U<<>d???A@EERHeHJK^NOPQ@TUUWWXhXXTYY;ZZZr[[3]]^__________ 00000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00(00606 06 06 06 06 0606(00V'0V'0V'0V'0V'0V'0V'0V'0V'0V'0V'0V'0V' 0V' 0V'0V'(00?0?0?0?(00RH0RH0RH0RH0RH0RH0RH0RH(00U(00W0W0W0W0W0W0W0W0W0W@0@0@0@0 0S@0@0@0@0@0@0@0 0S#>?VWmn[___90H s0 ]9008:Z0 ]9008:Z0 ]900:Z ]900:Z]900;Z 0 S]90]90p]90h HHHK $K&{9?E,P&X`bLd4f۬479:;EJKLNOQRSTX$$>$F$V$^$m$u$$j%+,2F@]Ϭ58<=>?@ABCDFGHIMPW6U;x;;;<<<="=== >G-HNHO'O1OqOOOvYYYQZZZZZZ_XXXXXXXXXX $&K!!t:X4X8@0(  B S  ? _Hlt518904225 _Hlt518391525 _Hlt518390700!<Z_@@@!<Z_\_b e g p %/  77l:n:o:v:w:{:R?W???BBUDbDGG(O1O@OGOOOOOVVVVVV8WBWWXYYZZ[[]]]]]^________7? _!j!6!666NNXY[[3]^________333333333cp{=?]#}##$l:T;[[]]]]_______ gordon bell gordon bell gordon bell gordon bell gordon bell gordon bell gordon bell gordon bell gordon bellJim GrayM ^GI I9VBXB3m ?FJLPRTVW^bdgikmnuxz|~_@|4LM^_p@pTp@pX@UnknownJim GrayGz Times New Roman5Symbol3& z Arial3z Times71 Courier5& z!Tahoma?5 z Courier New;Wingdings"1hTY&TY&w3X& >N. >N.!4d[[3QH?O~After 50 years of building high performance scientific computers, two major architectures exist: (1)  Cray-style vector superJim GrayJim Gray`                  FMicrosoft Word Document MSWordDocWord.Document.89q՜.+,D՜.+,| hp  [Microsoft Corporationo.[A After 50 years of building high performance scientific computers, two major architectures exist: (1) Cray-style vector super TitleOh+'0(4HT`t     After 50 years of building high performance scientific computers, two major architectures exist: (1) Cray-style vector superfte Jim Grayyeaim im  Normal.dota Jim Grayta3m Microsoft Word 10.0@^в@@4 ?@$g? >N@ L|_AdHocReviewCycleID_EmailSubject _AuthorEmail_AuthorEmailDisplayName ZhRevision of MSR-TR-2001-76gray@microsoft.com՜.+,0| hp  [Microsoft Corporationo.[A After 50 years of building high performance scientific computers, two major architectures exist: (1) Cray-style vector super Title  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXY[\]^_`acdefghijklmnopqrstuvwxyz{|}~Root Entry F`)?Data Z1Tableb"dWordDocument=SummaryInformation(DocumentSummaryInformation8CompObjj  FMicrosoft Word Document MSWordDocWord.Document.89qOh+'0(4HT`t     After 50 years of building high performance scientific computers, two major architectures exist: (1) Cray-style vector superfte Jim Grayyeaim im  Normal.dota Jim Grayta3m Microsoft Word 10.0@^в@@4 ?@$g? >N՜.+,0| hp  [Microsoft Corporationo.[A After 50 years of building high performance scientific computers, two major architectures exist: (1) Cray-style vector super Title;1h:pO/ =!"#$%C4Y&92!" Revised September 17 7 ϬЬ٬ڬ۬@jnp$h^h`a$gdO$a$gd #gdOo4Y&d&$h^h`a$h]h&`#$ ۬@jlnpϺǦ۔{soboShhh CJOJQJaJh[0JCJmHnHuh[h #B*ph333)hhh OJQJ4Y&*CJaJh [h0YHh4Y&h0Y0JCJ'h0Y0JCJcHdhdhdh4Y&h [0JCJmHnHuh0Y0JCJjh0Y0JCJU h0YCJHh4Y&h0YCJ#h0YCJcHdhdhdh4Y&!H@H Normal d[$CJ_HaJmH sH tH ^@B^ # Heading 1$<@&[$5CJ KH OJQJ\^JaJ `@` Heading 2$ & F<@&56CJOJQJ\]^JV@V Heading 3$<@&5CJOJQJ\^JaJDA@D Default Paragraph FontVi@V  Table Normal :V 44 la (k(No List F@F  Footnote Text x[$CJaJ@&@@ Footnote ReferenceH*4/@4 Listh^h`H@"H  Balloon TextCJOJQJ^JaJV>@2V Title$<@&a$5CJ KHOJQJ\^JaJ 4B@B4 Body Text$a$4 @R4 Footer  !.)@a. Page Number4@r4 Header  !<^@< Normal (Web) d\$6U@6 Hyperlink >*B*phFV@F FollowedHyperlink >*B* phB'B Comment ReferenceCJaJ<<  Comment TextCJaJ@j@ Comment Subject5\,L@, #Date [$:O: #Author$[$a$aJFO12F #Style1 @& [$CJOJQJ\^JaJFo5-9_<_DE\]st__900 D0 _900 E08.@ _900 LE0_900 E0_900 EPS_90 0 x0R HIbcopKd ]  (/7;DELPRVXZ\]dhjmoqst{~p<z%"##$$$$\'' )*,r.G0x2377p:q:Z<<>i???AEEEWHjHJKaN OPQATUUWWXiXXUYY$F$V$^$m$u$$j%+,2F@]Ϭp58<=>?@ABCDFGHIMPW6Z;};;; <<< ='===>H2HSH O*O4OtOOOwYYYRZZZZZZ_XXXXXXXXXX  !!t:X4X8@0(  B S  ? _Hlt518904225 _Hlt518391525 _Hlt518390700!<Z_@@@!<Z_qtv y { %.8 77W?\???B"BZDgDGG+O4OCOJOOOOOVVVVVV9WCWWXYYZZ[]]___HPe!p!$6&666NNYY[___333333333cpCEc##$$[]]_______ gordon bell gordon bell gordon bell gordon bell gordon bell gordon bell gordon bell gordon bell gordon bellJim GrayM ^GI I9VBXB3m >>>!>?i???@AABCDEEEEFGGHHH1H2H3HSHTHWHjHIIJpJJJJKKLMNaNO O O)O*O+O4O5OtOuOOOOOOPPQQRRSSTATUUUUUVVVVWWWW3WHWJWWWWWWWWWWXXiXXYY,YUYwYxYYYYYYYYZZq@qPqbp@p @p"@p$@p&@p(@pN@pz@q@q@p@ql@p@q@p@ q2@pH@pJ@qL@ q\@ q`@q@q\@ q@q@ q@q@q@qX@ qz@q@q@q@p@ q@ p@q@ q@q @p @q @q!@q"@q "@q#@q$@q8$@qF$@q&@p(@q)@q*@q+@ q,@qT,@q-@ q-@ q-@q-@ qn.@q~.@q.@ q.@q.@ q/@q/@q/@q/@q/@q/@p0@q0@q2@q3@q3@q3@q3@ q4@qH4@ pL5@qR5@q6@p8@q9@q9@ q9@q:@p<@qd=@q>@p?@q?@q@@q@@q@@q@@qA@q A@qA@qB@qB@qD@q$D@qF@q>G@ q@G@qfG@ qhG@ pjG@plG@ qpG@pH@pH@ pH@pH@ pH@ p"H@ p$H@ p&H@ p4H@ pDH@pRH@pbH@pjH@ p|H@ p~H@ pH@ pH@ pH@pH@pH@pH@pH@ pH@ pH@ pH@ pH@pH@pH@pH@pH@ pH@ pH@ pH@ pH@pH@pH@pI@pI@ pI@qJ@qL@qL@qL@pN@plN@qN@qP@qR@pT@q>T@pV@pV@p^W@qW@pX@p"X@pX@qLY@qZ@q\@p^@p^@q^@q`@q`@ q`@q2`@ qT`@qr`@ q~`@q`@ q`@qa@ q,a@q:a@ qZa@ qva@qa@ qa@pb@qb@q6c@qFc@qfc@qvc@pd@qe@qf@ph@qi@qj@pl@ql@qn@qn@pp@qp@qq@qq@qr@qr@ qfs@qxs@ q~s@qs@ q4t@qBt@ pRt@qt@q6u@ qLu@pv@qw@qx@qz@q{@ q{@q{@q{@ q{@q|@q |@ q@|@ qB|@qD|@q|@ q|@ q|@p~@p ~@q@q@q@q@q@qP@qR@p@pք@q؄@q@q@q@q@q@q@q @q @q@q‡@q@q @q@q@q0@p2@p@q @q,@q@q@q@q@q@q @qD@qF@q@q@q@q@q@q@q@p@q"@q8@p@pȎ@p@q@q@p@q@q@q@q@p@p@q@q@q@qx@q@q@q@qX@qZ@q\@q@p@p@qʠ@q@qn@q@qڤ@qZ@pb@q@p@q@qTq@p@q@p@q@q@qL@qN@qP@qb@qd@q@q@q @q"@q$@q6@q8@p@q@p@qb@q@qH@q@qX@p@q@p@p@qƽ@q̽@ qؽ@q@q4@ qb@qڿ@ q@ q @q:@q<@ qd@ q@q@q@ q @ q@ q.@ q0@ q6@ p:@p@q@p@p:@p@q@q@q"@pV@q@q@q@q@@qB@qD@ql@pn@qr@q@q@p(@qv@q@q@q@q@q @q @qF@pH@qL@qr@qt@q@q@q@q@p@q@q@q>@p@q@q@q@@qB@q@p.Z@q@q@q@qf@qj@q@q@q@q@q@q@qb@pd@qf@!qh@!q@!q$@!q&@!ql@!qn@!qp@!q@!q@!p@!q@#q@!p@!q@#pX@#phpY@#qY@qY@#qjqY@%qY@#q&Z@p(Z@#plp.Z@UnknownJim GrayGz Times New Roman5Symbol3& z Arial3z Times71 Courier5& z!Tahoma?5 z Courier New;Wingdings"qhTY&VY&w3X& >N. >N.!24d[[ 3QH?O~After 50 years of building high performance scientific computers, two major architectures exist: (1)  Cray-style vector superJim GrayJim Gray`                ;@ bjbjkk  [8J{"(Z!=\\\\\\\$`}RF]'MMM]({(!!!Ml\!M\!!"N7Y?[ o? Z0[DP{<{[0M!`?[?[XMM!MMMMM]]D!