DL_POLY Benchmarks DL_POLY Benchmarks

D L_P O L Y

PERFORMANCE of VARIOUS COMPUTERS in COMPUTATIONAL CHEMISTRY

Contents


ABSTRACT

This report compares the performance of a number of different computer systems using the DL_POLY software. The benchmark suite used includes a set of six MD calculations using the DL_POLY molecular simulation code. The comparison involves forty nine computers, including scientific workstations from IBM, Sun, Hewlett Packard, Digital and Silicon Graphics, and Pentium-based PCs.

1. INTRODUCTION

Workstations that have been benchmarked, include those from

We should stress from the outset that our access to much of the hardware evaluated herein has been at best short lived, and has often involved the temporary loan or donation of machines as part of one of the hardware evaluation exercises run at the Daresbury Laboratory. In many cases these machines were not optimally configured in terms of either memory, or high speed disk, and consideration of the results presented here should be viewed in that light.

Following an introductory evaluation of hardware based on the SPEC Benchmarks (section 2), we present in sections 3 and 4 results using the DL_POLY simulation program [1].

Note that the present results are taken from a more detailed report on computational chemistry benchmarks; the associated MS powerpoint presentation is also available.

2. The SPEC BENCHMARKS

One of the most useful indicator of CPU performance is provided by the SPEC (``Systems Performance Evaluation Corporation'') benchmarks. This benchmark suite contains non-tuned application-based code to measure processor speed for both integer (SPECint) and floating point (SPECfp) arithmetic. While earlier versions of the suite (e.g. SPECmark89) had certain well-advertised flaws, the more recent offerings, SPECfp95 and SPECint95 have become industry standards in measuring primarily the performance of a system's processor, memory architecture, operating system and compiler.

SPECfp95 is derived from the results of ten floating-point benchmarks compiled with aggressive optimization. It is the geometric mean of ten normalized ratios (one for each floating-point benchmark). SPECint95 is derived from the results of eight integer benchmarks compiled with aggressive optimization. It is the geometric mean of eight normalized ratios (one for each integer benchmark) Note that the level of optimization is not mandated. While highly aggressive optimization is permitted, results derived from benchmarks compiled with conservative optimization (as in SPECbase) can be submitted.

SPECfp95 and SPECint95 results for many of the CPUs discussed in this paper are given in Table 1.

An analysis of trends in SPECfp95 ratings up to mid 1998 suggested that the leading CPUs from each of the key workstation vendors exhibited comparable performance figures. As of July 1998, the P2SC/160 CPU in the IBM RS/6000-397 exhibited the most impressive SPECfp95 rating. With a value of 26.6, the P2SC was marginally faster than the HP PA-9000/C240, 1.2 times faster than the HP PA-9000/C200 and SUN Enterprise HPC4500/336, 1.3 times faster than the DEC Alpha 8400/EV5-625 and SUN Ultra-2/300, and 1.4 times faster than the R10k-based SGI Origin2000/195.

This picture changed quite drastically with the arrival of both the EV6/A21264 from Compaq/Digital and the PA8500 CPU from Hewlett Packard, and more recently the EV67/A21264A from Compaq and PA8600 from HP. The 667 MHz EV67 exhibits SPECfp95 ratings of 72.20 and 65.50 in the Compaq ES40 and XP1000 respectively, while the 500 MHz EV6/A21264 show ratings of 58.7 and 57.7 in the Compaq DS20 and Compaq ES40 respectively. The figure of 54.00 for the 440 MHz PA8500 CPU in the HP PA-9000/J500 is almost identical to the 667 MHz EV67 in the API UP2000/6-667 (noticeably slower than that in the Compaq XP1000). The figure of 52.40 in the EV6-based Compaq PW XP1000/500 is almost identical to that of the 400 MHz PA8500 CPU in the HP PA-9000/C3000 (52.40). While the Winterhawk-II CPU from IBM is of comparable performance (SPECfp95 of 50.90), the current range of leading CPUs from other vendors, notably SUN and Silicon Graphics, are markedly inferior e.g., 34.4 for the SGI 300 MHz R12k in the Origin 2000), and 27.90 for the SUN UltraSPARCIIi in the Ultra80/450. The 200 MHz RS6000 power3 CPU in the IBM/RS6000-43P Model 260 shows comparable performance, with a SPECfp rating of 30.10. The following points should be noted regarding the values specified in this Table;

  1. SPEC ratings for a number of the systems from SGI have not been published, and are estimates based on related CPUs. A number of other vendors have only published SPECbase_fp95 values, and not SPECfp95 ratings.
  2. Note that values quoted for the Cray T3E systems in Table 1 are estimates based on extrapolations from the corresponding Digital EV5 specifications. No SPEC figures have ever been submitted by Cray/SGI for the T3D or T3E series.

Using the Compaq PW XP1000/667 value of 65.5 to normalises the SPECfp ratings, we would expect the XP1000 and ES40/6-667 to be somewhat ahead of the APU UP2000/6-667 and other EV6-based machines (the Compaq Alpha DS20, XP1000/500, ES40/500, DS20/500, Alpha GS140, and Alpha 8400/6-575) and the PA8500-based systems (the HP PA-9000/J5000, PA-9000/C3000 and PA-9000/N4000). These eleven machines, together with the IBM RS/6000-SP/375 appear far superior to the remainder. Based on this performance metric, the PW XP1000/667 is seen to be 2.2 times faster than the power3-based 200 MHz IBM RS/6000-43P/260 and 1.9 times the 300 MHz R12k-based SGI Origin2000. All other CPUs are projected to be significantly less than half the speed.

The EV67-, PA8500- and EV6-based machines from Compaq/DEC and HP also seen to dominate the SPECint95 ratings, with typical values of 37, 32-34 and 24-27 respectively. The 667 MHz EV67 is rated at twice the SGI Origin2000/R12k and SUN Ultra80/450. The value of 24.40 for the IBM RS/6000-SP/375 is seen to be competitive with the EV6-based CPUs. We also note that the SPECint95 ratings suggest that Pentium III/550 is 1.63 times slower than the Compaq XP1000/667, while the SPECfp95 ratings point to the Pentium being a factor of 4.8 times slower. Corresponding values for the AMD Athlon K7/600 are reduced to 1.34 and 3.34 respectively.

When considering the present benchmarking results, there are several factors we wish to consider in assessing the usefulness of the SPEC ratings;

i. Do the SPECfp95 values provide a reliable metric for evaluating the capabilities of hardware in computational chemistry? If so, we would expect to find a close mapping of the ratios for the various chemistry benchmarks onto the SPECfp ratios;

ii. Does any particular CPU consistently ``underperform'' based on the SPECfp criteria? - this would manifest itself as the ratios from the chemistry benchmarks falling below the SPECfp ratios. In particular we shall look for indicators of the memory problems of the SGI O2-R10k [2] impacting on the benchmarks.

We will attempt to address these issues below. Finally, we note that A SPEC FAQ describing the SPEC benchmark suite and the SPEC consortium is periodically posted to comp.benchmarks, and can be found on the WWW at

http://www.specbench.org/spec/faq

An excellent summary of the SPEC benchmarks that is periodically updated is available via anonymous ftp from ftp.cs.toronto.edu in the file /pub/spectable More SPEC-related information is available at the SPEC WWW site,

http://www.specbench.org

and at the Performance Database Web site,

http://performance.netlib.org/performance/html/spec .html#specsite.

Finally, we note that the next generation of SPEC benchmarks, SPEC CPU2000,

http://www.specbench.org/osg/cpu200 0/results/cpu2000.html

has been announced and will replace SPEC95 during the year. At present there is only a limited set of CPU2000 results available and we have not included these in the present report.

3. THE DL_POLY BENCHMARK

The benchmark summarised below is designed to reflect the typical range of simulations undertaken by the molecular dynamicist. It includes 6 calculations carried out using the DL_POLY molecular dynamics code, and includes the following functionality;

The data presented in Table 2 is collected under control of the UNIX command time where available, and includes CPU time (both user and system), total elapsed time and Efficiency, measured as CPU versus elapsed. The total user CPU timings of Table 2 refer to the summed user CPU timings over all 6 calculations of the benchmark. Note that in contrast to the QC benchmark, little I/O is performed by the DL_POLY calculations, so that efficiency should always be high assuming the benchmarks were conducted on a dedicated resource.

The total CPU timings of Table 2 suggest that the Digital/Compaq Alpha EV67 CPU is dominant, with the ES40/667 and XP1000/667 showing comparable run times (10.8 and 11.2 mins. respectively), both ca. 1.2 times faster than the same CPU in the API UP2000/667. The EV67 is seen to be ca. 1.2-1.3 faster than the EV6-based machines from Compaq, the AlphaServer GS140 (13.9 minutes), AlphaServer DS20 and DS40 (14.3 and 14.5 minutes respectively) and XP1000/500 (14.8 mins). The EV67-based Compaq XP1000/667 outperforms the EV5-based DEC Alpha 8400/5-625 (19.8 mins.) and Alpha PW/600AU (20.2 mins.) by a factor of 1.78. Of the 11 leading machines of Table 17, only two are not Alpha-based, with the Hewlet Packard (HP PA-9000/N4000) and Silicon Graphics (SGI Origin2000/R12k) outperformed by the XP1000/667 by factors of 1.42 and 1.67 respectively.

When considering the performance of the CPUs from SUN, IBM and Hewlett Packard, we would note the following:

4. SUMMARY

As a summary of this work, we present in Table 3 the relative performance of 93 of the leading CPUs against the Compaq XP1000/667 in terms of the SPECfp95 and SPECint95 benchmarks, and those from the present DL_POLY evaluation and from the Matrix-97, Chemistry Kernels and GAMESS-UK benchmarks (detailed in computational chemistry benchmarks).

Based on the published SPECfp95 ratings, and normalising with respect to the Compaq XP1000/667 value of 65.5, we would expect (see section 1) the XP1000 and ES40/6-667 to be somewhat ahead of the API UP2000/6-667 and other EV6-based machines (the Compaq Alpha DS20, XP1000/500, ES40/500, DS20/500, Alpha GS140, and Alpha 8400/6-575) and the PA8500-based systems (the HP PA-9000/J5000, PA-9000/C3000 and PA-9000/N4000). These eleven machines, together with the IBM RS/6000-SP/375 appear far superior to the remainder. Based on this performance metric, the PW XP1000/667 is seen to be 2.2 times faster than the power3-based 200 MHz IBM RS/6000-43P/260 and 1.9 times the 300 MHz R12k-based SGI Origin2000. All other CPUs are projected to be significantly less than half the speed. Based on these relative SPECfp values given in the table, we expect a factor of 52.3 between the fastest and slowest processor, the SUN SPARC/10-41.

The EV67-, PA8500- and EV6-based machines from Compaq/DEC and HP are also seen to dominate the SPECint95 ratings, with typical values of 37, 32-34 and 24-27 respectively. The 667 MHz EV67 is rated at twice the SGI Origin2000/R12k and SUN Ultra80/450. The value of 24.40 for the IBM RS/6000-SP/375 is seen to be competitive with the EV6-based CPUs. We also note that the SPECint95 ratings suggest that Pentium III/550 is 1.63 times slower than the Compaq XP1000/667, while the SPECfp95 ratings point to the Pentium being a factor of 4.8 times slower. Corresponding values for the AMD Athlon K7/600 are reduced to 1.34 and 3.34 respectively.

When analysing the results, we wish to consider based on the present evaluation exercise, (i) do the SPECfp95 values provide a reliable metric for evaluating the capabilities of hardware in computational chemistry? If so, we would expect to find a close mapping of the ratios for the various chemistry benchmarks onto the SPECfp95 ratios, (ii) does any particular CPU consistently ``underperform'' based on the SPECfp criteria? - this would manifest itself as the ratios from the chemistry benchmarks falling below the SPECfp ratios, and (iii) do the ``simple'' Matrix and Chemistry Kernel benchmarks lead to the same conclusions as the GAMESS-UK and DL_POLY benchmarks?

In the interests of providing a single Performance Index (PI) covering all machines of Table 3, we have provided an average value of the Matrix-97, Chemistry Kernels and GAMESS-UK benchmarks. Note that that at this stage we have not included the DL_POLY benchmark results in computing the PI, since that these results are only available on a sub-set of the machines. The value of such an index is somewhat debatable, for not only does it omit the DL_POLY benchmark, it weights the chemistry kernels on an equal footing with the end-application codes, which is not ideal. Note that we have only provided PI estimates for those machines where data on all three benchmarks is available.

Summarising the major conclusions from the figures of Table 3, we would note the following;

5. References

1
DL_POLY is a parallel molecular dynamics simulation package developed at Daresbury Laboratory by W. Smith and T.R. Forester under the auspices of the Engineering and Physical Sciences Research Council (EPSRC) for the EPSRC's Collaborative Computational Project for the Computer Simulation of Condensed Phases (CCP5) and the Molecular Simulation Group (MSG) at Daresbury Laboratory. The package is the property of the Central Laboratory of the Research Councils.

2
In theory the O2-R10k should have outperformed the corresponding Indigo2; with better memory bandwidth, superior I/O and more tightly coupled integration it should have done well. However SGI made a design decision which has seriously impaired the performance of the O2 in some application areas. It took about 3 months for this "flaw" to be fully identified. Until December 1996, SGI were claiming that the O2 R10k would perform in the region of 10, 12 or even 15 SPECfp95. Indeed on some benchmarks it does indeed achieve performance that matches these figures. However, the O2 has a Unified Memory Architecture which uses main system memory as memory for the graphics display and operations. Despite the impressive bandwidth figures for the O2, it does seem that the O2 memory architecture severely impedes the performance of the R10k processor, particularly when compared with the Octane, Indigo2 and Origin systems. This is shown by the SPEC comparisons of Table 1; we suspect that the two main factors limiting performance in the memory subsystem are the main memory speed and the CRIME chip.

The CRIME chip, which acts as the memory interface between the memory and the three drains on it - the CPU (800 MByte/second), I/O engine (500 MByte/sec) and the monitor display (700 MByte/second) - is probably the main bottleneck. This chip was designed to work as a built in memory controller, but the design was biased toward the R5k; it can't work directly with the R10k because the R5k expects 32 byte cache refills while the R10k wants to have 64 or 128 byte refills. Therefore SGI supply a custom ASIC with the R10k daughter board. This interfaces the R10k's level 2 cache with the CRIME chip. Performance problems are caused by the ASIC having to break each 128 byte cache refill operation into 4, 32 byte refills. The net impact of this effect is that the O2 R10k will only work well with problems that fit into the L2 cache (1 MByte). Not surprisingly, the memory intensive SPECfp95 figures are badly affected, although the impact on less memory intensive applications is not so severe. It should be noted that this type of incident is very rare; chips often fail to deliver but not system architectures designed for existing chips.

3
see the following recently opened web page to obtain Pentium Pro Optimized BLAS and FFTs for Intel Linux: http://www.cs.utk.edu/ ghenry/distrib
4
More evidence of the poor performance of Sun Fortran f90 Compiler is shown in the following figures Linpack 100X100 all Fortran benchmark kindly provided by Hans-Hermann Frese of ZIB, Berlin. The results using different Sun Fortran compiler releases on a Sun Ultra 60 Model 2360 used one UltraSPARC CPU at 360 MHz. The performance of the latest NAGWare f95 is also compared:
Linpack 100X100 Performance as a function of FORTRAN compiler.
Compiler Options Linpack 100X100
Sun f77 4.2 -fast -O5 164 Mflop/ s
Sun f77 5.0 -fast -O5 166 Mflop/ s
Sun f90 1.2 -fast 61 Mflop/ s
Sun f90 2.0 -fast -ftrap=no%division 68 Mflop/ s
NAGWare f95 4.0 -Wc,-fast 166 Mflop/ s

These figures underline that the performance of the Sun Fortran 90 compiler is still poor compared with Sun's Fortran 77 compiler. Surprisingly, the performance of the NAGWare Fortran 95 compiler which produces intermediate C code is as good as Sun's native Fortran 77 compiler.


Table 1. SPECfp95 and SPECint95. Absolute Values and Values Relative to the Compaq XP1000/667.
Machine SPECfp95 SPECint95 Relative Values (%)
SPECfp95 SPECint95
Compaq Alpha ES40/667 72.20 36.40 110% 97%
Compaq PW XP1000/667 65.50 37.50 100% 100%
Compaq Alpha DS20 58.70 27.70 90% 74%
Compaq Alpha ES40 57.70 27.30 88% 73%
HP PA-9000/J5000 54.00 32.50 82% 87%
API UP2000 6/667 53.70 32.10 82% 86%
HP PA-9000/C3000 52.40 31.80 80% 85%
Compaq PW XP1000/500 52.20 26.90 80% 72%
HP PA-9000/N4000 51.40 34.00 78% 91%
Compaq Alpha DS10 47.90 24.60 73% 66%
DEC Alpha 8400/6-575 47.70 30.30 73% 81%
Compaq Alpha GS140 45.20 27.80 69% 74%
SGI Origin2000/R12k 34.40 18.40 53% 49%
IBM RS/6000-43P 30.10 13.10 46% 35%
HP PA-9000/785 C360 28.10 26.00 43% 69%
SUN Ultra80/450 27.90 19.70 43% 53%
IBM RS/6000-397 26.60 8.61 41% 23%
SUN HPC4500/400 25.70 17.70 39% 47%
HP PA-9000/C240 25.40 17.30 39% 46%
HP PA-9000/V2250 24.80 16.40 38% 44%
SGI Octane/R12k-270 24.70 15.60 38% 42%
SGI Onyx2 IR2/250 24.50 14.70 37% 39%
SGI Origin2000/250 24.50 14.70 37% 39%
SUN HPC4500/336 21.90 15.00 33% 40%
DEC Alpha 1200/5-533 21.90 16.60 33% 44%
AMD Athlon K7/600 21.60 27.20 33% 73%
HP PA-9000/C200 21.40 14.20 33% 38%
DEC Alpha PW/600AU 21.30 16.30 33% 43%
DEC Alpha 8400/5-625 20.80 18.40 32% 49%
DEC Alpha 500/5-500 20.40 15.00 31% 40%
SGI Octane/R10k-250 20.30 13.60 31% 36%
SUN Ultra30/300 18.30 12.10 28% 32%
DEC Alpha PW/433AU 18.10 13.90 28% 37%
IBM RS/6000-595 17.60 6.17 27% 16%
SGI Octane/R10k-195 17.40 9.40 27% 25%
HP PA-9000/C160 16.30 10.40 25% 28%
SGI Origin200/180 15.60 8.59 24% 23%
SUN Ultra-2/300 15.50 12.30 24% 33%
SGI Octane/R10k-175 15.50 8.40 24% 22%
Pentium III/550 15.10 22.30 23% 59%
Pentium III/500 14.70 20.60 22% 55%
DEC Alpha 500/5-400 14.10 12.30 22% 33%
SGI PChall-R10k/195 13.80 8.85 21% 24%
Pentium II/450 13.30 18.50 20% 49%
DEC Alpha 600/5-333 13.20 9.23 20% 25%
ProLiant PII/450 13.10 17.60 20% 47%
Pentium II/400 12.80 16.90 20% 45%
DEC Alpha 8400/5-300 12.40 7.43 19% 20%
SGI O2 R12k/270 11.80 12.40 18% 33%
DEC Alpha 600/5-266 11.80 7.91 18% 21%
SUN Ultra-2/200 11.10 7.67 17% 20%
IBM RS/6000-590 10.40 3.33 16% 9%
IBM RS/6000-3CT 10.20 3.42 16% 9%
Pentium II/300 9.20 12.90 14% 34%
SUN Ultra-1/170 9.06 5.56 14% 15%
SGI O2 R5k/300 9.03 8.03 14% 21%
DEC Alpha 2100/5-250 8.39 5.96 13% 16%
SUN Ultra-1/140 7.90 4.66 12% 12%
SGI O2 R10k/175 7.83 9.02 12% 24%
Dell Optiplex/266 7.68 10.80 12% 29%
Pentium II/266 7.68 10.80 12% 29%
IBM RS/6000-3BT 7.50 3.14 11% 8%
Pentium Pro/200 6.75 8.09 10% 22%
HP PA-9000/J200 6.32 3.52 10% 9%
DEC Alpha 250/4-266 6.27 5.18 10% 14%
DEC AXP/3000-700 5.71 3.66 9% 10%
SGI O2 R5k/180 5.42 4.82 8% 13%
Pentium 233 MMX 5.21 - 8% 0%
SGI Indy-R5k 4.78 4.32 7% 12%
HP PA-9000/735-125 4.61 3.97 7% 11%
HP PA-9000/735 4.06 3.22 6% 9%
DEC AXP/3000-500 3.65 2.15 6% 6%
HP PA-9000/715-100 3.47 2.89 5% 8%
IBM PowerPC-43P 3.20 3.59 5% 10%
IBM PowerPC-250 2.32 1.82 4% 5%
SUN SPARC 10/41 1.38 1.13 2% 3%
MPP node
IBM RS/6000-SP/375 50.90 24.40 78% 65%
IBM SP2/160Thin 25.80 8.61 39% 23%
HP PA-9000/V2200 22.10 13.80 34% 37%
Cray T3E/1200 21.30 18.40 33% 49%
Cray T3E/900 17.25 13.60 26% 36%
IBM SP2/120Thin 16.60 5.61 25% 15%
IBM SP2/66Thin 9.35 3.31 14% 9%

Table 2. The DL_POLY Benchmark: Total CPU and Elapsed Time (minutes) for Calculations 1-6 (see text) and Performance relative to the Compaq XP1000/667.
Machine CPU Time Elapsed Relative
User System Time Performance (%)
Compaq Alpha ES40/667 10.8 0.0 10.8 104%
Compaq PW XP1000/667 11.2 0.0 11.2 100%
API UP2000 6/667 13.6 0.0 13.5 82%
Compaq Alpha GS140 13.9 0.0 13.9 80%
Compaq Alpha DS20 14.3 0.0 14.3 78%
Compaq Alpha ES40 14.5 0.0 14.5 77%
Compaq PW XP1000/500 14.8 0.0 14.8 75%
HP PA-9000/N4000 15.9 0.1 16.1 70%
Compaq Alpha DS10 16.3 0.2 33.1(*) 68%
AlphaPC 264DP-500 18.4 0.0 18.8 61%
SGI Origin2000/R12k 18.7 0.0 18.8 60%
HP PA-9000/J5000 19.6 0.0 19.7 57%
DEC Alpha 8400/5-625 19.7 0.1 19.9 56%
HP PA-9000/C3000 20.1 0.0 20.2 56%
DEC Alpha PW/600AU 20.2 0.0 20.2 55%
SGI Octane/R12k-270 20.9 0.0 20.9 54%
SGI Origin2000/250 21.6 0.0 21.7 52%
IBM RS/6000-SP/375 21.7 0.0 21.7 52%
DEC Alpha 1200/5-533 23.2 0.1 24.6 48%
SGI Origin200/225 24.0 0.0 24.1 46%
SGI Octane/R10k-250 24.5 0.0 24.5 46%
DEC Alpha PW/433AU 28.1 0.0 28.8 40%
AMD Athlon K7/600 (pgi) 28.3 0.0 28.3 40%
SGI O2 R12k/270 29.5 0.1 30.2 38%
SGI Origin2000/195 29.9 0.0 30.8 37%
HP PA-9000/V2250 30.7 0.1 30.8 36%
SGI PChall-R10k/195 33.0 0.1 33.2 34%
AMD Athlon K7/500 (pgi) 33.3 0.0 33.3 34%
HP PA-9000/V2200 33.5 0.1 33.6 33%
Pentium III/550 (pgi) 34.3 0.0 34.3 33%
IBM RS/6000-43P 35.8 0.0 35.7 31%
HP PA-9000/C240 36.2 0.0 36.5 31%
DEC Alpha 8400/5-300 37.2 0.1 37.2 30%
SGI Octane/R10k-175 40.4 0.1 40.5 28%
ProLiant PII/450 (pgi) 41.1 0.0 41.2 27%
Cray T3E/1200 41.2 0.6 42.6 27%
SUN Ultra80/450 44.0 0.0 44.1 25%
SUN HPC4500/400 49.1 0.0 49.2 23%
Pentium II/400 (pgi) 50.4 0.0 50.5 22%
Cray T3E/900 51.0 0.7 52.3 22%
SGI O2 R5k/300 51.7 0.1 53.0 22%
SUN HPC4500/336 62.6 0.0 62.7 18%
Pentium II/300 (pgi) 65.6 0.0 65.7 17%
IBM SP2/120Thin 67.5 0.0 68.1 17%
Pentium II/300 (abs) 72.1 0.0 72.1 16%
Pentium II/266 (pgi) 76.4 0.0 76.5 15%
SGI O2 R5k/180 80.5 0.2 83.5 14%
Pentium II/266 (abs) 83.8 0.0 83.8 13%
IBM RS/6000-59H 107.7 0.0 108.3 10%

(+) Version 2.11 of the DL_POLY Code


Table 3. The Chemistry Benchmark: Performance (%, see text) Relative to the Compaq XP1000/667.
Machine SPEC- SPEC- Matrix-97 Chemistry GAMESS DLPOLY Final
fp95 int95 (Matrix-89) Kernels CPU CPU Index (%)
Compaq Alpha ES40/667 110 97 107 116 113 104 112
Compaq PW XP1000/667 100 100 100 100 100 100 100
Compaq Alpha DS20 90 74 77 90 75 78 81
Compaq Alpha ES40 88 73 70 89 78 77 79
HP PA-9000/J5000 82 87 108 91 85 57 95
API UP2000 6/667 82 86 76 106 77 82 86
HP PA-9000/C3000 80 85 81 82 81 56 81
Compaq PW XP1000/500 80 72 70 83 72 75 75
HP PA-9000/N4000 78 91 104 81 - 70 93
IBM RS/6000-SP/375 78 65 115 100 94 52 103
Compaq Alpha DS10 73 66 54 73 63 68 63
DEC Alpha 8400/6-575 73 81 77 63 73 - 71
Compaq Alpha GS140 69 74 74 85 61 80 73
SGI Origin2000/R12k 53 49 65 68 51 60 61
IBM RS/6000-43P 46 35 71 64 52 31 62
HP PA-9000/785 C360 43 69 59 60 65 - 61
SUN Ultra80/450 43 53 57 56 48 25 54
IBM RS/6000-397 41 23 41 61 38 - 47
IBM SP2/160Thin 39 23 41 62 - - 52
SUN HPC4500/400 39 47 51 48 41 23 47
HP PA-9000/C240 39 46 55 47 48 31 50
HP PA-9000/V2250 38 44 60 38 - 30 49
SGI Octane/R12k-270 38 42 56 63 46 54 55
SGI Onyx2 IR2/250 37 39 51 52 - - 52
SGI Origin2000/250 37 39 50 54 40 52 48
HP PA-9000/V2200 34 37 60 39 39 33 46
SUN HPC4500/336 33 40 46 38 37 18 40
DEC Alpha 1200/5-533 33 44 42 47 34 48 41
AMD Athlon K7/600 (pgi) 33 73 33 56 - 40 45
AMD Athlon K7/600 33 73 38 52 51 - 47
HP PA-9000/C200 33 38 39 41 32 - 37
DEC Alpha PW/600AU 33 43 42 49 45 55 45
Cray T3E/1200 33 49 32 51 - 27 42
DEC Alpha 8400/5-625 32 49 43 48 38 56 43
DEC Alpha 500/5-500 31 40 32 39 30 - 34
SGI Octane/R10k-250 31 36 38 51 - 46 45
SGI Origin2000/195 29 25 38 37 33 37 36
AMD Athlon K7/500 29 61 33 45 42 - 40
AMD Athlon K7/500 (pgi) 29 61 33 49 - 34 41
SUN Ultra30/300 28 32 30 29 31 - 30
DEC Alpha PW/433AU 28 37 31 37 34 40 34
IBM RS/6000-595 27 16 (35) 46 31 - 37
SGI Octane/R10k-195 27 25 33 37 33 - 34
Cray T3E/900 26 36 29 43 - 22 36
IBM SP2/120Thin 25 15 32 45 29 17 35
HP PA-9000/C160 25 28 30 25 25 - 27
SGI Origin200/180 24 23 31 34 30 - 32
SUN Ultra-2/300 24 33 37 28 32 - 32
SGI Octane/R10k-175 24 22 30 33 30 28 31
Pentium III/550 23 59 31 37 40 - 36
Pentium III/550 (pgi) 23 59 31 41 - 33 36
Pentium III/500 22 55 26 31 27 - 28
DEC Alpha 500/5-400 22 33 27 33 29 - 30
SGI PChall-R10k/195 21 24 28 29 29 34 29
Pentium II/450 20 49 24 29 - - 27
DEC Alpha 600/5-333 20 25 26 28 25 - 26
ProLiant PII/450 20 47 30 31 - 27 31
Pentium II/400 (abs) 20 45 21 33 - - 27
Pentium II/400 (pgi) 20 45 23 31 - 22 27
Pentium II/400 20 45 22 27 29 - 26
DEC Alpha 8400/5-300 19 20 25 26 22 30 24
SGI O2 R12k/270 18 33 30 47 27 38 35
DEC Alpha 600/5-266 18 21 (17) 22 21 - 20
SUN Ultra-2/200 17 20 (21) 24 22 - 22
IBM RS/6000-590 16 9 (15) 22 15 10 17
IBM RS/6000-3CT 16 9 (17) 21 17 - 18
IBM SP2/66Thin 14 9 (17) 19 - - 18
Pentium II/300 (abs) 14 34 15 22 - 16 19
Pentium II/300 14 34 14 18 22 - 18
SUN Ultra-1/170 14 15 (18) 20 17 - 18
SGI O2 R5k/300 14 21 11 23 16 22 17
DEC Alpha 2100/5-250 13 16 (18) 20 20 - 19
SUN Ultra-1/140 12 12 (15) 17 15 - 16
SGI O2 R10k/175 12 24 21 23 20 - 21
Dell Optiplex/266 12 29 10 - - - 5
Pentium II/266 12 29 13 18 17 - 16
Pentium II/266 (pgi) 12 29 13 20 - 15 17
Pentium II/266 (abs) 12 29 14 21 - 13 18
IBM RS/6000-3BT 11 8 (13) 16 - - 15
Pentium Pro/200 10 22 7 14 13 - 11
HP PA-9000/J200 10 9 (13) 9 - - 11
DEC Alpha 250/4-266 10 14 (13) 15 14 - 14
DEC AXP/3000-700 9 10 (11) 11 12 - 11
SGI O2 R5k/180 8 13 7 14 11 14 11
Pentium 233 MMX 8 - 5 9 8 - 7
SGI Indy-R5k 7 12 ( 9) 13 9 - 10
HP PA-9000/735-125 7 11 (12) 11 13 - 12
HP PA-9000/735 6 9 (10) 10 11 - 10
DEC AXP/3000-500 6 6 ( 7) 5 6 - 6
HP PA-9000/715-100 5 8 ( 8) 9 8 - 8
IBM PowerPC-43P 5 10 ( 7) 10 - - 9
IBM PowerPC-250 4 5 ( 3) 5 - - 4
SUN SPARC 10/41 2 3 ( 2) 3 3 - 3

Table 4. APPENDIX: Machine Configurations under Evaluation.
Machine Configuration Location
SUN SPARCstation 10/30 SuperSPARC/36 MHz DL (loan)
SUN SPARCstation 2/GS SPARC/40 MHz DL (loan)
SUN 4/370 - DL
Solbourne S4000 - DL (loan)
SUN SPARCstation 10/41 SuperSPARC/40 MHz PNNL
SUN SPARCserver 1000 SuperSPARC/50 MHz DL (loan)
SUN SPARCstation 5/85 MicroSPARC II/85 MHz DL (loan)
SUN SPARCstation 20/HS21 HyperSPARC/125 MHz DL (loan)
SUN Ultra-1 Model 170 UltraSPARC-1/167 MHz DL
SUN Ultra-2 Model 2200 UltraSPARC-2/200 MHz DL (loan)
SUN Ultra-1 Model 140 UltraSPARC-1/143 MHz DL (loan)
SUN Ultra-2 Model 2300 UltraSPARC-2/300 MHz DL (loan)
SUN Ultra30/300 UltraSPARC-2/296 MHz Adelaide
SUN HPC4500/336 UltraSPARC-2/336 MHz SUN
SUN HPC4500/400 UltraSPARC-2/400 MHz SUN
SUN Ultra80/450 UltraSPARC-2/450 MHz SUN
HP PA-9000/755 PA7100/99 MHz DL
HP PA-9000/750 PA7000/66 MHz DL
HP PA-9000/720 PA7000/50 MHz DL
HP/Apollo DN10020 PRISM DL
HP PA-9000/735 PA7100/99 MHz DL
HP PA-9000/735/125 PA7150/125 MHz PNNL
HP PA-9000/715-80 PA7100LC/80 MHz DL (loan)
HP PA-9000/715-100 PA7100LC/100 MHz DL (loan)
HP PA-9000/J200 PA7200/100 MHz DL (loan)
HP PA-9000/C160 PA8000/160 MHz Berlin
HP PA-9000/C200 PA8200/200 MHz DL (loan)
HP PA-9000/V2200 PA8200/200 MHz Oxford
HP PA-9000/C240 PA8200/236 MHz DL (loan)
HP PA-9000/V2250 PA8200/240 MHz HP
HP PA-9000/785 C360 PA8500/367 MHz Berlin
HP PA-9000/C3000 PA8500/400 MHz Berlin
HP PA-9000/J5000 PA8500/440 MHz
HP PA-9000/N4000 PA8500/440 MHz HP
DEC S5000/200 R3000A/R3010A 25 MHz DL (loan)
DEC S5000/120 R3000A/R3010A 20 MHz DL (loan)
DEC AXP/3000-500 AXP A21064/150 MHz DL (loan)
DEC AXP/3000-600 AXP A21064/175 MHz PNNL
DEC AXP/3000-300 AXP A21064/150 MHz PNNL
DEC AXP/3000-700 AXP A21064A/225 MHz DL (loan)
DEC Alpha 250/4-266 AXP A21064A/266 MHz DL (loan)
DEC Alpha 8400/5-300 AXP A21164/300 MHz RAL
DEC Alpha 600/5-266 AXP A21164/266 MHz DL (loan)
DEC Alpha 600/5-333 AXP A21164/333 MHz DL (loan)
DEC Alpha 2100/5-250 AXP A21164/250 MHz DL (loan)
DEC Alpha 500/5-400 AXP A21164/400 MHz DL
DEC Alpha 500/5-500 AXP A21164/500 MHz DL (loan)
DEC Alpha 1200/5-533 AXP A21164/533 MHz Compaq TestD
DEC Alpha PW/433AU AXP A21164/433 MHz DL
DEC Alpha PW/600AU AXP A21164/600 MHz DL (loan)
DEC Alpha 8400/5-625 AXP A21164/625 MHz RAL
DEC Alpha 8400/6-575 AXP A21264/575 MHz CCC (Galway)
AlphaPC 264DP-500 AXP A21264/500 MHz DL (loan)
Compaq PW XP1000/500 AXP A21264/500 MHz DL
Compaq Alpha DS20 AXP A21264/500 MHz CCC (Galway)
Compaq Alpha ES40 AXP A21264/500 MHz CCC (Galway)
Compaq Alpha GS140 AXP A21264/525 MHz CCC (Galway)
Compaq Alpha DS10 AXP A21264/466 MHz DL
Compaq Alpha ES40/667 AXP A21264A/667 MHz CCC (Galway)
Compaq PW XP1000/667 AXP A21264A/667 MHz DL (loan)
API UP2000/6-667 AXP A21264A/667 MHz DL (loan)
SGI R4000 Indigo R4000/R4010 100 MHz DL (loan)
SGI 4D/420 - DL (loan)
SGI 4D/320 R3000A/R3010A 33 MHz DL (loan)
SGI R3000 Indigo R3000A/R3010A 33 MHz DL (loan)
SGI 4D/220 GTX - DL
SGI Challenge L/100 R4400/R4010 100 MHz Utrecht
SGI PChall-R8k/75 R8000/R8010 75 MHz DL (loan)
SGI Indigo2 R4400/150 R4400/R4010 150 MHz DL (loan)
SGI Challenge L/150 R4400/R4010 150 MHz Southampton
SGI R8k Indigo2 R8000/R8010 75 MHz DL (loan)
SGI PChall-R10k/195 R10000/R10010 195 MHz DL (loan)
SGI Indy-R5k R5000/R5000 180 MHz DL (loan)
SGI Indigo2-R10k/175 R10000/R10010 175 MHz Liverpool
SGI Indigo2 R4400/250 R4400/R4010 250 MHz DL
SGI Origin2000/195 R10000/R10010 195 MHz Manchester
SGI Origin2000/250 R10000/R10010 250 MHz Manchester
SGI Onyx2 IR2/250 R10000/R10010 250 MHz DL (loan)
SGI Origin2000/R12k R12000 300 MHz R2.3 Utrecht
SGI Octane/R10k-175 R10000/R10010 175 MHz Oxford
SGI Octane/R10k-195 R10000/R10010 195 MHz DL (loan)
SGI Octane/R10k-250 R10000/R10010 250 MHz DL (loan)
SGI Octane/R12k-270 R12000/R12010 270 MHz DL (loan)
SGI Origin200/180 R10000/R10010 180 MHz DL
SGI Origin200/225 R10000/R10010 225 MHz DL
SGI O2 R5k/180 R5000/R5010 180 MHz DL (loan)
SGI O2 R5k/300 R5000/R5000 300 MHz Aberdeen
SGI O2 R10k/175 R10000/R10010 175 MHz UNCC
SGI O2 R12k/270 R12000/R12010 270 MHz DL
Stardent VISTRA-800 - DL (loan)
Stardent 1520 - DL
IBM Power1 RS/6000-550 RS6000/41.6 MHz Perugia
IBM Power1 RS/6000-340 RS6000/33 MHz DL (loan)
IBM Power1 RS/6000-320 - DL (loan)
IBM Power1 RS/6000-350 RS6000/41.6 MHz DL (loan)
IBM Power1 RS/6000-360 RS6000/50 MHz DL (loan)
IBM Power1 RS/6000-530H RS6000/33 MHz DL
IBM Power2 RS/6000-590 RS6000/66 MHz IBM
IBM Power1 RS/6000-370 RS6000/62.5 MHz DL
IBM PowerPC-250 - DL
IBM Power2 RS/6000-3CT RS6000/72 MHz DL (loan)
IBM PowerPC-25T MPC601 66 MHz DL
IBM Power2 RS/6000-3BT RS6000/67 MHz DL (loan)
IBM PowerPC-43P MPC604 100 MHz DL
IBM Power2 RS/6000-595 RS6000/P2SC-135 MHz IBM
IBM Power2 RS/6000-397 RS6000/P2SC-160 MHz DL (loan)
IBM Power3 RS/6000-43P RS6000/Power3-200 MHz DL (loan)
PCs
Netpower PC Pentium Pro/200MHz DL (loan)
AMD Athlon K7/600 Microstar MS-6167 RAL
AMD Athlon K7/500 Microstar MS-6167 DL (loan)
Pentium III/550 Intel SE440BX (550MHz) Berlin
Pentium III/500 Birkbeck
ProLiant PII/450 Intel SE440BX (450MHz) Compaq TestD
Pentium II/450 Intel SE440BX (450MHz) Porto
Pentium II/400 Intel SE440BX (400MHz) RAL
Pentium II/300 Intel AL440LX (300MHz) Perugia
Pentium II/266 Intel AL440LX (266MHz) DL
Dell Optiplex/266 Intel AL440LX (266MHz) Edinburgh
Pentium 233 MMX Intel LT430TX (233MHz) Sussex
Vector Supercomputers
CRAY YMP C98/4256 - SARA
CRAY YMP J90/10 - EPCC
NEC SX-4 - NL
FUJITSU VPP-300/3 - RAL
MPP Nodes
KSR-2 KSR-2 node (80 MHz) PNNL
Cray T3D/AXP-150 AXP node (150 MHz) Edinburgh
IBM SP2/66Thin TN2 node (67 MHz) DL
Cray T3E/900 AXP node (450 MHz) Berlin
Hitachi SR2201 300 Mflop node Cambridge
IBM SP2/120Thin P2SC node (120 MHz) DL
IBM SP2/160Thin P2SC node (160 MHz) IBM (POK)
Cray T3E/1200E AXP node (600 MHz) Manchester
IBM SP/WII 375 Winterhawk II node (375MHz) DL

M.F. Guest
Apr 19 22:02:03 BST 2000