Subproject D.2

1-1-2007 - 12-30-2012

Results

Many measurements have been done on GPUs in the first phase to find out the efficiency of porting MD simulations to GPUs:

Bucket sorts on GPU C2050 has been compared with a parallel CPU version. 9x speedup has been delivered

Data transfer rate over PCIe x16 between device memory (GPU) to page-locked host memory (CPU) is about 6 GB/s.

Moreover for Kids Week event a billiard simulation has been developed and in its fist phase it performs 2400 MFLOPS on a single core Xeon 55602.8GHz for a system of 200 particles and for 1000 iterations. Figure 4 depicts the billed table with 200 balls generated using POVRAY.

Another example in a successful porting of simulation codes to GPU is porting an industrial neural network. This porting results in a speedup of 12x comparing to the original implementation.

Abbildung 1: CUDA kernel Overhead Zeit auf C2050 und C1060

Abbildung 2: Partikel-Sortierung auf GPU C2050.

Abbildung 3: Devicespeicher (GPU) Hostspeicher (CPU) über PCIe x16.

Abbildung 4: Ein Billardtisch mit 200 Billardkugeln.

Abbildung 5: Die Portierung von ANN auf GPU C2050 und C1060.