[ CUDA: nVIDIA Tesla K40m (GK110) ] Device Properties: Device Name Tesla K40m GPU Code Name GK110 PCI Domain / Bus / Device 0 / 1 / 0 Clock Rate 745 MHz Asynchronous Engines 2 Multiprocessors / Cores 15 / 2880 L2 Cache 1536 KB Max Threads Per Multiprocessor 2048 Max Threads Per Block 1024 Max Registers Per Block 65536 Max 32-bit Registers Per Multiprocessor 65536 Max Instructions Per Kernel 512 million Warp Size 32 threads Max Block Size 1024 x 1024 x 64 Max Grid Size 2147483647 x 65535 x 65535 Max 1D Texture Width 65536 Max 2D Texture Size 65536 x 65536 Max 3D Texture Size 4096 x 4096 x 4096 Max 1D Linear Texture Width 134217728 Max 2D Linear Texture Size 65000 x 65000 Max 2D Linear Texture Pitch 1048544 bytes Max 1D Layered Texture Width 16384 Max 1D Layered Texture Layers 2048 Max Mipmapped 1D Texture Width 16384 Max Mipmapped 2D Texture Size 16384 x 16384 Max Cubemap Texture Size 16384 x 16384 Max Cubemap Layered Texture Size 16384 x 16384 Max Cubemap Layered Texture Layers 2046 Max Texture Array Size 16384 x 16384 Max Texture Array Slices 2048 Max 1D Surface Width 65536 Max 2D Surface Size 65536 x 32768 Max 3D Surface Size 65536 x 32768 x 2048 Max 1D Layered Surface Width 65536 Max 1D Layered Surface Layers 2048 Max 2D Layered Surface Size 65536 x 32768 Max 2D Layered Surface Layers 2048 Compute Mode Default: Multiple contexts allowed per device Compute Capability 3.5 CUDA DLL nvcuda.dll (27.21.14.5423 - nVIDIA ForceWare 54.23) Memory Properties: Memory Clock 3004 MHz Global Memory Bus Width 384-bit Total Memory 4095 MB Total Constant Memory 64 KB Max Shared Memory Per Block 48 KB Max Shared Memory Per Multiprocessor 48 KB Max Memory Pitch 2147483647 bytes Texture Alignment 512 bytes Texture Pitch Alignment 32 bytes Surface Alignment 512 bytes Device Features: 32-bit Floating-Point Atomic Addition Supported 32-bit Integer Atomic Operations Supported 64-bit Integer Atomic Operations Supported Caching Globals in L1 Cache Supported Caching Locals in L1 Cache Supported Concurrent Kernel Execution Supported Concurrent Memory Copy & Execute Supported Double-Precision Floating-Point Supported ECC Enabled Funnel Shift Supported Half-Precision Floating-Point Not Supported Host Memory Mapping Supported Integrated Device No Managed Memory Not Supported Multi-GPU Board No Stream Priorities Supported Surface Functions Supported TCC Driver Yes Warp Vote Functions Supported __ballot() Supported __syncthreads_and() Supported __syncthreads_count() Supported __syncthreads_or() Supported __threadfence_system() Supported CUDA-Z Report ============= Version: 0.10.251 64 bit http://cuda-z.sf.net/ OS Version: Windows x86 6.2.9200 Driver Version: 454.23 (TCC) Driver Dll Version: 11.0 (27.21.14.5423) Runtime Dll Version: 6.50 Core Information ---------------- Name: Tesla K40m Compute Capability: 3.5 Clock Rate: 745 MHz PCI Location: 0:1:0 Multiprocessors: 15 (2880 Cores) Threads Per Multiproc.: 2048 Warp Size: 32 Regs Per Block: 65536 Threads Per Block: 1024 Threads Dimensions: 1024 x 1024 x 64 Grid Dimensions: 2147483647 x 65535 x 65535 Watchdog Enabled: No Integrated GPU: No Concurrent Kernels: Yes Compute Mode: Default Stream Priorities: Yes Memory Information ------------------ Total Global: 11.9291 GiB Bus Width: 384 bits Clock Rate: 3004 MHz Error Correction: No L2 Cache Size: 48 KiB Shared Per Block: 48 KiB Pitch: 2048 MiB Total Constant: 64 KiB Texture Alignment: 512 B Texture 1D Size: 65536 Texture 2D Size: 65536 x 65536 Texture 3D Size: 4096 x 4096 x 4096 GPU Overlap: Yes Map Host Memory: Yes Unified Addressing: Yes Async Engine: Yes, Bidirectional Performance Information ----------------------- Memory Copy Host Pinned to Device: 9851.78 MiB/s Host Pageable to Device: 9006.76 MiB/s Device to Host Pinned: 9940.9 MiB/s Device to Host Pageable: 9067.91 MiB/s Device to Device: 96.2937 GiB/s GPU Core Performance Single-precision Float: 3363.67 Gflop/s Double-precision Float: 1405.32 Gflop/s 64-bit Integer: 176.151 Giop/s 32-bit Integer: 708.11 Giop/s 24-bit Integer: 699.47 Giop/s