source: http://www3.intel.com/cd/ids/developer/asmo-na/eng/257129.htm

 

CPUID(CPU_ID) for x64 Platforms and Microsoft Visual Studio* .NET 2005
by Eric Palmer

When targeting x64 platforms in Visual Studio .NET* 2005, programmers are no longer able to use inline assembly code as they did for 32-bit code. This forces the programmer to either rely on C/C++ code using intrinsics, or to tediously create a 64-bit MASM (.asm) version of the function. Unfortunately, the VS .Net 2005 implementation of the intrinsic for CPUID (__cpuid) recognizes only input arguments in the register eax, and not the more recently defined inputs in ecx, which are required for queries regarding cache parameters and certain multi-core characteristics. Thus, a 64-bit .asm listing is required for full use of the CPUID instruction.

The following code samples demonstrate how to use the CPUID and RDTSC instructions with VS .Net 2005 for 64-bit (x64) platforms. The CPUID instruction is commonly used to obtain detailed information about the system’s CPU(s), and RDTSC is used to read the CPU’s internal time-stamp counter for timing and performance-measurement purposes. The RDTSC intrinsic (__rdtsc) does work as expected and can be used to replace inline assembly.

To build the 64-bit .asm file, create a custom build step that calls the 64-bit MASM, "ml64.exe", as shown in the screen-shot below. For the 32-bit configuration, the cpuid64.asm file should not be built, so for platform Win32, set General -> Excluded From Build to Yes.





The header file below (cpuid_32_64.h) creates a single definition of the functions _CPUID and _RDTSC that can be used in both 32-bit and 64-bit builds. For 64-bit builds, _CPUID uses the .asm function cpuid64, and _RDTSC uses the intrinsic __rdtsc. For 32-bit builds, _CPUID uses the inline-assembly function cpuid32, and _RDTSC uses the inline-assembly function _inl_rdtsc32.

There are two examples shown in the C file below (cpuid_32_64.c). The first is GetCoresPerPackage(), which calls _CPUID with eax=4 and ecx=0 in order to read the first set deterministic cache parameters reported by the CPU and extract the field indicating the number of processor cores per processor package. (For example, this function would return 1 for a single-core Intel® Pentium® 4 processor, and 2 for a dual-core Intel® Pentium® D processor.) If the intrinsic __cpuid were used in this function on an x64 platform instead of the cpuid64 function, the input value of ecx would be nondeterministic, and the output would be unreliable. The second example function is timeSomethingExample(), which calls _RDTSC twice and calculates the elapsed timer ticks in the loop. The _CPUID example shows how to use one definition to invoke either 64-bit .asm code or 32-bit inline assembly, and the _RDTSC example shows how to use one definition to invoke either a 64-bit intrinsic or 32-bit inline assembly.

Both the _CPUID and _RDTSC examples show how to create utility functions that are transparently portable from Win32 to x64 platforms in cases where different underlying code is required for each platform. Furthermore, the cpuid64 function provides a workaround for a deficiency in the __cpuid intrinsic, allowing both 32-bit and 64-bit applications to fully utilize the capability of the CPUID instruction.

Header file (cpuid_32_64.h):

#pragma once

typedef struct cpuid_args_s {
	DWORD eax;
	DWORD ebx;
	DWORD ecx;
	DWORD edx;
} CPUID_ARGS;

#ifdef __cplusplus
extern "C" {
#endif

#ifdef _M_X64 // For 64-bit apps
unsigned __int64 __rdtsc(void);
#pragma intrinsic(__rdtsc)
#define _RDTSC __rdtsc

void cpuid64(CPUID_ARGS* p);
#define _CPUID cpuid64

#else // For 32-bit apps

#define _RDTSC_STACK(ts) \
	__asm rdtsc \
	__asm mov DWORD PTR [ts], eax \
	__asm mov DWORD PTR [ts+4], edx

__inline unsigned __int64 _inl_rdtsc32() {
	unsigned __int64 t;
	_RDTSC_STACK(t);
	return t;
}
#define _RDTSC _inl_rdtsc32

void cpuid32(CPUID_ARGS* p);
#define _CPUID cpuid32

#endif

// Our 32/64-bit example function
int GetCoresPerPackage();

#ifdef __cplusplus
}
#endif  

32/64-bit .c file (cpuid_32_64.c):

 
#include <windows.h>
#include "cpuid_32_64.h"

#ifndef _M_X64
void cpuid32(CPUID_ARGS* p) {
	__asm {
		mov	edi, p
		mov eax, [edi].eax
		mov ecx, [edi].ecx // for functions such as eax=4
		cpuid
		mov [edi].eax, eax
		mov [edi].ebx, ebx
		mov [edi].ecx, ecx
		mov [edi].edx, edx
	}
}
#endif

// Assumptions prior to calling:
// - CPUID instruction is available
// - We have already used CPUID to verify that this in an Intel® processor
int GetCoresPerPackage()
{
	// Is explicit cache info available?
	int nCaches=0;
	int coresPerPackage=1; // Assume 1 core per package if info not available 
	DWORD t;
	int cacheIndex;
	CPUID_ARGS ca;

	ca.eax = 0;
	_CPUID(&ca);
	t = ca.eax;
	if ((t > 3) && (t < 0x80000000)) { 
		for (cacheIndex=0; ; cacheIndex++) {
			ca.eax = 4;
			ca.ecx = cacheIndex;
			_CPUID(&ca);
			t = ca.eax;
			if ((t & 0x1F) == 0)
				break;
			nCaches++;
		}
	}

	if (nCaches > 0) {
		ca.eax = 4;
		ca.ecx = 0; // first explicit cache
		_CPUID(&ca);
		coresPerPackage = ((ca.eax >> 26) & 0x3F) + 1; // 31:26
	}
	return coresPerPackage;
}

void timeSomethingExample()
{
	ULONGLONG tStart, tElapsed;
	int i;

	tStart = _RDTSC();
	for (i=0; i < 1000; i++)
	{
		// Do something here 1000 times
	}
	tElapsed = _RDTSC() - tStart; // CPU timer ticks taken to do something 1000 times
}

64-bit .asm file (cpuid64.asm):

; call cpuid with args in eax, ecx
; store eax, ebx, ecx, edx to p
PUBLIC cpuid64
.CODE
       ALIGN     8
cpuid64	PROC FRAME
; void cpuid64(CPUID_ARGS* p);
; rcx <= p
        sub			rsp, 32
        .allocstack 32
        push		rbx
        .pushreg	rbx
        .endprolog
        
		mov	r8, rcx
		mov eax, DWORD PTR [r8+0]
		mov ecx, DWORD PTR [r8+8]
		cpuid
		mov DWORD PTR [r8+0], eax
		mov DWORD PTR [r8+4], ebx
		mov DWORD PTR [r8+8], ecx
		mov DWORD PTR [r8+12], edx

        pop      rbx         
        add      rsp, 32     
        
        ret                  
        ALIGN     8
cpuid64 ENDP
_TEXT ENDS

We invite you to post a comment (not monitored by customer support) on this page or send a question directly to our support team.

isn.support@intel.com (2007-01-10T23:30:54.810) wrote:

New code sample posted for platform topology enumeration on IA-64: http://www3.intel.com/cd/ids/developer/asmo-na/eng/dc/itanium/335391.htm

isn.support@intel.com (2007-01-02T21:02:38.657) wrote:

nivedan_nigam, one of our engineers responds: I have tried the SIV on IA-64 from the following link – it seems to provide a lot of info: http://siv.mysite.orange.co.uk/index.html

nivedan_nigam@rediffmail.com (2006-12-22T04:04:45.123) wrote:

Is cpucount program is avilable for IA64 platform. Any pointeres

isn.support@intel.com (2006-10-19T21:20:59.993) wrote:

To find similar information for IA-32 platforms, see Khang Nguyen and Shihjong Kuo's article "Detecting Multi-Core Processor Topology in an IA-32 Platform": http://www3.intel.com/cd/ids/developer/asmo-na/eng/275339.htm