source: http://isnwiki.jot.com/WikiHome/Articles/500086

Detecting Multi-Core Processors

Version 3, changed by dmsmith1 06/19/2006. Show version history  

Created by: admin , Updated by: admin

Category: Multi-Core


In this entry we will show you how to detect the topological relationships between physical package, processor core, and logical processors sharing the same core in a multi-processing platform with IA-32 processors. The algorithm described in this paper applies across many hardware multi-processing configurations, including single-socket and multi-socket platforms, IA-32 processors supporting Hyper-Threading Technology, dual-core and multiple cores.

Background:
In 2002, IA-32 platform introduced Hyper-Threading Technology, where each physical package provides functionality logically equivalent to two discrete, single-thread-execution processors, but sharing the same processor core in a physical package.  You can find more details on this under another entry in the Wiki Counting Physical and Logical 32-bit Processor

In 2005, Intel introduced IA-32 platforms with multi-core technology with the Intel Pentium® processor Extreme Edition, which provides two processor cores, each supporting HT Technology. Thus, hardware support for multi-processing has evolved from multiple discrete sockets, to HT Technology, to multiple cores with HT Technology. Enumerating processor topology correctly is essential for implementing licensing policy requirements. Understanding processor and cache topology information will allow multithreading software to make more efficient use of hardware multithreading resources and deliver optimal performance.

Software must recognize hardware multi-processing support in all of these combinations. For licensing purposes, Intel recommends a policy based on discrete physical packages. For performance optimization purposes, software may need to manage physical resources depending on the details of the sharing topology implemented in these various forms of hardware multiprocessing.

Hyper-Threading Technology, Multi-Core and Initial APIC ID

In a multithreading environment, using IA-32 processors with hardware multithreading support, each logical processor in the platform must have a unique identifier. This is established during platform power-up, and is referred to as “initial APIC ID.” Each initial APIC ID’s value in a multiprocessor (MP) platform is assigned in an orderly manner such that software can extract a topological relationship between an initial APIC ID and the physical package and processor core. (APIC stands for Advanced Programmable Interrupt Controller )        

Software can also extract the topological relationships of sibling logical processors sharing the same core or sharing a particular cache level of the cache hierarchy.

 

In general, each physical package can provide one or more processor cores. The number of logical processors sharing the same core must be derived from information provided by the CPUID(CPU_ID) instruction. With respect to the cache-sharing topology of the same microarchitecture, the number of logical processors sharing a given cache level may vary for each cache level in the cache hierarchy. Between different microarchitectures, the cache sharing topology may also vary.

With HT Technology enabled processors, each cache level is shared by the all logical processors sharing a processor core. Software must use the CPUID instruction to query for relevant data for each cache level at runtime. (Details of using CPUID instruction to query initial APIC IDs, logical processor/core/cache configurations can be found in Chapter 3 of IA-32 Intel® Architecture Software Developer’s Manual Vol. 2A.)

CPUID Instruction

Aside from the initial APIC IDs reported by CPUID instruction, there are three other essential parameters reported by CPUID. These parameters are used to determine the multithreading resource topology of an IA-32 platform:

 

Logical Processors per Package (CPUID.1.EBX[23:16]) — Indicates the maximum number of logical processors in a physical package. This represents the hardware capability of the processor as manufactured, and does not necessarily equate to the number of logical processors enabled by the platform bios or operating system.

Cores per Package (2) (CPUID.4.EAX[31:26] + 1) — The maximum number of cores in a physical package is indicated by one plus the decimal value represented by CPUID.4.EAX[31:26].

Logical Processors Sharing a Cache (CPUID.4.EAX[25:14] + 1) — The maximum number of logical processors in a physical package sharing the target level cache is indicated by one plus the decimal value represented by CPUID.4.EAX[25:14].

Intel supports only homogeneous MP systems. This means that in an MP system, any logical processor from any physical package must report the same values for the three parameters described above.

CPUID cannot be called directly from high-level languages such as C or C++. This must be done using assembly language. In this paper, we show sample code that executes the CPUID instruction from C/C++ source code using inline assembly.

 

Three Level Topology and Initial APIC ID

For a single-clustered MP system 3 , the 8-bit value of an initial APIC ID decomposes into three bit fields. The order in which initial APIC IDs are assigned in an IA-32 MP system ensures that the right most bit field represents the maximum number of logical processors sharing the same core.

This sub-field is referred to as SMT_ID, and its width is related to the maximum number of unique

IDs required to identify each logical processor sharing the same core. This width of this field can be zero, e.g., Pentium D processor does not support Hyper-Threading Technology. Software must use the algorithm described in this paper to determine the width dynamically.

 

Adjacent to the SMT_ID bit field is a bit field that represents the maximum number of processor cores in a package. This sub-field is referred to as CORE_ID. For a single-core processor, the width of the CORE_ID field is zero.

 

The remaining bit field can be used to identify a physical package in a non-clustered MP platform.

The sub-field is referred to as PACKAGE_ID.

Figure 1 shows the layout out of the initial APIC ID (content of CPUID.1.EBX [31:24]). Note the width of each bit field depends on multithreading hardware configurations. Figures 2 through 4 depict the dependence of each bit field position on three different hardware configurations. These three examples illustrate the important point that software must not assume each bit field has a constant width, or exists at all. This paper does not discuss clustered systems.

2 Software must check CPUID for its support of leaf 4 when implementing support for multi-core. If

CPUID leaf 4 is not available at runtime, software can handle the situation as if there is only one core per

package

3 Typically, a clustered system is made up of many nodes of multi-processor systems. This paper applies to

multi-processor systems within the same node in a clustered system. A multi-node clustered system will

have a four-level topology and is beyond the scope of this paper.


 

Figure 1 Generalized bit position layout of initial APIC ID for a non-clustered system


With a dual-core system that supports Hyper-Threading Technology, x will be equal to 1 and y will be equal to 1. This is shown in Figure 2.



 

Figure 2 Bit position layout for a dual-core system configuration supporting Hyper-Threading

Technology.

 

With a dual-core system that does not support Hyper-Threading Technology, x will be equal to 0 and y will be equal to 1. This is shown in Figure 3.


 

Figure 3 Bit Position Layout for a dual-core system that does not support Hyper-Threading

Technology

 

With a single-core-system supporting Hyper-Threading Technology, x will be equal to 1 and

y will be equal to 0. This is shown in Figure 4.



Figure 4 Bit position layout for a single-core system supporting Hyper-Threading Technology

An algorithm to map initial APIC ID to three-level topological IDs


The following algorithm demonstrates the method by which the initial APIC ID is decomposed into its three topological identifiers. Several support routines used in this algorithm are described in the following text.

 

Detecting Hardware Support for Multi-Threading

The first support routine determines if the current logical processor is contained in a physical package with hardware multi-threading support. This is accomplished by executing the CPUID instruction with register EAX set equal to one. If bit 28 of the returned value in register EDX is set, the associated physical package supports hardware multithreading. Note that this only asserts that a physical package is capable of hardware multithreading. It does not imply that hardware multithreading is enabled, or indicate the number of logical processors in the package enabled by system software.

//

// The function returns 0 when the hardware multi-threaded bit is

// not set.

//

#define HWD_MT_BIT (1 << 28) // more descriptive

unsigned int is_HWMT_Supported( void )

{

unsigned int Regedx = 0;

if ((CpuIDSupported() >= 1) && GenuineIntel())

{

__asm

{

mov eax, 1

cpuid

mov Regedx, edx

}

}

return (Regedx & HWD_MT_BIT);

}

//

// CpuIDSupported will return 0 if CPUID instruction is

// unavailable.

// Otherwise, it will return

// the maximum supported standard function.

//

unsigned int CpuIDSupported( void )

{

unsigned int MaxInputValue =0;

__try // If CPUID instruction is supported

{

__asm

{

xor eax, eax // call cpuid with eax = 0

cpuid

mov MaxInputValue, eax

}

}

__except (EXCEPTION_EXECUTE_HANDLER)

{

return (0); // cpuid instruction is unavailable

}

return MaxInputValue;

}

//

// GenuineIntel will return 0 if the processor is not a Genuine

// Intel Processor

//

unsigned int GenuineIntel( void )

{

unsigned int VendorID[3] = {0, 0, 0};

__try // If CPUID instruction is supported

{

__asm

{

xor eax, eax // call cpuid with eax = 0

cpuid // Get vendor id string

mov VendorID, ebx

mov VendorID + 4, edx

mov VendorID + 8, ecx

}

}

__except (EXCEPTION_EXECUTE_HANDLER)

{

return (0); // cpuid instruction is unavailable

}

return ( (VendorID[0] == 'uneG') &&

(VendorID[1] == 'Ieni') &&

(VendorID[2] == 'letn'));

}

Determining Maximum Logical Processors per Physical Package (Socket)

The second support routine determines the maximum number of logical processors in a physical package. This parameter is obtained by executing CPUID with the register EAX set equal to one and storing bits 16 to 23 of register EBX on return. If an IA-32 processor does not have hardware multithreading support, the value in CPUID.1.EBX[23:16] is reserved. However, in this case, one can treat this as a discrete processor containing a maximum of one logical processor per package. Note that the number of logical processors obtained here is the maximum number of logical processors supported by this physical package. The actual number of logical processors enabled by system software and made available to applications may be less.

 

#define NUM_LOGICAL_BITS 0xFF0000 //(or (0xFF << 16))

unsigned GetMaxNumLPperPackage(void)

{

unsigned int reg_ebx = 0;

if (!is_HWMT_supported()) return (unsigned char) 1;

__asm {

mov eax, 1

cpuid

mov reg_ebx, ebx

}

return (unsigned) ((reg_ebx & NUM_LOGICAL_BITS) >> 16);

}

 

Determining Maximum Number of Cores in a Package

The third support routine determines the maximum number of cores per physical package. This parameter is obtained by executing the CPUID instruction with the input value of 4 in EAX and 0 in ECX, followed by storing the returned decimal value of EAX[31:26] and incrementing it by one.

If a processor does not support theCPUID.4 leaf, then software must assume this is a single-core

processor 4 . Note that if the maximum number of cores in a physical package is greater than one, it only indicates that the physical package provides more than one core. The actual number of cores enabled by system software and made available to application may be less.

 

#define NUM_CORE_BITS 0xfc000000 //(or (0xFC << 26) )

unsigned GetMaxNumCoresPerPackage(void)

{

unsigned int reg_eax = 0;

if (!is_HWMT_supported())

{

// must be single-core

return (unsigned) 1;

 

4 Software must check for the availability of CPUID leaf 4 before querying for information using CPUID leaf 4. If a BIOS allows the end user to configure a multi-core processor to work with an older operating system that is not compatible with the presence of CPUID leaf 4, the user must ensure that the proper configuration of the platform is enforced by restoring the BIOS setting to default such that CPUID leaf 4 is available. The proper configuration and optimal performance of modern OS supporting multi-core processors relies on CPUID leaf 4 to be available.

 

}

__asm {

mov eax, 0 // how many leaves does cpuid support?

cpuid

cmp eax, 4 // does cpuid support leaf 4

jl single_core // if not, must be single core

mov eax, 4 // call cpuid with eax = 4

mov ecx, 0 // start with 1st level using index = 0

cpuid

mov reg_eax, eax // Has info on number of cores

jmp multi_core

single_core:

mov reg_eax, 0 // must be single core

multi_core:

}

return (unsigned ) ((reg_eax & NUM_CORE_BITS) >> 26)+1;

}

 

Determining the Width of a Bit Field

An important part of the algorithm is to determine the width of each bit field. This can be accomplished using a general purpose support routine that determines the width of a bit field based on the maximum number of unique identifiers that bit field can represent. The input value to determine the width of the SMT_ID bit field is the “maximum number of logical processors sharing the same core,” and the input value to determine the width of the CORE_ID bit field is the “maximum number of cores per physical package.” One should not assume that the number of available threads or cores will be a power of two.

 

unsigned find_maskwidth(unsigned count_item)

{

unsigned int mask_width, cnt = count_item;

__asm

{

mov eax, cnt

mov ecx, 0

mov mask_width, ecx

dec eax

bsr cx, ax

jz next

inc cx

mov mask_width, ecx

next:

mov eax, mask_width

}

return mask_width;

}

 

Collecting the Initial APIC IDs of Each Logical Processor

Before putting the algorithm together, we must retrieve the initial APIC ID of each logical processor in the platform. This requires using an OS-specific processor affinity service and using an affinity mask to bind the current execution thread to a specific logical processor. Once the current thread has successfully affinitized to the specific logical processor, the following code will retrieve the initial APIC ID of the logical processor that this code is currently running on:

 

#define INITIAL_APIC_ID_BITS 0xff000000 //(or (0xFF << 24) )

unsigned char GetInitialApicId (void)

{

unsigned int reg_ebx = 0;

__asm {

mov eax, 1 // call cpuid with eax = 1

cpuid

mov reg_ebx, ebx // Has APIC ID info

}

return (unsigned char) ((reg_ebx & INITIAL_APIC_ID_BITS) >> 24);

}

 

Extracting a Bit Field from an 8-bit ID

The next routine provides the finishing touch to complete the support routines needed to decompose an 8-bit initial APIC ID into three topological identifiers. A subset of bits in an 8-bit initial APIC ID can be extracted using the appropriate bit mask and shift value. The code below allows one to extract from an 8-bit “Full_ID” a subset of bits using two other input parameters. The input parameter “ MaxSubIDvalue ” determines the width of bit field to extract from the “Full_ID.” The input parameter “Shift_Count” specifies an offset from the right most bit of the 8-bit “Full_ID.”

 

//

// This routine extracts a subset of bit fields from the 8-bit Full_ID

// The return value, or subID, is a non-zero-based, 8-bit value

//

unsigned char GetNzbSubID(unsigned char Full_ID,

unsigned char MaxSubIDvalue,

unsigned char Shift_Count)

{

unsigned MaskWidth;

unsigned char SubID, MaskBits;

MaskWidth = find_maskwidth((unsigned ) MaxSubIDvalue);

MaskBits = ((unsigned char) (0xff << Shift_Count)) ^

((unsigned char) (0xff << (Shift_Count + MaskWidth))) ;

SubID = Full_ID & MaskBits;

return SubID;

}

 

Putting Everything Together to Extract IDs for the Three-Level Topology

Under an MP-aware OS, an application can assemble a list of all the initial APIC IDs of logical processors that are enabled by OS and made available to applications. Although there is a one-to-one mapping between each bit in an affinity mask to each unique APIC ID value, the ordering of affinity mask bits and the numerical values of initial APIC IDs are platform-specific. Because the OS allows applications to manage logical processors via affinity masks, the algorithm we describe here will construct two tables:

 

  • One table provides a list of affinity masks corresponding to each unique PACKAGE_ID, each affinity mask includes logical processors residing in the same physical package.

 

  • The second table provides a list of affinity masks corresponding to each unique CORE_ID; each affinity mask includes logical processors residing in the same core.

 

These two tables are built by first extracting the three-level identifiers from the initial APIC IDs of each logical processor. The code below illustrates how to use the support routines described previously to assemble individual tables that stores the sub-field ID of each level and for each logical processor.

 

To enumerate the processor and cache topology of all logical processors visible to an application, application software must rely on OS-specific services, such as affinity APIs, to bind the current execution context to each logical processor. The code example below uses Win32 APIs to manage processor affinity. The tables below demonstrate the relationships between processor/cache topology information with OS-specific affinity constructs.

 

AFFINITY_MASK dwProcessAffinity;

AFFINITY_MASK dwSystemAffinity;

int j, numLP_enabled, MaxLPPerCore;

unsigned char apicId;

unsigned char PackageIDMask;

unsigned char tblPkg_ID[256];

unsigned char tblCore_ID[256];

unsigned char tblSMT_ID[256];

GetProcessAffinityMask(

GetCurrentProcess(),

&dwProcessAffinity,

&dwSystemAffinity);

if (dwProcessAffinity != dwSystemAffinity) {

printf ("Not all logical processors in the platform

are enabled for this process. n");

}

j = 0;

dwAffinityMask = 1;

numLP_enabled = 0;

//

// This algorithm assumes that core within a package has the

// same number of logical processors.

// It does not assume that the value returned by

// MaxLPPerPackage()or MaxCoresPerPackage() is a power of 2.

//

MaxLPPerCore =

GetMaxNumLPperPackage()/GetMaxNumCoresPerPackage();

while (dwAffinityMask && dwAffinityMask <= dwSystemAffinity) {

if (SetThreadAffinityMask(GetCurrentThread(), dwAffinityMask)) {

Sleep(0); // Ensure this thread is on the affinitized CPU

apicId = GetInitialApicId();

//

// store SMT_ID and Core_ID of each logical processor

// Shift value for SMT_ID is 0

// Shift value for Core_ID is the mask width for maximum

// logical processors per core

//

tblSMT_ID[j] = GetNzbSubID(apicId, MaxLPPerCore, 0);

tblCore_ID[j] = GetNzbSubID(

apicId,

GetMaxNumCoresPerPackage(),

find_maskwidth(MaxLPPerCore));

//

// Extract PACKAGE_ID:

// Assume single cluster.

// Shift value is mask width for maximum logical processors

// per package

//

PackageIDMask =

((unsigned char) (0xff <<

find_maskwidth(GetMaxNumLPperPackage())));

tblPkg_ID[j] = apicId & PackageIDMask;

numLP_enabled ++;

}

j++;

dwAffinityMask = 1 << j;

}

//

// numLP_enabled contains the number logical processors in the platform

// that are enabled by system software and available for applications

//

 

Counting Physical Packages Enabled in the Platform

Once we have the three-level topological IDs of each logical processors enabled in the platform, we can create an affinity mask to represent the sibling logical processors residing in the same physical package. The code below sorts out the initial APIC IDs in the platform and puts those initial APIC IDs with identical PACKAGE_ID into the same group, and updates the affinity mask associated with each distinct PACKAGE_ID.

//

// pPkgMask points to a buffer allocated by the caller

// NumStartedLPs is an integer supplied by the caller after the

// caller had assembled three tables of PKG_ID, CORE_ID and SMT_ID

//

DWORD pPkgMask[256];

unsigned char PackageIDBucket[256];

unsigned ProcessorMask;

int ProcessorNum;

int i, PkgNum = 1;

PackageIDBucket[0] = tblPkg_ID[0];

ProcessorMask = 1;

pPkgMask[0] = ProcessorMask;

for (ProcessorNum = 1; ProcessorNum < NumStartedLPs;

ProcessorNum++) {

ProcessorMask <<= 1;

for (i=0; i < PkgNum; i++) {

//

// we may be comparing bit-fields of logical processors

// residing in different packages, the code below assumes

// that the bit-masks are the same on all processors in

// the system

//

if ( tblPkg_ID[ProcessorNum] == PackageIDBucket[i]) {

pPkgMask[i] |= ProcessorMask;

break;

}

}

if (i == PkgNum) {

//

// Did not match any bucket, start new bucket

//

PackageIDBucket[i] = tblPkg_ID[ProcessorNum] ;

pPkgMask[i] = ProcessorMask;

PkgNum++;

}

}

//

// PkgNum has the actual number of physical packages enabled in

// the platform

// pPkgMask[i] has the affinity mask of the sibling logical processors

// for the i’th package

//

 

Counting Processor Cores Enabled in the Platform

The procedure to count OS-enabled processor cores in a platform is similar to the previous routine. Here, we compare both PACKAGE_ID and CORE_ID to distinguish whether more than one logical processors are siblings in the same core. The code below sorts out the initial APIC IDs in the platform and puts those initial APIC IDs with identical values of for PACKAGE_ID and

CORE_ID into the same group and updates the affinity mask associated with each distinct core.

//

// pCoreProcessorMask points to a buffer allocated by the caller

// NumStartedLPs is an integer supplied by the caller after the

// caller had assembled three tables of PKG_ID, CORE_ID and SMT_ID

//

DWORD pCoreProcessorMask[256];

unsigned char CoreIDBucket [256];

unsigned ProcessorMask;

int ProcessorNum;

int i, CoreNum = 1;

CoreIDBucket[0] = tblPkg_ID[0] | tblCore_ID[0];

ProcessorMask = 1;

pCoreProcessorMask[0] = ProcessorMask;

for (ProcessorNum = 1; ProcessorNum < NumStartedLPs;

ProcessorNum++) {

ProcessorMask <<= 1;

for (i=0; i < CoreNum; i++) {

//

// we may be comparing bit-fields of logical processors

// residing in different packages, the code below assumes

// that the bit-masks are the same on all processors in

// the system

//

if (( tblPkg_ID[ProcessorNum] | tblCore_ID[ProcessorNum]) ==

CoreIDBucket[i]) {

pCoreProcessorMask[i] |= ProcessorMask;

break;

}

}

if (i == CoreNum) {

//

// Did not match any bucket, start new bucket

//

CoreIDBucket[i] = tblPkg_ID[ProcessorNum] |

tblCore_ID[ProcessorNum];

pCoreProcessorMask[i] = ProcessorMask;

CoreNum++;

}

}

//

// CoreNum has the actual number of cores enabled in the platform

// pCoreProcessorMask[i] has the affinity mask of the sibling

// logical processors for the i’th core

//

 

5 The source code listing can be compiled using Linux* kernel verison 2.6 or higher (e.g. RH 4AS-2.8 using GCC 3.4.4). Due to syntax variances of Linux affinity APIs with earlier kernel version and dependence on glibc library versions, compilation on Linux environment with older kernels and compilers may require kernel patches or compiler upgrades.

The following example will show you how to tie all of the above code snippets together. The example will examine every logical processor visible to the running process and enumerate the processor topology and cache topology of these logical processors. The status multi-core and hyper-threading capability in the platform is also summarized. The source code of the example is listed and can be compiled in a Win32 environment as well as Linux* environment 5 . Some compilers do not support inline assembly; it is left as an exercise for the reader to complete compiler-specific customizations.


 

 

Note : if the reader wishes to create a cpp file by copying the source listing below, please ensure only plain ascii text are being pasted into a standard ascii file.

//

// Copyright (c) 2005 Intel Corporation

// All rights reserved

//

//

// CpuCount.cpp : Detects hardware multi-threading topology on IA-32

platforms:

// Multi-processor, Multi-core, and HyperThreading Technology.

// This application enumerates the HW topology of the logical

processors

// enabled by OS and BIOS by using information provided by CPUID

instruction.

// The relevant topology can be identified using a three level

decomposition of

// "initial APIC ID" into Package_id, core_id, and SMT_id. Such

decomposition

// provides a three-level map of the topology of hardware resources.

This

// allows multi-threaded software to manage shared hardware resources

in the

// platform to reduce resource contention

//

// Multicore detection algorithm for processor and cache topology

requires

// all leaf functions of CPUID instructions be available. System

administrator

// must ensure BIOS settings is not configured to restrict CPUID

functionalities.

//---------------------------------------------------------------------

--------

//

// cpuid(EAX=1)EDX[28] set if HT or multi-core capable

//

#define HWD_MT_BIT 0x10000000

//

// cpuid(EAX=1)EBX[23:16] contains the maximum number of logical

processors

// in physical processor

//

#define NUM_LOGICAL_BITS 0x00FF0000

//

// cpuid(EAX=4, ECX=0)EAX[31:26] contains the maximum number of cores

// in physical processor

//

#define NUM_CORE_BITS 0xFC000000

// eax set to 4.

//

// cpuid(EAX=1)EBX[24:31] is the 8 bit initial APIC ID of the logical

processor

//

#define INITIAL_APIC_ID_BITS 0xFF000000

//

// Status Flags

//

#define SINGLE_CORE_AND_HT_ENABLED 1

#define SINGLE_CORE_AND_HT_DISABLED 2

#define SINGLE_CORE_AND_HT_NOT_CAPABLE 4

#define MULTI_CORE_AND_HT_NOT_CAPABLE 5

#define MULTI_CORE_AND_HT_ENABLED 6

#define MULTI_CORE_AND_HT_DISABLED 7

#define USER_CONFIG_ISSUE 8

unsigned int CpuIDSupported(void);

unsigned int GenuineIntel(void);

unsigned int HWD_MTSupported(void);

unsigned int MaxLogicalProcPerPhysicalProc(void);

unsigned int MaxCorePerPhysicalProc(void);

unsigned int find_maskwidth(unsigned int);

unsigned char GetAPIC_ID(void);

unsigned char GetNzbSubID(unsigned char,

unsigned char,

unsigned char);

unsigned char CPUCount(unsigned int *,

unsigned int *,

unsigned int *);

// Define constant “LINUX” to compile under Linux

#ifdef LINUX

// The Linux source code listing can be compiled using Linux kernel

verison 2.6

// or higher (e.g. RH 4AS-2.8 using GCC 3.4.4).

// Due to syntax variances of Linux affinity APIs with earlier

kernel versions

// and dependence on glibc library versions, compilation on Linux

environment

// with older kernels and compilers may require kernel patches or

compiler upgrades.

#include <stdlib.h>

#include <unistd.h>

#include <string.h>

#include <sched.h>

#define DWORD unsigned long

#else

#include <windows.h>

#endif

#include <stdio.h>

#include <assert.h>

char g_s3Levels[2048];

int

main(void)

{

unsigned int TotAvailLogical = 0, // # of available logical CPU per

CORE

TotAvailCore = 0, // # of available cores per

physical

// processor

PhysicalNum = 0; // # of available physical

processors

unsigned char StatusFlag = 0;

int MaxLPPerCore;

if (CpuIDSupported() < 4) { // CPUID does not report leaf 4

information

printf("User Warning: CPUID Leaf 4 is not supported or

disabled. Please check

BIOS and correct system configuration error if leaf 4 is

disabled. n");

}

StatusFlag = CPUCount(&TotAvailLogical, &TotAvailCore,

&PhysicalNum);

if ( USER_CONFIG_ISSUE == StatusFlag) {

printf("User Configuration Error: Not all logical processors"

" in the system are enabled n"

"while running this process. Please rerun this

application"

" after make corrections. n");

exit(1);

}

printf("n----Counting Hardware MultiThreading Capabilities and"

" Availability ---------- nn");

printf("This application displays information on three forms of

hardware "

"multithreadingn");

printf("capability and their availability to apps. The three forms

of"

" capabilities are:n");

printf("multi-processor (MP), Multi-core (core), and HyperThreading

"

"Technology (HT).n");

printf("nHardware capability results represents the maximum

number"

" provided in hardware.n");

printf("Note, Bios/OS or experienced user can make configuration

changes"

" resulting in n");

printf("less-than-full HW capabilities are available to

applications.n");

printf("For best result, the operator is responsible to configure

the"

" BIOS/OS such thatn");

printf("full hardware multi-threading capabilities are

enabled.n");

printf("n---------------------------------------------------------

- n");

printf("nCapabilities:nn");

switch(StatusFlag) {

case MULTI_CORE_AND_HT_NOT_CAPABLE:

printf("tHyper-Threading Technology: not capable nt"

"Multi-core: Yes ntMulti-processor: ");

if (PhysicalNum > 1) printf("yesn"); else printf("Non");

break;

case SINGLE_CORE_AND_HT_NOT_CAPABLE:

printf("tHyper-Threading Technology: Not capable nt"

"Multi-core: No ntMulti-processor: ");

if (PhysicalNum > 1) printf("yesn"); else printf("Non");

break;

case SINGLE_CORE_AND_HT_DISABLED:

printf("tHyper-Threading Technology: Disabled nt"

"Multi-core: No ntMulti-processor: ");

if (PhysicalNum > 1) printf("yesn"); else printf("Non");

break;

case SINGLE_CORE_AND_HT_ENABLED:

printf("tHyper-Threading Technology: Enabled nt"

"Multi-core: No ntMulti-processor: ");

if (PhysicalNum > 1) printf("yesn"); else printf("Non");

break;

case MULTI_CORE_AND_HT_DISABLED:

printf("tHyper-Threading Technology: Disabled nt"

"Multi-core: Yes ntMulti-processor: ");

if (PhysicalNum > 1) printf("yesn"); else printf("Non");

break;

case MULTI_CORE_AND_HT_ENABLED:

printf("tHyper-Threading Technology: Enabled nt"

"Multi-core: Yes ntMulti-processor: ");

if (PhysicalNum > 1) printf("yesn"); else printf("Non");

break;

}

printf("nnHardware capability and its availability to

applications: n");

printf("n System wide availability: %d physical processors, %d

cores,"

"%d logical processorsn", PhysicalNum, TotAvailCore,

TotAvailLogical);

MaxLPPerCore = MaxLogicalProcPerPhysicalProc() /

MaxCorePerPhysicalProc() ;

printf(" Multi-core capabililty : Maximum %d cores per package

n",

MaxCorePerPhysicalProc());

printf(" HT capability: Maximum %d logical processors per core

n",

MaxLPPerCore);

assert (PhysicalNum * MaxCorePerPhysicalProc() >= TotAvailCore);

assert (PhysicalNum * MaxLogicalProcPerPhysicalProc() >=

TotAvailLogical);

if ( PhysicalNum * MaxCorePerPhysicalProc() > TotAvailCore) {

printf("n Not all cores in the system are enabled for this "

"application.n");

} else {

printf("n All cores in the system are enabled for this "

"application.n");

}

printf("nnRelationships between OS affinity mask, Initial APIC

ID,"

" and 3-level sub-IDs: n");

printf("n%s", g_s3Levels);

printf("nnPress Enter To Continuen");

getchar();

return 0;

}

//

// CpuIDSupported() will return 0 if CPUID instruction is unavailable.

// Otherwise, it will return the maximum supported standard function.

//

unsigned int CpuIDSupported(void)

{

unsigned int MaxInputValue =0;

// If CPUID instruction is supported

#ifdef LINUX

try

{

MaxInputValue = 0;

// call cpuid with eax = 0

asm

(

"xorl %%eax,%%eaxnt"

"cpuidnt"

: "=a" (MaxInputValue)

:

: "%ebx", "%ecx", "%edx"

);

}

catch (...)

{

return (0); // cpuid instruction is unavailable

}

#else //Win32

try

{

__asm

{

xor eax, eax // call cpuid with eax = 0

cpuid

mov MaxInputValue, eax

}

}

catch (…)

{

return(0); // cpuid instruction is

unavailable

}

#endif

return MaxInputValue;

}

//

// GenuineIntel will return 0 if the processor is not a Genuine Intel

Processor

//

unsigned int GenuineIntel(void)

{

#ifdef LINUX

unsigned int VendorIDb = 0,VendorIDd = 0, VendorIDc = 0;

try

// If CPUID instruction is supported

{

// Get vendor id string

asm

(

//get the vendor string

// call cpuid with eax = 0

"xorl %%eax, %%eaxnt"

"cpuidnt"

: "=b" (VendorIDb),

"=d" (VendorIDd),

"=c" (VendorIDc)

:

: "%eax"

);

}

catch (...)

{

return (0); // cpuid instruction is

unavailable

}

return ( (VendorIDb == 'uneG') &&

(VendorIDd == 'Ieni') &&

(VendorIDc == 'letn'));

#else

unsigned int VendorID[3] = {0, 0, 0};

try // If CPUID instruction is supported

{

__asm

{

xor eax, eax // call cpuid with eax = 0

cpuid // Get vendor id string

mov VendorID, ebx

mov VendorID + 4, edx

mov VendorID + 8, ecx

}

}

catch (…)

{

return(0); // cpuid instruction is unavailable

}

return ( (VendorID[0] == 'uneG') &&

(VendorID[1] == 'Ieni') &&

(VendorID[2] == 'letn'));

#endif

}

//

// MaxCorePerPhysicalProc() returns the maximum cores per physical

package.

// Note that the number of AVAILABLE cores per physical to be used by

an

// application might be less than this maximum value.

//

unsigned int MaxCorePerPhysicalProc(void)

{

unsigned int Regeax = 0;

if (!HWD_MTSupported())

return (unsigned int) 1; // Single core

#ifdef LINUX

{

asm

(

"xorl %eax, %eaxnt"

"cpuidnt"

"cmpl $4, %eaxnt" // check if cpuid supports

leaf 4

"jl .single_corent" // Single core

"movl $4, %eaxnt"

"movl $0, %ecxnt" // start with index = 0; Leaf

4 reports

); // at least one valid cache

level

asm

(

"cpuid"

: "=a" (Regeax)

:

: "%ebx", "%ecx", "%edx"

);

asm

(

"jmp .multi_coren"

".single_core:nt"

"xor %eax, %eaxn"

".multi_core:"

);

}

#else

__asm

{

xor eax, eax

cpuid

cmp eax, 4 // check if cpuid supports leaf 4

jl single_core // Single core if no leaf 4

mov eax, 4

mov ecx, 0 // start with index = 0; Leaf 4 reports

cpuid // at least one valid cache level

mov Regeax, eax

jmp multi_core

single_core:

xor eax, eax

multi_core:

}

#endif

return (unsigned int)((Regeax & NUM_CORE_BITS) >> 26)+1;

}

//

// HWD_MTSupported() returns 0 when the hardware multi-threaded bit is

not set.

//

unsigned int HWD_MTSupported(void)

{

unsigned int Regedx = 0;

if ((CpuIDSupported() >= 1) && GenuineIntel())

{

#ifdef LINUX

asm

(

"movl $1,%%eaxnt"

"cpuid"

: "=d" (Regedx)

:

: "%eax","%ebx","%ecx"

);

#else

__asm

{

mov eax, 1

cpuid

mov Regedx, edx

}

#endif

}

return (Regedx & HWD_MT_BIT);

}

//

// MaxLogicalProcPerPhysicalProc() returns the maximum logical

processors per

// physical package. Note that the number of AVAILABLE logical

processors per

// physical to be used by an application might be less than this

// maximum value.

//

unsigned int MaxLogicalProcPerPhysicalProc(void)

{

unsigned int Regebx = 0;

if (!HWD_MTSupported())

return (unsigned int) 1;

#ifdef LINUX

asm

(

"movl $1,%%eaxnt"

"cpuid"

: "=b" (Regebx)

:

: "%eax","%ecx","%edx"

);

#else

__asm

{

mov eax, 1

cpuid

mov Regebx, ebx

}

#endif

return (unsigned int) ((Regebx & NUM_LOGICAL_BITS) >> 16);

}

//

// GetAPIC_ID() returns the initial APIC ID of the processor it

executes on

//

unsigned char GetAPIC_ID(void)

{

unsigned int Regebx = 0;

#ifdef LINUX

asm

(

"movl $1, %%eaxnt"

"cpuid"

: "=b" (Regebx)

:

: "%eax","%ecx","%edx"

);

#else

__asm

{

mov eax, 1

cpuid

mov Regebx, ebx

}

#endif

return (unsigned char) ((Regebx & INITIAL_APIC_ID_BITS) >> 24);

}

//

// find_maskwidth() returns the number of bits required to represent

the

// argument CountItem

//

unsigned int find_maskwidth(unsigned int CountItem)

{

unsigned int MaskWidth, count = CountItem;

#ifdef LINUX

asm

(

#ifdef __x86_64__ // define constant to compile

"push %%rcxnt" // under 64-bit Linux

"push %%raxnt"

#else

"pushl %%ecxnt"

"pushl %%eaxnt"

#endif

//"movl $count, %%eaxnt" //done by Assembler below

"xorl %%ecx, %%ecx"

//"movl %%ecx, MaskWidthnt" //done by Assembler below

: "=c" (MaskWidth)

: "a" (count)

//: "%ecx", "%eax" We don't list these as clobbered because

we don't

// want the assembler to put them back when we are done

);

asm

(

"decl %%eaxnt"

"bsrw %%ax,%%cxnt"

"jz nextnt"

"incw %%cxnt"

//"movl %%ecx, MaskWidthn" //done by Assembler below

: "=c" (MaskWidth)

:

);

asm

(

"next:nt"

#ifdef __x86_64__

"pop %raxnt"

"pop %rcx"

#else

"popl %eaxnt"

"popl %ecx"

#endif

);

#else

__asm

{

mov eax, count

mov ecx, 0

mov MaskWidth, ecx

dec eax

bsr cx, ax

jz next

inc cx

mov MaskWidth, ecx

next:

}

#endif

return MaskWidth;

}

//

// GetNzbSubID() extracts the sub bit field of maximum value

MaxSubIDValue

// at bit position ShiftCount from the 8-bit value FullID.

// It returns the 8-bit sub ID value

//

unsigned char GetNzbSubID(unsigned char FullID,

unsigned char MaxSubIDValue,

unsigned char ShiftCount)

{

unsigned int MaskWidth;

unsigned char MaskBits;

MaskWidth = find_maskwidth((unsigned int) MaxSubIDValue);

MaskBits = (0xff << ShiftCount) ^

((unsigned char) (0xff << (ShiftCount + MaskWidth)));

return (FullID & MaskBits);

}

//

// CPUCount() returns the total number of available Logical processors

// cores and physical processors in the platform

//

unsigned char CPUCount(unsigned int *TotAvailLogical,

unsigned int *TotAvailCore,

unsigned int *PhysicalNum)

{

unsigned char StatusFlag = 0;

unsigned int numLPEnabled = 0;

DWORD dwAffinityMask;

int j = 0, MaxLPPerCore;

unsigned char apicID, PackageIDMask;

unsigned char tblPkgID[256], tblCoreID[256], tblSMTID[256];

char tmp[256];

unsigned int i, ProcessorNum;

g_s3Levels[0] = 0;

*TotAvailCore = 1;

*PhysicalNum = 1;

#ifdef LINUX

//we need to make sure that this process is allowed to run on

//all of the logical processors that the OS itself can run on.

//A process could acquire/inherit affinity settings that

restricts the

// current process to run on a subset of all logical processor

visible to OS.

// Linux doesn't easily allow us to look at the Affinity Bitmask

directly,

// but it does provide an API to test affinity maskbits of the

current process

// against each logical processor visible under OS.

int sysNumProcs = sysconf(_SC_NPROCESSORS_CONF); //This will tell

us how many

//CPUs are

currently enabled.

//this will tell us which processors this process can run on.

cpu_set_t allowedCPUs;

sched_getaffinity(0, sizeof (allowedCPUs), &allowedCPUs);

for ( int i = 0; i < sysNumProcs; i++ )

{

if ( CPU_ISSET(i, &allowedCPUs) == 0 )

{

StatusFlag = USER_CONFIG_ISSUE;

return StatusFlag;

}

}

#else

DWORD dwProcessAffinity, dwSystemAffinity;

GetProcessAffinityMask(GetCurrentProcess(),

&dwProcessAffinity,

&dwSystemAffinity);

if (dwProcessAffinity != dwSystemAffinity) // not all CPUs are

enabled

{

StatusFlag = USER_CONFIG_ISSUE;

return StatusFlag;

}

#endif

//

// Assume that cores within a package have the SAME number of

// maximum logical processors. Values returned by

// MaxLogicalProcPerPhysicalProc and MaxCorePerPhysicalProc do not

have

// to be power of 2.

//

MaxLPPerCore = MaxLogicalProcPerPhysicalProc() /

MaxCorePerPhysicalProc();

dwAffinityMask = 1;

#ifdef LINUX

cpu_set_t currentCPU;

while ( j < sysNumProcs )

{

CPU_ZERO(&currentCPU);

CPU_SET(j, &currentCPU);

if ( sched_setaffinity (0, sizeof (currentCPU), &currentCPU)

== 0 )

{

sleep(0); // Ensure system to switch to the right

CPU

#else

while (dwAffinityMask && dwAffinityMask <= dwSystemAffinity)

{

if (SetThreadAffinityMask(GetCurrentThread(), dwAffinityMask))

{

Sleep(0); // Ensure system to switch to the right CPU

#endif

apicID = GetAPIC_ID();

//

// Store SMT ID and core ID of each logical processor

// Shift vlaue for SMT ID is 0

// Shift value for core ID is the mask width for maximum

logical

// processors per core

//

tblSMTID[j] = GetNzbSubID(apicID, (unsigned char)

MaxLPPerCore, 0);

tblCoreID[j] = GetNzbSubID(apicID,

(unsigned char)

MaxCorePerPhysicalProc(),

(unsigned char)

find_maskwidth(MaxLPPerCore));

//

// Extract package ID, assume single cluster.

// Shift value is the mask width for max Logical per

package

//

PackageIDMask = (unsigned char) (0xff <<

find_maskwidth(MaxLogicalProcPerPhysicalProc()));

tblPkgID[j] = apicID & PackageIDMask;

numLPEnabled ++; // increment count of available logical

// processors in the system.

sprintf(tmp," AffinityMask = %d; Initial APIC = %d;"

" Physical ID = %d, Core ID = %d, SMT ID =

%dn",

dwAffinityMask, apicID, tblPkgID[j],

tblCoreID[j],

tblSMTID[j]);

strcat(g_s3Levels, tmp);

} // if

j++;

dwAffinityMask = 1 << j;

} // while

// restore the affinity setting to its original state

#ifdef LINUX

sched_setaffinity (0, sizeof (allowedCPUs), &allowedCPUs);

#else

SetThreadAffinityMask(GetCurrentThread(), dwProcessAffinity);

#endif

*TotAvailLogical = numLPEnabled;

{

//

// Count available cores (TotAvailCore) in the system

//

unsigned char CoreIDBucket[256];

DWORD ProcessorMask, pCoreMask[256];

CoreIDBucket[0] = tblPkgID[0] | tblCoreID[0];

ProcessorMask = 1;

pCoreMask[0] = ProcessorMask;

for (ProcessorNum = 1; ProcessorNum < numLPEnabled; ProcessorNum++)

{

ProcessorMask <<= 1;

for (i = 0; i < *TotAvailCore; i++)

{

//

// Comparing bit-fields of logical processors residing in

// different packages

// Assumes that the bit-masks are the same on all

processors

// in the system.

//

if ((tblPkgID[ProcessorNum] | tblCoreID[ProcessorNum]) ==

CoreIDBucket[i])

{

pCoreMask[i] |= ProcessorMask;

break;

}

} // for i

//

// Did not match any bucket. Start a new one.

//

if (i == *TotAvailCore)

{

CoreIDBucket[i] = tblPkgID[ProcessorNum] |

tblCoreID[ProcessorNum];

pCoreMask[i] = ProcessorMask;

(*TotAvailCore)++; // Increment count of available cores

// in the system

}

} // for ProcessorNum

}

{

//

// Count physical processor (PhysicalNum) in the system

//

unsigned char PackageIDBucket[256];

DWORD pPackageMask[256], ProcessorMask;

PackageIDBucket[0] = tblPkgID[0];

ProcessorMask = 1;

pPackageMask[0] = ProcessorMask;

for (ProcessorNum = 1; ProcessorNum < numLPEnabled; ProcessorNum++)

{

ProcessorMask <<= 1;

for (i = 0; i < *PhysicalNum; i++)

{

//

// Comparing bit-fields of logical processors residing in

// different packages

// Assuming the bit-masks are the same on all processors

// in the system.

if (tblPkgID[ProcessorNum]== PackageIDBucket[i])

{

pPackageMask[i] |= ProcessorMask;

break;

}

} // for i

//

// Did not match any bucket. Start a new one.

//

if (i == *PhysicalNum)

{

PackageIDBucket[i] = tblPkgID[ProcessorNum];

pPackageMask[i] = ProcessorMask;

(*PhysicalNum)++; // Increment count of available

physical

// processors in the system

}

} // for ProcessorNum

}

//

// Check to see if the system is multi-core

// Check if the system is hyper-threading

//

if (*TotAvailCore > *PhysicalNum)

{

// Multi-core

if (MaxLPPerCore == 1)

StatusFlag = MULTI_CORE_AND_HT_NOT_CAPABLE;

else if (numLPEnabled > *TotAvailCore)

StatusFlag = MULTI_CORE_AND_HT_ENABLED;

else StatusFlag = MULTI_CORE_AND_HT_DISABLED;

}

else

{

// Single-core

if (MaxLPPerCore == 1)

StatusFlag = SINGLE_CORE_AND_HT_NOT_CAPABLE;

else if (numLPEnabled > *TotAvailCore)

StatusFlag = SINGLE_CORE_AND_HT_ENABLED;

else StatusFlag = SINGLE_CORE_AND_HT_DISABLED;

}

return StatusFlag;

}

 

About the Authors

Khang Nguyen is Applications Engineer working with Intel's Software and Solutions Group. He

can be reached at khang.t.nguyen@intel.com .

 

Shihjong Kuo is Senior Technical Marketing Engineer in Intel’s Digital Enterprise Group. He can

be reached at shihjong.kuo@intel.com