PMC.P4(3) | MidnightBSD Library Functions Manual | PMC.P4(3) |
pmc.p4
—
measurement events for Intel Pentium 4 and other Netburst
architecture CPUs
Performance Counters Library (libpmc, -lpmc)
#include
<pmc.h>
Intel P4 PMCs are present in Intel Pentium 4 and Xeon processors that use the Netburst CPU architecture.
These PMCs are documented in Volume 3: System Programming Guide, IA-32 Intel(R) Architecture Software Developer's Manual, Order Number 245472-012, Intel Corporation, 2003. Further information about using these PMCs may be found in IA-32 Intel(R) Architecture Optimization Guide, Order Number 248966-009, Intel Corporation, 2003. Some of these events are affected by processor errata described in Intel(R) Pentium(R) 4 Processor Specification Update, Document Number: 249199-059, Intel Corporation, April 2005.
Intel Pentium 4 PMCs are 40 bits wide. Each CPU contains 18 PMCs, divided into 4 groups with 4, 4, 4 and 6 PMCs respectively. On processors with hyperthreading support, PMC resources are shared between logical processors. These PMCs support the following capabilities:
Capability | Support |
PMC_CAP_CASCADE | Yes |
PMC_CAP_EDGE | Yes |
PMC_CAP_INTERRUPT | Yes |
PMC_CAP_INVERT | Yes |
PMC_CAP_READ | Yes |
PMC_CAP_PRECISE | Unimplemented |
PMC_CAP_SYSTEM | Yes |
PMC_CAP_TAGGING | Yes |
PMC_CAP_THRESHOLD | Yes |
PMC_CAP_USER | Yes |
PMC_CAP_WRITE | Yes |
Event specifiers for Intel P4 PMCs can have the following common qualifiers:
active=
choiceany
both
none
single
The default is
“both
”.
cascade
edge
complement
mask=
qualifieros
precise
tag=
valuethreshold=
valueusr
If neither of the “os
” or
“usr
” qualifiers are specified, the
default is to enable both.
On Intel Pentium 4 processors with HTT, events are divided into two classes:
Only TS events are allowed for use with process-mode PMCs on Pentium-4/HTT CPUs.
The event specifiers supported by Intel P4 PMCs are:
p4-128bit-mmx-uop
[,mask=
flags]all
If an instruction contains more than one 128 bit MMX uop, then each uop will be counted.
p4-64bit-mmx-uop
[,mask=
flags]all
If an instruction contains more than one 64 bit MMX uop, then each uop will be counted.
p4-b2b-cycles
p4-bnr
p4-bpu-fetch-request
[,mask=
qualifier]tcmiss
The default qualifier is also
“mask=tcmiss
”.
p4-branch-retired
[,mask=
flags]+
’
separated strings:
mmnp
mmnm
mmtp
mmtm
The default qualifier counts all four kinds of branches.
p4-bsq-active-entries
[,mask=
qualifier]+
’ separated set of the following
flags:
req-type0
,
req-type1
Bit “req-type1
” is
the MSB for this two bit number.
req-len0
,
req-len1
Bit “req-len1
” is
the MSB for this two bit number.
req-io-type
req-lock-type
req-lock-cache
req-split-type
req-dem-type
req-ord-type
mem-type0
,
mem-type1
,
mem-type2
Bit “mem-type2
” is
the MSB of this 3-bit number.
The default qualifier has all the above bits set.
Edge triggering using the
“edge
” qualifier should not be
used with this event when counting cycles.
p4-bsq-allocation
[,mask=
qualifier]+
’ separated set of the following
flags:
req-type0
,
req-type1
Bit “req-type1
” is
the MSB for this two bit number.
req-len0
,
req-len1
Bit “req-len1
” is
the MSB for this two bit number.
req-io-type
req-lock-type
req-lock-cache
req-split-type
req-dem-type
req-ord-type
mem-type0
,
mem-type1
,
mem-type2
Bit “mem-type2
” is
the MSB of this 3-bit number.
The default qualifier has all the above bits set.
This event is usually used along with the
“edge
” qualifier to avoid multiple
counting.
p4-bsq-cache-reference
[,mask=
qualifier]+
’ separated list of the following
keywords:
rd-2ndl-hits
rd-2ndl-hite
rd-2ndl-hitm
rd-3rdl-hits
rd-3rdl-hite
rd-3rdl-hitm
rd-2ndl-miss
rd-3rdl-miss
wr-2ndl-miss
The default is to count all the above events.
p4-execution-event
[,mask=
flags]+
’ characters:
nbogus0
,
nbogus1
, nbogus2
,
nbogus3
bogus0
,
bogus1
, bogus2
,
bogus3
This event requires additional (upstream) events to be allocated to perform the desired uop tagging. The default is to set all the above flags. This event can be used for precise event based sampling.
p4-front-end-event
[,mask=
flags]+
’ characters:
This event requires additional (upstream) events to be allocated to perform the desired uop tagging. The default is to select both kinds of events. This event can be used for precise event based sampling.
p4-fsb-data-activity
[,mask=
flags]+
’ separated set of the following
flags:
drdy-drv
drdy-own
drdy-other
dbsy-drv
dbsy-own
dbsy-other
Flags “drdy-own
” and
“drdy-other
” are mutually
exclusive. Flags “dbsy-own
” and
“dbsy-other
” are mutually
exclusive. The default value for qualifier is
“drdy-drv+drdy-own+dbsy-drv+dbsy-own
”.
p4-global-power-events
[,mask=
flags]running
p4-instr-retired
[,mask=
flags]+
’ characters:
nbogusntag
nbogustag
bogusntag
bogustag
The default qualifier counts all the above kinds of instructions.
p4-ioq-active-entries
[,mask=
qualifier]
[,busreqtype=
req-type]Qualifier qualifier is a
‘+
’ separated set of the following
flags:
all-read
all-write
mem-uc
mem-wc
mem-wt
mem-wp
mem-wb
own
other
prefetch
The default value for qualifier is to enable all the above flags.
The req-type qualifier is a 5-bit number can be additionally used to select a specific bus request type. The default is 0.
The “edge
” qualifier
should not be used when counting cycles with this event. The exact
behavior of this event depends on the processor revision.
p4-ioq-allocation
[,mask=
qualifier]
[,busreqtype=
req-type]Qualifier qualifier is a
‘+
’ separated set of the following
flags:
all-read
all-write
mem-uc
mem-wc
mem-wt
mem-wp
mem-wb
own
other
prefetch
The default value for qualifier is to enable all the above flags.
The req-type qualifier is a 5-bit number can be additionally used to select a specific bus request type. The default is 0.
The “edge
” qualifier is
normally used with this event to prevent multiple counting. The exact
behavior of this event depends on the processor revision.
p4-itlb-reference
[mask=qualifier]+
’
characters.
If no qualifier is specified the default is to count all the three kinds of ITLB translations.
p4-load-port-replay
[,mask=
qualifier]split-ld
The default value for qualifier is
“split-ld
”.
p4-mispred-branch-retired
[,mask=
flags]nbogus
p4-machine-clear
[,mask=
flags]+
’ characters:
clear
moclear
smclear
Use qualifier “edge
” to
get a count of occurrences of machine clears. The default qualifier is
“clear
”.
p4-memory-cancel
[,mask=
event-list]+
’ characters:
st-rb-full
64k-conf
If event-list is not specified, then the default is to count both kinds of events.
p4-memory-complete
[,mask=
event-list]+
’ separated list of the following
flags:
lsc
ssc
The default is to count both kinds of operations.
p4-mob-load-replay
[,mask=
qualifier]+
’ separated list of the following
flags:
no-sta
no-std
partial-data
unalgn-addr
The default qualifier is no-sta+no-std+partial-data+unalgn-addr.
p4-packed-dp-uop
[,mask=
flags]all
p4-packed-sp-uop
[,mask=
flags]all
p4-page-walk-type
[,mask=
qualifier]+
’ separated list of the following
keywords:
The default value for qualifier is
“dtmiss+itmiss
”.
p4-replay-event
[,mask=
flags]+
’ separated set of the following
strings:
This event requires additional (upstream) events to be allocated to perform the desired uop tagging. The default qualifier counts both kinds of uops. This event can be used for precise event based sampling.
p4-resource-stall
[,mask=
flags]sbfull
p4-response
p4-retired-branch-type
[,mask=
flags]+
’ separated list of
strings:
conditional
call
return
indirect
The default qualifier counts all the above branch types.
p4-retired-mispred-branch-type
[,mask=
flags]+
’ separated list of strings:
conditional
call
return
indirect
The default qualifier counts all the above branch types.
p4-scalar-dp-uop
[,mask=
flags]all
p4-scalar-sp-uop
[,mask=
flags]all
p4-snoop
p4-sse-input-assist
[,mask=
flags]all
p4-store-port-replay
[,mask=
qualifier]split-st
The default value for qualifier is
“split-st
”.
p4-tc-deliver-mode
[,mask=
qualifier]+
’ characters:
DD
DB
DI
BD
BB
BI
ID
IB
If there is only one logical processor in the processor
package then the qualifier for logical processor 1 is ignored. If no
qualifier is specified, the default qualifier is
“DD+DB+DI+BD+BB+BI+ID+IB
”.
p4-tc-ms-xfer
[,mask=
flags]cisc
p4-uop-queue-writes
[,mask=
flags]+
’ characters:
from-tc-build
from-tc-deliver
from-rom
The default qualifier counts all the above kinds of uops.
p4-uop-type
[,mask=
flags]+
’ characters:
tagloads
The default qualifier counts both kinds of uops.
p4-uops-retired
[,mask=
flags]+
’ characters:
The default qualifier counts both kinds of uops.
p4-wc-buffer
[,mask=
flags]+
’ characters:
wcb-evicts
wcb-full-evict
The default qualifier counts both kinds of evictions.
p4-x87-assist
[,mask=
flags]+
’ characters:
fpsu
fpso
poao
poau
prea
The default qualifier counts all the above types of instruction retirements.
p4-x87-fp-uop
[,mask=
flags]all
If an instruction contains more than one x87 floating-point uops, then all x87 floating-point uops will be counted. This event does not count x87 floating-point data movement operations.
p4-x87-simd-moves-uop
[,mask=
flags]+
’ characters:
The default is to count all uops. (Errata) This event may be affected by processor errata N43.
PMC cascading support is currently poorly implemented. While
individual event counters may be allocated with a
“cascade
” qualifier, the current API
does not offer the ability to name and allocate all the resources needed for
a cascaded event counter pair in a single operation.
Support for precise event based sampling is currently unimplemented.
The following table shows the mapping between the PMC-independent aliases supported by Performance Counters Library (libpmc, -lpmc) and the underlying hardware events used.
pmc(3), pmc.atom(3), pmc.core(3), pmc.core2(3), pmc.iaf(3), pmc.k7(3), pmc.k8(3), pmc.p5(3), pmc.p6(3), pmc.soft(3), pmc.tsc(3), pmclog(3), hwpmc(4)
The pmc
library first appeared in
FreeBSD 6.0.
The Performance Counters Library (libpmc, -lpmc) library was written by Joseph Koshy <jkoshy@FreeBSD.org>.
October 4, 2008 | midnightbsd-3.1 |