Solaris Processor Sets Made Easy

By Dr. Matthias Laux, June 2001

Abstract

Processor sets are an interesting feature of the Solaris operating environment that allow the binding of processes to groups of CPUs in a multiprocessor system. There are numerous reasons why processor sets can be useful:

Single processes requiring a dedicated CPU can be bound to a processor set, thus ensuring access to the required resources and getting rid of costly task switches.
Different applications running on a large system can be assigned to separate processor sets. Migration of CPUs can be used to dynamically control the resources available to each application, for example, as a reaction to application peak loads.
Scalability tests are made easy by confining an application to different numbers of CPUs on a multiprocessor system.
In a benchmark scenario, processor sets are frequently employed with complex requirements, for example, separating application processes from CPUs handling network interrupts, or separating database work processes from central processes like log or database writer.

This article describes the support Solaris offers for processor set creation and process binding. In addition, a modular set of Perl scripts is described that facilitate the handling of even very complex processor sets and process binding requirements. The application of these scripts in a real-life example scenario (based on two important ISV applications: the SAP R/3 ERP system and the Oracle RDBMS) is demonstrated.

Note that the scripts described below are available as a compressed tar archive.

Identifying CPUs and Their State

When process binding and processor set creation is considered, the first step is to identify the available CPUs in the system and their state.

The basic tools that Solaris makes available to accomplish these tasks are psrinfo to identify CPUs and their state and psradm to change the state of a CPU if necessary. The corresponding system calls are processor_info and p_online, respectively (see the Appendix for a complete listing of all relevant OS commands and system calls).

A CPU is in one of several possible states, which are reported by psrinfo:

State on-line (the default state: available for LWP scheduling and interrupt handling)
State off-line (not available for LWP scheduling)
State no-intr (available for LWP scheduling, but does not handle network or I/O interrupts)

Only CPUs in states 1 and 3 can be used for process binding and assigned to processor sets. At the same time, CPUs can already be assigned to processor sets. These can be identified using psrset -p. The state of a CPU (on-line, off-line, or no-intr) can be changed using the psradm command. The OS will ensure that at least one CPU is able to handle interrupts and process LWPs. Note that the CPUs handling interrupts can de identified, for example using mpstat.

Traditional Process Binding

The capability of binding a process to a specific CPU has been available in Solaris for a long time. On the OS level, this binding is accomplished using the pbind command, and the processor_bind system call can be used in programs. Note that pbind operates on the process level only, while processor_bind also supports LWP binding.

Using this approach, all LWPs of a given process are bound to the specified CPU, and the binding is not exclusive, that is, other LWPs in the system may get scheduled to the same CPU. Thus, it is not possible to exclusively reserve a CPU for a given process. It is also not easy to allocate more than one CPU to a multithreaded application since this would have to be done programmatically using processor_bind, which cannot directly handle any scheduling behaviour. This is where processor sets come in.

Solaris Processor Sets

With release 2.6, Solaris added support for processor sets, a logical partitioning of the CPUs in a multiprocessor system.

Processor sets are maintained using the psrset command. For example,

psrset -c 3 4 5

would create a processor set using the CPUs 3, 4, and 5 (provided they are available for assignment to such a set). Each processor set in a system has a unique ID (=> 1), which is returned by the OS after set creation. Note that processor sets are not persistent, that is, they're gone after a reboot of the system.

Only processes and LWPs that have been explicitly bound to a processor set via psrset -b or pset_bind are scheduled on the CPUs of this set; these CPUs are reserved exclusively. This allows you to achieve the goals outlined in the Abstract, so the remainder of this article is concerned with processor sets only.

Note that it is not possible to bind any other processes or LWPs to CPUs assigned to a user-created processor set using pbind or processor_bind. As a consequence of this, a CPU to which a process has been bound using the latter methods cannot be assigned to a user processor set as long as this binding is present.

It is also not possible to bind every CPU in a multiprocessor system to a user processor set. At least one CPU needs to remain unbound since otherwise the kernel itself would not have any CPU left for its own processing.

Note that psrset also offers the capability to change the interrupt handling state of all CPUs within a given set. This is more convenient than using psradm for each CPU in the set.

Another Solaris tool offering support for processor sets is mpstat, which reports statistical data like user, system, and idle time for each CPU. Using the -P flag, these statistics can be limited to a specific processor set. Using the -p flag adds another column to the output, indicating the processor set each CPU in a system belongs to.

Creating a Processor Set - A More Convenient Approach

There are four basic steps involved in binding one or more processes to a processor set:

Check the available CPUs and their state.
Create the processor set.
Identify the required process IDs (PIDs).
Bind the processes to the processor set.

The basic Solaris tools described so far offer the complete functionality required to handle these tasks. However, there are some shortcomings:

To create a processor set, individual CPU IDs need to be specified. These may not be contiguous in a multiprocessor system (for example, if not all boards are populated) or CPUs may not be available for various reasons (off-lined, already assigned to other processor sets, and so on).
A more realistic use case would be "create a processor set with N CPUs".
To bind processes, process IDs need to be determined. In complex scenarios involving many processes, this can be very inconvenient.
A more realistic use case would be "bind all processes of a specific type to a specified processor set".

The Solaris commands are too generic to handle these use cases, and thus a modular set of Perl scripts to easily handle more complex tasks was developed as an abstraction layer between the user and the Solaris commands. These scripts can then be reused by even higher-level scripts to automatically generate very complex processor sets and process binding patterns on distributed host configurations.

check_available_cpus

Before creating a processor set, a snapshot of the CPUs and their state in the system is required. The script check_available_cpus combines the output of psrinfo and psrset -p and presents the CPUs in the system sorted according to their state and availability, ultimately satisfying the requirement of identifying those CPUs that are available for assignment to a processor set.

An example output could look like this:

10 CPU total in system
    0  1  2  3  4  5  6  7  10 11
 8 CPU on-line
    0  1  2  4  5  6  10 11
 3 CPU no-intr
    2  4  10
 2 CPU off-line
    3  7
 4 CPU assigned to set(s)
    0  1  2  10
 4 CPU available for assignment
    4  5  6  11

Now a complete picture of the CPUs in the system is available, which will be used by the next script in the stack to create new processor sets. Note that check_available_cpus supports -i as a command line switch, in which case CPUs in the no-intr state will not be considered as available for assignment.

make_psrset

After it has been determined which CPUs can be used for assignment to a new processor set, this set needs to be created. Rather than explicitly specifying the CPU numbers determined above as arguments to psrset -c, make_psrset handles the basic use case of creating a processor set with a given number of CPUs:

make_psrset 4

would create a set using the first 4 available CPUs as determined by check_available_cpus. Note that make_psrset also supports a flag -e to exclude specific CPUs from being assigned to a processor set, should such a need arise.

Some Convenience Scripts

This section describes some more scripts that can be useful when dealing with processor sets. These scripts are utility scripts facilitating common tasks related to processor set and process management.

delete_psrsets

This script is a convenience wrapper for the Solaris psrset -d command. It identifies all processor sets in the system and removes them.

identify_procs_psrsets

This script gives a complete picture of all processor sets in the system and the processes bound to each of them. Lists of processes are also written to files for each processor set for other postprocessing purposes. identify_procs_psrsets can be handy to quickly gain an overview of the system and to check whether processor set creation and process binding was successful. Note that identify_procs_psrsets also supports -s as a command line switch, limiting the output to a quick summary of the most important data.

setup_interrupt_cpus

This is a first example application built on top of the other scripts described in this article. setup_interrupt_cpus creates a processor set with the specified size and enables interrupt handling for the CPUs in this set (using psrset -n). Interrupt handling is disabled (using psradm -i) for all other online CPUs.

In scenarios where servers have to handle heavy network traffic, isolating CPUs handling interrupts into a separate processor set can be a useful performance tuning approach. Since these CPUs can be consumed by this task to a significant percentage, application processes being scheduled to the same CPU will experience a high rate of context switches, especially given the high priorities of the interrupts. This can significantly impact the performance of such application processes.

Note, however, that enabling and disabling interrupt handling for a CPU affects both network and I/O interrupts, so it is crucial to monitor application, network, and I/O performance when modifying the interrupt handling state of CPUs. Experience has shown that under heavy network and I/O load, I/O performance suffers when the interrupts are just constrained to a small set of CPUs. In these cases, it is best to identify the CPUs handling network interrupts (for example, using mpstat) and isolate them in a separate processor set. When interrupt handling is enabled for all CPUs, the I/O interrupts are effectively distributed over many more CPUs and thus network and I/O performance is improved, and consequently also application performance.

Identifying Processes

Now that a processor set can be created, the processes to be bound to it need to be identified. Ideally, this should be possible at an abstract level rather than by specifying lists of PIDs. So the task is to map an abstract description of a process (or set of processes) to a PID (or a list of PIDs).

Unfortunately, there is no general solution to this. The natural first approach is to use ps, typically with some flags like -ef, to determine the process descriptions in the CMD column of the output. This, however, is not a panacea, since there are numerous cases where different process types are listed with the same data in this column. SAP R/3 work processes or IBM DB2 database processes are just two examples where this is true.

Thus, in more complicated scenarios, it is necessary to employ a different strategy to identify processes. In the example scenario, scripts will be described satisfying the sole purpose of identifying PIDs of a specific process type in an application. They can then -- following the modular approach -- be re-used by higher-level scripts.

Sometimes all processes of an application need to be bound to a processor set, but it is difficult to know which processes actually are started by the application. If all application processes are spawned by one parent process, one approach is to use the Solaris ptree command to identify the process hierarchy of the application. Using the PID of any process of the application,

ptree <PID>

prints the process tree, that is, all parents and children for this process. The top-level parent process PID can then be used in a second call to ptree to get the entire process tree for the application. In a more complicated scenario, for example if an application is started by several initial processes, more complicated combinations of ps and ptree may be required.

Another approach is to look for application-specific tools that could be used to retrieve the required information. One example is (dpmon used to identify the dispatcher process of an SAP R/3 instance).

Sometimes it can also be useful to use the BSD variant of ps, which can be found under /usr/ucb/ps. The standard ps has the disadvantage that the output in the CMD is truncated and so information that might be used to separate processes is not available. The BSD variant offers flags to display a wider output listing with the full command line used to start up the processes. In addition, the information printed is sometimes different from the standard variant. In the case of the IBM DB2 mentioned above, using

/usr/ucb/ps axw

correctly displays the different process types and thus allows you to distinguish them, while the standard variant prints the same process name in the CMD column.

So, to summarize, there is not one single strategy to satisfy abstract process identification requirements. Built on the approaches described in this section, the recommendation is to build scripts to manage this task for specific applications and then re-use them as modules in higher-level scripts as outlined in the next section.

Example Scenario

Scenario Description

This last section is devoted to an example application of the concepts described and the scripts developed so far. The idea is to demonstrate that even complex process binding requirements can easily be handled by adding some script support in between the user requirements and the basic Solaris tools. The individual scripts are not described on a detailed code basis since some of them are quite lengthy, but the basic ideas and the integration of the concepts and scripts of the previous sections are outlined. Since the source code of these scripts is available, their modification and extension to meet different requirements should be straightforward.

The scenario described here is based on two important ISV applications: the SAP R/3 ERP system and the Oracle RDBMS. The first layer of scripts described below covers these applications individually, while the second layer then deals with them in a joint scenario (SAP R/3 application servers connecting to an Oracle database).

Identifying Processes

identify_r3_procs

The task of this script is to identify the PIDs of all processes of an R/3 instance. The problem to be solved here is that several of the main R/3 processes (like dispatcher, dialog, update, batch, spool and enqueue) show up with the same value in the CMD column of the ps output. Thus, it is not possible to tell them apart using this information only. identify_r3_procs employs different strategies to identify the PIDs of all processes related to an R/3 instance:

Call dpmon to identify the different types of R/3 work processes.
Use the ps -ef output to identify dispatcher, message server, and some other processes.
Use /usr/proc/bin/ptree to identify sapstart and all remaining processes created by sapstart (since sapstart is at the top of the instance process hierarchy, creating the dispatcher, which in turn creates the work processes).

The details of how these various strategies are combined can be seen by inspecting the script source code.

Example:

identify_procs_psrsets C46 D14

could produce the following output (process type - PID):

DIA  26730
DIA  26731
DIA  26732
DIA  26733
DIA  26734
DIA  19704
DIA  26735
DIA  26736
DIS  19673
GWR  19696
OTH  19666
SEP  19674
UPD  19709
UPD  19710
UPD  19711

The process types are identified using three-letter acronyms. The complete list of these acronyms is:

DIA - Dialog process
BTC - Batch process
SPO - Spool process
ENQ - Enqueue process
UPD - Update 1 process
UP2 - Update 2 process
DIS - Dispatcher
MSG - Message server
COP - Systemlog-collector
SEP - Systemlog-sender
GWR - Gateway-process
OTH - All other processes (e. g. sapstart)

These output lists can be processed by other scripts. An example of this is manage_psrsets.

identify_oracle_procs

The task of this script is to identify the PIDs of all processes of an Oracle instance. This task is much simpler than for an R/3 instance since the processes can readily be identified by the value in the CMD column of the ps output. The invocation and the output format are very similar to identify_r3_procs, except for the three-letter acronyms, which are of course different:

DBW - DB writer
LGW - Log writer
LSR - Listener
SHA - Shadow process
CKP - Checkpoint process
PMO - PMON process
SMO - SMON process
SNP - SNP process
REC - RECO process

These output lists can also be processed by other scripts like manage_psrsets.

Note that full support for the IBM DB2 database is planned for a future release.

Putting it all together

Now that processor sets can be created and processes can be identified, both tasks need to be brought together. This is where manage_psrsets comes in.

manage_psrsets

manage_psrsets is the master script accepting a wide variety of process specifiers to be bound to a processor set:

-f=<file>                    : PIDs specified in the input file 
                               <file>
-u=<user>                    : Processes owned by user <user>
-c=<command>                 : Processes whose CMD field in the 
                                  ps output matches <command>
-r=<SID>:<instance>:<procs>  : Processes of an SAP R/3 instance
-o=<SID>:<procs>             : Processes of an Oracle instance

The processes thus identified are then bound to a newly created processor set. The cool feature here is that any number of such specifiers can be combined in one call of manage_psrsets.

SAP R/3 instances are specified by the ternary identifier <SID>:<instance>:<procs>, where <procs> is one of the three-letter identifiers introduced in the last section (or the additional identifier ALL, selecting all processes of an instance). Similarly, the binary identifier <SID>:<procs> specifies an Oracle instance and a set of processes, again one of the three-letter identifiers listed in the last section, or the additional identifier ALL.

Some examples:

Bind all processes of user orac46 and the PIDs listed in pidlist.txt to a new set of 5 CPUs:

manage_psrsets -u=orac46 -f=pidlist.txt 5

Bind the dispatchers of instances D10, D11, and D12 of R/3 system C46 and all processes of instance DVEBMGS00 in the R/3 system BIN to a new processor set of 10 CPUs:

manage_psrsets -r=C46:D10:DIS -r=C46:D11:DIS 
-r=C46:D12:DIS -r=BIN:DVEBMGS00:ALL 10

Note that the latter example could be simplified using a shortcut for the <instance> specifier supported by make_psrsets:

manage_psrsets -r=C46:D.10.12:DIS -r=BIN:DVEBMGS00:ALL 10

D.<A>.<B> effectively selects all instances between D<A> and D<B>. This shorthand notation can be helpful in large SAP R/3 environments.

manage_psrsets also supports other flags like the -e option to exclude specific CPUs (these are passed on directly to make_psrset) or the -d flag to initially delete all existing processor sets, completing make_psrsets to a comprehensive solution to handle even very complex process binding requirements.

match_db_r3_procs

This is an example application built on top of some of the other scripts described in this article. match_db_r3_procs identifies the database processes connected to SAP R/3 work processes. Currently only Oracle is supported; full support for IBM DB2 is planned in a future release.

First, all Oracle process IDs and their matching counterparts (SAP R/3 processes) are obtained from the v$session and v$process views. Then, all instances known to the SAP R/3 message server are retrieved using the SAP lgtst tool. For each of these instances, identify_r3_procs returns the process types and the PIDs. This information is matched with the output obtained from the Oracle v$ views to produce the desired output.

This output can be used in manage_psrsets, for example, to create a processor set for all the database processes handling the data traffic for a specific SAP R/3 instance, or to create a processor set for all the database processes handling data traffic of SAP R/3 dialog processes.

Summary

The scripts described in this article can significantly simplify the task of creating processor sets with complex process binding requirements. They form an additional abstraction layer on top of the Solaris tools psrinfo and psrset, and free the user from the tedious tasks of explicitly dealing with CPU and process IDs. The scripts and applications built on top of them are regularly used in many projects including high-end benchmarks.

Resources

Mauro, Jim, and Richard McDougall Solaris Internals: Core Kernel Architecture: Prentice Hall PTR/Sun Microsystems Press, 2000.

Wall, Larry, and Tom Christiansen and Jon Orwant Programming Perl: O'Reilly & Associates, 2000.

Appendix Solaris Support for Process Binding and Processor Sets

CPU information
`psrinfo`	OS command	Display CPU type and state
`processor_info`	System call	Get CPU type and state
CPU administration
`psradm`	OS command	Change CPU state
`p_online`	System call	Get or change CPU state
Process binding
`pbind`	OS command	Bind (or unbind) a process to a CPU
`processor_bind`	System call	Bind (or unbind) a process or LWP to a CPU
Processor set administration and process binding
`psrset`	OS command	Administer processor sets
`pset_create`	System call	Create a processor set
`pset_assign`	System call	Assign (or remove) a CPU to a processor set
`pset_destroy`	System call	Destroy a processor set
`pset_info`	System call	Get information about a processor set
`pset_bind`	System call	Bind (or unbind) a process or LWP to a processor set
Miscellaneous
`mpstat`	OS command	Display CPU statistics, optionally on a processor set basis
`ptree`	OS command	Display process hierarchy in which a process runs

About the Author

Dr. Matthias Laux is a system engineer working in the Global SAP-Sun Competence Center in Walldorf, Germany. His main interests are Java/J2EE/XML infrastructure and programming, databases and SAP benchmarking. Although he also has a background in aerospace engineering and HPC / parallel programming, today his languages of choice are Java and Perl.

June 2001