Posts Tagged ‘API
System Topology Enumeration Using CPUID Extended Topology Leaf
The algorithm of system topology enumeration can be summarized as three phase of operation:
- Derive “mask width” constants that will be used to extract each Sub IDs.
- Gather the unique APIC IDs of each logical processor in the system, and extract/decompose each APIC ID into three sets of Sub IDs.
- Analyze the relationship of hierarchical Sub IDs to establish mapping tables between OS’s thread management services according to three hierarchical levels of processor topology.
Table 1 Modular Structure of Deriving System Topology Enumeration Information
Table 1 shows an example of the basic structure of the three phases of system wide topology as applied to processor topology and cache topology. Figure 1 outlines the procedures of querying CPUID leaf 11 for the x2APIC ID and extracting sub IDs corresponding to the “SMT”, “Core”, “physical package” levels of the hierarchy.
Figure 1 Procedures to Extract Sub IDs from the x2APIC ID of Each Logical Processor
System topology enumeration at the application level using CPUID involves executing CPUID instruction on each logical processor in the system. This implies context switching using services provided by an OS. On-demand context switching by user code generally relies on a thread affinity management API provided by the OS. The capability and limitation of thread affinity API by different OS vary. For example, in some OS, the thread affinity API has a limit of 32 or 64 logical processors. It is expected that enhancement to thread affinity API to manage larger number of logical processor will be available in future versions.
Storage Systems and the Grid Storage API
In a Grid environment, data may be stored in different locations and on different devices with different characteristics. The mechanism neutrality implies that applications should not need to be aware of the specic low-level mechanisms required to access data at a particular location. Instead, applications should be presented with a uniform view of data and with uniform mechanisms for accessing that data. These requirements are met by the storage system abstraction and our grid storage API. Together, these dene our data access service.
1. Data Abstraction: Storage Systems
We introduce as a basic data grid component what we call a storage system, which we dene as an entity that can be manipulated with a set of functions for creating, destroying, reading, writing, and manipulating the attributes of named sequences of bytes called file instances. Notice that our definition of a storage system is a logical one: a storage system can be implemented by any storage technology that can support the required access functions. Implementations that target Unix file systems, HTTP servers, hierarchical storage systems such as HPSS, and network caches such as the Distributed Parallel Storage System (DPSS) are certainly envisioned. In fact, a storage system need not map directly to a single low-level storage device. For example, a distributed file system that manages files distributed over multiple storage devices or even sites can serve as a storage system, as can an SRB system that serves requests by mapping to multiple storage systems of different types.
Our definition of a file instance is also logical rather than physical. A storage system holds data, which may actually be stored in a file system, database, or other system; we do not care about how data is stored but specify simply that the basic unit that we deal with is a named sequences of uninterpreted bytes. The use of the term “file instance” for this basic unit is not intended to imply that the data must live in a conventional file system. For example, a data grid implementation might use a system such as SRB to access data stored within a database management system. A storage system will associate with each of the le instances that it contains a set of properties, including a name and attributes such as its size and access restrictions. The name assigned to a leinstance by a particular storage system is arbitrary and has meaning only to that storage system. In many storage systems, a name will be a hierarchical directory path. In other systems such as SRB, it may be a set of application metadata that the storage system maps internally to a physical leinstance.
2. Grid Storage API
The behavior of a storage system as seen by a data grid user is defined by the data grid storage API, which defines a variety of operations on storage systems and file instances. Our understanding of the functionality required in this API is still evolving, but it certainly should include support for remote requests to read and/or write named file instances and to determine file instance attributes such as size. In addition, to support optimized implementation of replica management services (discussed below) we require a third party transfer operation used to transfer the entire contents of a le instance from one storage system to another.
While the basic storage system functions just listed are relatively simple, various data grid considerations can increase the complexity of an implementation. For example, storage system access functions must be integrated with the security environment of each site to which remote access is required. Robust performance within higher-level functions requires reservation capabilities within storage systems and network interfaces. Applications should be able to provide storage systems with hints concerning access patterns, network performance, and so forth that the storage system can use to optimize its behavior. Similarly, storage systems should be capable of characterizing and monitoring their own performance; this information, when made available to storage system clients, allows them to optimize their behavior. Finally, data movement functions must be able to detect and report errors. While it may be possible to recover from some errors with in the storage system, other errors may need to reported back to the remote application that initiated the movement.



