Skip to main content

Cornelis Technical Documentation

5.2.14. Integrating Other Service Applications with vFabrics

5.2.14.1. Unicast Applications

Virtual Fabrics are designed to integrate automatically within the open fabrics alliance stack. Applications that take advantage of standard connection establishment and address resolution (also called Path Record resolution) mechanisms such as RDMA CM, IB CM, and ibacm automatically make the necessary SA queries. These SA queries allow the fabric manager to provide Path Records with the appropriate settings for SL, PKey, MTU, and other parameters used for fabric communications.

The standard flow for Path Records and virtual fabric address resolution is as shown in the following figure:

Figure 79. Address Resolution and vFabrics
Address Resolution and vFabrics


As shown in the flow diagram, the SA makes use of the Service Identifier (Service Id) supplied in the PathRecord query to identify the application making the request. The source and destination nodes specified in the query imply the possible set of DeviceGroups that are relevant, and then the SA finds the virtual fabric(s) that represent the intersection of the given application with the possible device groups. To ensure this process works smoothly and correctly, it's important that the Fabric Manager configuration of Virtual Fabrics specifies the appropriate set of Service Ids for the application. Some applications may provide direct configuration of the Service Id as a 64-bit number. This is especially true of those using the IB CM or ibacm directly. Applications using the RDMA CM specify a protocol and port number that are used to compose a 64-bit service ID.

RDMA Service IDs take the format 0x0000000001NNPPPP where N is the IANA protocol number and P is the port number the application needs to bind on, both in hexadecimal. For example, Lustre is known to use the TCP protocol (0x06) on port 987 (0x03DB) so its Service ID is expected to be 0x00000000010603DB. Similarly, IBM Storage Scale uses the TCP protocol (0x06) on port 1191 (0x04A7), so its Service ID is expected to be 0x00000000010604A7. The opafm.xml file contains a predefined rule for Lustre, iSER, and general RDMA apps as well as a few examples of other applications.

In some cases, the application may use nonstandard, out-of-band, or ad-hoc mechanisms to establish connections. In that case, the key connection parameters of PKey, BaseSL, and MTU will need to be specified a priori. This approach is shown in the following figure.

Figure 80. vFabric Cheats for QoS and Security
vFabric Cheats for QoS and Security


This process may be manual. In this case, the system administrator must make explicit choices of BaseSL, PKey, and MTU in the Fabric Manager configuration and then must provide those exact same values to the application. If mistakes are made, the application could end up running on the wrong QoS level or perhaps even fail to start up due to being unable to communicate with the desired nodes in the fabric.

As discussed in Fabric Manager Integrating Job Schedulers with Virtual Fabrics, Omni-Path provides assorted tools that may be used to identify the PKey, BaseSL, and MTU associated with a given virtual fabric. In some cases, those tools may be used to automate the discovery of the BaseSL, PKey, and MTU and then provide them directly to the application, thus reducing the risk of human mistakes and permitting future starts of the application to correctly obtain any changes to the configuration of the virtual fabric.Integrating Job Schedulers with Virtual Fabrics

5.2.14.2. Multicast Applications

Applications that take advantage of standard multicast mechanisms (also known as Multicast Member Records) either directly, through IPoFabric (also known as IPoIB), or ibacm, automatically receive the proper SL, PKey, and MTU as part of the multicast group parameters.

The standard flow for Multicast Member Records when creating a new multicast group and virtual fabric address resolution is shown in the following figure.

Figure 81. Multicast Create and vFabrics
Multicast Create and vFabrics


As shown in the flow diagram in the previous figure, the SA uses the Multicast Group ID (MGID) supplied in the Multicast Member Record to identify the application making the request. The source nodes specified in the query imply the possible set of DeviceGroups that are relevant, and then the SA finds the virtual fabric(s) that represent the intersection of the given application with the possible device groups. To ensure this process works smoothly and correctly, it is important that the FM configuration of Virtual Fabrics specifies the appropriate set of MGIDs for the application. Some applications may provide direct configuration of the MGID as a 128-bit number. This is especially true of those using the IB SA or ibacm directly. Applications using IPoFabric (also called IPoIB) specify an IP multicast group and these are used to compose a 128-bit MGID.

For IPv4, MGIDs are composed by IPoFabric as follows:

MGID = 0xffFS401bPPPP0000:00000000G

where F=flags, S=scope, P=PKey, and G=IP Multicast Group

For IPv6, MGIDs are composed by IPoFabric as follows:

MGID = 0xffFS601bPPPPGGGG:GGGGGGGGGGGGGGGG

where F=flags, S=scope, P=PKey and G=IP Multicast Group

When an application joins an existing multicast group through the Multicast Member Record mechanisms, the steps in the following figure occur:

Figure 82. Multicast Join and vFabrics
Multicast Join and vFabrics


In this case, the preexisting multicast group will already have an assigned SL, PKey, and MTU. The new join request will merely need to confirm its ability to communicate with that PKey.

There is limited support for nonstandard participation in multicast groups. While groups may be precreated in the opafm.xml configuration file (by specifying an appropriate MulticastGroup section with all the relevant parameters for the multicast group), the act of joining the group requires the use of Multicast Member Records. Such joins may be performed using the proxy join mechanism, whereby one node issues Multicast Member Record join or leave requests on behalf of another node. In that case, out-of-band mechanisms may be used by the application to communicate or discover the address (for example, Multicast LID) and parameters assigned to the multicast group by the FM.

The current set of multicast groups, along with their SL, PKey, and MTU and members may be listed using the opashowmc command. See the CN5000 Commands Guide, opashowmc for more information.