Skip to main content

Cornelis Technical Documentation

5.2.10.6. IPoIB and vFabrics

vFabrics are configured within the hardware in the order in which they appear in the configuration file. When IPoIB runs, it uses the first PKey on the given port for the default (hfi1_0...) network device. Therefore, it is best to place the Networking/IPoIB vFabric first.

IPoIB starts with the PKey for the IPoIB interface and uses that to define the MGID of the VLAN’s broadcast multicast group. Many aspects of the IPoIB VLAN are defined by the multicast group itself. Among them is the MTU for the VLAN.

A given port or node can participate in more than one IPoIB subnet. Each such subnet must have its own unique PKey. For vFabrics other than the first, the PKey should be manually specified in the VirtualFabric section and the PKey must be supplied to the IPoIB. On some Linux systems with the CN5000 Omni-Path Fabric stack, additional IPoIB virtual interfaces (with an IB-interface such as ib0) can be created by a command such as:

echo 0x1234 > /sys/class/net/ib0/create_child

The PKey given is ORed with 0x8000 to define the PKey for the multicast group. This creates an ib0.9234 interface that can be assigned the appropriate IP address and IP parameters.

The operation of Linux with multiple IPoIB subnets is very similar to the use of IP over Ethernet when VLANs are being used. It is up to the administrator which network interfaces are actually used and assigned IP addresses. This is done using the standard ifcfg files.

5.2.10.6.1. Pre-Created Multicast Groups

Multicast Groups that are pre-created by the SM are part of the MulticastGroup subsection of the Multicast subsection within the Sm section. Each MulticastGroup section defines one or more Multicast Group that will be pre-created by the SM.

  • If neither a VirtualFabric nor PKey is specified for a given pre-created MulticastGroup, the group will be created for a single vFabric that contains the given MGID as an application and the remaining group properties (Rate, MTU, and SL).

  • If MGIDs are specified for the MulticastGroup section, the group must match exactly one VirtualFabric. If no vFabric matches the MGIDs, these implicit multicast groups will not be created and a warning will appear in the logs.

  • When no MGIDs are explicitly specified, the necessary IPoIB multicast groups for IPv4 and IPv6 are pre-created against the selected VirtualFabric/PKeys (all applicable vFabrics if no specific VirtualFabric/PKey selected). When such automatic precreation occurs, the PKey assigned to the vFabric is inserted into the MGIDs per the IPoIB standard.

Omni-Path requires a pre-created MulticastGroup for IPoIB. The configuration can specify other groups that are also needed. Every pre-created MulticastGroup can have one or more MGIDs. The MGID must be unique among all MulticastGroups within an FM instance and must be able to match a single vFabric.

When defined in the Common section, the MGID must be unique within all instances. MGIDs are specified as two 64-bit values separated by a colon (:). A single MGID can be specified as <MGID>0xabc:0x123567</MGID> in the MulticastGroup section. If no MGIDs are specified, four groups for IPv4 and/or four groups for IPv6 will be created depending on which application is defined in the matching vFabric (Networking, IPv4 or IPv6). The groups for IPv4 are broadcast, all nodes, all routers, and DNS, while the groups for IPv6 are all nodes, all routers, mDNS, and MLDv2-capable Routers.

The following is an example of the IPoIB IPv4 and IPv6 multicast for all vFabrics that have IPoIB as an application.

<MulticastGroup>
    <Create>1</Create>
    <MTU>2048</MTU>
    <Rate>25g</Rate>
    <!-- <SL>0</SL> -->
    <QKey>0x0</QKey>
    <TClass>0x0</TClass>
</MulticastGroup>

The following is an example of the IPoIB IPv4 and IPv6 multicast for 0x8002/0x0002 PKey. This can be useful if there are multiple IPoIB vFabrics and different multicast parameters (Rate, MTU, etc.) are desired for each IPoIB vFabric. Because IPoIB MGID includes PKey, we specify PKey, not VirtualFabric. MGIDs specified must use the Full PKey (0x8000-bit set).

<MulticastGroup> 
  <Create>0</Create> 
  <PKey>0x0002</PKey> 
  <!--  PKey 0x8002/0x0002 is part of IPv4 MGID below  -->  
  <!--  MGID = 0xffFS401bPPPP0000:00000000GGGGGGGG  --> 
  <!--  where F=flags, S=scope, P=PKey and G=IP Multicast Group --> 
  <MGID>0xff12401b80020000:0x00000000ffffffff</MGID> <!--  bcast --> 
  <MGID>0xff12401b80020000:0x0000000000000001</MGID> <!--  all nodes --> 
  <MGID>0xff12401b80020000:0x0000000000000002</MGID> <!--  all routers --> 
  <MGID>0xff12401b80020000:0x00000000000000fb</MGID> <!--  all mDNS --> 
  <!--  PKey 0x8002/0x0002 is part of IPv6 MGIDs below  --> 
  <!--  MGID = 0xffFS601bPPPPGGGG:GGGGGGGGGGGGGGGG  --> 
  <!--  where F=flags, S=scope, P=PKey and G=IP Multicast Group --> 
  <MGID>0xff12601b80020000:0x0000000000000001</MGID> <!--  all nodes --> 
  <MGID>0xff12601b80020000:0x0000000000000002</MGID> <!--  all routers --> 
  <MGID>0xff12601b80020000:0x0000000000000016</MGID> <!--  all MLDV2 routers --> 
  <MGID>0xff12601b80020000:0x00000000000000fb</MGID> <!--  all mDNS --> 
  <MTU>2048</MTU> 
  <Rate>100g</Rate> 
  <QKey>0x0</QKey> 
  <TClass>0x0</TClass> 
</MulticastGroup> 

If a MulticastGroup section specifies a PKey of 0x0002, it can only match a VirtualFabric with PKey 0x0002. By default, no VirtualFabric has such a PKey. Therefore, the MGIDs of this group can never be assigned to a vFabric. The system will fail to start because a configuration-defined multicast group could not find a matching vFabric.

The parameters in the following table define the policies and controls for the Multicast Groups.

Table 12. Pre-Created MulticastGroup Parameters

Parameter

Description

Create

Enables (1) or Disables (0) the creation of the given set of Multicast Groups. This provides a convenient way to disable a MulticastGroup section without needing to delete it from the configuration file.

VirtualFabric

or

PKey

Controls the virtual fabric for which the MulticastGroup is created. Alternatively, a PKey may be specified. If neither is specified, IPv4/6-type MGIDs will be created for all Virtual Fabrics that contain those MGIDs in the Application section.

Rate

The Static Rate for the multicast group. Only nodes and paths that have a rate greater than or equal to this value will be able to join the group. This also sets the upper bound for the performance of the multicast group.

Rate is specified in natural format with values of 25g, 50g, 75g, 100g, 150g, or 200g.

MTU

The MTU for the multicast group. Only nodes and paths that have an MTU greater than or equal to this value will be able to join the group. This also sets the upper bound for the message sizes that may be sent to the multicast group.

MTU is specified in natural format with values of 2048, 4096, 8192, or 10240.

SL

The Service Level for the Multicast Group. If specified, it must match the MulticastSL of the corresponding Virtual Fabric. If unspecified, this will default to the MulticastSL of said Virtual Fabric.

QKey

The QKey to be used for the group.

TClass

The Traffic Class to be used for the group.



MulticastGroup Matching Rule

All MGIDs of all pre-created MulticastGroups must match to a VirtualFabric.

For a multicast group to locate a matching vFabric, the following rules must apply:

  • Matching vFabric name. If no name, then do not use for matching.

  • Matching PKeys. If no PKey, then do not use for matching.

  • MC group Rate should be smaller than or equal to the vFabric Rate. If MC Rate is missing, Fabric Manager uses the default value of 25g.

  • MC group MTU should be smaller than or equal to vFabric MTU. If MC MTU is missing, Fabric Manager uses the default value of 2048.

  • The application property of the candidate vFabric should have the same MGID associated with the MC group MGID.

  • ALL MGIDs within a Multicast Group must match the same vFabric.

  • If for any reason, a single MGID matches more than one vFabric or no vFabric at all, that will be considered a configuration error and the FM will not start.

  • If implicit multicast groups do not find any matching vFabric, a warning will be issued. The groups will not be created.

5.2.10.6.2. MLID Sharing

Current Cornelis switches have a limit of 8k Multicast LIDs in their Multicast Forwarding Tables. If a configuration file specified 1000 VirtualFabrics, it would be possible to very quickly exhaust the entire MLID space. It would also be possible for a single tenant to consume a bulk of the MLIDs leaving few for others to use. To alleviate this, it is possible to configure the Subnet Manager to share MLIDs between multiple MGIDs. This is done in the MLIDShare section of the configuration file.

To limit the number of MLIDs consumed by a single tenant, the Subnet Manager can be configured to allocate a maximum number of MLIDs for a given partition. The following shows a sample MLIDShare configuration:

<MLIDShare>
    <Enable>0</Enable>
    <MGIDMask>0x0000000000000000:0x0000000000000000</MGIDMask>
    <MGIDValue>0x0000000000000000:0x0000000000000000</MGIDValue>
    <MaxMLIDs>8000</MaxMLIDs>
    <MaxMLIDsPerPKey>8</MaxMLIDsPerPKey>
</MLIDShare>

Since MGIDMask and MGIDValue are both zero, this ensures all MLIDs not part of another MLIDShare group will fall into this group. Since MaxMLIDsPerPKey is specified, no partition can consume more than 8 MLIDs. This means MGIDs of the same partition will end up sharing from the same pool of MLIDs. To ensure proper security between partitions, no MGIDs with different PKeys will ever share MLIDs.