3.4.2. Install the OPX Software
This section provides the instructions for installing the CN5000 OPX Software.
The zipped RPM package and package manager repository includes meta packages that point to the list of the RPMs and RPM source files for the HFI driver, which are required for installation depending on your software product and if you are using GPU or non-GPU components.
3.4.2.1. Prerequisites
Where applicable, ensure the following has been completed before installing the software:
For ROCm or CUDA requirements, refer to AMD GPU Requirements and NVIDIA GPU Requirements, respectively.
Customers requiring dual-fabric with CN5000 SuperNIC and InfiniBand HCAs installed must use the inbox RDMA/ofed stack to support the InfiniBand HCA. The MLNX_OFED/MOFED stack is incompatible with the
hfi1driver.
3.4.2.2. Procedure
The following instructions provide the steps for installing the host software on the first management node and then the remaining nodes.
Note
If you plan to use a local repository and the TGZ package from the Cornelis Customer Center, download and extract the package to the management node.
Download and install the OPX Software package:
Create a repository folder or confirm that the following already exists.
For RHEL:
/etc/yum.repos.d/
For SLES:
/etc/zypp/repos.d/
For Ubuntu:
/etc/apt/sources.list.d/
Create a repository file.
For RHEL:
sudo vi /etc/yum.repos.d/cornelis-repository.repo
For SLES:
sudo vi /etc/zypp/repos.d/cornelis-repository.repo
For Ubuntu:
sudo vi /etc/apt/sources.list.d/cornelis-repository.list
Edit the repository file to add the following content.
Using a local repository and RHEL or SLES TGZ package:
[Cornelis-Package] name=Cornelis Repository baseurl=file:///<file path to local downloaded tar file> autorefresh=1 (SLES ONLY) enable=1 gpgcheck=1
For Ubuntu, perform the following
Import the GPG public key.
sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/cornelis.gpg <file path to local downloaded tar file>/CN5000-Packages-Public-GPG-Key.ascAdd content to the repository (
.list) file.# cat /etc/apt/sources.list.d/cornelis-repository.list deb [trusted=yes] file:<file path to local downloaded tar file>/ ./
Clean and update the package Index.
For RHEL:
sudo dnf clean all sudo dnf makecache
For SLES:
sudo zypper clean sudo zypper refresh
For Ubuntu:
sudo apt clean sudo apt update
Determine the meta package you need to use for installation.
Note
Ubuntu uses dashes in place of underscores in the following list.
cn5000_pkgs_non_gpu_meta - CPU-based installation
cn5000_pkgs_cuda_meta - NVIDIA GPU installation
cn5000_pkgs_rocm_meta - AMD GPU installation
Caution
Installing meta packages will override existing packages. If multiple versions are desired (for example, GPU/non-GPU), additional components can be installed directly using the package manager, but not using the meta packages.
Install the package using one of the following commands.
For RHEL:
sudo dnf install <META_PACKAGE_NAME>For SLES:
sudo zypper install <META_PACKAGE_NAME>Note
If zypper prompts for confirmation, you can select the default to continue.
Note
If a conflict with existing in-distro packages occurs, you will be provided with solutions to choose from to resolve the conflict. Select the solution that replaces the old packages with the new packages (usually Solution 1).
For Ubuntu:
sudo apt install <META_PACKAGE_NAME>Note
If you receive an error related to required dependencies that are not the latest versions available on Ubuntu, Cornelis recommends that you install and use
aptitudein place ofaptin the command.Alternately, you can use
aptorapt-getwith--mark-autoto manually install the dependencies.
Perform the following:
For RHEL and SLES, continue to the next step to download, build, and install the kmod-opxs-kernel-updates packages.
For Ubuntu, reboot the server to finalize the installation, then go to Step 13 to install the software to each remaining server.
Note
The opxs-modules-dkms packages are already installed with the Ubuntu meta package.
Download, build, and install the kmod-opxs-kernel-updates packages (RHEL and SLES only):
Copy the relevant RPM source file (src.rpm) to a local directory.
Important
Ensure you are using the correct src.rpm file to build and install the version of the opx-kernel-updates relevant to your hardware configuration (AMD GPU, NVIDIA GPU, or CPU-only). Three opxs-kernel-updates RPM source files are provided, one each for AMD GPU, NVIDIA GPU, and CPU-only.
RHEL Only: To rebuild the source RPM, you must install
kernel-rpm-macros(for AMD GPUs) andkernel-abi-stablelists.SLES Only: You need to build
unifdefbefore running rpmbuild:cd /lib/modules/$(uname -r)/source/scripts gcc -o unifdef unifdef.c
Build the source RPM.
rpmbuild --rebuild --define "_topdir <local_directory>" --define 'dist %{nil}' --target x86_64 --define 'kver $(uname -r)' <src.rpm_file>Verify the RPMs are built by changing directories to the
<local_directory>/RPMS/x86_64directory and runningls -1.Output should be:
kmod-opxs-kernel-updates-<build_version>.x86_64.rpm opxs-kernel-updates-devel-<build_version>.x86_64.rpm
Install the new RPMs.
For RHEL:
sudo dnf install kmod-opxs-kernel-updates-<build_version>.x86_64.rpm opxs-kernel-updates-devel-<build_version>.x86_64.rpmFor SLES:
zypper --no-gpg-checks install opxs-kernel-updates-devel-<build_version>.x86_64.rpm opxs-kernel-updates-kmp-default-<build_version>-50.x86_64.rpm
Reboot the primary node to apply changes.
Install the OPX Software to each remaining server:
Copy the Repository folder, the tar file (if using a local repository), and the
<local_directory>containing the newly built RPMs, to each remaining server.Important
The folders and files must be in the same directories as was in the management node.
Extract the TGZ package.
Install the software on the remaining nodes using the commands in steps 6 and 11.
Reboot the rest of the nodes to apply changes.
After installation, disable the repository on every node using one of the following methods:
For RHEL, run the following commands:
sudo dnf config-manager --set-disabled <repo>
For RHEL and SLES, in the repository file, change
enableto0.For Ubuntu, in the repository (
.list) file, comment out the following line:#deb [trusted=yes] file:/<file path to local downloaded tar file>/ ./
Host Channel Adapter - This network fabric interconnect provides a computer with port connections to other devices.
Remote Direct Memory Access - This technology enables two networked computers to exchange data in main memory without CPU intervention.
OpenFabrics Enterprise Distribution - This is open-source software for RDMA and kernel bypass applications.