Skip to main content

Cornelis Technical Documentation

7.3.2. Software/Firmware Maintenance

This section describes the software and firmware maintenance including upgrades, updates, and installation of new modules.

7.3.2.1. Download the Firmware

Download the firmware or software using the following procedures.

  1. Using a web browser, go to the Cornelis Customer Center.

  2. Under Download Library, clear the navigation filters.

  3. In the search box, enter your search string (for example, "firmware").

    The results are displayed.

  4. Select one or more items and click Download Selected.

  5. Review the Software License Agreement(s) and click Accept for each item.

    The firmware is saved to your computer.

7.3.2.2. Install the SuperNIC Firmware Update Tool

7.3.2.2.1. Prerequisites
  • The OPX Software (containing updateAgent dependencies) has been installed on the target server.

7.3.2.2.2. Procedure

To install the SuperNIC Firmware Update Tool, perform the following steps.

  1. Download and extract the Firmware Update TGZ package (contains updateAgent) from the Cornelis Customer Center.

  2. Copy the updateAgent binary to the root home (/root) on the target server containing the SuperNIC you want to update.

7.3.2.3. Update the SuperNIC Firmware

Use the SuperNIC Firmware Update Tool to update your SuperNIC firmware.

7.3.2.3.1. Prerequisites
  • The updateAgent must be copied to the server containing the SuperNIC you want to update.

  • The hfi1 driver must be loaded before using the updateAgent.

    If not, you will see errors like "Failed to open MAD port for HFI 0" and "Failed to build HFI list".

7.3.2.3.2. Procedure
  1. Obtain the SuperNIC firmware (CN5000_SuperNICFirmware-<version>.pkg) from the Cornelis Customer Center and copy the file to the server containing the SuperNIC you want to update. 

  2. Verify the hfi1 driver is loaded and working correctly.

    • lsmod | grep hfi1 should return results

    • opainfo should have entries for all SuperNICs in the system

      If needed, load the hfi1 driver using modprobe hfi1, then recheck opainfo.

  3. Check the current SuperNIC firmware version.

    ./updateAgent -V
    HFI hfi1_0 activeComponentImageSetVersionString: <current version>
    

    Note

    You can use ./updateAgent -V -d all to display all of the SuperNICs on a server.

  4. If the Fabric Manager is running anywhere in the fabric, disable the SuperNIC port.

    opaportconfig disable -h1 -p2

    Alternately, you can stop the Fabric Manager.

    systemctl stop opafm
  5. Update the SuperNIC.

    ./updateAgent /path/to/firmware.pkg

    Note

    • To update all SuperNICs on a server, you can use:

      ./updateAgent -d all /path/to/firmware.pkg
    • To update all SuperNICs in a fabric, you can use tools such a pdsh command as shown in the following example:

      pdsh -w <hostfile> updateAgent -d all /path/to/firmware.pkg
  6. Check the current SuperNIC firmware version and verify that the new version status is pendingComponentImageSetVersionString.

    ./updateAgent -V
    HFI hfi1_0 activeComponentImageSetVersionString: <old version>
    HFI hfi1_0 pendingComponentImageSetVersionString: 
  7. Power cycle the server.

  8. Check the current SuperNIC firmware version again and verify that the status is activeComponentImageSetVersionString.

    ./updateAgent -V
    HFI hfi1_0 activeComponentImageSetVersionString: <new version>

    If the firmware is still pending, power cycle the server using BMC.

  9. If you stopped the Fabric Manager, restart it.

    systemctl start opafm

7.3.2.4. Update the Switch Firmware

If you are updating both BMC and ASIC firmware, you must update the BMC firmware first.

Note

  • In the following instructions for the Pull Method, user@hostname implies DNS is configured on the switch. If using static IP addresses, replace this text with the IP address of the switch containing the .pkg file.

  • When using firmware update at the switch CLI to transfer (“pull”) firmware from a remote server, you’ll be prompted for the remote server password.

  • When using SCP on a remote server to transfer (“push”) firmware to the switch, you’ll be prompted for the switch password.

  • It may take up to 20 minutes (for CN5000 Switch) or 40 minutes (for DCS) for a firmware update to complete.

Perform the following steps to update your switch firmware.

Note

If you are updating multiple switches, repeat these steps for each switch.

  1. Download and extract the target Switch firmware package files (BMC Firmware and/or Switch Firmware) from the Cornelis Customer Center onto a server on the same Ethernet network as the switch to be updated.

Updating the BMC Firmware

Note

During the initial installation of this new version (after the forced reboot) or when inserting new boards, the BMCs may require multiple updates and reboots (up to 2) to synchronize the firmware across the entire DCS. After this initial synchronization, future updates should only require a single final reboot to apply. As long as the ASIC is off, this process should be automatic.

  1. Run the firmware update command to begin the update process.

    • Pull Method: If not already logged in, log into the switch using the admin account. Specify the user@hostname:/path/to/file.pkg path. Enter the password of the host when prompted.

      admin@CNEdge -> firmware update user@hostname:/path/to/ CN5000_BMCFirmware-<version>.pkg
      root@hostname's password:
      Copying firmware image to staging area...
      Firmware update started. Wait (up to 20 minutes), check status with "firmware update -s", and initiate a reboot when ready
    • Push Method: Specify the admin@switchName:/tmp/images destination path. Enter the switch password if/when prompted.

      [user@servername ~]# scp -O root/fw/CN5000_BMCFirmware-<version>.pkg admin@switchname:/tmp/images
      admin@switchname's password:
      Copying firmware image to staging area...
      Firmware update started. Wait (up to 20 minutes), check status with "firmware update -s", and initiate a ‘reboot force’ when ready
Updating the Switch Firmware (ASIC)
  1. Run the firmware update command to begin the update process.

    • Pull Method: If not already logged in, log into the switch using the admin account. Specify the user@hostname:/path/to/file.pkg path. Enter the password of the host when prompted.

      admin@CNEdge -> firmware update user@hostname:/path/to/ CN5000_SwitchFirmware-<version>.pkg
      root@hostname's password:
      Copying firmware image to staging area...
      Firmware update started. Wait (up to 20 minutes), check status with "firmware update -s", and initiate a 'reboot -f' when ready
    • Push Method: Specify the admin@switchName:/tmp/images destination path. Enter the switch password if/when prompted.

      [user@servername ~]# scp -O root/fw/CN5000_SwitchFirmware -<version>.pkg admin@switchname:/tmp/images
      admin@switchname's password:
      Copying firmware image to staging area...
      Firmware update started. Wait (up to 20 minutes), check status with "firmware update -s", and initiate a ‘reboot force’ when ready
Completing the Update
  1. Check the status of the update using firmware update -s.

    admin@CNEdge -> firmware update -s
    BMC:
        Image 1: Booted and Active
        Image 2: Currently updating
    ASIC A:
        Image 1: Booted and Active
        Image 2: Currently updating

    changes to

    admin@CNEdge -> firmware update -s
    BMC:
        Image 1: Booted and Active
        Image 2: Staged for update
    ASIC A:
        Image 1: Booted and Active
        Image 2: Staged for update

    When the firmware shows Staged for update, the switch is ready for reboot.

  2. Reboot the switch.

    admin@CNEdge -> reboot force
    Rebooting in 1 second(s)Lost Communication with server
    Connection to <hostname> closed by remote host.
    Connection to <hostname> closed.

    Note

    If you try to reboot BEFORE the firmware is in Staged for update, you will receive the following error:

    Error during firmware update, rebooting now could be dangerous. Are you sure you wish to continue?

    Type no and wait for the status to change.

After updating the switch firmware, check the versions.

admin@CNEdge -> firmware version
Firmware Versions: 

Switch BMC Chip version: <current version>
ASIC Chip A version: <current version>

7.3.2.6. Remove OPX Software from a Host

If you need to remove the OPX Software from a host due to an issue with an OS upgrade, perform the following steps.

  1. Uninstall the software using the command.

    • Using RHEL:

      sudo dnf remove <META_PACKAGE_NAME>
    • Using SLES:

      sudo zypper remove <META_PACKAGE_NAME>
    • Using Ubuntu:

      sudo apt remove <META_PACKAGE_NAME>
  2. Remove kernel packages.

    • Using RHEL:

      sudo dnf remove kmod-opxs-kernel-updates opxs-kernel-updates-devel
    • Using SLES:

      sudo zypper remove kmod-opxs-kernel-updates opxs-kernel-updates-devel
    • Using Ubuntu:

      sudo apt remove opxs-modules-dkms*
  3. Remove residual packages.

    • Using RHEL:

      sudo dnf remove opa-*
      sudo dnf remove libfabric-*
      sudo dnf remove libpsm2-*
    • Using SLES:

      sudo zypper remove opa-* 
      sudo zypper remove libfabric-* 
      sudo zypper remove libpsm2-*
    • Using Ubuntu:

      sudo apt remove opa-*
      sudo apt remove libfabric-*
      
  4. Reboot.