Known issues¶
EESSI Production Repository (v2023.06)¶
Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
¶
This is an error that occurs with OpenMPI after updating to OFED 23.10.
There is an upstream issue on this problem opened with EasyBuild. See: https://github.com/easybuilders/easybuild-easyconfigs/issues/20233
WorkaroundsYou can instruct OpenMPI to not use libfabric and turn off `uct`(see https://openucx.readthedocs.io/en/master/running.html#running-mpi) by passing the following options to `mpirun`:
Or equivalently, you can set the following environment variables: You may also set these additional environment variables via site-specific Lmod hooks:require("strict")
local hook=require("Hook")
-- Fix Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
function fix_ud_qp_init_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('OMPI_MCA_btl', '^uct,ofi')
setenv('OMPI_MCA_pml', 'ucx')
setenv('OMPI_MCA_mtl', '^ofi')
end
end
local function combined_load_hook(t)
if eessi_load_hook ~= nil then
eessi_load_hook(t)
end
fix_ud_qp_init_openmpi(t)
end
hook.register("load", combined_load_hook)