drm/amdgpu: gate VM CPU HDP flush on reset lock

During GPU reset, the application could still run CPU page table updates. Each commit called
amdgpu_device_flush_hdp(), which on SR-IOV sends work through the KIQ ring.
That can advance sync_seq while the GPU is being reset,
leaving fence writeback out of sync and causing amdgpu_fence_emit_polling()
to time out on later KIQ use.

Fix:
amdgpu_vm_cpu_commit():
  Reset will flush HDP anyway, the HDP flush in amdgpu_vm_cpu_commit() can be skipped
  when a reset is ongoging.
  Take reset_domain->sem with down_read_trylock() before amdgpu_device_flush_hdp().
  If the reset path holds the write lock, skip the HDP flush so no HDP-related HW
  access (including KIQ) runs during reset; state is re-established after reset.

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This commit is contained in:
Chenglei Xie
2026-04-07 10:51:24 -04:00
committed by Alex Deucher
parent 574b3b14f7
commit ddda81c4d7
+11 -1
View File
@@ -21,6 +21,8 @@
*/
#include "amdgpu_vm.h"
#include "amdgpu.h"
#include "amdgpu_reset.h"
#include "amdgpu_object.h"
#include "amdgpu_trace.h"
@@ -108,11 +110,19 @@ static int amdgpu_vm_cpu_update(struct amdgpu_vm_update_params *p,
static int amdgpu_vm_cpu_commit(struct amdgpu_vm_update_params *p,
struct dma_fence **fence)
{
struct amdgpu_device *adev = p->adev;
if (p->needs_flush)
atomic64_inc(&p->vm->tlb_seq);
mb();
amdgpu_device_flush_hdp(p->adev, NULL);
/* A reset flushed the HDP anyway, so that here can be skipped when a reset is ongoing */
if (!down_read_trylock(&adev->reset_domain->sem))
return 0;
amdgpu_device_flush_hdp(adev, NULL);
up_read(&adev->reset_domain->sem);
return 0;
}