sched/deadline: Fix 'stuck' dl_server

Andrea reported the dl_server getting stuck for him. He tracked it
down to a state where dl_server_start() saw dl_defer_running==1, but
the dl_server's job is no longer valid at the time of
dl_server_start().

In the state diagram this corresponds to [4] D->A (or dl_server_stop()
due to no more runnable tasks) followed by [1], which in case of a
lapsed deadline must then be A->B.

Now our A has dl_defer_running==1, while B demands
dl_defer_running==0, therefore it must get cleared when the CBS wakeup
rules demand a replenish.

Fixes: a110a81c52 ("sched/deadline: Deferrable dl server")
Reported-by: Andrea Righi arighi@nvidia.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Andrea Righi arighi@nvidia.com
Link: https://lkml.kernel.org/r/20260123161645.2181752-1-arighi@nvidia.com
Link: https://patch.msgid.link/20260130124100.GC1079264@noisy.programming.kicks-ass.net
This commit is contained in:
Peter Zijlstra
2026-01-30 13:41:00 +01:00
parent 63804fed14
commit 1151354225

View File

@@ -1034,6 +1034,12 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
return;
}
/*
* When [4] D->A is followed by [1] A->B, dl_defer_running
* needs to be cleared, otherwise it will fail to properly
* start the zero-laxity timer.
*/
dl_se->dl_defer_running = 0;
replenish_dl_new_period(dl_se, rq);
} else if (dl_server(dl_se) && dl_se->dl_defer) {
/*
@@ -1655,6 +1661,12 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
* dl_server_active = 1;
* enqueue_dl_entity()
* update_dl_entity(WAKEUP)
* if (dl_time_before() || dl_entity_overflow())
* dl_defer_running = 0;
* replenish_dl_new_period();
* // fwd period
* dl_throttled = 1;
* dl_defer_armed = 1;
* if (!dl_defer_running)
* dl_defer_armed = 1;
* dl_throttled = 1;