POSIX CPU Timers TOCTOU race (CVE-2025-38352)

Tip

Jifunze na fanya mazoezi ya AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Jifunze na fanya mazoezi ya GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE) Jifunze na fanya mazoezi ya Azure Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks

Ukurasa huu unaelezea hali ya TOCTOU race katika Linux/Android POSIX CPU timers ambayo inaweza kuharibu timer state na kusababisha kernel crash, na katika baadhi ya mazingira inaweza kuelekezwa kuelekea privilege escalation.

  • Sehemu iliyoathiriwa: kernel/time/posix-cpu-timers.c
  • Primitive: expiry vs deletion race wakati task inapoondoka
  • Inategemea usanidi: CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (IRQ-context expiry path)

Muhtasari mfupi wa internals (relevant for exploitation)

  • Saa tatu za CPU zinaendesha kuhesabu kwa timers kupitia cpu_clock_sample():
  • CPUCLOCK_PROF: utime + stime
  • CPUCLOCK_VIRT: utime only
  • CPUCLOCK_SCHED: task_sched_runtime()
  • Uundaji wa timer huunganisha timer na task/pid na kuanzisha timerqueue nodes:
static int posix_cpu_timer_create(struct k_itimer *new_timer) {
struct pid *pid;
rcu_read_lock();
pid = pid_for_clock(new_timer->it_clock, false);
if (!pid) { rcu_read_unlock(); return -EINVAL; }
new_timer->kclock = &clock_posix_cpu;
timerqueue_init(&new_timer->it.cpu.node);
new_timer->it.cpu.pid = get_pid(pid);
rcu_read_unlock();
return 0;
}
  • Arming inaingiza kwenye per-base timerqueue na inaweza kusasisha next-expiry cache:
static void arm_timer(struct k_itimer *timer, struct task_struct *p) {
struct posix_cputimer_base *base = timer_base(timer, p);
struct cpu_timer *ctmr = &timer->it.cpu;
u64 newexp = cpu_timer_getexpires(ctmr);
if (!cpu_timer_enqueue(&base->tqhead, ctmr)) return;
if (newexp < base->nextevt) base->nextevt = newexp;
}
  • Njia ya haraka inazuia usindikaji wa gharama kubwa isipokuwa kumbukumbu za kuisha zilizo kwenye cache zinaonyesha uwezekano wa kutokea:
static inline bool fastpath_timer_check(struct task_struct *tsk) {
struct posix_cputimers *pct = &tsk->posix_cputimers;
if (!expiry_cache_is_inactive(pct)) {
u64 samples[CPUCLOCK_MAX];
task_sample_cputime(tsk, samples);
if (task_cputimers_expired(samples, pct))
return true;
}
return false;
}
  • Kuisha (Expiration) hukusanya timers zilizokwisha, huziweka alama kuwa tayari kutolewa, huviondoa kwenye queue; utoaji halisi umecheleweshwa:
#define MAX_COLLECTED 20
static u64 collect_timerqueue(struct timerqueue_head *head,
struct list_head *firing, u64 now) {
struct timerqueue_node *next; int i = 0;
while ((next = timerqueue_getnext(head))) {
struct cpu_timer *ctmr = container_of(next, struct cpu_timer, node);
u64 expires = cpu_timer_getexpires(ctmr);
if (++i == MAX_COLLECTED || now < expires) return expires;
ctmr->firing = 1;                           // critical state
rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing);
}
return U64_MAX;
}

Hali mbili za usindikaji wa kumalizika

  • CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y: kumalizika kunacheleweshwa kupitia task_work kwenye task lengwa
  • CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n: kumalizika inashughulikiwa moja kwa moja katika muktadha wa IRQ
Njia za kukimbia za POSIX CPU timer ```c void run_posix_cpu_timers(void) { struct task_struct *tsk = current; __run_posix_cpu_timers(tsk); } #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK static inline void __run_posix_cpu_timers(struct task_struct *tsk) { if (WARN_ON_ONCE(tsk->posix_cputimers_work.scheduled)) return; tsk->posix_cputimers_work.scheduled = true; task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME); } #else static inline void __run_posix_cpu_timers(struct task_struct *tsk) { lockdep_posixtimer_enter(); handle_posix_cpu_timers(tsk); // IRQ-context path lockdep_posixtimer_exit(); } #endif ```

Katika njia ya IRQ-context, orodha inayotekelezwa inashughulikiwa nje ya sighand

Njia ya kushughulikia IRQ-context ```c static void handle_posix_cpu_timers(struct task_struct *tsk) { struct k_itimer *timer, *next; unsigned long flags, start; LIST_HEAD(firing); if (!lock_task_sighand(tsk, &flags)) return; // may fail on exit do { start = READ_ONCE(jiffies); barrier(); check_thread_timers(tsk, &firing); check_process_timers(tsk, &firing); } while (!posix_cpu_timers_enable_work(tsk, start)); unlock_task_sighand(tsk, &flags); // race window opens here list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) { int cpu_firing; spin_lock(&timer->it_lock); list_del_init(&timer->it.cpu.elist); cpu_firing = timer->it.cpu.firing; // read then reset timer->it.cpu.firing = 0; if (likely(cpu_firing >= 0)) cpu_timer_fire(timer); rcu_assign_pointer(timer->it.cpu.handling, NULL); spin_unlock(&timer->it_lock); } } ```

Root cause: TOCTOU kati ya IRQ-time expiry na concurrent deletion chini ya task exit Preconditions

  • CONFIG_POSIX_CPU_TIMERS_TASK_WORK imezimwa (IRQ path in use)
  • The target task iko exiting lakini haijareaped kabisa
  • Thread nyingine kwa wakati mmoja inaita posix_cpu_timer_del() kwa timer ile ile

Sequence

  1. update_process_times() inasababisha run_posix_cpu_timers() katika context ya IRQ kwa task inayokuwa exiting.
  2. collect_timerqueue() inaweka ctmr->firing = 1 na kuhamisha timer kwenye temporary firing list.
  3. handle_posix_cpu_timers() inaangusha sighand kupitia unlock_task_sighand() ili deliver timers nje ya lock.
  4. Mara tu baada ya unlock, task iliyoko exiting inaweza kureaped; thread mwenzake inaendesha posix_cpu_timer_del().
  5. Katika kipindi hiki, posix_cpu_timer_del() inaweza kushindwa kupata state kupitia cpu_timer_task_rcu()/lock_task_sighand() na hivyo kuruka normal in-flight guard inayokagua timer->it.cpu.firing. Ufutaji unaendelea kana kwamba haiko firing, ukaharibu state wakati expiry inashughulikiwa, na kusababisha crashes/UB.

Why TASK_WORK mode is safe by design

  • Kwa CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, expiry inacheleweshwa hadi task_work; exit_task_work inaendesha kabla ya exit_notify, hivyo IRQ-time overlap na reaping haijatokea.
  • Hata hivyo, ikiwa task tayari iko exiting, task_work_add() inashindwa; gating on exit_state inafanya mode zote mbili ziwe consistent.

Fix (Android common kernel) and rationale

  • Ongeza early return ikiwa current task iko exiting, kuzuia usindikaji wote:
// kernel/time/posix-cpu-timers.c (Android common kernel commit 157f357d50b5038e5eaad0b2b438f923ac40afeb)
if (tsk->exit_state)
return;
  • Hii inazuia kuingia kwenye handle_posix_cpu_timers() kwa exiting tasks, ikiondoa dirisha ambapo posix_cpu_timer_del() inaweza kukikosa it.cpu.firing na kushindana na usindikaji wa kumalizika.

Impact

  • Uharibifu wa kumbukumbu ya kernel wa miundo ya timer wakati wa kumalizika/kufutwa kwa wakati mmoja unaweza kusababisha crashes papo hapo (DoS) na ni primitive yenye nguvu kuelekea privilege escalation kutokana na fursa za udhibiti wa hali ya kernel kwa njia yoyote.

Triggering the bug (safe, reproducible conditions) Build/config

  • Hakikisha CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n na tumia kernel isiyokuwa na fix ya exit_state gating.

Runtime strategy

  • Lenga thread ambayo iko karibu kuondoka na uweke CPU timer kwake (kwa kila-thread au saa ya mchakato mzima):
  • Kwa kila-thread: timer_create(CLOCK_THREAD_CPUTIME_ID, …)
  • Kwa mchakato mzima: timer_create(CLOCK_PROCESS_CPUTIME_ID, …)
  • Weka kwa kumalizika kwa awali mfupi sana na kipindi (interval) kidogo ili kuongeza idadi ya entry za IRQ-path:
static timer_t t;
static void setup_cpu_timer(void) {
struct sigevent sev = {0};
sev.sigev_notify = SIGEV_SIGNAL;    // delivery type not critical for the race
sev.sigev_signo = SIGUSR1;
if (timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &t)) perror("timer_create");
struct itimerspec its = {0};
its.it_value.tv_nsec = 1;           // fire ASAP
its.it_interval.tv_nsec = 1;        // re-fire
if (timer_settime(t, 0, &its, NULL)) perror("timer_settime");
}
  • Kutoka kwa thread jirani, futa kwa wakati mmoja timer ile ile wakati thread lengwa inapomaliza:
void *deleter(void *arg) {
for (;;) (void)timer_delete(t);     // hammer delete in a loop
}
  • Race amplifiers: kiwango cha juu cha ticks za scheduler, mzigo wa CPU, mizunguko ya kuondoka/kurekebisha tena kwa thread. Crash kawaida hujitokeza wakati posix_cpu_timer_del() inapopitisha kutambua firing kutokana na kushindwa kwa task lookup/locking mara tu baada ya unlock_task_sighand().

Detection and hardening

  • Mitigation: apply the exit_state guard; prefer enabling CONFIG_POSIX_CPU_TIMERS_TASK_WORK when feasible.
  • Observability: add tracepoints/WARN_ONCE around unlock_task_sighand()/posix_cpu_timer_del(); alert when it.cpu.firing==1 is observed together with failed cpu_timer_task_rcu()/lock_task_sighand(); watch for timerqueue inconsistencies around task exit.

Audit hotspots (for reviewers)

  • update_process_times() → run_posix_cpu_timers() (IRQ)
  • __run_posix_cpu_timers() uchaguzi (TASK_WORK vs IRQ path)
  • collect_timerqueue(): inaweka ctmr->firing na kuhamisha nodes
  • handle_posix_cpu_timers(): inatoa sighand kabla ya mzunguko wa firing
  • posix_cpu_timer_del(): inategemea it.cpu.firing kugundua expiry in-flight; ukaguzi huu unapitwa wakati task lookup/lock inashindwa wakati wa exit/reap

Notes for exploitation research

  • Tabia iliyofunuliwa ni primitive ya ku-crash kernel inayotegemewa; kuibadilisha kuwa privilege escalation kawaida inahitaji overlap nyingine inayoweza kudhibitiwa (object lifetime au write-what-where influence) nje ya wigo wa muhtasari huu. Tibu PoC yoyote kama kitu ambacho kinaweza kusababisha kuteguka na itumike tu katika emulators/VMs.

Chronomaly exploit strategy (priv-esc without fixed text offsets)

  • Tested target & configs: x86_64 v5.10.157 under QEMU (4 cores, 3 GB RAM). Critical options: CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n, CONFIG_PREEMPT=y, CONFIG_SLAB_MERGE_DEFAULT=n, DEBUG_LIST=n, BUG_ON_DATA_CORRUPTION=n, LIST_HARDENED=n.
  • Race steering with CPU timers: A racing thread (race_func()) burns CPU while CPU timers fire; free_func() polls SIGUSR1 to confirm if the timer fired. Tune CPU_USAGE_THRESHOLD so signals arrive only sometimes (intermittent “Parent raced too late/too early” messages). If timers fire every attempt, lower the threshold; if they never fire before thread exit, raise it.
  • Dual-process alignment into send_sigqueue(): Parent/child processes try to hit a second race window inside send_sigqueue(). The parent sleeps PARENT_SETTIME_DELAY_US microseconds before arming timers; adjust downward when you mostly see “Parent raced too late” and upward when you mostly see “Parent raced too early”. Seeing both indicates you are straddling the window; success is expected within ~1 minute once tuned.
  • Cross-cache UAF replacement: The exploit frees a struct sigqueue then grooms allocator state (sigqueue_crosscache_preallocs()) so both the dangling uaf_sigqueue and the replacement realloc_sigqueue land on a pipe buffer data page (cross-cache reallocation). Reliability assumes a quiet kernel with few prior sigqueue allocations; if per-CPU/per-node partial slab pages already exist (busy systems), the replacement will miss and the chain fails. The author intentionally left it unoptimized for noisy kernels.

See also

Ksmbd Streams Xattr Oob Write Cve 2025 37947

References

Tip

Jifunze na fanya mazoezi ya AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Jifunze na fanya mazoezi ya GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE) Jifunze na fanya mazoezi ya Azure Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks