How does the Go runtime scheduler work?

Question

Accepted Answer

The Go runtime includes a **scheduler** that multiplexes many goroutines onto a small number of OS threads. This **M:N scheduling** (M goroutines on N OS threads) is what makes goroutines so cheap and Go's concurrency so scalable. Understanding it explains goroutine performance.

## The G-M-P model

```text
G (Goroutine) — your concurrent task (lightweight, ~2KB stack to start)
M (Machine)   — an OS thread (the actual thread the OS schedules)
P (Processor) — a logical processor / scheduling context; holds a queue of runnable Gs
                (the number of P's = GOMAXPROCS, default = number of CPU cores)

The scheduler runs G's on M's, coordinated through P's:
  Each P has a local run queue of goroutines; an M must hold a P to run G's.
```

The runtime maps many goroutines (G) onto OS threads (M) via logical processors (P). There's a P per CPU core by default (`GOMAXPROCS`), each with a queue of goroutines ready to run.

## How it achieves cheap, scalable concurrency

```text
✓ Creating a goroutine is cheap (tiny stack, runtime-managed) → millions feasible
✓ Switching goroutines happens in USER space (runtime), not via expensive OS
  context switches → far lighter than switching OS threads
✓ With P = CPU cores, Go runs goroutines truly in parallel across cores
```

Because the scheduler handles goroutine switching in user space (not kernel-level thread switches), and goroutines have tiny stacks, you can run vastly more goroutines than OS threads, cheaply.

## Key scheduler behaviors

```text
✓ Work stealing — an idle P STEALS goroutines from a busy P's queue → load balancing
✓ Cooperative + preemptive — goroutines yield at function calls/channel ops/blocking;
  since Go 1.14, the scheduler can also PREEMPT long-running goroutines (no starvation)
✓ Blocking handled smartly:
   - A goroutine blocked on a CHANNEL/sync → parked, the M runs other goroutines (no thread wasted)
   - A goroutine blocked on a SYSCALL (e.g. file I/O) → the M blocks, but the P is
     handed to ANOTHER M so other goroutines keep running
```

**Work stealing** keeps cores busy, **preemption** prevents one goroutine from hogging a core, and crucially, when a goroutine blocks (on a channel or syscall), the scheduler keeps other goroutines running instead of wasting the thread.

## GOMAXPROCS

```go
runtime.GOMAXPROCS(4)    // limit to 4 OS threads running Go code simultaneously
// default = number of CPU cores; rarely needs changing
```

## Why goroutines aren't free, though

```text
Goroutines are cheap but not zero-cost — millions still use memory, and excessive
goroutines add scheduling overhead. Bound concurrency (worker pools) for heavy loads.
```

## Why it matters

The Go scheduler is the engine behind goroutines' remarkable cheapness and scalability — understanding its **M:N model** (many goroutines on few OS threads via the G-M-P design) explains *why* Go can run millions of goroutines when OS threads max out in the thousands, and *why* its concurrency is so efficient (user-space switching instead of costly OS context switches, true parallelism across cores via GOMAXPROCS).

The smart handling of blocking (parking channel-blocked goroutines, handing off P on syscalls so threads aren't wasted), **work stealing** for load balancing, and **preemption** (since 1.14) to prevent starvation are what make Go's concurrency robust under real workloads.

While you rarely interact with the scheduler directly, understanding it deepens your intuition for goroutine performance, explains behaviors (why blocking doesn't waste threads, why you can spawn so many goroutines), and informs decisions like bounding concurrency.

It's an advanced topic that demonstrates genuine depth of Go runtime knowledge and is valued in senior interviews.