Use this file to discover all available pages before exploring further.
ggml separates the description of a computation graph from its execution. A backend is a pluggable execution target — CPU cores, a CUDA device, Apple Silicon GPU, or a remote machine. You write one graph-building routine and ggml dispatches it to whatever hardware is available.
ggml_backend_t is an opaque handle to an initialized backend instance. It holds an execution stream and is the primary object you pass to graph compute calls.
ggml_backend_buffer_t and ggml_backend_buffer_type_t
Buffers hold the raw memory for tensors. A buffer type (ggml_backend_buffer_type_t) is a descriptor that tells ggml where and how to allocate memory. You get one from a backend and use it to allocate buffers:
Every registered backend exposes one or more ggml_backend_dev_t objects. You can enumerate all available devices at runtime:
ggml_backend_load_all(); // load all compiled-in backendssize_t count = ggml_backend_dev_count();for (size_t i = 0; i < count; i++) { ggml_backend_dev_t dev = ggml_backend_dev_get(i); struct ggml_backend_dev_props props; ggml_backend_dev_get_props(dev, &props); printf("%s: %s (%.1f GB free)\n", props.name, props.description, props.memory_free / 1e9);}
Device types are defined by ggml_backend_dev_type:
Enum value
Meaning
GGML_BACKEND_DEVICE_TYPE_CPU
CPU using system memory
GGML_BACKEND_DEVICE_TYPE_GPU
Discrete GPU with dedicated memory
GGML_BACKEND_DEVICE_TYPE_IGPU
Integrated GPU using host memory
GGML_BACKEND_DEVICE_TYPE_ACCEL
Accelerator used alongside the CPU (e.g. BLAS, AMX)
Convenience initializers select a backend without enumerating devices manually:
// Best available GPU, or CPU if no GPU is foundggml_backend_t backend = ggml_backend_init_best();// First device of a specific typeggml_backend_t cpu = ggml_backend_init_by_type(GGML_BACKEND_DEVICE_TYPE_CPU, NULL);// Backend by name (e.g. "CUDA0", "Metal")ggml_backend_t named = ggml_backend_init_by_name("CUDA0", NULL);
The following is drawn directly from examples/simple/simple-backend.cpp and shows the full lifecycle — backend selection, graph construction, scheduling, and result retrieval.