Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ggml-org/ggml/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- CMake 3.14 or later
- A C11 / C++17 compiler: GCC, Clang, or MSVC
- Git
Basic build
ggml library and all examples in build/bin/. The
default build targets the CPU backend with native ISA optimizations enabled.
Backend builds
- CUDA
- Metal
- Vulkan
- HIP (AMD)
Requires the NVIDIA CUDA Toolkit (tested with CUDA 11.x and 12.x) and a
compatible NVIDIA GPU. Ensure
nvcc is on your PATH.| Flag | Default | Description |
|---|---|---|
GGML_CUDA_FORCE_MMQ | OFF | Use MMQ kernels instead of cuBLAS |
GGML_CUDA_FORCE_CUBLAS | OFF | Always use cuBLAS instead of MMQ kernels |
GGML_CUDA_FA | ON | Compile FlashAttention CUDA kernels |
GGML_CUDA_GRAPHS | OFF | Enable CUDA graph capture (llama.cpp) |
GGML_CUDA_NO_VMM | OFF | Disable CUDA virtual memory management |
General CMake options
These options apply to all build configurations.| Flag | Default | Description |
|---|---|---|
GGML_STATIC | OFF | Static link libraries |
GGML_NATIVE | ON | Optimize for the host CPU (enables AVX2, etc.) |
GGML_LTO | OFF | Enable link-time optimization |
GGML_CCACHE | ON | Use ccache if available |
GGML_BACKEND_DL | OFF | Build backends as dynamic libraries |
BUILD_SHARED_LIBS | ON | Build shared instead of static libraries |
GGML_OPENMP | ON | Use OpenMP for CPU multi-threading |
CPU instruction set options
When
GGML_NATIVE=ON (the default), the compiler detects and enables all
supported ISA extensions automatically. Set individual flags only when
cross-compiling or targeting a specific baseline.| Flag | Description |
|---|---|
GGML_AVX | Enable AVX |
GGML_AVX2 | Enable AVX2 |
GGML_AVX512 | Enable AVX-512F |
GGML_FMA | Enable FMA |
GGML_F16C | Enable F16C |
GGML_SSE42 | Enable SSE 4.2 |
GGML_BMI2 | Enable BMI2 |
