Skip to main content
The Metal backend runs tensor operations on Apple GPUs using Metal compute shaders. It is the recommended backend for macOS and provides the best performance on Apple Silicon (M-series) chips as well as AMD GPUs on Intel Macs.

Requirements

  • macOS 13.0 (Ventura) or later
  • Apple Silicon (M1/M2/M3/M4) or AMD GPU on an Intel Mac
  • Xcode Command Line Tools

Build

cmake -B build -DGGML_METAL=ON
cmake --build build
On macOS, Metal is detected automatically and DGGML_METAL=ON may already be the default in upstream build configurations. Check your CMakeCache.txt to confirm.
Useful CMake options:
OptionDefaultDescription
GGML_METAL=ONOFF (non-Apple)Enable the Metal backend
GGML_METAL_EMBED_LIBRARY=ONOFFEmbed the Metal shader library into the binary
GGML_METAL_SHADER_DEBUG=ONOFFCompile shaders with debug info

Initialization

#include "ggml-metal.h"

ggml_backend_t backend = ggml_backend_metal_init();
if (!backend) {
    fprintf(stderr, "failed to initialize Metal backend\n");
    return 1;
}
Alternatively, use the generic backend selector:
ggml_backend_load_all();
ggml_backend_t backend = ggml_backend_init_best();
// On macOS with a supported GPU, this returns the Metal backend

Checking GPU family support

Metal devices are organised into feature families. You can query whether the device supports a specific family before using features that depend on it:
// Check for Apple7 family (A15/M2 and newer)
if (ggml_backend_metal_supports_family(backend, 7)) {
    // Features available: BFloat16 accumulation, etc.
}
Refer to Apple’s Metal Feature Set Tables for the capabilities of each family.

Abort callback

Register a callback to cancel a Metal compute pass early:
bool my_abort(void * user_data) {
    return should_cancel;
}

ggml_backend_metal_set_abort_callback(backend, my_abort, NULL);

GPU capture

To capture a Metal compute pass with Xcode Instruments, call this before executing the graph you want to capture:
ggml_backend_metal_capture_next_compute(backend);
ggml_backend_graph_compute(backend, graph);
// The captured frame appears in the GPU timeline in Instruments
GPU captures require running the process from Xcode or with Metal capture enabled in the scheme. Use this during development to profile shader performance.

Using Metal with the scheduler

Metal works with ggml_backend_sched_t the same way any other backend does. Add a CPU backend as a fallback for operations not yet implemented in Metal:
ggml_backend_t metal = ggml_backend_metal_init();
ggml_backend_t cpu   = ggml_backend_cpu_init();

ggml_backend_t backends[2] = { metal, cpu };
ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends, NULL, 2, GGML_DEFAULT_GRAPH_SIZE, false, true
);

API summary

FunctionDescription
ggml_backend_metal_init()Create a Metal backend instance
ggml_backend_is_metal(backend)Check whether a backend is the Metal backend
ggml_backend_metal_supports_family(backend, family)Query GPU feature family support
ggml_backend_metal_set_abort_callback(backend, cb, data)Register an abort callback
ggml_backend_metal_capture_next_compute(backend)Capture the next compute pass for Xcode profiling
ggml_backend_metal_reg()Return the Metal backend registry entry