Go Meets PHP: Enhancing Your PHP Applications with Go via FFI

Use PHP’s FFI to supercharge your application by offloading compute-heavy work to Go!

As an interpreted language, PHP has inherent performance limitations, especially when it comes to CPU-bound tasks. Go, on the other hand, is a compiled language known for its speed and efficiency. By leveraging PHP’s Foreign Function Interface (FFI), we can call Go functions from PHP via a shared C layer and achieve significant performance improvements in the right scenarios.

Before We Start

There are a few caveats to keep in mind:

This approach only benefits CPU-bound tasks — it won’t help with I/O-bound operations like database queries or API calls.
FFI adds overhead. For simple tasks, PHP may still be faster despite Go’s raw speed.
We’re using Go’s C bindings, which add an extra layer. For the absolute best performance, writing in C directly is faster.
Cross-platform support can be tricky — you’ll need to compile your Go shared library separately for each target platform and architecture.
Memory management between PHP and Go requires care — you need to handle allocation and freeing of memory correctly on both sides.

That said, for the right use cases, this technique can be extremely powerful without the complexity of writing low-level C code.

Hello World!

No tutorial would be complete without a “Hello, World!” example — but let’s skip the static string and jump straight into a personalized greeting.

In Go, it’s as simple as:

package main

import "fmt"

func HelloWorld(name string) {
	fmt.Printf("Hello %s!\n", name)
}

Calling it is straightforward:

	HelloWorld("Dominik")

Which prints: Hello Dominik!

To make this callable from PHP, we’ll need to export it as a C function. Here's a basic binding:

package main

import "C"
import (
	"fmt"
)

//export HelloWorldC
func HelloWorldC(name *C.char) {
	result := C.GoString(name)
	fmt.Printf("Hello %s!\n", result)
}

However, mixing conversion and logic can get messy. A cleaner approach is to separate concerns:

package main

import "C"
import (
	"fmt"
)

//export HelloWorld
func HelloWorld(name *C.char) {
	HelloWorldGo(C.GoString(name))
}

func HelloWorldGo(name string) {
	fmt.Printf("Hello %s!\n", name)
}

func main() {}

Now we have a clear boundary: HelloWorld handles data conversion, and HelloWorldGo contains the business logic.

The //export comment is essential — without it, Go won’t export the function. You also need an empty main() function to satisfy the Go compiler when building shared libraries in the main package.

Build it with:

go build -buildmode=c-shared -o hello.so hello.go

This generates two files: hello.so and hello.h, both of which we’ll need on the PHP side.

Wiring It Up in PHP

Create an FFI instance in PHP:

<?php

$ffi = FFI::cdef(
    file_get_contents(__DIR__ . '/hello.h'),
    __DIR__ . '/hello.so',
);

However, PHP uses a non-standard C header parser, so we’ll need to trim hello.h to just this:

extern void HelloWorld(char* name);

Once that’s done, you can call it directly:

$ffi->HelloWorld("Dominik");

Which outputs: Hello Dominik!

The FFI Overhead

Before we dive deeper, let’s compare the performance of this FFI approach against a native PHP function. For simple functions like this, the FFI overhead is significant, and using Go wouldn’t make much sense.

Running the following code, we compare the performance of calling the Go function via FFI a thousand times versus calling a native PHP function:

<?php

$ffi = FFI::cdef(
    file_get_contents(__DIR__ . '/hello.h'),
    __DIR__ . '/hello.so',
);

function HelloWorld(string $name): void
{
    echo "Hello {$name}!", PHP_EOL;
}

$start = microtime(true);
for ($i = 0; $i < 1000; $i++) {
    $ffi->HelloWorld("Dominik");
}
$end = microtime(true);

$timeGo = $end - $start;

$start = microtime(true);
for ($i = 0; $i < 1000; $i++) {
    HelloWorld("Dominik");
}
$end = microtime(true);
$timePhp =  $end - $start;

echo "Go version took {$timeGo} seconds.", PHP_EOL;
echo "PHP version took {$timePhp} seconds.", PHP_EOL;

The results:

Go version took 0.51009082794189 seconds.
PHP version took 0.0016758441925049 seconds.

As you can see, the Go version is much slower here — over 300 times slower than native PHP. That’s not because Go is slow, but because FFI incurs a high cost per call. Each of those 1,000 calls crosses the PHP–C–Go boundary.

Now let’s move the loop inside Go to reduce the number of boundary crossings. Here’s the updated Go function:

func HelloWorldGo(name string) {
	for range 1000 {
		fmt.Printf("Hello, %s!\n", name)
	}
}

And an equivalent PHP function for fairness:

function HelloWorld(string $name): void
{
    for ($i = 0; $i < 1000; $i++) {
        echo "Hello {$name}!", PHP_EOL;
    }
}

The results now look very different:

Go version took 0.0031590461730957 seconds.
PHP version took 0.012860059738159 seconds.

This time, the Go version is clearly faster. Why? Because we’ve reduced the number of PHP–FFI–Go context switches from 1,000 down to just 1. This highlights the most important performance tip when using FFI: minimize the number of boundary crossings. Let Go do as much as possible once you’re there.

Fibonacci

Now that we’ve seen how performance improves with fewer context switches, let’s try something that’s inherently CPU-bound: calculating the nth number in the Fibonacci sequence. We’ll stick with a naive recursive implementation to keep things simple (and CPU-intensive).

Here’s the Go version:

//export Fibonacci
func Fibonacci(n C.int) C.int {
	return C.int(fibonacciGo(int(n)))
}

func fibonacciGo(n int) int {
	if n <= 1 {
		return n
	}
	return fibonacciGo(n-1) + fibonacciGo(n-2)
}

And here’s the equivalent PHP version:

function fibonacci(int $n): int
{
    if ($n <= 1) {
        return $n;
    }

    return fibonacci($n - 1) + fibonacci($n - 2);
}

To benchmark both implementations:

<?php

$ffi = FFI::cdef(
    file_get_contents(__DIR__ . '/hello.h'),
    __DIR__ . '/hello.so',
);

function fibonacci(int $n): int
{
    if ($n <= 1) {
        return $n;
    }

    return fibonacci($n - 1) + fibonacci($n - 2);
}

$start = microtime(true);
$result = $ffi->Fibonacci(35);
$end = microtime(true);
$time = $end - $start;

echo "Go result: {$result}. It took {$time} seconds to compute.", PHP_EOL;

$start = microtime(true);
$result = fibonacci(35);
$end = microtime(true);
$time = $end - $start;

echo "PHP result: {$result}. It took {$time} seconds to compute.", PHP_EOL;

The output:

Go result: 9227465. It took 0.041604042053223 seconds to compute.
PHP result: 9227465. It took 3.975930929184 seconds to compute.

Same result, but Go is almost 100 times faster. And the difference gets even more dramatic with larger inputs. Here’s what happens with fibonacci(40):

Go result: 102334155. It took 0.39231300354004 seconds to compute.
PHP result: 102334155. It took 44.720011949539 seconds to compute.

That’s nearly 45 seconds for PHP versus less than half a second for Go. It’s a striking example of why you’d want to offload compute-heavy tasks to Go via FFI.

Where It Makes Sense

Some potential real-world use cases:

Sorting large in-memory datasets
Matrix operations and other complex math
Cryptographic algorithms not natively supported by PHP (e.g., BLAKE3)
Custom sorters (e.g., geo distance, radix sort)
Compression formats unsupported by PHP extensions
Working with XLS files (via Go libraries)
Concurrent workloads

Concurrent Work

Let’s now explore one of Go’s major strengths: concurrency. As an example, imagine a user uploads multiple images and your application needs to generate thumbnails for them. We’ll simulate the image processing step using time.Sleep to represent a long-running operation.

Here’s a simplified image processing function in Go:

func ResizeImage(path string) error {
	time.Sleep(300 * time.Millisecond)

	if rand.Int()%2 == 0 {
		return errors.New("test")
	}

	return nil
}

In Go, returning an error is a common idiom. Returning nil (similar to null in other languages) indicates success.

Now let’s look at the function we’ll be calling from PHP:

func ResizeImagesGo(paths []string) []string {
	var waitGroup sync.WaitGroup // create a wait group - once it's empty, everything has been processed
	var mutex sync.Mutex         // a mutex to safely write into the failed slice below
	failed := make([]string, 0)  // create a slice that can contain strings and has initial length of zero

	for _, path := range paths { // iterate over all paths
		path := path     // this recreates the path variable inside the current scope to avoid race conditions
		waitGroup.Add(1) // add one to the wait group
		go func() {      // run this in a goroutine (similar to threads in other languages)
			defer waitGroup.Done() // after this function finishes, waitGroup.Done() will be called
			err := ResizeImage(path)
			if err != nil { // if we have an error
				mutex.Lock()                  // lock the mutex to make sure only one goroutine is writing to the failed slice
				failed = append(failed, path) // add a new path to the list of failed paths
				mutex.Unlock()                // unlock the mutex so that any other goroutine can lock it again
			}
		}()
	}

	waitGroup.Wait() // wait until all wait groups are done

	return failed
}

I’ve commented the code heavily, but here’s the high-level flow:

Accept a list of image paths
Process each image in its own goroutine (like a lightweight thread)
Safely track which images failed using a mutex
Wait for all images to finish processing
Return the list of failed paths

Now comes the only messy part — the C binding. Unfortunately, that’s just how FFI works at this level:

//export ResizeImages
func ResizeImages(input **C.char, count C.int, failedOut ***C.char, failedCount *C.int) {
	// because this is a C binding and C doesn't have any nice structures built-in,
	// we have to pass the data as a char[] pointer and provide the count of items as
	// a second parameter

	// to avoid having to create a custom struct, we return the data by having them passed as references
	// the triple asterisk means it's a pointer to char array, the single asterisk means it's a pointer to
	// an integer

	paths := unsafe.Slice(input, int(count)) // we have to make a slice out of the input
	goPaths := make([]string, count)         // create a new Go slice with the correct length
	for i, path := range paths {
		goPaths[i] = C.GoString(path) // convert the C-strings to Go-strings
	}

	failed := ResizeImagesGo(goPaths) // call the Go function and assign the result

	// the parts below are some C-level shenanigans, basically you need to allocate (C.malloc) enough memory
	// to hold the amount of pointers that will be assigned, which is the length of the failed slice
	failedAmount := len(failed)
	ptrSize := unsafe.Sizeof(uintptr(0))
	cArray := C.malloc(C.size_t(failedAmount) * C.size_t(ptrSize))
	cStrs := unsafe.Slice((**C.char)(cArray), failedAmount)

	for i, str := range failed { // iterate over the failed paths
		cStrs[i] = C.CString(str) // and assign it to the C array
	}

	*failedOut = (**C.char)(cArray)    // assign the array to the reference input parameter
	*failedCount = C.int(failedAmount) // assign the count of failed items to the reference input parameter
}

Yes, it’s a bit messy — but that’s standard practice when working with low-level bindings in Go or C. The important part is that we’ve isolated the complexity into this layer. Imagine writing the actual business logic in C — suddenly Go feels a lot more pleasant.

Now, after rebuilding the library, you’ll need to update hello.h to include:

extern void ResizeImages(char** input, int count, char*** failedOut, int* failedCount);

PHP Integration

Let’s now call this function from PHP. Here’s the full example:

<?php

$ffi = FFI::cdef(
    file_get_contents(__DIR__ . '/hello.h'),
    __DIR__ . '/hello.so',
);

$imagePaths = [
    "pathA",
    "pathB",
    "pathC",
    "pathD",
];
$imagesCount = count($imagePaths);

$cArray = FFI::new("char*[" . count($imagePaths) . "]"); // create a new array with fixed size
$buffers = []; // this will just hold variables to prevent PHP's garbage collection

foreach ($imagePaths as $i => $path) {
    $size = strlen($path); // the size to allocate in bytes
    $buffer = FFI::new("char[" . ($size + 1) . "]"); // create a new C string of length +1 to add space for null terminator
    FFI::memcpy($buffer, $path, $size); // copy the content of $path to memory at $buffer with size $size
    $cArray[$i] = FFI::cast("char*", $buffer); // cast it to a C char*, aka a string
    $buffers[] = $buffer; // assigning it to the $buffers array ensures it doesn't go out of scope and PHP cannot garbage collect it
}

$failedOut = FFI::new("char**"); // create a string array in C, this will be passed as reference
$failedCount = FFI::new("int"); // create an integer which will be passed as reference

$start = microtime(true);
$ffi->ResizeImages(
    $cArray,
    count($imagePaths),
    FFI::addr($failedOut),
    FFI::addr($failedCount),
);
$end = microtime(true);
$time = $end - $start;

$count = $failedCount->cdata; // fetch the count of failed items

echo "Failed items: {$count}", PHP_EOL;
for ($i = 0; $i < $count; $i++) {
    echo " - ", FFI::string($failedOut[$i]), PHP_EOL; // cast each item to a php string and print it
}
echo "Processing took: {$time} seconds", PHP_EOL;

Depending on randomness, you’ll see output similar to:

Failed items: 4
 - pathA
 - pathC
 - pathD
 - pathB
Processing took: 0.30362796783447 seconds

Two things to notice:

The failed items are out of order — a clear sign the operations ran in parallel. Each image was processed in its own goroutine and reported failure as soon as it was done.
Total time is around 300 ms — the time it takes to process a single image, despite processing four at once. This shows we achieved true concurrency.

Memory Management

The previous example contains a memory leak — something you typically don’t have to worry about in PHP or Go, since both languages have garbage collectors. But once you introduce C into the mix, you’re responsible for manually managing memory.

Whether this matters depends on how you run your PHP code. If you use the traditional execute-and-die model (e.g. a web server spawns a PHP process that dies at the end of each request), then memory leaks are mostly harmless — the operating system will reclaim all memory when the process exits.

However, if you're using modern alternatives like RoadRunner, Swoole, AMPHP, ReactPHP, or any long-running PHP worker (Symfony Messenger), memory leaks will accumulate across requests and eventually exhaust system memory.

The rule of thumb is simple: if your C glue code allocates memory, you must free it once it’s no longer needed. In our case, both the outer array and the individual strings are allocated in Go:

// C.malloc is a direct allocation of memory
cArray := C.malloc(C.size_t(failedAmount) * C.size_t(ptrSize))

for i, str := range failed {
    // C.CString uses malloc in the background
	cStrs[i] = C.CString(str) 
}

To free this memory in PHP, you can use FFI::free() directly:

echo "Failed items: {$count}", PHP_EOL;
for ($i = 0; $i < $count; $i++) {
    echo " - ", FFI::string($failedOut[$i]), PHP_EOL;
    FFI::free($failedOut[$i]); // free each string after use
}
FFI::free($failedOut); // finally free the array itself

Again, if you're only using short-lived PHP processes, this isn't a concern. But in long-running environments, proper memory management is essential to avoid leaks and unpredictable crashes.

Conclusion

The C-level glue code can be verbose and awkward, but once it’s in place, combining Go and PHP can unlock performance that’s hard to beat — all while keeping most of your code in modern, high-level languages.

What do you think? Would you consider using Go alongside PHP for performance-critical workloads?

Tip: To comment on or like this post, open it on your favourite Fediverse instance (such as Mastodon or Lemmy).