How PIC extraction works

A plain-language tour of what picblobs does under the hood — and why the bytes it spits out can run anywhere.

The 30-second version

You write ordinary C code. We compile it with some special flags so the resulting machine code doesn't care where it ends up in memory. Then we crack open the compiled file, scoop out just the executable part, and hand you back the raw bytes.

From C source to portable machine code — what picblobs does end-to-end.

Those bytes are portable shellcode — load them at any address in any process and they execute correctly.

Why "position-independent" is the whole point

Programs are normally compiled with the assumption that they own their own address space. The compiler writes literal addresses into the machine code: "jump to address 0x401050", "read from 0x404020". If you copy that code somewhere else, all those numbers point at the wrong things and it crashes immediately.

For shellcode — code you intend to inject into a process whose memory layout you don't control — that's a non-starter. You need position-independent code (PIC): machine code that uses relative references instead of absolute ones. Every jump and every data access is expressed as an offset from the current instruction, so the code keeps working no matter where it lands.

Same bytes, three different addresses, three working executions. That's the whole point of position-independent code.

GCC and Clang can both produce PIC, but the resulting .so file is still wrapped in a lot of housekeeping — ELF headers, symbol tables, relocation entries, debug info. To actually use the code, you need just the code. That's what extraction is for.

What the compiler hands us

Each blob is compiled with these flags:

Flag	What it means
`-ffreestanding`	"Don't assume a C standard library exists."
`-nostdlib`	"Don't link libc. Don't add the C runtime."
`-fPIC`	"Generate position-independent code."
`-Os`	"Optimise for size."

The result is an ELF shared object (.so) — but a strange one. There is no main(), no printf(), no dynamic linker glue. The code talks to the kernel directly through syscalls. A custom linker script plants two symbols, __blob_start and __blob_end, around the bytes we actually want to keep.

An ELF .so packs the code we want between a lot of metadata we don't. The linker script plants __blob_start and __blob_end markers so picblobs can find the right slice.

Everything outside those markers is metadata the ELF format demands but that we'll throw away.

The extraction step

picblobs opens the .so with pyelftools (the same library most Python debuggers use). It walks the symbol table, finds the __blob_start and __blob_end markers, then iterates over every section that falls inside that window:

For sections that contain real bytes (.text, .rodata, .data), it copies the bytes straight out.
For sections that are implied zeros (.bss — uninitialised globals, not stored in the file), it fills the gap with the right number of zero bytes.
It records the offset of __config_start so callers can patch in runtime parameters at the right spot.

The output is a flat byte string plus a tiny bundle of metadata: total length, config offset, entry offset, and a SHA-256 fingerprint for sanity-checking.

What you can do with the bytes

Because the code is position-independent, the bytes are usable in exactly the ways you'd hope:

mmap a chunk of rwx memory at a random address, copy them in, jump to them — done.
Write them to a file and run them through a tiny loader on a target machine — done.
Inject them into another process via your favourite injection technique — done.

The bytes don't know or care where they are. They were built that way.

Try it yourself

The smallest demonstration of the whole pipeline is two commands:

picblobs-cli extract hello linux:x86_64 -o /tmp/hello.bin
hexdump -C /tmp/hello.bin | head

That's a few hundred bytes of pure x86-64 machine code that, when executed at any address you like, will print Hello, world!.