How PIC extraction works
A plain-language tour of what picblobs does under the hood — and why
the bytes it spits out can run anywhere.
The 30-second version
You write ordinary C code. We compile it with some special flags so the resulting machine code doesn't care where it ends up in memory. Then we crack open the compiled file, scoop out just the executable part, and hand you back the raw bytes.
picblobs does end-to-end.Those bytes are portable shellcode — load them at any address in any process and they execute correctly.
Why "position-independent" is the whole point
Programs are normally compiled with the assumption that they own their
own address space. The compiler writes literal addresses into the
machine code: "jump to address 0x401050", "read from 0x404020".
If you copy that code somewhere else, all those numbers point at the
wrong things and it crashes immediately.
For shellcode — code you intend to inject into a process whose memory layout you don't control — that's a non-starter. You need position-independent code (PIC): machine code that uses relative references instead of absolute ones. Every jump and every data access is expressed as an offset from the current instruction, so the code keeps working no matter where it lands.
GCC and Clang can both produce PIC, but the resulting .so file is
still wrapped in a lot of housekeeping — ELF headers, symbol tables,
relocation entries, debug info. To actually use the code, you need
just the code. That's what extraction is for.
What the compiler hands us
Each blob is compiled with these flags:
| Flag | What it means |
|---|---|
-ffreestanding |
"Don't assume a C standard library exists." |
-nostdlib |
"Don't link libc. Don't add the C runtime." |
-fPIC |
"Generate position-independent code." |
-Os |
"Optimise for size." |
The result is an ELF shared object (.so) — but a strange one. There
is no main(), no printf(), no dynamic linker glue. The code talks
to the kernel directly through syscalls. A custom linker script plants
two symbols, __blob_start and __blob_end, around the bytes we
actually want to keep.
.so packs the code we want between a lot of metadata we don't. The linker script plants __blob_start and __blob_end markers so picblobs can find the right slice.Everything outside those markers is metadata the ELF format demands but that we'll throw away.
The extraction step
picblobs opens the .so with pyelftools (the same library most
Python debuggers use). It walks the symbol table, finds the
__blob_start and __blob_end markers, then iterates over every
section that falls inside that window:
- For sections that contain real bytes (
.text,.rodata,.data), it copies the bytes straight out. - For sections that are implied zeros (
.bss— uninitialised globals, not stored in the file), it fills the gap with the right number of zero bytes. - It records the offset of
__config_startso callers can patch in runtime parameters at the right spot.
The output is a flat byte string plus a tiny bundle of metadata: total length, config offset, entry offset, and a SHA-256 fingerprint for sanity-checking.
What you can do with the bytes
Because the code is position-independent, the bytes are usable in exactly the ways you'd hope:
mmapa chunk ofrwxmemory at a random address, copy them in, jump to them — done.- Write them to a file and run them through a tiny loader on a target machine — done.
- Inject them into another process via your favourite injection technique — done.
The bytes don't know or care where they are. They were built that way.
Try it yourself
The smallest demonstration of the whole pipeline is two commands:
picblobs-cli extract hello linux:x86_64 -o /tmp/hello.bin
hexdump -C /tmp/hello.bin | head
That's a few hundred bytes of pure x86-64 machine code that, when
executed at any address you like, will print Hello, world!.