vcn: trying (and failing really hard) to turn on the hardware encoder on a cursed ps5 apu

how this started

it all started when i was first trying (and miserably failing) to turn my bc-250 into a cloud gaming pc. when i finalllllyy got something working on moonshine? moonlight? sunshine? idk, i think sunshine is the server and moonlight is the client. anyways when it worked, my poor bc-250 wasn't even overclocked, i had no governor, the thing was like, anemic, totally, and sunshine was using like 20% of the whole cpu just doing software encode. so

ok so what is vcn anyway

vcn (not vnc nonono) stands for video core next. i just call it hardware encoding, or even better a porra do encoder if you're from brazil zil zil like me. so the thing is.

this vcn thingy is the thing that lets you like, stream your screen and a bunch of other mostly-video-related stuff without frying your cpu, cuz it's a pipeline to decompress and recompress videos baked into the chip itself. it's literally a PHYSICAL path that the data takes that processes it, so it doesn't tank your cpu, cuz it's prebuilt, no real computer brainpower is really wasted if you use it (it kinda is but it's irrelevant, the point is that it lets you stream the screen without using like 30% of cpu). so yeah, the board has it cuz ps5 for sure has it (although we'll circle back to this later). the thing is.

the amd crew didn't give a single fuck about it

this chip is primarily sold for mining, so the amd crew just didn't give a single fuck about vcn running on this board. on the kernel it's like they wrote:

char MakeVcnWork() {
    // fuckit implement yourself dumbass
    return 🖕;
}

except that's me being dramatic. the real kernel is actually worse, they didn't even bother to write a stub, they just left the switch case empty. look at drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:

case IP_VERSION(2, 0, 3):
    break;    // <- this is it. this is the entire vcn impl for our chip

that break is the whole story. no ip block registered, no firmware mapping, no power gating handler. the driver sees vcn v2.0.3, shrugs, walks past it. right below it, navi10 (2.0.0 / 2.0.2 / 2.2.0) does the normal amdgpu_device_ip_block_add(adev, &vcn_v2_0_ip_block) dance. we just got a break.

so yeah, i was in for a big adventure...

so for starters let's just see what the other navis do

navi (cute codename for their gpus) drivers do this vcn thing by registering vcn as an ip block (basically modules, funny enough they're called intellectual property blocks 🧠 — not much intellect was left in me after this btw) and initializing them by doing some magic (sends some signals to the thing to wake it up via smu or whatever) annnnndddd booom, works, yayy —

EXCEPT IT DON'T bro...... bro......

the wall

this fucking thing. this vcn. has been taking my sleep all day all night. the other drivers' init sequences just fucking kill it on this board cuz you can't use smu, can't use direct memory access, can't use the mailbox, can't even look at the fucking thing that it just dies and hangs everything. cuz apu = gpu + cpu, so if the gpu hangs the cpu commits seppuku with it cuz ram and everything is shared, yeahhh!!! so you have to reboot it like every 2 seconds trying to catch the right log line.

and even worse, at some point i was like

no way bro for sure it's simple-framebuffer (the thing that renders tty when no gpu driver or something like that idk)

cuz it was hanging on a function that had something to do with it. and what this means? no video. and what this means? no logs, cuz the fucking thing crashed on boot, which means i had to fucking enable all debug possible on the kernel, add prints everywhere, recompile, turn my ""pc"" on, FUCKING RECORD MY SCREEN WITH A CELLPHONE ON SLOW MOTION, and check after for logs. nah fuck this bro, i gave up for like a week.

(and for the record: simple-framebuffer was part of it, i had drm.debug=0x1e in cmdline spamming infinite simpledrm output, plus an aperture_remove_conflicting_pci_devices conflict. but even after fixing both, the driver still crashed the moment the vcn ip blocks were registered. simplefb was just the loudest symptom, not the cause.)

.

.

.

then claude accidentally solved it (a little)

days passed and i was like — k, i'll just drop an ai on a shell and be eating while i constantly restart the pc and send it photos of the logs. until

claude fucking deactivated the driver and called it quits.

buuuttt, in deactivating the driver, the pc booted fine. and i was like, "fuukkkcckckck turn on the fucking driver mf". and the little dude

modprobes this shit

and i see

everything going thru my screen until it hangs.

which gave me a STUPID idea.

and here comes the GENIUS part

i literally added a bunch of prints on the source code and ""sleeps"". annnndddd here comes the genius part:

i created a script that started the driver and wrote dmesg + journal to a file every like 0.2 seconds. (ikik, truly albert einstein grade thinking here.)

yeah, i debugged a bunch like that cuz it's way easier to cat a file instead of literally recording your screen. 10x faster iteration, and i actually started finding real stuff.

what i found

i was reading the wrong offsets for like three weeks 💀

quick confession before i brag: for most of this saga i was hammering the wrong addresses and just crashing the board over and over. i was reading 0x14068 / 0x14069 / 0x140D0 thinking those were UVD_PGFSM_CONFIG / PGFSM_STATUS / POWER_STATUS. they're not. those are MMIO dword offsets (the ones RREG32_SOC15 uses inside the kernel), NOT raw SMN addresses you can just dd out of debugfs.

the real SMN addresses for the PGFSM are 0x5C9A0, 0x5C9A4, 0x5C9B0. like three whole segments off from where i was reading. so every read i had been doing was landing on unrelated power-gated stuff, returning 0xFFFFFFFF and hanging the PCIe bus. i thought the vcn was dead. for weeks. it wasn't. my offsets were dead.

only figured it out when i cross-referenced cyan_skillfish_ip_offset.h in the kernel against a python RE of the SMU firmware (the mapping table at SRAM offset 0x15130 has the real PGFSM register addresses stored as constants). switched to the right SMN space, stopped the governor, read one register at a time via dd, and suddenly half the map lit up.

the governor was using the bar and messing everything up

the cyan-skillfish-governor-smu service (a community-maintained dpm governor that keeps clocks sane on this board) is racist. just racist. the thing does periodic SMN reads via PCIE_INDEX/DATA, which races with anything else touching SMN (the driver, my debugfs scripts, whatever). so half the results i had been collecting earlier were corrupted by that race. governor stopped, reads one at a time, and suddenly registers respond that didn't before. this alone invalidated a ton of previous testing.

the ps5 has an auxiliary chip (forgot the name)

the ps5 has an on-die auxiliary chip that does all the non-gpu-core housekeeping including bringing up vcn. it touches power rails, initializes rings, loads firmware, the whole dance. on the bc-250, this chip got cut out of the binning and doesn't exist in IP Discovery. gone. so nobody is doing the vcn power-on.

(yeah, i forgot the name while writing this. had to ssh into the machine and grep my own notes. it's the ARM A53, codename MP4.)

and the SMU on the bc-250 (Robin 5, v11.0.8, Xtensa LE, 257 KB of firmware) has zero vcn handlers:

  • SMU feature mask = 0xDD602C7D, bit 19 (VCN_PG) not set
  • SMU_MSG_PowerUpVcn (msg 0x2B) → 0xFE UnknownCmd
  • scanned Q0 message IDs 0x01–0x50 looking for anything vcn-related → nothing
  • cyan_skillfish_ppt_funcs literally doesn't define .dpm_set_vcn_enable, so amdgpu_dpm_enable_vcn() is a silent NO-OP that returns 0

ps5 had the A53 do it. we don't have the A53. the SMU doesn't know how. the driver's enable_vcn path is a stub. it's just... nothing.

(yeah maybe it's impossible without the A53. maybe. but not before we try.)

buuuttt it does respond in some registers

and this made me very happy cuz it meant i wasn't all this time in vain. after fixing the offsets + stopping the governor + reading one at a time, a bunch of the vcn always-on registers answer back with coherent values instead of 0xFFFFFFFF:

addressnamevalue
0x50000VCN0 base0x00000001 (present)
0x5B000RSMU Block A0x0000002817 live power regs
0x5B800JPEG RSMU8 live regs
0x5C000CGC clock gating0x0000001E
0x5C9A0UVD_PGFSM_CONFIG0x00080001
0x5C9A4UVD_PGFSM_STATUS0x40105001 (mixed state)
0x5C9B0UVD_POWER_STATUS0x00000000 (tiles off)
0x5F800UVD decode base0x0000001E

that PGFSM_STATUS of 0x40105001 is two bits per sub-block: 6 of them report "on", 4 are stuck in "transitioning", and the upper bits say someone tried this before and got stuck. so the vcn isn't fully dead, it's partially powered and hanging halfway through a state change that nobody finished.

so yeah — it's not harvested (IP Discovery says 2x VCN v2.0.3 + 2x JPEG v4.5, all with harvest=0x00), it's partially on (since it responds), and it's kinda accessible if you load the driver and try poking some registers. yayyy.

so what's next you ask

poke everything until something makes noise, basically.

i've been probing registers, trying a bunch of things, reversing every fucking file of a ps5 i can find on the web. the SMU firmware itself, analyzed statically in python (didn't even need ghidra for this part), gave me 76 vcn register offsets straight from the firmware SRAM:

UVD_PGFSM_CONFIG  = 0x0005C9A0
UVD_PGFSM_STATUS  = 0x0005C9A4
UVD_PGFSM_WRITE   = 0x0005C9A8
UVD_PGFSM_READ    = 0x0005C9AC
... (72 more)

and a mapping table inside the smu firmware at offset 0x13EF8 confirms VCN0 = PG domain 4. so the smu at least knows about the vcn, it just doesn't expose any message to power it on.

the most promising path rn is those 17 live RSMU Block A registers at 0x5B000+. they're clearly power-control related (some are stat regs, some look like enables) and they're the only live power-control surface we have direct SMN access to. my guess is these are what the A53 wrote on ps5 to bring the vcn rails up. if i can figure out the right sequence and ship it from the driver, maybe we bypass the smu handler entirely.

three kernel patches sketched so far:

  1. register vcn_v2_0_ip_block + jpeg_v2_0_ip_block for IP_VERSION(2,0,3) (aka fill in that empty break)
  2. map IP_VERSION(2,0,3) → navi10_vcn.bin firmware (2.0.0 is close enough to 2.0.3 that the load doesn't explode)
  3. bypass PSP for firmware load, copy direct to VRAM via memcpy_toio (the psp on this board doesn't recognize vcn firmware and hangs otherwise)

with those three, the driver gets further into init and fails on vcn_v2_0_disable_static_power_gating(). it writes 0x00055555 to UVD_PGFSM_CONFIG, then polls UVD_PGFSM_STATUS forever because the sub-blocks can't finish transitioning without power rails up. same root cause. the write goes through, the state machine just can't complete without someone bringing the rails up first.

also recently theflow (andy nguyen) dropped a video running linux on the ps5 and a bunch of GFX1013 patches (pci ids for PS5, dcn 2.01 fixes) and mesa merged bigger bc-250 support. which maybe kinda helps me via more people becoming interested in this platform.

keep tuned in for part 2

so yeah, we're continuing the research. keep tuned in for part 2 where i hopefully poke the right bit and get h.264 encode working, or more realistically, discover 4 more reasons this shouldn't be possible.

— gabriwar

ESC[A] simple-view