This is a writeup on how we solved this problem at my workplace. Written because there where few solutions available online for this problem.
Our SCCM 2012 install have been working great for OSD ever since we got it installed. We have been deploying Windows 7 to HP Elitebooks on the client VLAN, and WIndows 2008R2 to ProLiant BL460c G1 blades the server VLAN. But when we tried G7's and G8's the PE boot would halt. It would be slow, then crash with an error message. This buggered us for months. PXE booting from our Linux PXE server with utilities, firmware CD's etc. works. So it is something Microsoft do differently. Googling would mostly show results for the usual PXE issues like IP helpers, or be for SCCM 2007 wich does PXE differently.
After a lot of trial and error for WinPE (both 3.0 and 4.0) PXE booting from the Configuration Manager server it looked like this:
ProLiant BL460c G1 - Embedded Internet - Server VLAN - WORKS
ProLiant BL465c G7 (same blade enclosure) - FlexFabric Embedded Ethernet - Server VLAN - FAILS
ProLiant BL465c Gen8 - HP FlexFabric 10Gb 2-port 554FLB Adapter - Server VLAN - FAILS
VMware Workstation 9 (hw9) VM - E1000 - Client VLAN - WORKS
VMware ESXi VM (on Gen8) - vmxnet3 & E1000 - Server VLAN - FAILS
After this we ruled out drivers, and WinPE issues. It had to be something with the network. I had been monitoring the WDS processes (TFTP and the WDS server) with perfmon.exe and it looked ok. A colleague set up our test vm with a mirror port to a machine running Kali with Wireshark running. Then it became obvious what was happening. WDS uses 1456 as the packet size for its TFTP transfers. After the initial transfer of the bootloader, it hands the process to the distribution point service that then fills fills the ramdisk. Microsoft explains this in this technet article. SCCM DP requests 16384 as the packet size. Google quickly pointed us to a registry key that got things flying:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SMS\DP
Change RamDiskTFTPBlockSize from 16384 to 1456.
Note: A larger number should increase performance. 16384 is the maximum.