Day 3 of Being an imposter đŸ˜›
PLE (Per Layer Embedding) is a surprisingly similar approach to MoE,
Instead of doing all at once, let’s do fewer focused things at a time.
In an LLM with # PLE-based activation, the layers get a different set of signals fed, instead of feeding all at once. Since it has a significantly smaller input, it needs less processing power (RAM/VRAM+CPU/GPU/NPU). Often to the extent of 1/7th compared to a fully connected dense model.
Most recent example of PLE in action, checkout #Gemma4 family’s E2B and E4B by Google đŸ˜€
Have any more interesting thoughts to share on faster and more efficient LLM modelling? Share in comments!
For more such posts, connect or follow Anmoldeep đŸ™‚
#LLM #Architecture #Efficiency
