Kernel functions are usually pipelined, and local on-chip memory is reused to minimize external memory bandwidth.
In the past, on-chip memory has been made from static random access memory, or SRAM, typically built from units composed of six transistors.
From an energy perspective, the most relevant kinds of memory are the on-chip memory of a microcontroller and Flash memory-off-chip RAM is rarely, if ever, used.
Kernels and streams are scheduled at compile-time and moved to on-chip memory at runtime via a scoreboard.
Due to the relatively low external memory bandwidth, and the modest amount of on-chip memory required, tiled rendering is a popular technology for embedded GPUs.
There is 12 KiB of on-chip memory used for pixel and vertex caches.
Code and data are normally fetched from on-chip memory, which the user must split into regions of different word sizes as desired.
If the off-chip memory is configured as 32-bit words to avoid waste, then only the on-chip memory may be used for code execution and extended floating-point.
Operating systems may use overlays to work around this problem, transferring 48-bit data to on-chip memory as needed for execution.
Features on-chip memory that can be used either as processor cache or mapped RAM.