Good discussion of SCC architecture

The SCC's most interesting feature is its memory architecture. First, though, it has 24 tiles of 2 cores each. A tile also has 4 ports to the on-chip mesh network, a message-passing buffer, and a static look-up-table for translating addresses.

The memory architecture: The chip has four separate physical memory spaces, each with its own DDR3 controller. These physical spaces are divided up into chunks, one for each core, and the leftover shared among the cores. A core uses the look-up-table to map its private and shared addresses into a physical address on a specific controller. The look-up-table is loaded at boot time (or could be modified dynamically, if desired, but is not intended that way).

Each core has its own L1 and L2 caches. They hold either private memory or shared memory. However, there is no hardware coherence protocol. Instead, when using shared memory, the executable has to declare variables of shared-memory type "Message Passing Buffer Type". This causes a tag to be added to each cache-line holding shared-memory values. This tag is used by a special instruction that, when executed, invalidates all cache lines holding shared memory. These two features support software coherence protocol.

Each core also has a message-passing buffer, which is accessed via memory-mapping. There's a library to use for sending and receiving messages.

That's the most interesting part of the chip.