As projected the number of cores in a single processor chip will be nearing 1000 and optimistically to 4096. Among many other problems, the problem of memory contention will pose a serious threat to the scaling the hardware capability to software performance. For the simple fact that the caches do not scale with the these number of cores due to difficulty in implementation of cache coherency hardware, we can safely assume that there are no caches interfering in the memory access. The memory access requests would be queued with the network/array of memory controllers which will service the requests in a predefined order according to priority semantics for the cores. When n is small, this scenarios will not have any problems but assuming n reaches up to 1000 or 4096 cores which is currently being projected as being possible in next decade. This could cause serious performance lags.
A scheme to overcome or at least reduce this situation of memory bandwidth limitation is needed. Memory bandwidth utilization could be optimized using orthogonality of binary codes.
More about CDMA: http://en.wikipedia.org/wiki/Code_division_multiple_access
Idea is simple, multiplex bus using CDMA techniques to aggregate responses to memory accesses from different cores on a single processor.
What do you think ?