Posts Tagged ‘Research and Development

Transactional Memory Coherence and Consistency

Transactional memory Coherence and Consistency (TCC). TCC providesa model in which atomic transactions are always the basic unit of parallel work, communication, memory coherence, and memory reference consistency. TCC greatly simplifies parallel software by eliminating the need for synchronization using conventional locks and semaphores, along with their complexities.

TCC  hardware must combine all writes from each transaction region in a program into a single packet and broadcast this packet to the permanent shared memory state atomically as a large block.This simplifies the coherence hardware because it reduces the need for small, low-latency messages and completely eliminates the need for conventional snoopy cache coherence protocols, as multiple speculatively written versions of a cache line may safely coexist within the system. Meanwhile, automatic, hardware-controlled rollback of speculative transactions resolves any correctness violations that may occur when several processors attempt to read and write the same data simultaneously. The cost of this simplified scheme is higher interprocessor bandwidth.

TCC model continually execute speculative transactions. A transaction is a sequence of instructions that is guaranteed to execute and complete only as an atomic unit. Each transaction produces a block of writes called the write state which are committed to shared memory only as an atomic unit, after the transaction completes execution.Once the transaction is complete, hardware must arbitrate system-wide for the permission to commit its writes. After this permission is granted, the processor can take advantage of high system interconnect bandwidths to simply broadcast all writes for the entire transaction out as one large packet to the rest of the system. The broadcast can be over an unordered interconnect,with individual stores separated and reordered, as long as stores from different commits are not reordered or overlapped.Snooping by other processors on these store packets maintains coherence in the system, and allows them to detect when they have used data that has subsequently been modified by anothertransaction and must rollback — a dependence violation. Combiningall writes from the entire transaction together minimizesthe latency sensitivity of this scheme, because fewer interprocessormessages and arbitrations are required, and because flushingout the write state is a one-way operation. At the same time, sincewe only need to control the sequencing between entire transactions,instead of individual loads and stores, we leverage the commit operation to provide inherent synchronization and a greatly simplified consistency protocol. This continual speculative buffering, broadcast, and potential violation cycle  allows us to replace conventional coherence and consistence protocols simultaneously:


To impose some sort of ordering rules between individual memory reference instructions, as with most consistency models, TCC just imposes a sequential ordering between transaction commits. This can drastically reduce the number of latency-sensitive arbitration and synchronization events required by low-level protocols in a typical multiprocessor system. As far as the global memory state and software is concerned, all memory references from aprocessor that commits earlier happened “before” all memory references from a processor that commits afterwards, even if the references actually executed in an interleaved fashion.A processor that reads data that is subsequently updated by another processorʼs commit, before it can commit itself, is forced to violate and rollback in order to enforce this model.Interleaving between processorsʼ memory references is only allowed at transaction boundaries, greatly simplifying the process of writing programs that make fine-grained access to shared variables. In fact, by imposing an original sequential programʼs original transaction order on the transaction commits, this can effectively let the TCC system provide an illusionof uniprocessor execution to the sequence of memory references generated by parallel software.


Stores are buffered and kept within the processor node for the duration of the transaction in order to maintainthe atomicity of the transaction. No conventional, MESI-style cache protocols are used to maintain lines in “shared” or “exclusive”states at any point in the system, so it is legal for many processor nodes to hold the same line simultaneously in either an unmodified or speculatively modified form. At the end of each transaction, the broadcast notifies all other processors about what state has changed during the completing transaction. During this process, they perform conventional invalidation(if the commit packet only contains addresses) or update(if it contains addresses and data) to keep their cache state coherent. Simultaneously, they must determine if they may have used shared data too early. If they have read any datamodified by the committing transaction during their currently executing transaction, they are forced to restart and reload the correct data. This hardware mechanism protects against true data dependencies automatically, without requiring programmers to insert locks or related constructs. At the same time, data antidependencies are handled simply by the fact that later processors will eventually get their own turn to flush out data to memory. Until that point, their “later” results are not seenby transactions that commit earlier (avoiding WAR dependencies) and they are able to freely overwrite previously modified data in a clearly sequenced manner (handling WAW dependenciesin a legal way). Effectively, the simple,  sequentialized consistence model allows the coherence model to be greatly simplified, as well.

Tags : , , ,