Monday, March 06, 2006

Eraser - In Brief

Multi-threaded programming

* error-prone

* easy to make synch mistakes := data race

* hard to locate synch mistakes in debug


In few Word: Eraser is ?

* dynamically detects data races in lock-based multi-threaded programs

* uses binary rewriting techniques

* monitor every shared memory reference

* verify "consistent" locking behavior/discipline


Introduction

* multi-threaded programs are the common programming techniques

* Dynamic race detection based on "happens-before" relation

* checks conflicting mem accesses from different threads

* Based on locking discipline: programming policy, ensures absence of data races E.g. require every variable shared b/n threads to be protected by a mutual exclusion lock


Definitions

* Lock : synch object; used for mutex; either available or owned by a thread

= ops on lock L: lock(L); unlock(L)

= is essentially a binary semaphore ,but ONLY lock's owner can release that lock

* Data Race : when two (or more) concurrent threads access a shared var

= at least one access is a write AND

= the threads use no explicit mech to prevent the accesses from being simultaneous

= If a prog has a potential data race, then the impact of conflicting accesses to the shared var *will depend on the interleaving* (of the thread executions)


Related Work ? Monitors

* group of shared vars + procedures that are allowed to access those vars

* single "anonymous" lock (by which we mean implicit)

* whenever one of those procedures is entered, that lock is acquired & whenever one of those procedures is exited, the lock is released

* the shared vars in the monitor are invisible outside of the monitor

* provide a static, compile-time guarantee that shared vars are

* serialized and thus free from data races

* Don't protect against data races in progs with dynamically allocated shared vars


Happens-Before Relation

* The HB order is a partial order on all events of all threads in a concurrent execution.

* W/in any single thread: events ordered in order in which they occur.

* B/n threads, events ordered according to the properties of the synch objects they access;

= if thread T1 accesses synch object L1 and the next access to L1 is by T2 then the T1's access of L1 "happens before" T2's

= if two (or more) threads access a shared var and the accesses are NOT ordered by the happens-before relation, then in another exec of the prog, the two accesses could have happened simultaneously == a data race could have occurred

# If there were no happens-before relation b/n T1's unlock(L) and T2's lock(L) then T1 could have read v = 3; T2 could have read v = 3; then T1 would write v = 4; so would T2 (which is not what SHOULD have happened had those two variable assignments been serialized)


Drawbacks to tools based on HB

* inefficient; require per-thread info a/b concurrent accesses to all shared mem locs

* effectiveness of tools based on HB is highly dependent upon the interleaving produced by the scheduler


Interleaving produced by the Scheduler

* Thread 1

y++ //y1

A

* Thread 2

B

y++ //y2

* So possible orderings (y_1 comes before A; y_2 comes after B):

= y_1, A, B, y_2

= y_1, B, A, y_2

= y_1, B, y_2, A

= B, y_1, A, y_2

= B, y_1, y_2, A <=== potential data race

= B, y_2, y_1, A <=== potential data race


The Lockset algo

* enforces simple locking discipline: "every shared var must be protected by some lock?

= the lock is held by the thread accessing the var *when* that thread accesses the shared var

* Eraser monitors all reads and writes as the program executes; infers protection relation (b/n particular locks & specific vars) from execution history (since it does not know which lock protects which shared var)

* V => shared var

* C(V) => candidate locks for V

* as would expect, when new var V is initialized, its candidate set, C(V), includes all possible locks

* when V is accessed, it updates C(V) with the intersection of C(V) and the set of locks held by the current thread => "lockset refinement?

* if C(V) becomes empty, then this indicates that there is no lock that consistently protects V. (error)


The Lockset algo V.1

* Let locks held(t) be the set of locks held by thread t.

* For each v, initialize C(v) to the set of all locks.

* On each access to v by thread t,

= set C(v) := C(v) ? locks held(t);

= if C(v) := { }, then issue a warning.


Improving the Locking Discipline

* Three cases violate the discipline but don't have data races

= Initialization: shared vars are frequently initialized w/o holding a lock

= Read-shared data: some shared vars are written during init only and are read-only thereafter; is OK to have multiple readers simultaneously reading

= Read-write locks: read-write locks allow multiple readers to access a shared var but only ONE writer to do so


Accommodating Initialization

* by delaying refinement of a variable's candidate after that var has been initialized

* how to know when initialization is complete?

= Eraser says a shared var has been initialized at the instant that var is first accessed by a second thread (Virgin)

= As long as a var has been touched by a single thread only, all reads & writes of that var are considered part of initialization (Exclusive). C(V) is not refined here


Accommodating Multiple-Readers

* A read access from another state changes the state of the variable to shared

* C(V) is refined, but errors are not reported even if C(V) is null

* Accommodating Read-Write

* A write by another thread on a shared var, takes the var from Exclusive or shared state to shared-modified

* C(V) is refined and error is reported if C(V) is null

* Require that for every variable V, some lock M protects V

= M is held in write-mode for every write of V

= M is held in *some* mode (read or write) for every read of V


The Lockset algo V.2

* keep track of all locks held by thread T; locks_held(T)

* keep track of locks held in write mode by T; write_locks_held(T)

* Upon entering the Shared-Modified, C(V) = set of all locks

= If we are READING V, C(V) = C(V) intersection locks_held(T)

= If we are writing V, C(V) = C(V) intersection write_locks_held(T)

= if C(V) == {}, report error


Implementation

* Eraser takes an unmodified program binary as input and adds instrumentation to produce a new binary that is functionally equivalent but that includes calls to the Eraser runtime to implement the Lockset algo

* To maintain C(V), Eraser instruments each load & store in the program

* To maintain locks_held(T) for each thread T, Eraser instruments each call to acquire or release a lock as well as the stubs used to manage thread initi & finalization.

* To initialize C(V) for dynamically allocated data, Eraser instruments each call to the storage allocator

* Eraser treats each 32-bit word in the heap or global data as a possible shared var

* Eraser does NOT instrument loads & stores whose address mode is indirect off the stack pointer -- since these are presumed to be stack refs which are presumed to be local to the thread data, not global data (which will be in a global location or in the heap).

* Eraser maintains candidate sets for stack locations that are accessed via registers *besides* the stack pointer


Representing Candidate Lock Sets

* naive implem: store a list of candidate locks for each mem loc

* exploit: distinct sets of locks observed in practice is small

= represent each set of locks by an int (lockset index)

= lockset index indexes into a table whose entries represent the set of locks as sorted vectors of lock addresses

= each entry represents some set of locks stored as a vector where that vector is sorted by the addresses of the locks comprising the set

= use hashing to eliminate dupes in the table & to find the key for a given set of locks

= the entries in the table are never deallocated nor modified; lockset index valid for program's lifetime

= Eraser caches result of several set intersections so fast case for set intersection is just a table lookup

= each lock set is -- again -- stored as a sorted vector so the worst case, intersection can be performed by comparing two sorted vectors

* For every 32-bit word in the data segment or in the heap, there is a shadow word that contains

= the 30-bit lockset index for that word

= a 2-bit state condition (Virgin, Exclusive, Shared, Shared-Modified).

= In the Exclusive state the 30 bits are used to store the ID of the thread with exclusive access.

* All the std. mem alloc routines are instrumented to allocate and initialize a shadow word for each word allocated by the prog.

* When a thread accesses a mem loc, Eraser finds the shadow word for that mem loc by adding some fixed displacement to the memory location's


Annotations

* How to suppress false positives without eliminating true positives?

* If false alarms suppressed with accurate & specific "annotations,? then when a prog is modified and modified prog is tested, only "fresh and relevant" warnings will be produced


Kinds of False Alarms

* mem reuse: mem is reused w/o resetting shadow var

= many progs implement free lists or private allocators so now way for Eraser to know that some piece of memory has been "privately recycled" (and thus is protected by a *new* set of locks) must be accommodated

= EraserReuse(address,size)

= instructs eraser to reset the shadow mem corresponding to the mem range to the Virgin state

* Private locks:

= locks taken w/o communicating this seizure to Eraser

= caused by private implems of multiple-reader, single-writer locks (not part of pthreads i/f that Eraser implements)

= EraserReadLock(lock)

= EraserReadUnlock(lock)

= EraserWriteLock(lock)

= EraserWriteUnlock(lock)

= communicate private lock implementations

* Benign races:

= true data races; didn't affect program's correctness

= some intentional, others no

= EraserIgnoreOn()

= EraserIgnoreOff()

= surround code which the race detector should not worry about w.r.t. finding races


Additional Thoughts/Works

* Protection by multiple locks

* Deadlock


Protection by multiple locks

* Some programs protect a shared variable by multiple locks instead of single

* Working :

= any writer must hold all locks for a shared var

= any reader must hold at least one lock for a shared var

= aimed at avoiding dead lock

* modified Lockset in following manner:

= On each read of V by T,

= if C(V) == {}, issue warning

= On each write of V by T,

= set C(V) == C(V) intersect locks_held(T)

= if C(V) == {}, issue warning

* prevented false alarms, could also cause false negatives

= if T1 reads V while holding M1 and T2 writes V while holding M2

= violation of locking discipline will only be reported if the write preceeds the read

* requires complex DSs: sets of sets of locks


Deadlocks

* Discipline to avoid deadlock:

= choose partial ordering among all locks

= and program each thread so that whenver it holds more than one lock, it acquires them in increasing order (prevents Circularity requirement of deadlock).

* deadlock checking could be useful addition to Eraser

No comments: