Wednesday, March 08, 2006

void __builtin_return_address (unsigned int level);

void __builtin_return_address (unsigned int level);
    Returns the return address of the current function, or of one of its callers. Where level argument is a constant literal indicating the number of frames to scan up the call stack. The level must range from 0 to 63. A value of 0 yields the return address of the current function, a value of 1 yields the return address of the caller of the current function and so on.
Notes:
  1. When the top of the stack is reached, the function will return 0.
  2. The level must range from 0 to 63, otherwise a warning message will be issued and the compilation will halt.
  3. When functions are inlined, the return address corresponds to that of the function that is returned to.
  4. Compiler optimization may affect expected return value due to introducing extra stack frames or fewer stack frames than expected due to optimizations such as inlining.
    __builtin_return_address(0) yields the address to which the current function will return. __builtin_return_address(1) yields the address to which the caller will return, and so on up the stack.
    Return probes operates by replacing the return address in the stack (or in a known register, such as the lr register for ppc - what else can you do with a system with 100's of registers :) ). This may cause __builtin_return_address(0), when invoked from return-probed function, to return the address of the return-probes trampoline. There has been concern that this will break something.
    __builtin_return_address is used entirely for tracing, profiling, and error reporting.

Tuesday, March 07, 2006

Keeping Deleted Code for Future Reference (with comments)

Many a times we have come across a situation where we want to
keep a piece of code that does not execute. If often do it with "/*" and
"*/". But these won't work if there are similar comments within the
piece of code we want to comment. It achieve this use "#if 0" and
"

eg:

#if 0

    /* Used for debugging */
    while (a < n)

      printf ("%d", a);

#endif

Monday, March 06, 2006

Eraser - In Brief

Multi-threaded programming

* error-prone

* easy to make synch mistakes := data race

* hard to locate synch mistakes in debug


In few Word: Eraser is ?

* dynamically detects data races in lock-based multi-threaded programs

* uses binary rewriting techniques

* monitor every shared memory reference

* verify "consistent" locking behavior/discipline


Introduction

* multi-threaded programs are the common programming techniques

* Dynamic race detection based on "happens-before" relation

* checks conflicting mem accesses from different threads

* Based on locking discipline: programming policy, ensures absence of data races E.g. require every variable shared b/n threads to be protected by a mutual exclusion lock


Definitions

* Lock : synch object; used for mutex; either available or owned by a thread

= ops on lock L: lock(L); unlock(L)

= is essentially a binary semaphore ,but ONLY lock's owner can release that lock

* Data Race : when two (or more) concurrent threads access a shared var

= at least one access is a write AND

= the threads use no explicit mech to prevent the accesses from being simultaneous

= If a prog has a potential data race, then the impact of conflicting accesses to the shared var *will depend on the interleaving* (of the thread executions)


Related Work ? Monitors

* group of shared vars + procedures that are allowed to access those vars

* single "anonymous" lock (by which we mean implicit)

* whenever one of those procedures is entered, that lock is acquired & whenever one of those procedures is exited, the lock is released

* the shared vars in the monitor are invisible outside of the monitor

* provide a static, compile-time guarantee that shared vars are

* serialized and thus free from data races

* Don't protect against data races in progs with dynamically allocated shared vars


Happens-Before Relation

* The HB order is a partial order on all events of all threads in a concurrent execution.

* W/in any single thread: events ordered in order in which they occur.

* B/n threads, events ordered according to the properties of the synch objects they access;

= if thread T1 accesses synch object L1 and the next access to L1 is by T2 then the T1's access of L1 "happens before" T2's

= if two (or more) threads access a shared var and the accesses are NOT ordered by the happens-before relation, then in another exec of the prog, the two accesses could have happened simultaneously == a data race could have occurred

# If there were no happens-before relation b/n T1's unlock(L) and T2's lock(L) then T1 could have read v = 3; T2 could have read v = 3; then T1 would write v = 4; so would T2 (which is not what SHOULD have happened had those two variable assignments been serialized)


Drawbacks to tools based on HB

* inefficient; require per-thread info a/b concurrent accesses to all shared mem locs

* effectiveness of tools based on HB is highly dependent upon the interleaving produced by the scheduler


Interleaving produced by the Scheduler

* Thread 1

y++ //y1

A

* Thread 2

B

y++ //y2

* So possible orderings (y_1 comes before A; y_2 comes after B):

= y_1, A, B, y_2

= y_1, B, A, y_2

= y_1, B, y_2, A

= B, y_1, A, y_2

= B, y_1, y_2, A <=== potential data race

= B, y_2, y_1, A <=== potential data race


The Lockset algo

* enforces simple locking discipline: "every shared var must be protected by some lock?

= the lock is held by the thread accessing the var *when* that thread accesses the shared var

* Eraser monitors all reads and writes as the program executes; infers protection relation (b/n particular locks & specific vars) from execution history (since it does not know which lock protects which shared var)

* V => shared var

* C(V) => candidate locks for V

* as would expect, when new var V is initialized, its candidate set, C(V), includes all possible locks

* when V is accessed, it updates C(V) with the intersection of C(V) and the set of locks held by the current thread => "lockset refinement?

* if C(V) becomes empty, then this indicates that there is no lock that consistently protects V. (error)


The Lockset algo V.1

* Let locks held(t) be the set of locks held by thread t.

* For each v, initialize C(v) to the set of all locks.

* On each access to v by thread t,

= set C(v) := C(v) ? locks held(t);

= if C(v) := { }, then issue a warning.


Improving the Locking Discipline

* Three cases violate the discipline but don't have data races

= Initialization: shared vars are frequently initialized w/o holding a lock

= Read-shared data: some shared vars are written during init only and are read-only thereafter; is OK to have multiple readers simultaneously reading

= Read-write locks: read-write locks allow multiple readers to access a shared var but only ONE writer to do so


Accommodating Initialization

* by delaying refinement of a variable's candidate after that var has been initialized

* how to know when initialization is complete?

= Eraser says a shared var has been initialized at the instant that var is first accessed by a second thread (Virgin)

= As long as a var has been touched by a single thread only, all reads & writes of that var are considered part of initialization (Exclusive). C(V) is not refined here


Accommodating Multiple-Readers

* A read access from another state changes the state of the variable to shared

* C(V) is refined, but errors are not reported even if C(V) is null

* Accommodating Read-Write

* A write by another thread on a shared var, takes the var from Exclusive or shared state to shared-modified

* C(V) is refined and error is reported if C(V) is null

* Require that for every variable V, some lock M protects V

= M is held in write-mode for every write of V

= M is held in *some* mode (read or write) for every read of V


The Lockset algo V.2

* keep track of all locks held by thread T; locks_held(T)

* keep track of locks held in write mode by T; write_locks_held(T)

* Upon entering the Shared-Modified, C(V) = set of all locks

= If we are READING V, C(V) = C(V) intersection locks_held(T)

= If we are writing V, C(V) = C(V) intersection write_locks_held(T)

= if C(V) == {}, report error


Implementation

* Eraser takes an unmodified program binary as input and adds instrumentation to produce a new binary that is functionally equivalent but that includes calls to the Eraser runtime to implement the Lockset algo

* To maintain C(V), Eraser instruments each load & store in the program

* To maintain locks_held(T) for each thread T, Eraser instruments each call to acquire or release a lock as well as the stubs used to manage thread initi & finalization.

* To initialize C(V) for dynamically allocated data, Eraser instruments each call to the storage allocator

* Eraser treats each 32-bit word in the heap or global data as a possible shared var

* Eraser does NOT instrument loads & stores whose address mode is indirect off the stack pointer -- since these are presumed to be stack refs which are presumed to be local to the thread data, not global data (which will be in a global location or in the heap).

* Eraser maintains candidate sets for stack locations that are accessed via registers *besides* the stack pointer


Representing Candidate Lock Sets

* naive implem: store a list of candidate locks for each mem loc

* exploit: distinct sets of locks observed in practice is small

= represent each set of locks by an int (lockset index)

= lockset index indexes into a table whose entries represent the set of locks as sorted vectors of lock addresses

= each entry represents some set of locks stored as a vector where that vector is sorted by the addresses of the locks comprising the set

= use hashing to eliminate dupes in the table & to find the key for a given set of locks

= the entries in the table are never deallocated nor modified; lockset index valid for program's lifetime

= Eraser caches result of several set intersections so fast case for set intersection is just a table lookup

= each lock set is -- again -- stored as a sorted vector so the worst case, intersection can be performed by comparing two sorted vectors

* For every 32-bit word in the data segment or in the heap, there is a shadow word that contains

= the 30-bit lockset index for that word

= a 2-bit state condition (Virgin, Exclusive, Shared, Shared-Modified).

= In the Exclusive state the 30 bits are used to store the ID of the thread with exclusive access.

* All the std. mem alloc routines are instrumented to allocate and initialize a shadow word for each word allocated by the prog.

* When a thread accesses a mem loc, Eraser finds the shadow word for that mem loc by adding some fixed displacement to the memory location's


Annotations

* How to suppress false positives without eliminating true positives?

* If false alarms suppressed with accurate & specific "annotations,? then when a prog is modified and modified prog is tested, only "fresh and relevant" warnings will be produced


Kinds of False Alarms

* mem reuse: mem is reused w/o resetting shadow var

= many progs implement free lists or private allocators so now way for Eraser to know that some piece of memory has been "privately recycled" (and thus is protected by a *new* set of locks) must be accommodated

= EraserReuse(address,size)

= instructs eraser to reset the shadow mem corresponding to the mem range to the Virgin state

* Private locks:

= locks taken w/o communicating this seizure to Eraser

= caused by private implems of multiple-reader, single-writer locks (not part of pthreads i/f that Eraser implements)

= EraserReadLock(lock)

= EraserReadUnlock(lock)

= EraserWriteLock(lock)

= EraserWriteUnlock(lock)

= communicate private lock implementations

* Benign races:

= true data races; didn't affect program's correctness

= some intentional, others no

= EraserIgnoreOn()

= EraserIgnoreOff()

= surround code which the race detector should not worry about w.r.t. finding races


Additional Thoughts/Works

* Protection by multiple locks

* Deadlock


Protection by multiple locks

* Some programs protect a shared variable by multiple locks instead of single

* Working :

= any writer must hold all locks for a shared var

= any reader must hold at least one lock for a shared var

= aimed at avoiding dead lock

* modified Lockset in following manner:

= On each read of V by T,

= if C(V) == {}, issue warning

= On each write of V by T,

= set C(V) == C(V) intersect locks_held(T)

= if C(V) == {}, issue warning

* prevented false alarms, could also cause false negatives

= if T1 reads V while holding M1 and T2 writes V while holding M2

= violation of locking discipline will only be reported if the write preceeds the read

* requires complex DSs: sets of sets of locks


Deadlocks

* Discipline to avoid deadlock:

= choose partial ordering among all locks

= and program each thread so that whenver it holds more than one lock, it acquires them in increasing order (prevents Circularity requirement of deadlock).

* deadlock checking could be useful addition to Eraser