Thursday, May 29, 2014

Circular causality in Computing

More often than not circular causality is associated with philosophical thought and is thought of as a contradiction to scientific thought let alone computer science. It might not be wrong to say that paradigm of linear causality is so embedded in human thought process that taking another view on it or not being able to think in terms of cause and effect or not being able to pin down events on a time scale seems unreasonable and counter intuitive. In fact, intuition as we understand it, is actually based on causality.

As computer scientists, the term ‘causality’ as we know it actually refers to linear causality. Linear causality is one of the most fundamental concepts required for designing, implementing, debugging and proving correctness of the systems at different scales. But there exists examples in the current systems which show non linear behavior such as N-tiered application performance, user intent capturing systems and management of large scale data centers.

The hypothesis is that non-linear effects are primarily due to the hidden feedback loops and self organizing nature of systems which cannot be captured using linear causality. These systems can be modelled using the notions of circular causality which can then be used to used to explain the unexplained behavior. As opposed to intuition, notion of circular causality does not replace existing linear causality paradigm but can complement it to explain the non linear behavior of largely distributed systems.

Here are some potential case studies which can be carried out in this regard:


Example #1 - Management of large scale clouds

There has been recent research which treats large data centers as different kinds of devices (as defined by physicists)
  • Thermodynamic device : Examples include entropy based treatment of phenomenon observed in data center workloads
  • A computational device: Offloading and/or web servers
  • Information storing, processing and creating device: Storage clouds, sensor repositories and stream processing to generate new data
  • Cloud as a self organizing device: Management of virtual machines in a data centers
None of them explain the overall nature of cloud. Why ? Because all of the above works are trying to explain different parts of a big thing. So, hypothesis is a large scale cloud is actually all of the above and can be characterized by dynamical system theory. How - I don't know. 

So, cloud essentially behaves as multiple type of physical devices and hence, cannot be explained by linear causality as there are interactions at lowest level which affect decision at higher level and decisions at higher level affect lower level operations. There is a hierarchical loop which will lead to emergent properties (like consciousness) which cannot be explained by linear causality concepts.

Example #2 - User Intent formulation

This is a very interesting and economically attractive area which is basically taking the wrong path of trying to capture user intent either from clicks (in case of internet) or from linear analysis of brain signals (brain imaging) or habit of a user. None of the methods have proven to be effective. 


This post is not to provide a solution to these issues but to asking for contribution from readers on how can we collectively think or work to find  out a solution ?

CDMA based memory interface

As projected the number of cores in a single processor chip will be nearing 1000 and optimistically to 4096. Among many other problems, the problem of memory contention will pose a serious threat to the scaling the hardware capability to software performance. For the simple fact that the caches do not scale with the these number of cores due to difficulty in implementation of cache coherency hardware, we can safely assume that there are no caches interfering in the memory access. The memory access requests would be queued with the network/array of memory controllers which will service the requests in a predefined order according to priority semantics for the cores. When n is small, this scenarios will not have any problems but assuming n reaches up to 1000 or 4096 cores which is currently being projected as being possible in next decade. This could cause serious performance lags.


A scheme to overcome or at least reduce this situation of memory bandwidth limitation is needed. Memory bandwidth utilization could be optimized using orthogonality of binary codes.

More about CDMA: http://en.wikipedia.org/wiki/Code_division_multiple_access

Idea is simple, multiplex bus using CDMA techniques to aggregate responses to memory accesses from different cores on a single processor.

What do you think ?

OCCI - On chip Cooling Instruction


OCCI : On Chip Cooling Instructions

With increasing miniaturization of devices in recent times, power dissipated by devices in form of heat is a major challenge for chip manufacturers and end consumer device manufacturers. Heat is generated by all components like memory, processor, storage or any I/O controllers. For this post, I focus on processors only. Lets think about what processors do, basically they read some instructions and execute them. Today's microprocessor is a highly complex beast, with transistor size shrinking, we operate on lower voltages, combining many types of cores with varying capabilities, multiple levels of cache hierarchy etc. The size of chip is shrinking more and more. 
Basis of this post is my hunch is that for a given processor, each instruction has a different heat profile. Also, for each instruction, there are certain group of transistors which get activated (which inherently could depend on the state)

So the question I ask is can we design some instructions which have negative heat dissipation OR absorb heat ?

At first glance, this sounds crazy right ? What do you think ?

Opinion

I think that it should be possible. Given that we have a large body of research in thermoelectric substances. Why can't nano thermocouples be fabricated in the processor itself and we can design instructions which activate them. The electricity generated by these thermocouples could further be harvested and used later. How exactly I don't know.

What do you think ?

SASP: Self Aware suggesting Programming


A technique which facilitates suggestions from a lower level software stack to upper level stack about the various actions or cross layer API calls that can be performed without a  foreseeable error. This allows the upper layer software software stack to decide at runtime what action and/or calls are supposed to made according to the lower level stack. In the existing stacks technique can be beneficial to avoid a large number of bugs that arise in the development/Integration phase.


Introduction
In any system there are different layers or components that interact with each other to facilitate a particular task. A similar abstraction is now proposed for different devices working in tandem to provide a complete user experience.


For example,  for the use case of capturing an image in any handheld device will require interactions of various layers of software i.e. The minimal implementation of use case will require firmware handling the camera interacting with the OS driver of camera, the post processing filters applied to the raw captured image, the post processed image is then passed on to the display which re-sizes & displays it on a view finder application or screen. When the capture button is pressed the image is passed on to a JPEG encoder and finally to saved to a storage device.


Similarly, it is proposed that functionality of each layer in the above example could be offloaded to cloud or a nearby device. The interfaces used to facilitate the collaboration can be implemented in user mode or a complete stack of software across virtualization, operating system, user mode and application level.


There are two points worth noticing in the above discussion. Firstly, there are various software stacks or devices which are interacting with each other to provide the required functionality. In practice this interaction is handled and fine tuned by engineers in a final product (as of now). 


Secondly, worth noticing here is most of above mentioned software/device have fixed way of interacting with the other software stack/device. You can pick any of the above OS driver (fixed semantic for a given OS), camera firmware (Mostly encapsulated in some standard middleware specification like openmax), same applies for imaging components and encoders etc.


So, in other words, the know how required to integrate and deploy system is well present within the system itself. As of now engineers are required to understand the interaction semantics and code the applications as per the different layers of software or interaction of devices. We argue that this could well be made to autonomous to learn the way of interactions at run time and then use these. This paper implements the system offering such functionality.
   
Another concern addressed by this design that a large portion of bugs in software arise due to lack of proper calls to various software stack layers in a complex systemStatistics suggest that out of total number of bugs a large portion of bugs which are found in the developmental phase arise due to wrongs actions performed by service user which are not possible according to the service provider software stacks. For example, A Openmax IL user can issue a command which is not allowed in the current state of the component. A lot of man hours are wasted in correcting these bugs. It has been observed in practice that the most of the software development cycle consists of resolving the bugs which are mainly caused by lack of synchronisation between the developers. Although, proper documentation is sought as a full proof solution but in practice it does not suggest so. The cost being the time taken by such adjustments which in turn translates in to delays and hamper time to market of the product. This could be avoided with a little overhead in the implementation.


The proposed technique utilises the current state information available with a software stack/device (say A) to suggest the actions to the software/device (say B) which is using the services offered by A. In most cases, A is a software with well defined interfaces and behaviour like OpenMax IL, OpenGL or any other protocol stacks etc. and B is an application or a middleware utilising the services offered by A.

Technique
As suggested earlier, the proposed technique utilises the information available with software and the action it is expecting or is ready to undertake. 


We have a software stack A which exposes to its user n interfaces
APIs : I = {i1,i2 .... iN}.


Each interface has say M(i,k) methods exposed where M(i,k) stands for k th method exposed by i the interface. Let's also suppose that any interface exposes a max of l methods.


At any given point of time there can be in one of m states.
States : S = {s1, s2, ... sM}


Each state comprises of allowed b calls to methods of these interfaces where B can vary from 0 to n*l.
C = {c1,c2 ... Ck}  : 0<= k <=n
which is a subset of I.

Cj  (j= 1 to k) is subset of M(i,k)


Wrong calls : Calls on which possibly error will occur in the system or the call that is not expected by the service provider.
Correct Calls : calls with no foreseeable error.
Total number of wrong calls (WC)  and correct calls (CC) which can arise in the system can be expressed as follows

WC = summation (for count = 1, count <=n) (s-count intersection !(c-count))
CC =  summation (for count = 1, count <=n) (s-count intersection (c-count))

If the software stack/device (A) could communicate with the user software/device (B) after each call suggesting what possible wrong calls and correct calls it can make. This implies that the service user has the information of possible calls of service provider it can make. Depending on the result service user needs to do its task it can call the required method of its interface. It is closely like a feedback closed loop (ODA Loop). But in this case the control depends not on outcome but the allowed calls reported by service provider and the result service user requires.

This could well be implemented using a callback returning a bitmap after each call suggesting the allowed calls. This usage of bitmap for suggestions requires a small initialization overhead. The search of the supported calls in the same state basically denotes getting a bitmap from the software stack which can be accomplished in O(1) complexity using the bit manipulation instructions by almost all processors.

Using this callback,  Following are analysed in detail in light of the new proposed technique :
1. Memory footprint
2. Latency 
3. Number of man hours it would save in development
4. Increase in the debugging option


Feel free to add to the discussion.