Archive for the ‘Architecture’ Category

Data streams, lakes, oceans and “Finding Nemo” in them….

April 4, 2014

This weekend, I complete 3 years of TCS second innings. Most of the three years I have been working with large insurance providers trying to figure out the ways to add value with the technology to their operations and strategy.

The concurrent period has been a period of re-imagination. Companies and individuals (consumers / employees) slowly moving towards reimagining themselves in the wake of converging digital forces like cloud computing, analytics & big data, social networking and mobile computing.

Focus of my career in Information Technology has always been “Information” and not technology. I am a firm believer in “Information” led transformation rather than “technology” led transformation. The basis for information is data and the ability to process and interpret the data, making it applicable and relevant for the operational or strategic issues being addressed by the corporate business leaders.

Technologists are busy making claims that their own technology is best suited for the current data processing needs. Storage vendors are finding business in providing the storage in cloud. Mobility providers are betting big on wearable devices making computing more and more pervasive. The big industrial manufacturers are busy fusing sensors everywhere and connecting them on the internet following the trend set by the human social networking sites. A new breed of scientists calling themselves data scientists are inventing algorithms to quickly derive insights from the data that is being collected. Each one of them is pushing themselves to the front taking support of the others to market themselves.

In the rush, there is a distinctive trend in the business houses. The CTO projecting technology as a business growth driver and taking a dominant role is common. The data flows should be plumbed across the IT landscape across various technologies causing a lot of hurried and fast changing plumbing issues.

In my view the data flow should be natural just like streams of water. Information should be flowing naturally in the landscape and technology should be used to make the flow gentle avoiding the floods and tsunamis. Creating data pools in the cloud storage and connecting the pools to form a knowledge ecosystem grow the right insights relevant to the business context remains the big challenge today.

The information architecture in the big data and analytics arena is just like dealing with big rivers and having right reservoirs and connecting them to get best benefit in the landscape. And a CIO is still needed and responsible for this in the corporate.

If data becomes an ocean and insights become an effort like “Finding Nemo” the overall objective may be lost. Cautiously avoiding the data ocean let us keep the (big) data in its pools and lakes as usable information while reimagining data in the current world of re-imagination. This applies to both corporate business houses as well as individuals.

Hoping Innovative reimagination in the digital world helps improve the life in the ecosystems of the real world….

Advertisements

Cloud Architecture Security & Reliability

January 31, 2014

Yesterday, I was doing a presentation at SSIT, Tumkur on Cloud Architecture Security & Reliability to the faculty members of SSIT and SIT Tumkur.

With the advent of Cloud Computing paradigm there are at least five categories of “Actors” emerged.
1. Cloud Consumers, 2. Cloud Providers, 3. Cloud Brokers, 4. Cloud Auditors, 5. Cloud Carriers. The NIST conceptual reference model gives a nice overview of these. ( http://www.nist.gov/itl/cloud/upload/NIST_SP-500-291_Version-2_2013_June18_FINAL.pdf )

Image description not specified.

The security of more specifically “Information Security” is a cross cutting concern across all these actors. The CSA publishes top threats regularly here. The top threats 2013 are

  1. Data Breaches
  2. Data Loss
  3. Account Hijacking
  4. Insecure APIs
  5. Denial of Service
  6. Malicious Insiders
  7. Abuse of Cloud Services
  8. Insufficient Due Diligence
  9. Shared Technology Issues

All these threats translate to protecting four major areas of Cloud Architecture…

  1. Application Access – Authentication and Authorization
  2. Separation of Concerns – Privileged user access to sensitive data
  3. Key – Management – of encryption keys
  4. Data at Rest – Secure management of copies of data

Interestingly the ENISA threat landscape also points to similar emerging threats related to Cloud Computing –

Image description not specified.

Is there any shortcut to achieve security to any of the actors in the Cloud? I do not think so. The perspective presented by Booz & Co on cloud security has a nice ICT Resilience life clycle that was discussed.

Finally, there was a good discussion on the Reliability and Redundancy. The key aspect was how do we achieve better reliability of a complex IT system consisting of multiple components across multiple layers (i.e., web, application, database) to make best utility of non failing components to share the load while isolating the failure component and decoupling it from the cluster and seamlessly re-balancing the workload to the rest of the working components.

Overall it was a good session to interact with academia!

The slide deck that was used:

Small & Big Data processing philosophies

January 3, 2013
In this first post of 2013, I would like to cover some fundamental philosophical aspects of “data” & “processing”.
As the buzz around “Big Data” going on high, I have classified the original structured, relational data as “small data” even though some very large databases I have seen having 100+ Terabytes of data with an IO volume of 150+ Terabytes per day.  
Present day data processing predominantly uses Von-Neumann architecture of computing in which “Data” and its “processing” are distinct and separated into “memory” and “processor” connected by a “bus”.  Any data that need to be processed will be moved into processor using the bus and then the required arithmetic or logical operation happens on it producing the “result” of the operation. Then the result will be moved to “memory/storage” for further reference.  Also, the list of operations to be performed (the processing logic or program) is stored in the “memory” as well. One needs to move the next instruction to be carried out into the processor from memory using the bus.
So in essence both the data and the operation that needs performing will be in memory which can’t process data and the facility that can process data is always dependent on the memory in the Von-Neumann architecture.
Traditionally, the “data” has been moved into a place where the processing logic is deployed as the amount of data is small when compared to the amount of processing needed is relatively large involving the complex logic. In the RDBMS engines like Oracle read the blocks of storage into the SGA buffer cache of running database instance for processing. The transactions were modifying small amounts of data at any given time.
Over a period of time “analytical processing” that required to bring huge amounts of data from storage into processing node which created a bottleneck on the network pipe. Add to that there is a large growth in the semi-structured and unstructured data that started flowing which needed a different philosophy towards data processing.
There comes the HDFS and map-reduce framework of Hadoop which took the processing to the data. During the same time comes Oracle Exadata which took the database processing to storage layer with a feature called “query offloading”
In the new paradigm, the snippets of processing logic are being taken to a cluster of connected nodes where the data mapped with a hashing algorithm resides and results of processing then reduced to finally produce result sets. It is now becoming economical to take the processing to data as the amount of data is relatively large and the required processing is fairly simple tasks of matching, aggregating, indexing etc.,
So, we now have choice of taking small amounts of data to complex processing with structured RDBMS engines with shared-everything architecture of traditional model as well as taking processing to data in the shared-nothing big data architectures. It purely depends on the type of “data processing” problem in hand and neither traditional RDBMS technologies will be replaced by new big data architectures, nor could the new big-data problems be solved by traditional RDBMS technologies. They go hand-in-hand complementing each other while adding value to the business when properly implemented.
The non-Von-Neumann architectures still need better attention by the technologists which will probably hold the key to the way human brain processes and deals with the information seamlessly either it is structured or non-structured streams of voice, video etc., with ease. 
Any non-Von-Neumann architecture enthusiasts over here?

Software Architecture simplified

July 12, 2012

I have been asked to present some of my experiences along with some key points of “Software Architecture” to the faculty in an Engineering College.

Hope this gives a good overview!