Archive for the ‘Social’ Category

Data streams, lakes, oceans and “Finding Nemo” in them….

April 4, 2014

This weekend, I complete 3 years of TCS second innings. Most of the three years I have been working with large insurance providers trying to figure out the ways to add value with the technology to their operations and strategy.

The concurrent period has been a period of re-imagination. Companies and individuals (consumers / employees) slowly moving towards reimagining themselves in the wake of converging digital forces like cloud computing, analytics & big data, social networking and mobile computing.

Focus of my career in Information Technology has always been “Information” and not technology. I am a firm believer in “Information” led transformation rather than “technology” led transformation. The basis for information is data and the ability to process and interpret the data, making it applicable and relevant for the operational or strategic issues being addressed by the corporate business leaders.

Technologists are busy making claims that their own technology is best suited for the current data processing needs. Storage vendors are finding business in providing the storage in cloud. Mobility providers are betting big on wearable devices making computing more and more pervasive. The big industrial manufacturers are busy fusing sensors everywhere and connecting them on the internet following the trend set by the human social networking sites. A new breed of scientists calling themselves data scientists are inventing algorithms to quickly derive insights from the data that is being collected. Each one of them is pushing themselves to the front taking support of the others to market themselves.

In the rush, there is a distinctive trend in the business houses. The CTO projecting technology as a business growth driver and taking a dominant role is common. The data flows should be plumbed across the IT landscape across various technologies causing a lot of hurried and fast changing plumbing issues.

In my view the data flow should be natural just like streams of water. Information should be flowing naturally in the landscape and technology should be used to make the flow gentle avoiding the floods and tsunamis. Creating data pools in the cloud storage and connecting the pools to form a knowledge ecosystem grow the right insights relevant to the business context remains the big challenge today.

The information architecture in the big data and analytics arena is just like dealing with big rivers and having right reservoirs and connecting them to get best benefit in the landscape. And a CIO is still needed and responsible for this in the corporate.

If data becomes an ocean and insights become an effort like “Finding Nemo” the overall objective may be lost. Cautiously avoiding the data ocean let us keep the (big) data in its pools and lakes as usable information while reimagining data in the current world of re-imagination. This applies to both corporate business houses as well as individuals.

Hoping Innovative reimagination in the digital world helps improve the life in the ecosystems of the real world….

Social Analytics for Online Communities

January 10, 2014

This is the first post of 2014. Happy new year to one and all………..

A recent discussion on knome (TCS’ internal social platform) related to managing online communities, controlling spam, making the best out of an enterprise social platform of the scale of ~200K members made me study the application of Social Analytics to achieve these objectives.

As I research on the internet, came across this paper –… titled “Scalable Social Analytics for Online Communities” by Marcel Karnstedt, Digital Enterprise Research Institute (DERI), National University of Ireland, Galway Email:

This post is to summarize the contents of the paper and some of my thoughts around it.

Success of a social platform depends on strength of analytics understanding and driving the dynamics of the network built by the platform.

To achieve these goals we need to have a set of tools that can perform multidimensional analysis of the structure, behavioural, content/semantic and cross community analysis.

Structural Analysis: Analyse all the communities, memberships, sub-communities based on strong relations between the members, influencers/leaders and followers.

Behavioural Analysis: Analyse the interactions to identify the helpful experts (or sub-groups) who provide information and newbies who are seeking information that are benefited by the interactions. Both a micro-level or individual level and a macro-level analysis is needed.

Content / Semantic analysis: Use text mining to detect, track and quantitatively measure current interest and shift in interest in topic and sentiment within the community.

Cross community dynamics: Understand how the community structure and sub structures are influencing each other to detect redundancies and complementary to merge and link them together.

There is a need to sufficiently combine all the analysis from all four dimensions in a scalable real-time model to achieve best understanding, control and utility of socially generated data. (rather knowledge!)

New solutions for new problems! Have a nice weekend………..

Models of Innovation diffusion in social networks

July 12, 2013


Having seen the trust modeling, centrality in social network, this post is the third and last of the series on social network analysis.

Innovation diffusion, influence propagation or ‘viral marketing’ is one of the most researched subject of contemporary era.

Some theory:

Compartmental models studying the spread of epidemics, which have susceptible (S), infected (I) and recovered (R) ‘SIR states’ are used to study the influence propagation in the electronic social networks as well. Initially these are descriptive models to describe a specific behavior of nodes when exposed to new innovation or information each node has an initial probability to adapt to that innovation. As each node adapts to the new innovation it has a specific amount of influence on the nodes connected to it.

Primarily two basic models are used to study the spread in a social network. An initial set of ‘active’ nodes at time t0 exert influence on the connected nodes and at t1 some of the connected nodes will become ‘active’ with a probability p(i). Each individual node has a threshold θi and when the influence from the neighbors is more than this threshold it becomes active. This model is called ‘Linear Threshold’ model. At each step, the set of nodes till the step – 1 remain active and influence their neighbors with a weightage. In independent cascade model each node is given only one chance to influence its neighbor.

Based on the above two diffusion models, the maximization problem is to determine the best set of initial ‘active’ nodes in a network to arrive at a best propagation by maximizing influence for ‘viral marketing’ campaign.

It is a NP-hard problem and this paper – discusses some interesting approximation algorithm with a general cascade, threshold and triggering models.

Have a good weekend reading!

On Centrality and Power in social networks

June 21, 2013
After the last weeks post on ‘Trust’ – – let us quickly review another important measure of (social) network structure.

Centrality is a structural measure of a network that gives an indication of relative importance of a node in the graph / network.
Simplest way of measuring centrality is by counting the number of connections a node has. This is called ‘degree centrality’.

Another way of measuring centrality is to see how far a node from all other nodes of the graph is is. This measure is called as ‘closeness centrality’ as it measures the path length between pairs of nodes.

‘Betweenness Centrality’ is the measure of number of times the node acting as a bridge on the shortest path of any other two nodes. That gives how important each n ode in connecting the whole network.

To complicate the centrality further, we have a measure called ‘eigenvector centrality’. Eigenvector considers the influence for the node in the network. This methods considers the power of the nodes the current node is connected. To explain it simply, if I am connected to 500 other people on LinkedIn is different from Barak Obama connecting to 500 of his friends on the LinkedIn. His 500 connections are more influential (probably) than my 500 connections. Google’s page rank is a variant of Eigenvector Centrality.

When an external factor is considered for each node and implement eigenvector centrality to consider an external α it is called ‘alpha centrality’

When we move the alpha centrality measure from one node to cover multiple radii to include first degree, second degree and so on.. With a factors of β(i) and measure the centrality as a function of influence of varying degrees, it is called beta centrality.

The key problem with centrality computation is the amount of computing power needed to arrive at the beta centrality measure of the social network with millions of nodes. I recently came across this paper – which proposes an alternative approximation algorithm which is computationally efficient to estimate fairly accurate centrality measure. This alter-based non recursive method works well on non-bipartite networks and suits well for social networks.

Title of this blog states “power” and whole content did not mention anything about it. Generally centrality is considered as the indicator of power or influence. But in some situations power is not directly proportional to centrality. Think about it.

Trust modeling in social media

June 14, 2013


After last week’s “tie strength” post, this week let me give some fundamentals on importance of modeling TRUST in social media.

What is Trust?
It is difficult to define. But when I ask “Will you loan a moderate amount to the other person?” or “Will you seek a reference or recommendation regarding a key decision?” help understand the term TRUST.

There are two components to TRUST. Some people are more trusting than others. Some quickly establish trust where as others take a long time in establishing the trust. This component is not easy to be modeled. The second component is the credibility of the trusted person.

Measuring Trust:
In social media, the second component can be measured by analyzing the sentiment based on the blogs referenced by others. This is called “network based trust inference”.

This paper describes a model for measuring trust using link polarity.

Have a good weekend reading!