Archive for the ‘Technology’ Category

Accelerating Analytics using “Blink” aka “BLU acceleration”

April 5, 2013
This Friday marks completion of my 2 years in the second innings with TCS ‘s Technology Excellence Group and it is time for a technical blog post.
During this week, I have seen IBM announcing new “BLU acceleration” enabled DB2 10.5 that claims a 10 to 20 times performance improvement out of box.  (Ref: http://ibmdatamag.com/2013/04/super-analytics-super-easy/ )
This post aims at giving a brief summary of the Blink Project which has brought in this acceleration to the analytic queries.
The Blink technology has primarily two components that achieve the said acceleration to the analytic processing:
1.       The compression at the load time
2.       The query processing
Compression & Storage:
At load time each column is compressed using a “Frequency Partitioning” order preserving fixed length dictionary encoding method. Each partition of the column has a dictionary of its own making it to use shorter column codes. As it preserves order the comparison operators/predicates can be applied directly to the encoded values without needing to uncompress them.
Rows of are packed using the bit aligned columns to a byte aligned banks of 8, 16, 32 or 64bits for efficient ALU operations. This bank-major storage is combined to form blocks that are then loaded into the memory (or storage.) This bank-major storage exploits SIMD (Single Instruction, Multiple Data) capability of modern POWER processor chips of IBM.
Query Processing:
In Blink there are no indexes, no materialized views nor a run-time query optimizer. So, it is simple. But the query must be compiled to take care of different encoded column lengths of each horizontal partition of the data.
Each SQL is split into a series of single-table queries (STQs) which does scans with filtering. All the joins are hash joins. These scans happen in an outside-in fashion on a typical snowflake schema creating intermediate hybrid STQs.
Blink executes these STQs in multiple blocks to threads each running on a processor core. As most modern ALUs can operate on 128bit registers all the operations are bit operations exploiting SIMD which makes the processing fast.
For more technical details of Blink project refer to – http://sites.computer.org/debull/A12mar/blink.pdf
Hope this will bring “Analytics” a boost and some competition to Oracle’s Exa- appliances. Views, Comments?

Data Replication

April 12, 2012

The need for data replication is evergreen. The use-case may vary from “High Availability” to “Operational Reporting (BI)” to “Real-time Analytics”; whatever may be the case, the need for replicating the data exists in the information systems and solutions.

I try to summarize the evolution of data replication technology from a Oracle Database standpoint in this post.

  1. Looking back at the Oracle based data replication technologies the first one is “Database Link“. One can create a database link and pull or push the data directly into a remote database starting Oracle 5 or 6. This is the very first method of replication where the application need to push or pull the data from a remote database and apply necessary logic to identify what data has changed and what to do with those changes…..
  2. The next improvement in the replication is around Oracle 8 – one can setup a trigger based replication. That means whenever a transaction changes the data in a table one can trigger a function that can handle the replication without changing the application. So, database started giving a method to replicate data using triggers….
  3. Then the next improvement has come around 9.2 with streams and log-based replication. The capability is to mine the redo logs and move the committed transactions to the target systems. (DBMS_CDC_PUBLISH and DBMS_CDC_SUBSCRIBE packages were introduced)
  4. Oracle Advanced Queuing enhanced the streams to have robust publish and subscribe model replication that has enqueue and dequeue based communication. I was involved in a very large project that set up a custom Changed Data Capture to migrate data from a legacy system into SAP ERP involving large tables having 100Milion records…. 
  5. Later the log-mining technology was used for physical and logical standby dataabses and evolved to a Active Data Guard technology……
  6. With GoldenGate, the heterogeneous log-based data replication solution is complete from Oracle that has capabilities to extract, replicate, synchronize data in bi-directional movement.  

Depending on the need one should choose the right technology to achieve the needed data replication….

Maximum Security Architecture

May 1, 2010

In one of the past posts, I have just listed different technologies that are available in the Data Security area. (in Feb 2008)

With Oracle 11g database, the security focus has taken more methodical and architectural approach.

To put things together data security is placed under the following broad (four) categories:
1. User Management
2. Access Control
3. Encryption and Masking
4. Auditing/Monitoring

Just like Maximum Availability Architecture for Highly available architectural patterns, we can call this as Maximum Security Architecture for highly secure architecture….

One should choose the required options and implement it properly to really make the data SECURE!

This Link gives more details of MSA on Oracle 11g database.

Platform As A Service – Private Cloud

February 17, 2010

Sometime back, Oracle has published this white paper.

This gives a clear vision of where Oracle is heading in the technological direction of Cloud Computing.

By standardization on the technology stack for enterprise application and fusion middleware enhancements, the PaaS seems the natural direction for enterprises in medium term.

Link to my past blog post on Private Clouds Here.

SOA Governance standalone? or merge with IT Management?

February 9, 2010

“Governance” in general and SOA Governance in particular has been a buzz word for sometime now.

My definition for Governance is making sure that a “thing” works as it is supposed to work. Replace the “thing” with SOA, IT, state, country etc.,

One of the emerging trends for past few years is “Business Transaction Management” for managing the composite IT applications configuration, performance and “Governance”

Traditionally the IT management tools and frameworks have looked at the IT as operations management, quality management, performance management, configuration management etc., disciplines.

In this trend the problem of IT management is from a business point of view. That is good. So, what is happenning in the technology?

1. Application Performance Management
2. Composite Application Management using run time discovery of relationships between the components (services) of the application
3. Business transaction management tools

Here is the question:

Will the new trend merge in traditional IT Management tools?
or
A standalone set of new SOA Governance tools emerge due to the new trend?

Based on my “UNIX” philosophy, I think an integrated set of tools that consist of traditional IT management with the new SOA Governance would be a best fit to tailor to the needs of contemporary corporates!

Let us wait and see….

Hybrid Columnar Compression (HCC)

February 3, 2010

Oracle has introduced a new data storage technology in Exadata V2 called Hybrid Columnar Compression. This method uses a “compression unit” that is made of several rows; it is physically stored in compressed “column vectors” in multiple blocks of storage.

There are two flavors of columnar compression that are optimized for “QUERY” (warehouse compression better for executing queries) and “ARCHIVE” (best data storage for archive compression)

Each of the QUERY and ARCHIVE come with another modifier to set the “LOW” or “HIGH” compression mode.

This technical white paper gives an overview of the new technology.

This article on Oracle Magazine may also be useful in understanding this new feature.

More data can be stored in “less” space. Good!

sizing a data center management solution

December 9, 2009

Introduction:

A typical data center management solution will have a “data store” for storing
a. configuration data
b. monitoring data

On regular intervals it will collect the data and upload it to the central data store.

A host, database, listener, application server instance etc., are the managed entities that will need monitoring and management.

Before implementing the solution, every organization will try to “size” the infrastructure requirements for the solution.

The sizing involves estimating the resource consumption for the solution.
1. disk storage
2. memory
3. cpu cycles required
4. network bandwidth requirements

The Problem:
The complexity involved in sizing a data center management solution is mainly due to lack of clarity of definition.
for e.g., customer want to monitor 10 databases on 10 servers. (one DB on each server)
Depending on the database version and weather it uses ASM for storage or it is on a cold failover cluster and several other considerations the number of managed entities will vary. A database can have only one tablespace and a single datafile or a database can have 10K tablespaces with 100K datafiles. If customer want to monitor the database space usage by tablespace a database with 10K tablespaces will produce 10K times more data when compared to a database with 1 tablespace.

Each metric (the monitored datapoint) can also be collected once in every 5 minutes or once in an hour. The collection frequency may vary based on customer requirements.

A simple formula is number of metrics ( number is at lowest granule) * number of collections per day gives the total number of metric values per managed entity. Multiplying this number with bytes required to store an average metric value gives the bytes required to store the metric values per managed entity.

If customer wants to keep this data at this granularity for a week, then the storage requirement is number of days of retention at raw granularity * bytes required to store one day data. This multiplied by the number of managed entities gives the total storage requirement for this type of managed entitiy (e.g., database)

The same exercise need to be repeated for all types of managed entities to get the storage requirement.

Collecting that many bytes over a day means transferring that data over the network from the host where the managed entity resides to the host where the management data store resides. That gives the network bandwidth requirement.

This data need to be rolled up to a hourly average and daily average for keeping it for historical trending. One need to calculate the space requirements and processing requirements in this rolling up process.

The old data need to be purged away from the data store. It needs processing cycles.

All the managed entities should also have a set of configuration data. That need to be collected, compared on a regular basis and kept up-to-date. One need to compute the resources required to collect, compare and store the original and changed values of different configurations as they evolve.

Managing all the “heartbeats” between the central management solution and the monitored hosts requires a proactive two way mechanism. This needs processing capacity.

Monitoring the websites with synthetic transactions by periodically playing a pre-recorded transaction to make sure the application works require additional set of processing, storage, memory etc., This need to be added to the above estimate.

A provisioning solution that is used to deploy new instances of databases, application servers will need the GOLD IMAGES to be stored somewhere. Patching the enterprise applications needs considerable amount of patch download space and prorogation on regular intervals. This also need to be considered.

The other thing to consider is about the periodic tasks (jobs) that get executed to perform routine operations. Each job may produce few KB of output that need to be collect and stored in the central store for a period of time.

The next thing to consider is the number of users that use the application, number of concurrent users and the amount of data they retrieve from the data store for performing their operations.

Collecting all this information, adding the block header, segment header etc., overheads in the phisical structure of the target solution is a complex task. By the time this exercise is almost complete, the tool vendor would have released the next version of the data center management tool with some modifications!!

The solution:
It is nearly impossible to “size” any application to the last byte accuracy for storage, memory, Network and cpu utilization.

No managed entity should generate more than 25MB of data in the monitoring store. No managed entity should have more then 100kb of configuration data.
So, taking a 250GB storage for 1000 monitored entities and configuring the solution in such a way that it will not exceed this requirement is a wise man’s solution.

Considering a 1000 monitored entities will have a maximum 100 administrators and at most 30 of them concurrently loggedin, An average db oltp application with 250GB database and 24 * 4 (considering 4 txn’s an hour i.e., 15 min collection frequency) * 1000 (no of entities) would require a 2 processor DB server and a 2 processor Application server with 4GB RAM each.

Starting with this configuration, implementing the solution using an iterative model is the best approach to balance between the up-front sizing vs not impacting the service level of this critical service.

Conclusion:

In the current world with virtualization and cloud technology it is easier to build scalable Grid like applications and scaling the solutions horizontally is the trend that is going to stay and further pick-up more momentum. The data center management solutions are not exempt from this trend. For every next 1000 monitored entities we will have to add a node to the DB grid and another node to the AppServer Grid along with additional storage to the storage grid.

IS IT NOT??

Improved Application Performance Management

December 2, 2009

In April 2008 I have written this post : Application Performance Management

With
a. Oracle Enterprise Manager 10.2.0.5 Service Level Management,
b. REUI 6.0,
c. SOA mamagement pack and
d. Diagnostic Pack for middleware the solution is available.

A corporate can now implement a completely integrated APM solution based on the above four components of Oracle Enterprise Manager.

The new capabilities of REUI 6.0 include:
a. Full capture and replay of end user sessions.
b. Ability to link the slow performing JSP to the overall configuration in SOA management pack and then jump directly into the diagnostic tool (AD4J)
c. Customizable dashboards from the collected data.

What is REUI? “Real End User Experience Insight”! and What is 6.0? The new version released on 1st December 2009.

The ACTIVE MONITORING functionality is already provided by Enterprise Manager Service Level Management pack.

We hope these new tools will truly improve the Real End User Experience of the web applications.

Exadata V2 – World’s First OLTP database machine

October 20, 2009

Last year the Exadata Database machine was announced for making the data warehouse performance improvement by moving the database query processing to the storage layer.

This year, the new Exadata V2 is proven to be the fastest database machine for OLTP applications.

The trick lies in the Flash solid state storage and the 40GBPS infiniband internal network fabric used between the DB servers and the Storage servers. The full rack contains 8 database servers and 14 storage servers. There are other configurations available.

All the flash storage (600 GB per storage server) is viewed as the extended memory of the servers and it has 72GB of DDR3 memory as the cache per database server….

Guess what, an open challenge to IBM can be found at http://www.oracle.com/features/exadatachallenge.html

So, why not try out for a $10 Million prize?

Oracle Open World 2009 a mega event

September 4, 2009

Three years back I was at Oracle Open World 2006 staffing one demo pod on “Deployment Best Practices for Enterprise Manager”

Again this year, back to this event where I am presenting a session and three hands on lab sessions to attendees.

Link to the Sessions

S309533 – Applications Management Best Practices with Oracle Enterprise Manager
Oracle Enterprise Manager Hands-on Lab: Oracle Application Testing Suite
Oracle Enterprise Manager Hands-on Lab: Oracle Data Masking Pack
Oracle Enterprise Manager: Oracle Real Application Testing

Event dates: October 11-15 2009
Catch me there!!