Archive for the ‘in memory’ Category

Accelerating Analytics using “Blink” aka “BLU acceleration”

April 5, 2013
This Friday marks completion of my 2 years in the second innings with TCS ‘s Technology Excellence Group and it is time for a technical blog post.
During this week, I have seen IBM announcing new “BLU acceleration” enabled DB2 10.5 that claims a 10 to 20 times performance improvement out of box.  (Ref: http://ibmdatamag.com/2013/04/super-analytics-super-easy/ )
This post aims at giving a brief summary of the Blink Project which has brought in this acceleration to the analytic queries.
The Blink technology has primarily two components that achieve the said acceleration to the analytic processing:
1.       The compression at the load time
2.       The query processing
Compression & Storage:
At load time each column is compressed using a “Frequency Partitioning” order preserving fixed length dictionary encoding method. Each partition of the column has a dictionary of its own making it to use shorter column codes. As it preserves order the comparison operators/predicates can be applied directly to the encoded values without needing to uncompress them.
Rows of are packed using the bit aligned columns to a byte aligned banks of 8, 16, 32 or 64bits for efficient ALU operations. This bank-major storage is combined to form blocks that are then loaded into the memory (or storage.) This bank-major storage exploits SIMD (Single Instruction, Multiple Data) capability of modern POWER processor chips of IBM.
Query Processing:
In Blink there are no indexes, no materialized views nor a run-time query optimizer. So, it is simple. But the query must be compiled to take care of different encoded column lengths of each horizontal partition of the data.
Each SQL is split into a series of single-table queries (STQs) which does scans with filtering. All the joins are hash joins. These scans happen in an outside-in fashion on a typical snowflake schema creating intermediate hybrid STQs.
Blink executes these STQs in multiple blocks to threads each running on a processor core. As most modern ALUs can operate on 128bit registers all the operations are bit operations exploiting SIMD which makes the processing fast.
For more technical details of Blink project refer to – http://sites.computer.org/debull/A12mar/blink.pdf
Hope this will bring “Analytics” a boost and some competition to Oracle’s Exa- appliances. Views, Comments?

in memory computing

February 14, 2012

Approximately two years back I made a post on Enterprise Data Fabric technology. The aim of the data grid or “in memory data store” is to remove the movement of data in and out of slower disk storage for processing. Instead the data in kept in “pooled main memory” during the processing.

To get above the physical limitations on the amount of main memory, the data grid technologies will create a pooled memory cluster with data distributed over multiple nodes connected using a high bandwidth network.

With SAP bringing HANA, an in memory database that has option to store data in traditional row store and column store (read storing data in rows and columns) within an in-memory technology and Oracle bringing the Exaletics appliance, the in-memory computing is getting more attention.

So, the claims are that the in memory technology will boost the performance by multiple degrees. But the truth is it can only remove the time taken to move the data out of disk into main memory. If there is a query that is processing the data using a wrong access method, even if all the data is moved into a memory store the processing will still take as long to provide the answer!

In memory computing would need re designing the applications to use the technology for better information processing. OLTP workload will surely improve the performance due to memory caching but the consistency of the data need to be managed by the application moving to a event based architecture.

OLAP and Analytical workloads would also improve the performance by using memory based column stores with a good design of the underlying structure of data that suits the processing requirements.

Overall, in memory computing is promising at the moment but without the right design to use the new technology, the old systems will not just get the performance boost just by moving the data store into the main memory

Let us wait and see how the technology shapes further in future…..