Archive for the ‘nosql’ Category

F1 Database from Google: A scalable distributed SQL database

August 30, 2013

 

This world is a sphere. We keep going round and round. After a great hype around the NoSQL highly distributed databases, now Google presented a paper on how they have implemented a SQL based highly scalable database for supporting their “AdWords” business in 39th VLDB conference.


The news item: http://www.theregister.co.uk/2013/08/30/google_f1_deepdive/
and the paper: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/41344.pdf


The key changes I liked:
1. Heirarchically clustered physical schema model: I always thought heighrarchical model is more suited for real life application than a pure relational model. This implementations is proving it.

2. Protocol Buffers: Columns allowing structured types. It saves a lot of ORM style conversions when moving data from a storage to in-memory and vice versa.


A quote from the paper’s conclusion:

In recent years, conventional wisdom in the engineering community has been that if you need a highly scalable, high-throughput data store, the only viable option is to use a NoSQL key/value store, and to work around the lack of ACID transactional guarantees and the lack of conveniences like secondary indexes, SQL, and so on. When we sought a replacement for Google’s MySQL data store for the Ad-Words product, that option was simply not feasible: the complexity of dealing with a non-ACID data store in every part of our business logic would be too great, and there was simply no way our business could function without SQL queries.

So, ACID is needed and SQL is essential for running the businesses! Have a nice weekend!!

Storing Rows and Columns

December 4, 2011

A fundamental requirement of a database is to store and retrieve the data. In Relational Database Management Systems (RDBMS) the data is organized into a table that contain the rows and columns. Traditionally the data is stored into blocks of rows. For example a “sales transaction row” may have 30 data items representing 30 columns. Assuming a record occupies 256 bytes, a block of 8KB can hold 32 such records. Again assuming a million such transactions that need to be stored in 32150 blocks per day. All this works well as long as we need the data as ROWS! We want to access one row or a group of rows at a time to process that data, this organization has no issues.

Let us consider if we want to get a summary of total value of type x items that are sold in past seven days. This query need to retrieve 7million records that contain 30 columns each to just process the count of items of types x. All that we need is two columns item type and amount to process this. This type of analytical requirement lead us to store the data in columns. We group the columns together and store them in blocks. It improves the speed of retrieving the columns from the overall table quickly for the purpose of analyzing the data.

But the column storage has its limitations when it comes to the write and update

With a high volume of social data, where there is high volume of write is needed (like messages and status updates, likes and comments etc.,) , highly distributed, NOSQL based column stores are emerging into mainstream. Apache Cassandra is the new breed of NOSQL column store that was initially developed by Facebook.

So, we have a variety of data base / data stores available now, a standard RDBMS engine with SQL support for OLTP applications, A column based engies for OLAP processing and noSQL based key value pair stores for in-memory processing, highly clustered Hadoop style big data with map/reduce framework for big data processing and noSQL based column stores for high volume social write and read efficiencies. 


Making right choice of data store for the problem in had is becoming tough with many solution options. But that is the job of an architect; Is it not?

ACID and BASE of data

October 9, 2011

I am completing my 18 years of working in the field of Information Technology.

All these days an enterprise  data store generally provides the four qualities Atomicity, Consistency, Isolation and Durability (ACID) to the transactions. Oracle has emerged as a leader in providing enterprise class ACID transactional capabilities to the applications.

Recently in the Open World 2011, Oracle announced a noSQL database which typically characterized by the BASE acronym. Basically Available, Soft state, Eventually consistent (BASE)

I see a lot of debate on SQL vs NoSQL, ACID vs BASE and Shared Everything vs Shared Nothing architectures of data stores of late; and with Oracle getting on to the NoSQL bandwagon, this debate is just took up additional momentum.

Oracle has posted this paper nicely explaining their NoSQL database. http://www.oracle.com/technetwork/database/nosqldb/learnmore/nosql-database-498041.pdf

In my opinion, SQL and NOSQL choice is straight forward to make:-

big query: Are we storing data or BIG-DATA (read my old post on transactional data vs machine generated big data – http://technofunctionalconsulting.blogspot.com/2011/02/analytics.html)

With the new trends in ‘BIG DATA’ all the data almost become key, value pair with read and insert only operations with minimal or no updates to the data records. NoSQL/BASE is best suited to handle this type of data. Still the traditional transactional databases of OLTP nature, needs ACID complaint transactions.

So, when designing the big data solutions, an architect should surely look at the NoSQL dataBASE. Is it not?

Publishing this post on 09/10/11 (dd/mm/yy) and this is my 85th post to this blog.