Bigtable in Cloud computing

Bigtable:

The expansion of Internet facilities and digital communications has led to a widespread requirement for storage systems with efficient accessibility of stored data items. Conventional database systems make use of relational databases which are comprised of several tables made up of rows and columns. These rows and columns are named. These systems generally follow ACID properties – that is atomicity, consistency, isolation and durability.

However, generally with large-scale datasets, these properties are difficult to maintain along with maintaining good availability and tolerance on network partitioning. Hence for huge and extremely distributed environments, databases posing ACID properties become difficult to manage and here arises the need for alternate databases which can deal with high availability and performance.

Bigtable is one such alternate database which has been specially designed for huge storage of data. It is a storage system for the distributed environment made in the form of a large table. The size of this large table can range up to petabytes of data. This data can be present across several thousand machines. Bigtable is capable of handling millions of queries made by millions of users in an instant and millions and trillions of images and information pieces.

The credit for developing Bigtable goes to Google. Google formulated the Big Table in 2005 and this was made used in several of its services. According to Google’s White Paper, the definition of Bigtable is as follows:

“Bigtable” is a distributed storage system for managing structured data that is designed to scale to a very large size – petabytes of data across thousands of commodity servers. It is a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a row key, column key and timestamp, each value in the map is an uninterrupted array of bytes.

Structure of Bigtable:

Bigtable can be visualized as a map which is accessible through indexing. This is done by three values.

  • Row key
  • Column key
  • Timestamp

Every value is in itself an array of bytes. Bigtable is a huge structure comprising rows and columns. The access to these is done by a (key, value) pair. The key corresponds to the rows and the value refers to the set of columns. The structure of Bigtable poses some striking features:

1. Persistent: This property establishes that all data keeps getting stored on disk persistently.

2. Sparse: This property tells that the table structure could use a varied number of columns for various rows, in which some may remain empty as well.

3. Distributed: The meaning of being distributed in terms of Big Table structure is that data stored in the table can be done across various machines. These can be tens of thousands in number.

4. Sorted: Bigtable is structured as a map, which is an associate array. Associate arrays are generally not sorted, since the values can be accessed by hashing, etc. However, Bigtable shows a difference here and keeps its records in sorted form through its key values. The keys can be sorted in a number such that data records, which are related to each other are stored together.

5. Multi-Dimensional: Indexing of the table in Bigtable structure is done by its rows. Every row on the other hand is comprised of column families which can be one or more than one in number. A column family may contain one or more than one named column. These are normally defined during the creation of the table. A column family usually stores data of the same type. All the columns are generally compressed within a column family in a big table. The combination of rows, column families and columns in a big table gives a three-level naming hierarchy for accessing data.

6. Time-Based: Time is an important parameter for Bigtable structure. There can be kept various versions of data in a column family. For an application to access a certain data from the column family, it needs to specify its timestamp. If the application fails to do so, it’ll get access to the latest version of the requested data.