Types of Data in Data Science
Types of Data:
Thus Data and Big Data include huge volume, high velocity and an extensible variety of data. The data in it will be of three types:
1. Unstructured Data
2. Semi-Structured Data
3. Meta Data
4. Structured Data
Any data with an unknown form or structure is classified as unstructured data. In addition to the size being huge, unstructured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos, etc. Now a day organizations have wealth of data available to them but unfortunately, they don’t know how to derive value from it since this data is in its raw form or unstructured format.
Any data that can be stored, accessed and processed in the form of a fixed format is termed ‘structured’ data. Over the period, talent in computer science has achieved greater success in developing techniques for working with such kinds of data and also deriving value out of it. However, nowadays, we are foreseeing issues when the size of such data grows to a huge extent, typical sizes are in the range of multiple zettabytes.
Semi-structured data can contain both forms of data. We can see semi-structured data as structured in form but it is not defined with a table definition in relational DBMS.
Metadata is defined as the data providing information about one or more aspects of the data. It is used to summarize basic information about data which can make tracking and working with specific data easier.
Types of Metadata:
There are three main types of metadata.
1. Descriptive metadata: It describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract author, author and keywords.
2. Structural metadata: It indicates how compound objects are put together.
3. Administrative metadata: It provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it.
Metadata repository is an integral part of a data warehouse system. It contains the following metadata –
i. Business metadata:
ii. Operational metadata:
iii. Data for mapping from the operational environment to the data warehouse:
iv. Algorithms for summarization:
Any data that can be stored, accessed and processed in the form of fixed format is termed ‘structured’ data. Over the period, talent in computer science has achieved greater success in developing techniques for working with such kinds of data. It also derives value from it. However, now nowadays are foreseeing issues when the size of such data grows to a huge extent, typical sizes are in the zettabyte, range of multiple.