Data Science Terminology
Terminology of Data Science:
1. Big Data:
As per the Oxford English Dictionary, the definition of Big Data is “data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.” Big data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage and process the data with low latency.
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage and process data within a tolerate elapsed time.
2. Business Intelligence (BI):
Business Intelligence is the technology which uses transformed and loaded historical data to get or create reports. It is a set of methodologies, processes, and theories that transform raw data into useful information to help companies make better decisions.
Business Intelligence is a process for analyzing the data and presenting actionable information to help executives, managers and other corporate end users make informed business decisions and thus help in decision-making. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.
BI can be used by enterprises to support a wide range of business decisions ranging from operational to strategic. Basic operating decisions include product positioning or pricing. Strategic business decisions involve priorities, goals and directions at the broadest level. Often BI applications use data gathered from a data warehouse (DW) or a data mart.
3. Data Analytics:
Data analytics are general terms used to describe the field and a comprehensive collection of associated methods. All these terms tend to be used for the application of analytics methods to data that large organizations generate.
Data analysts collect, process and perform statistical analyses of data. Their skills may not be as advanced as data scientists, but their goals are the same – to discover how data can be used to answer questions and solve problems.
4. Data Wrangling:
The process of conversion of data, often through the use of scripting languages to make it easier to work with is known as Data Wrangling or Data Munging. If you have 900000 birth year values of the format yyyy-mm-dd and 100000 of the format mm/dd/yyyy and you write a Perl script to convert the latter to look like the former so that you can use them all together, you’re doing data wrangling.
A series of repeatable steps for carrying out a certain type of risk with data. As with data structures, people studying computer science learn about the different algorithm and their suitability for various tasks. Specific data structures often play a role in how certain algorithms get implemented.
6. Machine Learning:
Analytics in which computers “learn” from data to produce models or rules that apply to those data and other similar data. Predictive modelling techniques such as neural nets, classification and regression trees, native Bayes, k-nearest neighbour and support vector machines are generally included. One characteristic of these techniques is that the form of the resulting model is flexible and adapts to the data. Statistical modelling methods that have highly structured model forms, such as linear regression, logistic regression and discriminant analysis are generally not considered part of machine learning. Unsupervised learning methods such as association rules and clustering are also considered part of machine learning.
7. Web Analytics:
Statistical or machine learning methods applied to web data such as page views, hits, clicks, and conversions. It is generally to learn what web presentations are most effective in achieving the organizational goal. This goal might be to sell products and services on a site to serve and sell advertising space, purchase advertising on other sites, or collect contact information. Key challenges in web analytics are the volume and constant flow of data and the navigational complexity and sometimes lengthy gaps that precede users’ relevant web decisions.