First, before beginning this Hadoop Tutorial, let's explain some terms.
What is Big Data?
Big Data is the reality of to do business for most organizations. Big data is a collection of large data sets which can not be processed using routine data processing techniques. Big Data is no longer a given, it became a complete subject which involves various tools, techniques and frameworks. Big data involves data produced by applications and devices. Some areas that are under the Big Data roof
Some examples of Big Data
Social Media data such as Facebook and Twitter capture information and views displayed by millions of people worldwide.
Big Data types
The data will be of three types:
Technologies used in Big Data
There are various technologies in the market from different vendors, including Amazon, IBM, Microsoft, etc. to manage large volumes of data. In this article we will examine the two following classes of technologies:
Operational Big Data
It includes a system like MongoDB operational capabilities that provide real-time, interactive workloads where data is mainly captured and stores. NoSQL big data systems are designed to take advantage of new architectures of cloud, it makes operational workloads of large data much more manageable, cheaper and faster to implement.
Big Data Analytics
It includes systems such as Massively Parallel Processing (MPP) systems and database MapReduce analytic capabilities that provide complex analysis to show which can affect most or all of the data. MapReduce provide a new method of data analysis that is corresponding to the capabilities of SQL and MapReduce-based system that can be scaled from single servers to thousands of high-end devices and low.
Difficulties encountered by Big Data
The main challenges related to large volumes of data are:
Hadoop Big Data Solution
Limitation of the traditional approach
We use this approach where we have less volume of data that can be accommodated by the database servers or standard data to the processor limit that is currently processing the data. But when it comes to trade with huge amounts of data traditional approach is really a tedious task to process the data via a traditional database server.
The Google solution
Where the Apache Hadoop fits in?
Let's first begin in this Hadoop tutorial what the Apache Hadoop actually is. Hadoop is basically an open framework of software that can store data and process data through hardware clusters. Hadoop is designed to grow from a single server to thousands of machines offering to each local storage and computer. Hadoop gives a massive storage for any data type with enormous processing power and the ability to handle tasks or virtually unlimited parallel jobs.
Hadoop Big Data Solution and history
Doug Cutting, Mike Cafarella and his team took the solution provided by Google and started an open source project called Hadoop. Hadoop in 2005 is a trademark of the Apache Software Foundation. Apache Hadoop is an open source framework written in Java that allows processing of large data sets on distributed computer clusters using simple programming models. Hadoop runs applications using the MapReduce algorithm, where the data are processed in parallel on various processor nodes. Hadoop framework is capable enough to develop applications that run on computer clusters and they could do a full statistical analysis to huge amounts of data.
Hadoop architecture framework
Hadoop Framework consists of four modules:
Discussing in detail the four hadoop modules
There are 3 steps in Hadoop can discuss in detail in this Hadoop tutorial:
1st step: the user submits a job / Application to Hadoop for the necessary process by specifying the following:
2nd Step: Hadoop (A Hadoop job client) then submits the job (jar / executable etc.) and configuring the Job Tracker which then assumes responsibility for distributing the software / configuration to the slaves, scheduling tasks and monitoring of granting status as diagnosis information to the job client.
Benefits of Hadoop
Top reasons to choose Hadoop is its ability to store and process huge amount of data quickly. Other benefits of Hadoop are:
Author: Written by Mubeen Khalid for CodeGravity.com ®