When I wrote about Big Data I mentioned that Big data is a problem and Hadoop is a solution for it.
Let me start this post with Big Data Problems:-
- Storing the colossal amount of data.
- Storing heterogeneous data.
- Accessing and processing speed of data.
Here comes Hadoop…!!
- Hadoop is a framework to address “Big Data”.
- It is based on master-slave architecture.
- Slaves are called Data Nodes. Data nodes are scalable.
- Hadoop allows you to store “Big Data” in distributed environment (Data Nodes).
- Storing in distributed environment helps in increasing processing speed.
Components of Hadoop
HDFS (Hadoop Distributed File System)
- It is storage unit of Hadoop framework.
- It have many Data Nodes and single Master Node.
- It can store any amount of data i.e Big Data in a distributed way (Multiple Data Node and Single Master Node).
- Data is stored in Data Nodes on blocks and you can specify block size. Like you can configure block size to 128Mb for 512 Mb data. Data is stored in 4 blocks then.
- Data blocks are stored on different “DATA NODES”.
- Replication factor is 3. Each block is stored on 3 Data Nodes.
- Data Nodes can be added when needed.
- Heterogeneous data (structured or unstructured or semi structured) can be stored on HDFS.
- There is no pre-dumping schema validation.
- It is the processing unit of Hadoop.
- It helps to process data faster because “we move processing to data and not data to processing“.
- In YARN, the processing logic is sent to the various slave nodes and then data is processed parallely across different slave nodes.
- That processed results are sent to the master node where the results is merged and the response is sent back to the client.
- This addresses the third problem of Big Data that Accessing and processing speed of data with traditional database is slow.
what is Fog Computing?
Bitcoin Tutorial 1 : What is a Bitcoin?
Bitcoin Tutorial 2 : How Bitcoin Transactions are stored?
What is Petya ? Is it Really a Ramsomware or it is a Wiper?
What force Google to change its logo?
What is Big Data? What are the characteristics and problems of Big Data?
Agile and SCRUM
What is rainbow technology?
What is Amazon Alexa?
How to become an Amazon Alexa Developer?