What is Big Data? What are the characteristics and problems of Big Data?

Big Data is often related with Hadoop but you cannot compare Big Data and Hadoop as they are complimentary to each other.

Understand Big Data as a problem statement and Hadoop as a solution to it.

What is Big Data?

  • Big Data is a term  used for a “collection of data sets” that are large and complex, which is difficult to store and process using available database management tools or traditional data processing applications.
  • Big Data is not any technology. It is a term used to represent collection of data.


Characteristics of Big Data

  1. VOLUME: Volume refers to the ‘amount of data’, which is growing day by day at a very fast pace.
  2. VELOCITY: Velocity is defined as the pace at which different sources generate the data every day.
  3. VARIETY: It refers to the type of data that are generating. It can be structured, semi-structured or unstructured[Image, Video, Text etc].
  4. VERACITY: Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency and incompleteness. Like amount of iPhone is said to be 1$.

Problem with Big Data – Why we need Hadoop Framework?

  • The first problem is storing the colossal amount of data. Storing this huge data in a traditional system is not possible.
  • Second problem is storing heterogeneous data. As we know data is present in various formats as well like: Unstructured, Semi-structured and Structured. so it was impossible to store these varieties of data, generated from various sources in traditional system.
  • Now, let’s focus on third problem, which is accessing and processing speed. The hard disk capacity is increasing but disk transfer speed or the access speed is not increasing at similar rate. Let me explain you this with an example: If you have only one 100mbps I/O channel and you are processing say 1TB of data, it will take around 2.91 hours.


