What is Big Data? Let’s understand!

Vineet Negi
2 min readSep 17, 2020

--

And Why it’s important ? What challenges it’s solving ?

Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there. By 2020, it’s estimated that 1.7MB of data will be created every second for every person on earth.

What is Big Data?

The name itself telling a lot of things Big Data -> Huge amount of data or we can say massive volume of both structured and unstructured data that is so large that it is difficult to process using traditional techniques.

5 Vs of big data

  • Volume: The size of the data in simple terms! With incease in the use of internet , social media , IoT technology, the data generated by all these sources is skyrocketing.
  • Velocity: As Volume increasing, ability to process it also needs to be increasing and it’s increasing.
  • Variety: The data is in three types structured , semi- structured and unstructured . Now a days most of the data or we can say almost all the data comes in unstructured format like social media post, photo , audio ,video etc.
  • Veracity: It deals with exploring a data set for data quality and systematically cleansing that data to be useful for analysis.
  • Visualization: Once data has been analyzed, it needs to be presented in a visualization for end users to understand and act upon.

Big Data Problem Solution

The Problem of Velocity and Volume solved by the concept name called distributed storage. In this we use the concept of master-slave topology.Ex -Lets say we need to store 40gb of data inside the server but it contain 10gb of size. Now in this case data is called big data. Now to resolve this issue we divide the data into 4 pieces of 10gb each. share send it to 4 diff server contain storage 10gb or more. Now we can send the data with the speed of 10gb instead of 40gb why? because it is sending 10gb data parallel to 4 diff server at same time. Now this also resolve the issue of velocity to be more precise velocity is decrease by 1/4 of its normal speed it means I/O( i/o means reading and writing over the hard disk) become more efficient .

Now to implement this distributed storage concept over the big data we used one of the open source as well free software/package called Hadoop.

--

--

Vineet Negi
Vineet Negi

Written by Vineet Negi

★ Aspiring DevOps Engineer ★ Technical Volunteer @LinuxWorld ★ Technical Content Writer @Medium ★ ARTH Learner

No responses yet