BigData! how companies manage and manipulate Bigdata with Hadoop.
datacenters image

BigData! how companies manage and manipulate Bigdata with Hadoop.

BigData –>

Well, Big data is actually how big! May this question arises in our mind after listening to the word Big data. For small companies maybe it dealing with gigabytes, 50 terabytes data is big for them. But for companies like Facebook, Google, or yahoo petabytes is big data. Let see how Hadoop help these company to manage, store and manipulate the data.

Why and How big data is a problem?

In Today’s world, everyone just wants their every information to be stored online and permanent. This is the major problem occur to the companies which provide storage facility to its user. For example, on Facebook, everyone wants their photo and video permanently stored or on google in every day thousands of webpages added and it is permanent.

For storing this data these companies need storage or we can say that large storage of hard disk needed which can store the data and fetch the data to the user.

Why these companies don’t have petabytes or zettabytes storage capacity of Hard Disk?

Creating such a huge volume of hard disk is not a big deal. The companies which make hard disk can make this much volume of hard disk if facebook or google gives order to them. Then why they don’t use this? This is because here I/O problem came up when we create such a large capacity of the hard disk then the speed of loading the data to RAM and storing the data from RAM to Hard disk will be extremely slow. For example, if you search for one page in google then it will take 5-10 days for Google to fetch data from the hard disk and load to ram and show the result. This is known as the I/O problem.

How companies then handle this Problem?

Companies like Google, Facebook, Yahoo, or many big companies use the concept of Distributed Storage. In this, every computer shared their hard disk to the main computer in the center. Suppose if there are 5 computers of 100 GB capacity. In that one is the main computer and 4 other computers distributed their storage capacity by 50GB to the main computer then the total storage of the main computer will increase to the 300GB. Amazing concept didn’t it? This can be done in 2 computers if both of them share their hard disk then the overall capacity of both computers will increase.
In this, the problem of I/O solve because in this the data will equally distributed between the shared computer through “Hadoop software” so if the storage capacity individually less then the speed of loading or storing will decrease.

diagram of distributed system

What is Hadoop?

Hadoop is an open software platform developed in 2006 particularly adapted for managing and analyzing big data in structured and unstructured forms. This software is developed on the basis of a distributed system research paper written by Google.
Hadoop is a tool/software install on the cluster that having multiple computers connected in a parallel with one main computer known as “master-slave topology” or it can be different according to the companies requirements.
Hadoop helps in storing and analyzing big data in many ways. Let us see-

How Hadoop help in solving Big data problems?

Hadoop is an open-source plateform and people started using it so they contribute to developing it very efficiently and securely. The open-source of Hadoop plateform allows it to run on multiple servers.
The core components of Hadoop it consists of Hadoop Distributed File System(HDFS), YARN, MapReduce, Hadoop Common, and Hadoop Ozone and Hadoop Submarine.

Hadoop Distributed File System(HDFS)

As the name suggests HDFS is the component that is responsible for distributing the data/files in the different data nodes from the main node.

Hadoop collect the data and analyze it and distribute in the cluster having multiple storage capacity hard disk. It reduce the loading speed when the data load from RAM to Hard disk or from Hard disk to RAM. The data can be load in less than 1-second depending on the clusters. Hadoop is used by all big companies like Amazon, Yahoo, Facebook, LinkedIn, etc.,
Let us see how these companies manage data by using Distributed storage technology–>

1) Amazon

We all are aware of amazon world’s largest online retailer. But amazon has aws technology which helps us to hire the server of amazon at some prices. In these server same concepts of Hadoop work and companies has made server warehouse in which so many hard disk are connected and data is being distributed through the technology Hadoop. Pinterest and Instagram both are installed on the server of aws on rent.
According to some data, Pinterest pays $50 an hour to amazon during peak hours of the day, and about $15 in night when traffic is less.

2) Facebook

facebook data in 2015

Facebook uses Hadoop on a large scale. It has the largest cluster in the world having more than 4000 computers connected in it. It develops its own software Scuba which lets Facebook engineers instantly analyze data describing the length and breadth of the company’s massive infrastructure. Typically when you deal with such a huge amount of data then it takes time to process it all. But is what’s called an in-memory data store. It keeps all the data into the RAM/memory in the high-speed system running across hundreds of computer servers. This is the reason why we are able to fetch the data from Facebook in realtime.

3) Google

We all know this and Google has more than 2 billion pages or websites in their database. Handling this and other than it google also provide storage system like google drive and others. Ever you amazed how google keeps these data permanent. Google homebrewed data infrastructure is not exactly the same as “Hadoop” but it is somewhere the same as the Hadoop. The company has been known for its army of programmer wizard and its talent for creating distributed programs. Google manages big data very well. For example, once Google reported that sorts of petabyte storage just solve in 6 hours in this they set up 8000 computers and in all this process few hard drives were probably killed in the whole process. So Google also uses a distributed system to handle the Big data problem.

Like this, all such big companies keep the user data permanent and safe. This is possible only after knowing the concept of Distributed System and Hadoop. So if you also want large storage in your laptop or pc than ask your friend to share his/her storage with you.

This Post Has 2 Comments

Leave a Reply