In part one, we defined big data, and in part two, we talked a little about why companies might need big data and how an open source solution might be the best way to go. In this part, we will take a look at one open source solution that is making headlines. It is called Apache Hadoop.
From the same Apache Software Foundation that develops the world’s most widely-used web server software, Apache HTTP Server, Hadoop is free and open source and is the big data solution of choice for many enterprises. Its website describes it as a “reliable, scalable, distributed computing” software project.
Essentially, if you have large amounts of data and need to spread them out over clusters of servers, Hadoop can help you accomplish this using a technology called map/reduce. The distributed file system stores data on the compute nodes, making it a fast and easier to access for applications that need to utilize and even analyze this large amount of data.
Hadoop has gained a lot of major contributors to its code, and many major corporations are now using it to handle their big data. Moreover, companies like Amazon now offer Hadoop services on their EC2/S3 compute and storage solutions. As more companies find themselves in need of big data solutions, Hadoop may very well become the standard.