In part one, we defined big data, but the name big data really only tells you what the problem is. It does not address the solution. You have this massive amount of unstructured data, and you have no way of organizing it. What should you do? This is where big data solutions come into play.
In the old days, companies would spend thousands of dollars on big analytical systems that would pore over massive amounts of data and take days, weeks, months, even years to finish analyzing it. Today, that is no longer acceptable. The big data is coming in too fast, and industries change far too quickly. With the Internet and everything in it, change is rapid and demands a big data solution that can get results to you fast.
The purpose of a big data solution is to provide the semblance of structure to the massive amounts of data and allow the user to access it quickly and analyze it efficiently. As an added bonus, it would also be nice if that user could accomplish it without spending a fortune on hardware to power such a system. That is a problem that needed a universal solution, one that the entire IT world would be able to use and develop together. In other words, it needed to be free and open source.
In part 3, we will learn a little about that solution.