Pig Pig Pig!


Have you heard of Pig? It is a very useful tool on Hadoop to read your database. Pig does the MapReduce jobs for me on Hadoop clusters, so that I don't have to write out the lengthy and complicated MapReduce Python code. I got IMDB dataset again, and this time I want to find out which movie is the oldest 5-star movie and which movies are popular bad movies with ratings less than 2. The below file talks about the whole procedure, please take a look.

No comments:

Post a Comment