Part A : Assignments based on the Hadoop

  1. Hadoop Installation on  a)Single Node  b)Multiple Node
  2. Design a distributed application using MapReduce which processes a log file of a system. List out the users who have logged for maximum period on the system. Use simple log file from the Internet and process it using a pseudo distribution mode on Hadoop platform.
  3. Design and develop a distributed application to find the coolest/hottest year from the available weather data. Use weather data from the Internet and process it using MapReduce.
  4. Write an application using HBase and HiveQL for flight information system which will include
  • Creating, Dropping, and altering Database tables
  • Creating an external Hive table to connect to the HBase for Customer Information Table
  • Load table with data, insert new values and field in the table, Join tables with Hive
  • Create index on Flight information Table
  • Find the average departure delay per day in 2008.