Need to identify and carry out a series of analyses (i.e., at least two) of a large dataset (or a collection of large datasets) utilising appropriate programming languages and programming environmentsof large datasets) utilising appropriate programming languages and programming environments.
Your project must incorporate the following elements: 1. Utilisation of a MapReduce environment for some part of the analysis 2. Source dataset(s) should be stored in appropriate database(s) prior to processing by MapReduce 3. Post-MapReduce processing dataset(s) should be stored in appropriate database(s) 4. Programmatically accessing the MapReduce source data 5. Programmatically storing the MapReduce output data 6. Follow-up analysis on the MapReduce output data For example, you may initially utilise MySQL to store a dataset and then your MapReduce processing would utilise the MySQL database as an input source. After processing the data through MapReduce you may then store the data in HBase or MongoDB. Following that you may use Python’s NumPy/ Pandas/ Matplotlib to conduct further analysis of the MapReduce output data (e.g., statistical analysis), and generate data visualisation plots for better presentation of results.
The results of your analysis should be included in a project report. The project report should discuss the programming and data handling challenges that you encountered and the means and mechanisms you implemented to overcome these challenges.
The report should be around 3000 words in length (excluding references), should follow the IEEE format1, as well as appropriate referencing and academic style.
The report should provide the following: 1. A description of the underlying dataset(s) 2. A description of the objective of the analysis; the analysis should answer a novel question 3. A description of the data processing activities carried out 4. Algorithms to process the dataset in a MapReduce environment 5. Presentation of results by making appropriate use of figures, tables, etc. 6. Discussion of the rationale and justification for the choices you have made in terms of data processing, programming language choice, and algorithms that you have implemented