Fall 2020
Data is growing faster than ever before, more data has been created in the past two years than in the entire previous history. During 2020, about 1.7 megabytes of new information have been created every second for every human being on the planet. Our accumulated data has growth to around 44 zettabytes, or 44 trillion gigabytes. The number of devices is also quickly growing. In 2020, we had over 6.1 billion smartphone users globally and there was over 50 billion smart connected devices in the world, all developed to collect and share data. The operation of these large volumes of data in order to get their insights in real time presents new challenges and opportunities for existing parallel data processing platforms cloud computing infrastructures.
This course introduces cloud computing and big data, and demonstrates the core tools used to wrangle and analyze big data on the cloud. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry, and manage elastic processing environments using Amazon Web Services. You will also explore the basics of cloud services and cloud deployment models. You will become acquainted with commonly used industry terms, typical business scenarios and applications for the cloud, and benefits and limitations inherent in the new paradigm that is the cloud.
Main course site: https://www.ucm.es/imllorente/cloud-computing-and-big-data
Extreme scale data science at the convergence of big data and massively parallel computing is enabling simulation, modelling and real-time analysis of complex natural and social phenomena at unprecedented scales. The aim of the projects is to gain practical experience into this interplay by applying parallel computation principles in solving a data-intensive problem.
These final projects solve a data-intensive problem with parallel processing on the AWS cloud. They have identified a ata science problem, analysed its compute scaling requirements, collected the data, designed and implemented a parallel software, and demonstrated scaled performance of an end-to-end application.
Presented on 19 and 21 January 2021
Group Number | Project Title | Team | Website |
---|---|---|---|
1 | MovieTrends | Javier Gómez Moraleda, Ramón Arjona Quiñones, Michael Steven Paredes Sánchez | GitHub, Website |
2 | Spotify Popularity Prediction | Iago Zamorano Chouciño, David Cantador Piedras, Jonathan José Jiménez Jiménez, Begoña Martínez Martín | GitHub, Website |
3 | USA Elections Analysis | Mario Román Dono, Álvaro Penalva Alberca , Gonzalo Martinez Berzal , Javier I. Sotelino Barriga | GitHub, Website |
4 | Estudio de la incidencia del Covid-19 en el mundo | Javier Guzmán Muñoz, Carla Martínez Nieto-Márquez, Daniel Ledesma Ventura, Pablo Morer Olmos, Gonzalo Fernández Megía | GitHub, Website |
5 | Covid-19 analysis | Pablo López Martín, Juan Tecedor Roa, Elena Fernández Jiménez | GitHub, Website |
6 | Estudio de la Play Store | Álvaro Casado Molinero, Jesús Sánchez Granado, Sergio Morán Agüero, Daniel Sanz Mayo | GitHub, Website |
7 | Steam Products Data Analysis | Óscar Molano Buitrago, Ricardo Freire Sacco, Cristina Ioana Duduta, Pablo García Grossi, Gonzalo Cidoncha Pérez | GitHub, Website |
8 | Analisis de logs en Android | Ramón Costales de Ledesma, Jose Ignacio Daguerre Garrido, Daniel Puente Arribas | GitHub, Website |
9 | Análisis de Reddit | Francisco Javier Lozano Hernández, Daniel Alcázar Muñoz, Jorge Roselló Martín | GitHub, Website |
10 | Vietnam War Bombings Analysis | Robert Farzan Rodríguez, Raquel Pérez Gónzalez de Ossuna, Miguel Robledo Casal | GitHub, Website |
11 | Análisis de Reviews de Hoteles Europeos | Juan Antonio Carrión García, Laura González Falque, Nathael Pérez Toyos, Xukai Chen, Jesús Miguel Darias Goitia | GitHub, Website |