Big Data Scripts

This repository contains a collection of Apache Spark scripts used to get familiar with the basics of batch processing of big data and a collection of Apache Flink scripts used to get familiar with the basics of stream processing of big data.

Features

The Apache Spark scripts cover a range of topics such as:

manipulating RDDs via:
- functional programming principles like pattern matching
- regex
- functions like:
  - map
  - flatMap
  - reduceByKey
  - flatten
  - filter
manipulating DataFrames via:
- Spark SQL
- custom aggregation functions using Window

The Apache Flink scripts cover a range of topics such as:

basic manipulation of DataStreams via functions like:
- map
- filter
- flatMap
working with stateful streams via keyBy
dealing with infinite streams via:
- different kinds of window assigners like TumblingEventTimeWindows or SlidingEventTimeWindows
- keyed and non-keyed windows
- new ProcessWindowFunction

Tools

Purpose	Name
Programming language	Scala
Cluster computing framework	Apache Spark, Apache Flink

Installation Process

It is assumed that both a Java JDK and an IDE such as IntelliJ are installed and that the users operating system is Windows.

Install the Scala support plugin for your IDE.
Import the corresponding sub folder of this repository as a Maven project and resolve all dependencies.

Licence

These Big Data scripts are published under the MIT licence, which can be found in the LICENSE file. For this repository, the terms laid out there shall not apply to any individual that is currently enrolled at a higher education institution as a student. Those individuals shall not interact with any other part of this repository besides this README in any way by, for example cloning it or looking at its source code or have someone else interact with this repository in any way.

References

The Apache Spark logo was taken from Wikipedia and the Apache Flink logo from .

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
apache-flink		apache-flink
apache-spark		apache-spark
img		img
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Scripts

Features

Tools

Installation Process

Licence

References

About

Releases 1

Packages

Languages

License

johanneshagspiel/big-data-scripts

Folders and files

Latest commit

History

Repository files navigation

Big Data Scripts

Features

Tools

Installation Process

Licence

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages