The docker image Apache hadoop 3.3.4 distribution on Ubuntu 22.04 with Spark 33.3.1, Pig 0.17.0, and Hive 3.1.3
Find this on Docker Hub https://hub.docker.com/r/fedric/hadoop-spark-pig-hive
docker build -t fedric/hadoop-spark-pig-hive.
docker pull fedric/hadoop-spark-pig-hive
In order to use the Docker image you have just build or pulled use:
docker run -it -p 50070:50070 -p 8088:8088 -p 8080:8080 fedric/hadoop-spark-pig-hive bash
You can run one of the hadoop examples:
# run the mapreduce
yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
# check the output
hdfs dfs -cat output/*
hive
or
beeline -u jdbc:hive2://
pig
Scala
spark-shell
Python
pyspark