Spark Streaming integration with Kafka and Hive

This project created by using Apache Spark Discretized Stream (DStreams) to read web requests logs data as messages from Apache Kafka and store them inside Apache Hive table

spark-submit --class com.weblogs.stream.WebLogStreaming --master local[2] target/SparkStreaming-0.0.1-SNAPSHOT.jar

Create topic

# create a topic
kafka-topics --create --replication-factor 1 --bootstrap-server localhost:9095 --partitions 1 --topic weblog_out
kafka-topics --create --replication-factor 1 --bootstrap-server localhost:9095 --partitions 1 --topic WebLogs
# create events in consumer
kafka-console-consumer --bootstrap-server localhost:9095 --topic WebLogs --from-beginning
# producer
kafka-console-producer --broker-list localhost:9092 --topic WebLogs

Sample of the data

10.131.2.1,30/Nov/2017:15:50:53,GET /details.php?id=43 HTTP/1.1,200

10.131.2.1,30/Nov/2017:15:34:56,POST /process.php HTTP/1.1,302

10.131.2.1,02/Dec/2017:18:35:48,GET /fonts/fontawesome-webfont.woff2?v=4.6.3 HTTP/1.1,304

10.129.2.1,14/Nov/2017:02:54:51,GET /robots.txt HTTP/1.1,404

10.130.2.1,22/Nov/2017:23:21:04,POST /process.php HTTP/1.1,302

Data source

https://www.kaggle.com/shawon10/web-log-dataset

Firstly, clone the repo to your machine by running following command:

git clone https://github.com/devibhattaraii/RCT-Dev-Deployment.git

Installation of packages

Install dependencies following this link. https://www.tecmint.com/install-apache-spark-on-ubuntu/ Exception: Spark: 3.2.1, Scala: 2.12.15

Running the Project

In order start the project, we will need Kafka server. In one terminal, run docker-compose and shift to another terminal.

sudo docker-compose up

starting the Spark Server

In another terminal, start the spark sever using following command:

# starts the spark server
make logs

sending Logs

In another terminal, using following command, send the logs:

make send-logs

To visit the Kafdrop UI, visit http://localhost:8000

After this, we can see the changes in total number of logs in Kafka via Kafdrop UI under topic: WEBLOGS.

Then, the messages we just pushed to Kafka is shoown as below:

Watching analytics

In another terminal after closing "make logs", using following command, we can view analytics:

make log-analytics

After this, we can see the log via analytics run on Hive table data via sql.

Working Demo in Fedora 35(workstation)

This UI screenshot is taken in Fedora 35 (workstation).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
scripts		scripts
src/main/java/com		src/main/java/com
.gitignore		.gitignore
Kafdrop-UI.png		Kafdrop-UI.png
Makefile		Makefile
README.md		README.md
analytic.png		analytic.png
docker-compose.yaml		docker-compose.yaml
events.txt		events.txt
log.txt		log.txt
pom.xml		pom.xml
viewMessages.png		viewMessages.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Streaming integration with Kafka and Hive

Create topic

Sample of the data

Data source

Installation of packages

Running the Project

starting the Spark Server

sending Logs

Watching analytics

Working Demo in Fedora 35(workstation)

About

Releases

Packages

Languages

devibhattaraii/BigData-Project

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming integration with Kafka and Hive

Create topic

Sample of the data

Data source

Installation of packages

Running the Project

starting the Spark Server

sending Logs

Watching analytics

Working Demo in Fedora 35(workstation)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages