This project is an assignment. Implemented code would be a nice example of Restaurants data scraper and generate business insight from scraped data.
Restaurants Data Scraper provides
Event driven architecture.
Highly decoupled and scalable.
Single-purpose event processing components that asynchronously receive and process events.
Configurable Scrape datasource.
Scalable and highly available NoSQL data store.
Prometheus and Grafana for Insight dashboard and monitoring.
According to assignment, distribution of delivery fees has to be shown.
Calculation is done in RestaurantRegionDataProcessor.
Data is exposed to Prometheus.
But couldn’t figure out how to show it to Grafana.
Authentication and authorization is not taking into consideration.
NoSQL data store is an obvious choice. MongoDB provides high scalability and availability.
: Common utils and exception classes. -
: Read datasource (urls and metas) from sources.yml and publish event to scraper to catch. -
: Scrape source and publish content to Apache Kafka. -
: Extract data and persist to DB. -
: Read data from DB, calculate/generate KPIs and expose to Prometheus.
Improve architectural design, completed the project in around 10 hours. First time user of Prometheus, Grafana and MongoDB.
Code improvements
Add end-to-end tests. Cover more unit tests.
Handle failure scenario properly. Publish data to Prometheus would be nice way to represent it.
Kafka publish error in DataSourcePublisherService, DataScraperService and DataExtractionNotificationPublisherService.
Handle when data is missing in RestaurantRegionDataProcessor.
Make APIs to add/delete/modify datasource, instead of
. -
Build docker image (plugin already added in the pom).
Generate and check OWASP report.
JDK 1.8 (Tested with Oracle JDK)
Maven 3.6.x+
Docker (19.03.4), Docker Compose (1.24.1)
# Runs follwing services:
kafka-zookeeper: Kafka zookeeper
kafka: Event / Message bus
kafdrop: UI to administer Kafka
mongodb: NoSQL DB
mongo-express: UI to administer MongoDB
prometheus: Time series database
grafana: Analytics & monitoring solution
apps: source-publisher, scraper, data-extractor and kpi-publisher
$ mvn clean compile package
$ docker-compose build
$ docker-compose up
Should able to see Kafka topics at
. -
Should able to see extracted data in MongoDB at
. -
Should able to see published data for Prometheus at
. -
Should able to see Prometheus at
. -
Should able to see Grafana dashboard at
. Username/pass: admin/admin. -
How to change datasource url?
. -
. Make a GET call to`http://ip:8080/`. -
mvn clean compile package && docker-compose build && docker-compose up
To run the unit tests, execute the following commands
mvn clean test-compile test
To run the integration tests, execute the following commands
mvn clean test-compile verify -DskipTests=true
To run the integration tests, execute the following commands
mvn clean test-compile verify
Licensed under the MIT License, see the LICENSE file for details.