Map-Reduce Implementation on Kubernetes (Project 2024)

for Principles of Distributed Systems Class, Technical University of Crete

Overview

This project implements a Map-Reduce framework on Kubernetes to compute word frequencies from input files stored in Minio. It utilizes various components such as UI service, authentication service, Cassandra for metadata storage, and Kubernetes-managed workers for executing Map-Reduce tasks.

Deployment and Cleanup

Create Kubernetes Deployment

To deploy the entire system on Kubernetes:

make deploy

Clean Kubernetes Deployment

To remove all deployed components from Kubernetes:

make clean

How to Run

Using the Client CLI (client.py)

Port Forwarding for UI Service:

Forward the UI service port to access it locally:
```
kubectl port-forward service/ui-service 8080:8080 -n dena
```

Admin Operations:

Login as admin:

python3 client.py login --username admin --password admin

Logout:
```
python3 client.py logout
```

Create a new user as admin:

python3 client.py admin create-user user2

Job Submission:

Submit a job to process a specific filename (ensure the file is in "map-reduce-input-files" bucket in Minio):
```
python3 client.py jobs submit filename
```
Job Status:

Check the status of a submitted job using its job_id:
```
python3 client.py jobs status job_id
```

Additional Scripts in `testing_scripts`

This directory includes useful scripts for development and testing:

display_content.py: Displays contents of a file stored in Minio.
generate_file.py: Generates input files with a specified number of words and stores them in Minio.
test_cassandra.py: Tests connectivity and functionality of Cassandra.
test_minio: Checks the contents of Minio buckets.

Architecture

Components

UI Service: Flask API handling user commands.
Auth Service: FastAPI managing user login and token assignment.
Cassandra: Distributed data storage for job metadata and temporary data.
Manager: Flask API coordinating Map-Reduce execution, managing workers, and metadata.
Workers: Kubernetes jobs executing Map-Reduce tasks.
Minio: Persistent storage for input files, output files, and Map-Reduce chunks.

Map-Reduce Algorithm Workflow

Job Initialization:
- UI service forwards a job to /initialize_job endpoint of Manager service, storing job metadata in Cassandra with status initialized.
Split Phase (Worker Job - Split):
- Worker job retrieves the input file from Minio, splits it into chunks, and stores them in a Minio bucket (chunk_bucket). It notifies the Manager service upon completion.
Map Phase (Worker Job - Map):
- Manager creates mapper jobs based on the number of chunks.
- Mapper jobs retrieve chunks from Minio, perform mapping, and store data in Cassandra (map_table). They notify the Manager upon completion.
Shuffle-Sort Phase (Worker Job - Shuffle-Sort):
- Manager aggregates and sorts mapped data into shuffle_sort_table in Cassandra.
- Worker job retrieves data, performs shuffling and sorting, and stores results back in Cassandra. It notifies the Manager upon completion.
Reduce Phase (Worker Job - Reduce):
- Manager creates reducer jobs based on configured reducers.
- Reducer jobs retrieve data from Cassandra, perform reduction operations, and store results in reduce_table. They notify the Manager upon completion.
Combine Phase (Worker Job - Combine):
- Manager initiates a combine job to aggregate reduced results.
- Worker job retrieves data from Cassandra, performs combining, and stores the output JSON file in the output Minio bucket. It notifies the Manager upon completion.
Job Completion:
- Manager updates job status to completed upon receiving notification from the combine job.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
auth_service		auth_service
kubernetes		kubernetes
manager_service		manager_service
testing_scripts		testing_scripts
ui_service		ui_service
workers_service		workers_service
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
client.py		client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Map-Reduce Implementation on Kubernetes (Project 2024)

Overview

Deployment and Cleanup

Create Kubernetes Deployment

Clean Kubernetes Deployment

How to Run

Using the Client CLI (client.py)

Additional Scripts in `testing_scripts`

Architecture

Components

Map-Reduce Algorithm Workflow

About

Releases 1

Packages

Contributors 2

Languages

License

JoelJa835/Map-Reduce

Folders and files

Latest commit

History

Repository files navigation

Map-Reduce Implementation on Kubernetes (Project 2024)

Overview

Deployment and Cleanup

Create Kubernetes Deployment

Clean Kubernetes Deployment

How to Run

Using the Client CLI (client.py)

Additional Scripts in testing_scripts

Architecture

Components

Map-Reduce Algorithm Workflow

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Additional Scripts in `testing_scripts`

Packages