This program implements a histogram using the Message Passing Interface (MPI) and OpenMP. The histogram is constructed based on a dataset containing group ages that watch a specific TV show. The program calculates the frequency of age groups and generates a frequency histogram to visually represent the data.
To run this program, you need the following:
- MPI library installed (e.g., Open MPI, MPICH)
- OpenMP library installed (compatible with your compiler)
- C/C++ compiler (supporting MPI and OpenMP)
The program expects the following inputs:
- Number of bars: Number of intervals or bins the range of values will be divided into.
- Number of points: Total number of data points in the dataset.
- Number of Threads: Number of threads to be used for parallel computation with OpenMP.
- Number of Processes: Number of processes to be used for parallel computation with MPI.
The dataset is assumed to be stored in a file named dataset.txt
.
The program generates the following output:
- The range of each interval/bar along with the count of data points falling into that interval.
Example output:
The range starts with 0 and ends with 6 with a count of 2
The range starts with 6 and ends with 12 with a count of 6
The range starts with 12 and ends with 18 with a count of 4
The range starts with 18 and ends with 24 with a count of 8
The program utilizes both MPI and OpenMP to achieve parallel computation.
The MPI library is used to distribute the dataset among multiple processes and gather the results.
-
Data Distribution:
- The dataset is divided equally among the available processes.
- Each process reads its portion of the dataset from the file.
-
Data Gathering:
- MPI communication functions, such as
MPI_Scatter
andMPI_Gather
, are used to distribute and gather data, respectively. - The root process (rank 0) receives the data from all processes and merges it into a single dataset.
- MPI communication functions, such as
The OpenMP library is used for parallel computation within each process, utilizing multiple threads.
-
Thread Distribution:
- The root process distributes the merged dataset evenly among the available threads.
- Each thread calculates the frequency of age groups in its portion of the dataset.
-
Thread Synchronization:
- OpenMP synchronization constructs, such as
omp barrier
andomp critical
, are used to ensure correct accumulation of frequency counts from each thread.
- OpenMP synchronization constructs, such as
-
Interval Calculation:
- The program divides the age range into intervals based on the number of bars specified.
- The range is determined by dividing the total range of ages by the number of bars.
-
Frequency Counting:
- Each thread counts the number of data points falling into its assigned interval.
- The thread-local counts are accumulated to obtain the final frequency counts for each interval.
-
Output Generation:
- The results are then printed to the console, showing the range of each interval and the count of data points within it.
- Ensure that you have MPI and OpenMP installed on your system.
- Compile the code using the appropriate compiler for MPI and OpenMP.
- Execute the compiled program, providing the required inputs as prompted.
- The program will read the dataset from the file named "dataset.txt".
- After processing, the program will display the frequency histogram on the console.
The dataset should be stored in a text file named dataset.txt
. Each line of the file represents a data point, which should be a valid age value. The program assumes that the dataset file is properly formatted and contains one age value per line.
The program currently assumes that the input values provided are valid. However, for robustness, you may need to add error handling and input validation to handle incorrect inputs. Additionally, file read errors should be handled gracefully.
Here are some potential improvements for the program:
- Error handling and input validation can be added to handle incorrect inputs and file read errors.
- The program can be extended to support different data types and formats for the dataset.
- Additional statistical calculations, such as mean and standard deviation, can be included.
- Visualization libraries can be integrated to generate a graphical representation of the histogram.
Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.
- Khaled Ashraf Hanafy Mahmoud - 20190186.
- Ahmed Sayed Hassan Youssef - 20190034.
- Samah Moustafa Hussien Mahmoud - 20190248.
This program is licensed under the MIT License.