-
Notifications
You must be signed in to change notification settings - Fork 80
Маслова Ульяна. Технология SEQ-MPI. Подсчет частоты символа в строке. Вариант 23. #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
allnes
merged 27 commits into
learning-process:master
from
ulianamaslova25:maslova_u_char_frequency_count
Dec 10, 2025
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
36f66e7
Add new files and rename it
ulianamaslova25 61be904
Add InType and OutType
ulianamaslova25 274eaa7
Add SEQ solution
ulianamaslova25 6471fad
Add MPI solution
ulianamaslova25 14230ad
fix TestType
ulianamaslova25 b1156fc
fix ops_mpi.cpp
ulianamaslova25 c6481e5
Add functional tests
ulianamaslova25 41a127e
Add performance tests
ulianamaslova25 10ca6e8
Add report
ulianamaslova25 95d15e7
fix bug with type
ulianamaslova25 5e6b799
change size_t to int
ulianamaslova25 14baf81
all int
ulianamaslova25 249fb70
fix clang-format
ulianamaslova25 a9186b6
add more simbols in perf tests
ulianamaslova25 b06fb23
fix clang_tidy
ulianamaslova25 5486c3d
fix clang-format
ulianamaslova25 335277b
fix all clang-tidy issues
ulianamaslova25 95acd08
i hope i fixed all clang-tidy issues
ulianamaslova25 58c353e
the last fix clang-tidy issues
ulianamaslova25 ce8b04f
add new func tests
ulianamaslova25 c9b23ef
fix report
ulianamaslova25 16499dd
some changes
ulianamaslova25 864ad8b
fix bugs from allnes
ulianamaslova25 e742daf
fix report for new data
ulianamaslova25 d17208f
fix clang-tidy
ulianamaslova25 b99f28b
new try to fix clang-format
ulianamaslova25 2c1b7d8
edited info
ulianamaslova25 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
17 changes: 17 additions & 0 deletions
17
tasks/maslova_u_char_frequency_count/common/include/common.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| #pragma once | ||
|
|
||
| #include <cstddef> | ||
| #include <string> | ||
| #include <tuple> | ||
| #include <utility> | ||
|
|
||
| #include "task/include/task.hpp" | ||
|
|
||
| namespace maslova_u_char_frequency_count { | ||
|
|
||
| using InType = std::pair<std::string, char>; | ||
| using OutType = size_t; | ||
| using TestType = std::tuple<InType, OutType, std::string>; | ||
| using BaseTask = ppc::task::Task<InType, OutType>; | ||
|
|
||
| } // namespace maslova_u_char_frequency_count |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| { | ||
| "student": { | ||
| "first_name": "Ульяна", | ||
| "last_name": "Маслова", | ||
| "middle_name": "Александровна", | ||
| "group_number": "3823Б1ФИ2", | ||
| "task_number": "23" | ||
| } | ||
| } |
22 changes: 22 additions & 0 deletions
22
tasks/maslova_u_char_frequency_count/mpi/include/ops_mpi.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| #pragma once | ||
|
|
||
| #include "maslova_u_char_frequency_count/common/include/common.hpp" | ||
| #include "task/include/task.hpp" | ||
|
|
||
| namespace maslova_u_char_frequency_count { | ||
|
|
||
| class MaslovaUCharFrequencyCountMPI : public BaseTask { | ||
| public: | ||
| static constexpr ppc::task::TypeOfTask GetStaticTypeOfTask() { | ||
| return ppc::task::TypeOfTask::kMPI; | ||
| } | ||
| explicit MaslovaUCharFrequencyCountMPI(const InType &in); | ||
|
|
||
| private: | ||
| bool ValidationImpl() override; | ||
| bool PreProcessingImpl() override; | ||
| bool RunImpl() override; | ||
| bool PostProcessingImpl() override; | ||
| }; | ||
|
|
||
| } // namespace maslova_u_char_frequency_count |
109 changes: 109 additions & 0 deletions
109
tasks/maslova_u_char_frequency_count/mpi/src/ops_mpi.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| #include "maslova_u_char_frequency_count/mpi/include/ops_mpi.hpp" | ||
|
|
||
| #include <mpi.h> | ||
|
|
||
| #include <algorithm> | ||
| #include <climits> | ||
| #include <cstddef> | ||
| #include <cstdint> | ||
| #include <string> | ||
| #include <utility> | ||
| #include <vector> | ||
|
|
||
| #include "maslova_u_char_frequency_count/common/include/common.hpp" | ||
|
|
||
| namespace maslova_u_char_frequency_count { | ||
|
|
||
| MaslovaUCharFrequencyCountMPI::MaslovaUCharFrequencyCountMPI(const InType &in) { | ||
| SetTypeOfTask(GetStaticTypeOfTask()); | ||
| GetInput() = in; | ||
| GetOutput() = 0; | ||
| } | ||
|
|
||
|
allnes marked this conversation as resolved.
|
||
| bool MaslovaUCharFrequencyCountMPI::ValidationImpl() { | ||
| int rank = 0; | ||
| MPI_Comm_rank(MPI_COMM_WORLD, &rank); | ||
| int flag = 0; // 0 - всё ок, 1 - ошибка | ||
| if (rank == 0) { | ||
| if (GetInput().first.size() > static_cast<size_t>(INT_MAX)) { | ||
| flag = 1; | ||
| } | ||
| } | ||
| MPI_Bcast(&flag, 1, MPI_INT, 0, MPI_COMM_WORLD); | ||
| return (flag == 0); | ||
| } | ||
|
|
||
|
allnes marked this conversation as resolved.
|
||
| bool MaslovaUCharFrequencyCountMPI::PreProcessingImpl() { | ||
| return true; | ||
| } | ||
|
|
||
| bool MaslovaUCharFrequencyCountMPI::RunImpl() { | ||
| int rank = 0; | ||
| int proc_size = 0; | ||
| MPI_Comm_rank(MPI_COMM_WORLD, &rank); // id процесса | ||
| MPI_Comm_size(MPI_COMM_WORLD, &proc_size); // количество процессов | ||
|
|
||
| std::string input_string; | ||
| char input_char = 0; | ||
| size_t input_str_size = 0; | ||
|
|
||
|
allnes marked this conversation as resolved.
|
||
| if (rank == 0) { | ||
| input_string = GetInput().first; | ||
| input_char = GetInput().second; | ||
| input_str_size = input_string.size(); // получили данные | ||
| } | ||
|
|
||
| uint64_t size_for_mpi = 0; | ||
| if (rank == 0) { | ||
| size_for_mpi = static_cast<uint64_t>(input_str_size); // явное приведение перед передачей | ||
| } | ||
|
|
||
| MPI_Bcast(&size_for_mpi, 1, MPI_UINT64_T, 0, MPI_COMM_WORLD); // отправляем размер строки | ||
|
|
||
| if (rank != 0) { | ||
| input_str_size = static_cast<size_t>(size_for_mpi); // возращаем обратно для удобного использования в дальнейшем | ||
| } | ||
|
|
||
| if (input_str_size == 0) { | ||
| GetOutput() = 0; // ставим для всех процессов | ||
| return true; | ||
| } | ||
|
allnes marked this conversation as resolved.
|
||
|
|
||
| MPI_Bcast(&input_char, 1, MPI_CHAR, 0, MPI_COMM_WORLD); // отправляем нужный символ | ||
|
|
||
| std::vector<int> send_counts(proc_size); // здесь размеры всех порций | ||
| std::vector<int> displs(proc_size); // смещения | ||
| if (rank == 0) { | ||
| size_t part = input_str_size / proc_size; | ||
| size_t rem = input_str_size % proc_size; | ||
| for (size_t i = 0; std::cmp_less(i, proc_size); ++i) { | ||
| send_counts[i] = static_cast<int>(part + (i < rem ? 1 : 0)); // общий размер, включающий остаток, если он входит | ||
| } | ||
| displs[0] = 0; | ||
| for (size_t i = 1; std::cmp_less(i, proc_size); ++i) { | ||
| displs[i] = displs[i - 1] + send_counts[i - 1]; | ||
| } | ||
| } | ||
|
|
||
| MPI_Bcast(send_counts.data(), proc_size, MPI_INT, 0, MPI_COMM_WORLD); // отправляем размеры порций | ||
| std::vector<char> local_str(send_counts[rank]); | ||
| MPI_Scatterv((rank == 0) ? input_string.data() : nullptr, send_counts.data(), displs.data(), MPI_CHAR, | ||
| local_str.data(), static_cast<int>(local_str.size()), MPI_CHAR, 0, MPI_COMM_WORLD // распределяем данные | ||
| ); | ||
|
|
||
| size_t local_count = std::count(local_str.begin(), local_str.end(), input_char); | ||
| auto local_count_for_mpi = static_cast<uint64_t>(local_count); | ||
| uint64_t global_count = 0; | ||
| MPI_Allreduce(&local_count_for_mpi, &global_count, 1, MPI_UINT64_T, MPI_SUM, | ||
| MPI_COMM_WORLD); // собрали данные со всех процессов | ||
|
|
||
| GetOutput() = static_cast<size_t>(global_count); // вывели результат, при этом приведя его к нужному нам типу | ||
|
|
||
| return true; | ||
| } | ||
|
|
||
| bool MaslovaUCharFrequencyCountMPI::PostProcessingImpl() { | ||
| return true; | ||
| } | ||
|
|
||
| } // namespace maslova_u_char_frequency_count | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| # Подсчет частоты символа в строке | ||
|
|
||
| - Student: Маслова Ульяна Александровна, group 3823Б1ФИ2 | ||
| - Technology: SEQ | MPI | ||
| - Variant: 23 | ||
|
|
||
| ## 1. Introduction | ||
| Проблема: Последовательный подсчет частоты символов в строках большого размера является медленным. | ||
| Задача: Ускорить этот процесс с помощью параллельных вычислений на MPI. | ||
| Ожидаемый результат: Значительное сокращение времени выполнения по сравнению с последовательной версией. | ||
|
|
||
| ## 2. Problem Statement | ||
| Нужно найти число вхождений символа input_char в строке input_str. | ||
| - InPut: Пара (std::string, char). | ||
| - OutPut: Целое число (size_t). | ||
|
|
||
| ## 3. Baseline Algorithm (Sequential) | ||
| Проход по строке в цикле с увеличением счетчика при нахождении искомого символа. Алгоритм имеет линейную временную сложность O(N), где N — длина строки. | ||
|
|
||
| ## 4. Parallelization Scheme | ||
| Процесс с рангом 0 делит исходную строку на P (число процессов) частей. Далее ранг 0 рассылает каждому процессу его фрагмент строки. Каждый процесс независимо считает символы в своей части. Локальные счетчики суммируются на ранге 0. | ||
|
|
||
| ## 5. Implementation Details | ||
| - common: Определяет общие типы данных (InType, OutType). | ||
| - seq: Содержит простую последовательную реализацию алгоритма. | ||
| - mpi: Содержит параллельную MPI-реализацию алгоритма. | ||
| - tests: Включает два набора тестов: functional для проверки корректности и performance для замера скорости. | ||
| Максимальная длинна строки, которая может быть обработана программой - 2<sup>31</sup> - 1, что составляет 2 147 483 647 символов. | ||
|
|
||
| ## 6. Experimental Setup | ||
| - Аппаратное обеспечение: AMD Ryzen 7 7840HS (8 ядер, 16 логических процессоров, базовая частота 3,80 ГГц) | ||
| - ОЗУ — 16 ГБ | ||
| - Операционная система: Windows 11 | ||
| - Компилятор: g++ | ||
| - Тип сборки: Release | ||
|
|
||
| ## 7. Results and Discussion | ||
|
|
||
| ### 7.1 Correctness | ||
| Корректность проверялась на строках различной длинны, а также различного типа (пустые, из одного слова, только из букв, смешанные и т.д.) | ||
|
|
||
| ### 7.2 Performance | ||
|
|
||
| Тест на данных, состоящих из 100 000 000 символов: | ||
|
|
||
| | Mode | Count | Time, s | Speedup | Efficiency | | ||
| |------|-------|-----------|---------|------------| | ||
| | seq | 1 | 0.07242 | 1.00 | N/A | | ||
| | mpi | 2 | 0.04439 | 1.63 | 81.5% | | ||
| | mpi | 4 | 0.03100 | 2.34 | 58.5% | | ||
| | mpi | 8 | 0.03050 | 2.37 | 29.6% | | ||
|
|
||
| ## 8. Conclusions | ||
| Мы видим значительное повышение производительности. С увеличением числа процессов время выполнения сокращается, в связи с этим мы имеем ускорение 2.34 на 4 процессах. В свою очередь эффективность падает с ростом числа процессов, так как накладные расходы на коммуникацию MPI на 8 процессах начинают перевешивать выгоду от параллелизма. | ||
|
|
||
| ## 9. References | ||
| 1. Лекции и практики курса "Параллельное программирование" | ||
|
|
||
| ## Appendix (Optional) | ||
| ```cpp | ||
| bool MaslovaUCharFrequencyCountMPI::RunImpl() { | ||
| int rank = 0; | ||
| int proc_size = 0; | ||
| MPI_Comm_rank(MPI_COMM_WORLD, &rank); // id процесса | ||
| MPI_Comm_size(MPI_COMM_WORLD, &proc_size); // количество процессов | ||
|
|
||
| std::string input_string; | ||
| char input_char = 0; | ||
| size_t input_str_size = 0; | ||
|
|
||
| if (rank == 0) { | ||
| input_string = GetInput().first; | ||
| input_char = GetInput().second; | ||
| input_str_size = input_string.size(); // получили данные | ||
| } | ||
|
|
||
| uint64_t size_for_mpi = 0; | ||
| if (rank == 0) { | ||
| size_for_mpi = static_cast<uint64_t>(input_str_size); // явное приведение перед передачей | ||
| } | ||
|
|
||
| MPI_Bcast(&size_for_mpi, 1, MPI_UINT64_T, 0, MPI_COMM_WORLD); // отправляем размер строки | ||
|
|
||
| if (rank != 0) { | ||
| input_str_size = static_cast<size_t>(size_for_mpi); // возращаем обратно для удобного использования в дальнейшем | ||
| } | ||
|
|
||
| if (input_str_size == 0) { | ||
| GetOutput() = 0; // ставим для всех процессов | ||
| return true; | ||
| } | ||
|
|
||
| MPI_Bcast(&input_char, 1, MPI_CHAR, 0, MPI_COMM_WORLD); // отправляем нужный символ | ||
|
|
||
| std::vector<int> send_counts(proc_size); // здесь размеры всех порций | ||
| std::vector<int> displs(proc_size); // смещения | ||
| if (rank == 0) { | ||
| size_t part = input_str_size / proc_size; | ||
| size_t rem = input_str_size % proc_size; | ||
| for (size_t i = 0; std::cmp_less(i, proc_size); ++i) { | ||
| send_counts[i] = static_cast<int>(part + (i < rem ? 1 : 0)); // общий размер, включающий остаток, если он входит | ||
| } | ||
| displs[0] = 0; | ||
| for (size_t i = 1; std::cmp_less(i, proc_size); ++i) { | ||
| displs[i] = displs[i - 1] + send_counts[i - 1]; | ||
| } | ||
| } | ||
|
|
||
| MPI_Bcast(send_counts.data(), proc_size, MPI_INT, 0, MPI_COMM_WORLD); // отправляем размеры порций | ||
| std::vector<char> local_str(send_counts[rank]); | ||
| MPI_Scatterv((rank == 0) ? input_string.data() : nullptr, send_counts.data(), displs.data(), MPI_CHAR, | ||
| local_str.data(), static_cast<int>(local_str.size()), MPI_CHAR, 0, MPI_COMM_WORLD // распределяем данные | ||
| ); | ||
|
|
||
| size_t local_count = std::count(local_str.begin(), local_str.end(), input_char); | ||
| auto local_count_for_mpi = static_cast<uint64_t>(local_count); | ||
| uint64_t global_count = 0; | ||
| MPI_Allreduce(&local_count_for_mpi, &global_count, 1, MPI_UINT64_T, MPI_SUM, | ||
| MPI_COMM_WORLD); // собрали данные со всех процессов | ||
|
|
||
| GetOutput() = static_cast<size_t>(global_count); // вывели результат, при этом приведя его к нужному нам типу | ||
|
|
||
| return true; | ||
| } | ||
|
|
||
| bool MaslovaUCharFrequencyCountMPI::PostProcessingImpl() { | ||
| return true; | ||
| } | ||
| ``` |
22 changes: 22 additions & 0 deletions
22
tasks/maslova_u_char_frequency_count/seq/include/ops_seq.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| #pragma once | ||
|
|
||
| #include "maslova_u_char_frequency_count/common/include/common.hpp" | ||
| #include "task/include/task.hpp" | ||
|
|
||
| namespace maslova_u_char_frequency_count { | ||
|
|
||
| class MaslovaUCharFrequencyCountSEQ : public BaseTask { | ||
| public: | ||
| static constexpr ppc::task::TypeOfTask GetStaticTypeOfTask() { | ||
| return ppc::task::TypeOfTask::kSEQ; | ||
| } | ||
| explicit MaslovaUCharFrequencyCountSEQ(const InType &in); | ||
|
|
||
| private: | ||
| bool ValidationImpl() override; | ||
| bool PreProcessingImpl() override; | ||
| bool RunImpl() override; | ||
| bool PostProcessingImpl() override; | ||
| }; | ||
|
|
||
| } // namespace maslova_u_char_frequency_count |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| #include "maslova_u_char_frequency_count/seq/include/ops_seq.hpp" | ||
|
|
||
| #include <climits> | ||
| #include <cstddef> | ||
| #include <string> | ||
|
|
||
| #include "maslova_u_char_frequency_count/common/include/common.hpp" | ||
|
|
||
| namespace maslova_u_char_frequency_count { | ||
|
|
||
| MaslovaUCharFrequencyCountSEQ::MaslovaUCharFrequencyCountSEQ(const InType &in) { | ||
| SetTypeOfTask(GetStaticTypeOfTask()); | ||
| GetInput() = in; | ||
| GetOutput() = 0; | ||
| } | ||
|
|
||
| bool MaslovaUCharFrequencyCountSEQ::ValidationImpl() { | ||
| return GetInput().first.size() <= static_cast<size_t>(INT_MAX); | ||
| } | ||
|
|
||
| bool MaslovaUCharFrequencyCountSEQ::PreProcessingImpl() { | ||
| return true; | ||
| } | ||
|
|
||
| bool MaslovaUCharFrequencyCountSEQ::RunImpl() { | ||
| std::string &input_string = GetInput().first; | ||
| char input_char = GetInput().second; // получили данные | ||
| size_t frequency_count = 0; | ||
|
|
||
| for (const char c : input_string) { | ||
| if (c == input_char) { | ||
| frequency_count++; | ||
| } | ||
| } | ||
|
|
||
| GetOutput() = frequency_count; // отправили данные | ||
| return true; | ||
| } | ||
|
|
||
| bool MaslovaUCharFrequencyCountSEQ::PostProcessingImpl() { | ||
| return true; | ||
| } | ||
|
|
||
| } // namespace maslova_u_char_frequency_count |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| { | ||
| "tasks_type": "processes", | ||
| "tasks": { | ||
| "mpi": "enabled", | ||
| "seq": "enabled" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| InheritParentConfig: true | ||
|
|
||
| Checks: > | ||
| -modernize-loop-convert, | ||
| -cppcoreguidelines-avoid-goto, | ||
| -cppcoreguidelines-avoid-non-const-global-variables, | ||
| -misc-use-anonymous-namespace, | ||
| -modernize-use-std-print, | ||
| -modernize-type-traits | ||
|
|
||
| CheckOptions: | ||
| - key: readability-function-cognitive-complexity.Threshold | ||
| value: 50 # Relaxed for tests |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.