Refactor parseArtSci function and add tests for newly introduced parseDepartmentList#1725
Refactor parseArtSci function and add tests for newly introduced parseDepartmentList#1725r-weng wants to merge 20 commits into
Conversation
Coverage Report for CI Build 0Coverage increased (+0.4%) to 58.998%Details
Uncovered Changes
Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
david-yz-liu
left a comment
There was a problem hiding this comment.
@r-weng This is good, but please write tests for this new function
| ### 🔧 Internal changes | ||
|
|
||
| - Refactor `parseArtSci` function in `app/WebParsing/ArtSciParser.hs` by introducing `parseDepartmentList` | ||
| - Add test cases for the `parseDepartmentList` function in `backend-test/WebParsing/ArtSciParserTests.hs` |
There was a problem hiding this comment.
It isn't necessary to add a second entry describing test cases added (this is implied as part of the first change)
| "Research Opportunity/Research Excursions (299/398/399)"] | ||
| bodyTags <- httpBodyTags url | ||
| let deptList = getDeptList bodyTags | ||
| let cleaned = map (BF.second (T.replace "\160" " ")) deptList |
There was a problem hiding this comment.
First, move this into getDeptList (i.e., we don't need to separate out the "cleaning" step). That function currently only calls T.strip, so it can be simplified by using getDeptList anyways.
Second, rather than do this replacing, please use the existing cleanText. That said, replacing "\160" with " " is better than replacing it with "", so you can also go ahead and modify cleanText itself. (See here for a reference for the '\160' char.)
| bodyTags <- httpBodyTags url | ||
| let deptList = getDeptList bodyTags | ||
| let cleaned = map (BF.second (T.replace "\160" " ")) deptList | ||
| return $ filter (\(deptPage, deptName) -> "/" `T.isPrefixOf` deptPage && deptName `notElem` ignoredDepts && not (" College)" `T.isSuffixOf` deptName)) cleaned |
There was a problem hiding this comment.
As was mentioned at our meeting yesterday, this filtering is a great candidate for a local helper function. The idiomatic way to define this is to use a where block in this function:
return filterDepartments deptList
where
filterDepartments :: [(T.Text, T.Text)] -> [(T.Text, T.Text)]
...| @@ -1,11 +1,12 @@ | |||
| module WebParsing.ArtSciParser | |||
| (parseCalendar, getDeptList) where | |||
| (parseCalendar, getDeptList, parseDepartmentList) where | |||
There was a problem hiding this comment.
While parseDepartmentList is good to be exported for testing purposes, getDeptList isn't used anywhere else in the codebase so we can remove it from this export list.
… refactor-parseArtSci-function
…ng a local helper function
| import Data.List (findIndex, nubBy) | ||
| import Data.Maybe (fromMaybe, mapMaybe) | ||
| import qualified Data.Text as T | ||
| import qualified Data.Bifunctor as BF |
There was a problem hiding this comment.
Put this in alphabetical order (i.e., above the Data.List)
| "Data Science", | ||
| "Faculty of Arts and Science Programs (299/398/399)", | ||
| "Laboratory Medicine and Pathobiology)", -- Displayed as "Pathobiology (see Laboratory Medicine and Pathobiology)" on program areas page | ||
| "Research Opportunity/Research Excursions (299/398/399)"] |
There was a problem hiding this comment.
This is good, I just realized there's one more section to exclude: "Writing in the Faculty of Arts & Science".
| -- Exclude departments with no courses, duplicate courses, and program areas belonging to a college. | ||
| parseDepartmentList :: String -> IO [(T.Text, T.Text)] | ||
| parseDepartmentList url = do | ||
| let ignoredDepts = ["ASIP (Arts & Science Internship Program)", |
There was a problem hiding this comment.
please move this into the where clause as well, since it's only used by the helper
Proposed Changes
This pull request refactors the
parseArtScifunction inapp/WebParsing/ArtSciParser.hsby introducing a new function,parseDepartmentList, to handle parsing of department lists.parseDepartmentListnow ignores department pages with no courses, duplicate courses, and program areas belonging to a college. Additionally, this pull request adds test for the newly introducedparseDepartmentListfunction.Type of Change
(Write an
Xor a brief description next to the type or types that best describe your changes.)Checklist
(Complete each of the following items for your pull request. Indicate that you have completed an item by changing the
[ ]into a[x]in the raw text, or by clicking on the checkbox in the rendered description on GitHub.)Before opening your pull request:
After opening your pull request:
Questions and Comments
(Include any questions or comments you have regarding your changes.)