- Once we finished rebranding, we should add a Wikipedia article (DE and EN) and point the acronym "fork" articles RDB (de) and RDB (en) to it.
-
Experiment with
plotly::plot_ly(colors = viridis::turbo(n = ?))inplot_topic_segmentation() -
There is at least one "Landsgemeinde" referendum (
id = 5bbc004292a21351232e52e7) with no result (NA) where the result should actually be"no"(Landsgemeinde rejected proposal) if I'm not mistaken. Do we handle "Landsgemeinde" votes specially or why is that? -
Should we also try to collect "surrounding conditions" information besides the formal institutional stuff? For example, international law recognizes a set of conditions that must be met for a secession referendum to be considered legal.
-
Add additional database fields:
description, holding a short prosa description of the referendum.tags(maybe we'll find a better name, e.g.keywords?), holding a list of freely definable tags, so we can create ad-hoc collections of referendums that share some common characteristicstatusto capture the recognition status of a voting
Any more?
-
According to Uwe, we only capture "official"/"authorized" votings, but there are already inofficial ones present in the database like this one for which sudd.ch reports:
Diese Abstimmung ist nicht offiziell und wird von niemandem anerkannt.
Instead of not capturing such votings, it would be superior to introduce another variable indicating the status of a voting (official, inofficial, ...); currently we only have an institutional variable
legal_basis_type(formerlyofficial_status) which measures a completely different thing. Maybe name this new variable simplystatus? -
Völkerrechtlich umstrittene Gebiete: Es gibt bislang keine explizite RDB-Policy dazu, wir müssten daher etwas definieren.
Bspw. werden
- alle Abstimmungen, die die Republik Kosovo betreffen, unter dem
country_nameSerbia geführt... - für die Abstimmungen in Taiwan uneinheitliche
country_name's verwendet, für die Abstimmungen am 2018-11-24 Taiwan, Province of China, für die anderen einfach Taiwan...
Pragmatisch wäre, einfach die Handhabung der offiziellen/diplomatischen Schweiz zu übernehmen.
- alle Abstimmungen, die die Republik Kosovo betreffen, unter dem
-
There is obviously not much consistency in how the referendum titles in the three languages are captured. According to the guidelines to add Swiss votings (
~/Arbeit/ZDA/Lokal/RDB/Materialen von Mayowa/CH_Vorgehen_Abstimmungseingabe.docx), thetitle_de(andtitle_frif one exists) are the official titles by the authorities andtitle_enis a translation of the German one. But- the Swiss authorities (sometimes) also translate the title to English themselves (example).
- the guidelines to add international votings (
~/Arbeit/ZDA/Lokal/RDB/Materialen von Mayowa/Intl_Vorgehen_Abstimmungseingabe.docx) don't say anything about the titles; but sometimes there's a German title for countries where almost certainly no official German version exists (e.g. Venezuela).
Therefore, we should define a better/stricter policy how titles are captured (and identify existing entries violating this policy, so they can be corrected).
-
Topics-Hierarchie anpassen:
- den
topic_tier_3"homosexuals" wohl etwas breiter fassen, bspw. "sexual orientation / gender identity". - den
topic_tier_3"compensation for loss of earnings for persons on military service or civil protection duty" kürzen!
Sonstige Vorschläge?
- den
-
We should ensure minimum quality of attachments (e.g. correct orientation, page ordering, OCR). Bad example: https://services.c2d.ch/s3_objects/referendum_5bbbf59192a21351232e2e65_0001.pdf
I could probably write some validation fn in pkg rdb that checks all PDFs for text content to determine if they're OCR'ed, but maybe there's already more sophisticated software available for this.
Ideally, the data portal should ensure the minimum quality requirements upon upload and display an informative warning in case of violation (with an opt-in override to upload nevertheless).
-
Prefix all Plotly fns with
plotly_instead ofplot_to avoid misconceptions. -
Adapt code to use NocoDB's API instead of
services.c2d.chafter we finished basic transition to NocoDB.We should probably either use
- the R package rapiclient which can automatically extract the relevant specs from the OpenAPI definition, or
- the OpenAPI Generator which apparently can generate a ready-made R package from an OpenAPI definition (see also this post).
-
Automated vote entry creation by feeding scraped sudd.ch data to
rdb::add_rfrnds(). -
For meaningful cross-time analyses, we need additional information about countries ("jurisdictions"):
-
Information about territorial changes. This is (at least for the most essential part) covered by our current ISO 3166-based country classification.
TODO: Investigate whether additional information could be sourced from the Correlates of War project's Territorial Change dataset.
-
Information about political/jurisdictional changes. Currently we don't cover this beyond ISO 3166. ISO 3166-1 numeric does indirectly cover those political changes which are also accompanied by territorial changes, but no intra-territorial changes.
Consider for example the country Lybia (
LY) which already experienced 4 major political systems which are not reflected in ISO 3166:- (United) Kingdom of Libya (1951--1969)
- Libyan Arab Republic (1969--1977)
- Great Socialist People's Libyan Arab Jamahiriya (1977--2011)
- National Transitional Council of Libya (2011--2012) / Libya (2012--)
Viable sources for political/jurisdictional changes of countries include:
- Wikidata
-
Information about (in)dependency status of geographical entities on the national level (i.e. "countries"). Currently, we treat external dependent territories like Norfolk Island (an external territory of Australia) the same as fully independent countries like Switzerland.
Ideally, we'd augment our data with an additional variable from a suitable external source that holds information about a country's (in)dependency status. See this Wikipedia article for an overview. Maybe we could source the information from Wikidata?
-
-
Add terminology reference complementing other structure information like the RDB codebook. E.g. we often use the term "voting" to refer to a referendum instance but we don't have a formal lookup reference for these kind of things.
-
Are
id_sudds stable over time? Maybe contact the creator Beat Müller and ask? -
Genauer abklären, inwieweit man Angaben von Swissvotes integrieren oder linken könnte. Evtl. Techniker hinter Swissvotes kontaktieren, um herauszufinden, mit welchen Weiterentwicklungen zu rechnen ist (Stichwort: API!); ein Blick in den Quellcode des swissdd-R-Pakets könnte womöglich ganz aufschlussreich sein!
-
Bei verschiedenen Abstimmungen meldet sudd.ch
"id=..." ist gelöscht , weil keine Volksabstimmung stattgefunden hat.
Betroffen sind die folgenden Abstimmungen:
- Ägypten 1976-06-10; bei uns: 5bbbe82f92a21351232e0381
- New Zealand 1931-12-02; bei uns 3 Einträge, jeweils einer pro Option: 5bbbe29792a21351232de3c9, 5bbbe29792a21351232de3c7, 5bbbe29792a21351232de3c7
- Puerto Rico 1952-11-04 (3. Abstimmungsvorlage "Abolition of certain social rights"), bei uns: 5bbbe2ba92a21351232ded1f
Falls in diesen Fällen tatsächlich keine Abstimmungen stattfanden, sollten die Einträge aus der RDB entfernt werden!
-
Bei der Abstimmung Norfolk Island 1980-07-10 meint sudd.ch, sie habe stattdessen 1979-07-10 stattgefunden. Falls sudd.ch Recht hat, sollte das korrigiert werden.
-
Silagadze & Gherghina (2019) (S. 467) detected some referendums that are missing in the database -> systematically check/add these!
They include Italy
1929and 1934,Andorra 1933,Austria 1938, Romania 2009,Slovenia 2015,Bulgaria 2016,Netherlands 2016,UK 2016.
-
A total of 858 referendums don't have a
typeset though it's a mandatory field (at least in the C2D admin interface) -> the missingtypes should be traced and added ASAP! -
For the following referendums, the
votes_*andelectorate_*numbers have to be double-checked and possibly corrected sinceelectorate_total < sum(votes_*), which should by definition be impossible:rdb::rfrnds() %>% rdb::add_turnout(excl_dubious = FALSE) %>% dplyr::filter(turnout > 1.0) %>% dplyr::select(id, electorate_total, matches("^votes_(yes|no|empty|invalid)"), turnout)
We should probably also double-check improbably high turnout numbers, e.g. those > 0.9:
rdb::rfrnds() %>% rdb::add_turnout(excl_dubious = FALSE) %>% dplyr::filter(dplyr::between(turnout, 0.9, 1.0)) %>% dplyr::select(id, electorate_total, matches("^votes_(yes|no|empty|invalid)"), turnout)
-
Complete and add Aargau cantonal referendums 1888--1971 once issue #29 is resolved.
Also add the referendums from the similar Excel sheets for the remaining 25 cantons we got. See the HLS R project for some partial data cleansing/tidying.
-
Voting with
id == "5bbbfee992a21351232e4f37"(Romania 2008-02-01) was limited to the region Szeklerland, thereforesubnational_entityshould be set toSzékely Land -
Sobald via Admin-Interface nach Draft-Status gefiltert werden kann, sollten die existierenden Drafts geprüft werden -> entweder vervollständigen und freischalten oder löschen!
-
Clean
id_official; there are likely erroneous entries or ones that don't designate anid_officialbut another kind of ID; entries to double-check:rdb::rfrnds() %>% dplyr::filter(stringr::str_detect(string = id_official, pattern = "\\D") | !(country_code == "CH" & level == "national") & !is.na(id_official))
Plus: Nobody knows what
id_official = "0"means, so it should be replaced withNA(if no properid_officialcan be determined). -
municipalityscheint inkonsistent zugewiesen; enthält Werte, die klar eine Gemeinde bezeichnen (bspw."London"), aber auch solche wie"Republic of Serbian Krajina until 1991"oderRepublic of Serbian People (1963-1992)... bei letzteren sollte- der passende
country_code_historicalfür "Yugoslavia" gesetzt werden - der
country_code = "CS"für den Folgestaat "Serbia and Montenegro" gesetzt werden (oder besser leer lassen? TBD!) is_past_jurisdiction = TRUEgesetzt werden- den gegenwärtigen Wert in
municipalitystattdessen insubnational_entity_nameeintragen
- der passende
-
Die Abstimmungen Netherlands 2005-04-08 und 2014-12-17 fanden genau genommen auf Sint Eustatius statt, siehe sudd.ch-Einträge (1, 2); Sint Eustatius ist zwar eine Besondere Gemeinde der Niederlande, besitzt aber einen eigenen ISO-Ländercode (BQ-SE) etc.
Sollte daher als
country_namenicht besser Sint Eustatius eingetragen werden? Andernfalls solltesubnational_entityauf"Sint Eustatius"gesetzt werden, da ja nicht die gesamte Niederlande abstimmen konnte! -
Die Abstimmungen France 2006-02-23 und 2006-09-06 beziehen sich auf Referenden in Sark,
country_namesollte daher auf"United Kingdom"oder (besser?)"Guernsey"gesetzt werden undsubnational_entity = "Sark"! -
Regarding data about subnational referendums in the US, we currently know of two up-to-date compilations:
-
Ballotpedia maintains a List of veto referendum ballot measures, seems to be very rich in information.
-
The National Conference of State Legislatures (NCSL) maintains a Statewide Ballot Measures Database that "includes all statewide ballot measures in the 50 states and the District of Columbia, starting over a century ago". It's unclear (to me) what data this database exactly includes, but it might also be a viable avenue for automated additions to our database.
The NCSL also provides information about the institutional conditions regarding direct democracy in the US states, e.g. here. Also very rich is the information that Wikipedia provides in the article Initiatives and referendums in the United States
-
-
Systematically inspect/handle all
applicability_constraintviolations (seevalidate_rfrnds(check_applicability_constraint = TRUE)). -
Systematically check if variables that are "completely dependent" on other variables (like
inst_trigger_actoroninst_trigger_type) are correctly filled.E.g. is
inst_trigger_typemissing for referendums with IDs5cb82f07cb48652399618eb1and6080ef7d4132d76d38bfe9e0althoughinst_trigger_actoris present!If the "completely dependent" property of these variables really holds, we should auto-fill them in the back-end and avoid the possibility of manual changes.
-
Systematically check if all votes in the
sudd.chdatabase are included in the RDB -> parsehttps://sudd.ch/list.php?mode=allrefs(theid_suddis part of the link in the last column)A challenge is to identify the bogus referendums included on sudd.ch like this one (totally fabricated by Gregor von Rezzori)
-
According to the guidelines in
~/Arbeit/ZDA/Lokal/RDB/Materialen von Mayowa/CH_Vorgehen_Abstimmungseingabe.docx, the PDFfilesofcountry_code == "CH"entries must be named consistentlyVoting_brochure_CH/Kantonskürzel_Jahr_Monat_Tag("Abstimmungsbroschüre"") andResults_CH/Kantonskürzel_Jahr_Monat_Tag(results) -> check if this is actually always the case! -
check
country_codefor obsolete codes, i.e. check ifISOcodes::ISO_3166_3$Alpha_4 %>% # exclude simple renamings where `country_code` didn't change magrittr::extract(stringr::str_sub(string = ., start = 3L) != "AA") %>% # extract former `country_code` stringr::str_sub(end = 2L) %>% # check magrittr::is_in(data2$country_code) %>% any()
and if so, assign
country_code_historical, setis_past_jurisdiction = TRUEand assign proper newcountry_code(ISOcodes::ISO_3166_3$Alpha_4 %>% stringr::str_sub(start = 3L); if it's"HH", a "manual" decision about which successor country we shall assign has to be hard-coded)` -
check
electorate_abroadfor obvious errors (e.g.id == "5f99b6c8d1291cc3961f1c2c"is one)
-
Die Datenbank braucht ein Logo! (Dann könnten auch passende Favicons generiert werden!)
-
Harvard Dataverse näher anschauen (Uwe meint, die RDB dort "aufzunehmen", könnte passen) und vergleichen mit Zenodo (siehe FA-Notizen) und Datahub.
Abklären:
-
Lizenzanforderungen? Default für Uploads ist CC0, aber es kann eine abweichende Lizenz definiert werden (es finden sich Beispiele mit ODbL 1.0!); darüber hinaus gelten nicht-rechtlich-bindende community norms (viel besserer Ansatz als in FORSbase!)
-
Datensätze tatsächlich hinterlegt oder nur Referenzen?
-
-
Der C2D-Link auf der ZDA-Webseite sollte auf HTTPS geändert werden!
-
IT-Firmen, die als Nachfolge für CCM Design möglicherweise in Frage kommen:
- Furqan Software; founder Mahmud Ridwan developed two notable Goldmark extensions, goldmark-d2 and goldmark-katex, which i.a. other notable activity proves he deeply understands how open source is best organized and engineered!
- PM TechHub, Slowenien: Für den Neubau der
rdb.vote-Hauptseite. Sie sind JAMStack-Profis und Maintainer des Git-basierten CMS Decap. - Brudi, Zürich
- Cloud68, Estland
- Liip, Zürich (und weitere CH-Städte)
- Ops One, Zürich
-
Kann mir jemand die genaue Bedeutung von
inst_object_revision_extentsowie den*precondition*-Variablen erklären (insb. ausinst_precondition_decisionwerde ich nicht schlau...)? -
Stimmen die Definitionen im Codebook so? Insb.:
committee_nametypeund alleinst_*-Variablen
-
inst_quorum_turnoutsollte standardisiert werden -> was wäre eine geeignete, abschliessende Menge an Werten? -
inst_object_legal_levelsollte m. E. in Relation zulevelstehen, tut es aber nicht. Dementsprechend kanninst_object_legal_levelmehrdeutig sein (Beispiel: Istinst_object_legal_level = "law"lokales, kantonales oder nationales Recht bei Referendum auf CH-Gemeindeebene?)Wir sollten das daher ändern (sprich auf eindeutige Weise erfassen. Vorschläge?
Würde dieses Problem behoben, könnte
inst_object_legal_levelvermutlich auch alsordinal_ascendingklassifiziert werden. -
Ist
position_government(ehemalsrecommendation) immer die Empfehlung der Regierung? Oder immer des Parlamentes? Oder manchmal dies, manchmal jenes?Zudem: Die Variable kennt gegenwärtig eig. 3 Ausprägungen, ich behandle den Wert
"None"allerdings alsNA, weil das gegenwärtige Admin-Portal keineNAs zulässt, sprich die Coder bei Unbekanntheit des Wertes gezwungen sind,"None"anzugeben. Wie siehst du das? -
Currently,
inst_trigger_thresholdis a free text field which is really bad for analysis since no coding consistency at all is enforced. Instead, we should define, in what way the same information could be captured in a more systematic way (splitting it into two varsinst_trigger_threshold_relativeandinst_trigger_threshold_absolutemight make sense), introduce the new variable and then convert the old values to the new format.Was meint Irina?
-
MongoDB/API: Track atomic edit history, traceable by author, and make it visually inspectable (some kind of diff viewer would be cool). On top of this, some method to easily undo specific or all edits by a specific user account should be added.
See issue #34 (point 3) for a tentative request and @liviass' answer about an already existing events collection (with no API endpoint so far).
-
MongoDB: optimize order of subvariables (
filesandcontext.votes_per_canton); doing this post-hoc in R is slow/inefficient, so getting the JSON in the desired order directly from the API would be cool...but is this actually possible? generally, the order of variables in the returned JSON seems random: compare e.g.
dateof here vs. here -
C2D Website: Ditch the
by ccm.designpromo in the footer?## Reported to CCM Design -
Publish code under AGPL >= 3, see issue #26
We should then make the repository public!
-
C2D Admin Front-end: Add possibility to filter by draft status (binary) and color draft rows (e.g. in orange) -> see issue #26
-
C2D Website: Lizenzierung der Daten fehlt (betrifft auch den Download via rdb-Paket)! ODC-ODbL würde sich anbieten. -> see issue #37
Once this is implemented, the same license terms should be added to the rdb package documentation!
-
Introduce
date_time_last_editedholding the timestamp of a referendum entry's last edit. -> see issue #29 -
Add
id_officialandid_sudd! Then I can populate them with the (corrected) data from the formernumbervariable andnumbercan be deleted. -> see issue #29 -
Introduce
subnational_entity_code; ISO 3166-2 codes seem perfectly suitableopen points:
-
we need to establish a policy on which
country_codeto assign to subnational entitites that have both an ISO 3166-1 country code as well as an ISO-3166-2 subdivision code (recommended: avoid using the own ISO 3166-1 country code but assign the one from the subdivision's "parent" country) -
how to deal with subnational entity changes? ISO 3166-2 is regularly updated but there doesn't seem to exist an equivalent to ISO-3166-3 codes on the subnational level...
-
how to deal with "inofficial" / non-authorized subnational entities that don't have an ISO-3166-2 code?
-
(I think introducing dedicated variables to capture the administrative division hierarchy below the national level in a more fine-grained way makes little sense since administrative division levels vary widely across the globe.)
-> see issue #29
-
-
Introduce
is_past_jurisdictionsignifying if the relevant jurisdiction where the referendum took place still exists (FALSE) or not (TRUE) -> see issue #29 -
Introduce
country_code_historicalthat holds the ISO 3166-3 code for referendums in countries that don't exist anymore (see also this site by Statistics Canada; also informative: https://en.wikipedia.org/wiki/United_Nations_list_of_Non-Self-Governing_Territories); ISO 3166-3 seems to only assign codes for countries that ceased to exist since 1974 -> is there any classification for older historical entities? -> see issue #29 -
Introduce
questionholding the referendum question 1:1 as it was asked andquestion_encontaining an English translation; open question: what to do when the question was officially asked in multiple languages like in CH? -> see issue #29 -
Outsource institutional variables into separate database/MongoDB collection and adapt everything. -> see issue #42
-
Extend the set of variables, so the
remarksfield isn't overloaded anymore. Possible extensions (taken from Louis'remarksstructure (cf.~/Arbeit/ZDA/Lokal/RDB/Materialen von Mayowa/Intl_Vorgehen_Abstimmungseingabe.docx)):- Background information on the vote (most important actors and events (sudd.ch, Wikipedia, NZZ etc.), content/main points)
- Voting question; original language
- Voting question; English (translation if necessary)
- Legal basis
- Name of the institution in original language
- Specialities of the institution (e.g. special quorum or 2 collecting periods)
- Specialities of the result (e.g. contradictory numbers)
-
topics: Adapt back-end to apply the same 3-tier topics logic that the R package does:- Parent topics should be implicit, i.e. it should be impossible to select a parent topic and one of its respective childs topics at the same time (selecting a child topic should always result in implicit selection of its parent (e.g. in a different (e.g. faded) color)).
- The upper limit of 3 topics should refer to main topics (i.e. excluding any implicit parent topics).
- Based on the user's selection of main topics, parent topics should automatically be derived from child topics based on the hierarchical topic
structure and all the topics should be assigned to the 3 variables
topics_tier_1,topics_tier_2,topics_tier_3.
-> see issue #41
-
Standardize
subnational_entity_name; ISO 3166-2 country subdivision names (definition in chap. 3.29) seem suitable (mapping codes <-> names in R viaISOcodes::ISO_3166_2; note that for some subdivisions, different names exist for multiple languages, e.g. some Swiss cantons;ISOcodes::ISO_3166_2only tracks one name (the most "native" one per subdivision, I guess))-> see issue #44
-
Rethink standardization of
country_name. Current problem: standardization happens only when creating/editing entries. Thus, it's not consistent, e.g. forcountry_code == "GB"there are entries from before the relaunch withcountry_name = "United Kingdom", and there are newer entries with the auto-deducedcountry_name = "United Kingdom of Great Britain and Northern Ireland".See
countrycode::codelistfor possible standards; ISO 3166 English short country names (countrycode::codelist$iso.name.en) seem most promising.Ideally, this would be done in the API back-end so
country_nameis determined at request time if possible.-> see issue #43 for a closed (and partially invalid) first problem report and #51 for a follow-up requesting an improved UX.
-
Deduplicate file attachments! Currently, file attachments like voting brochures which apply to multiple proposals on the same ballot date are attached to each individual proposal, thus resulting in file duplications. Example: subnational ballot date in ZH, CH @ 2022-05-15 has 4 proposals and the file
voting_brochure_zh_2022_05_15_de.pdfis uploaded 4 different times to our Amazon S3 bucket:62d6760ca52c3995043a8a1e: https://services.c2d.ch/s3_objects/referendum_62d6760ca52c3995043a8a1e_0001.pdf62d67203a52c3995043a8a16: https://services.c2d.ch/s3_objects/referendum_62d67203a52c3995043a8a16_0001.pdf62d66e97a52c3995043a8a0f: https://services.c2d.ch/s3_objects/referendum_62d66e97a52c3995043a8a0f_0001.pdf62d66ce0a52c3995043a8a08: https://services.c2d.ch/s3_objects/referendum_62d66ce0a52c3995043a8a08_0002.pdf
Ideally, we'd have two different attachment types:
- Attachments that belong to an individual proposal.
- Attachments that belong to a whole ballot date in a jurisdiction, i.e. all proposals at that ballot.
That way uploading and assigning e.g. a voting brochure to referendums would be a single action.
Rough outline of the procedure for introducing the second attachment type:
-
Create new ballot-date-level database with primary key
country_code_historical/country_code+subnational_entity_code+municipality+date, plus a field for attachment metadata (to be discussed what exactly is sensible here). -
Create necessary API endpoints and front-end logic for type 2 attachments.
-
Treat all existing attachments as belonging to individual proposals (type 1).
-
Programmatically identify the attachments that belong to ballot dates (type 2) instead by comparing file hashes (open question: are file hashes already available from S3 some way or do we have to download all attachments and calculate them ourselves?) and convert them to type 2.
-
C2D admin front-end: Louis' Dokument
~/Arbeit/ZDA/Lokal/RDB/Materialen von Mayowa/Intl_Louis/3_Test_Datenbank.docx -
C2D website: Möglichkeit zum Report falscher/fehlender Daten schaffen! Bevor CCM Design damit beauftragt wird, sollten wir definieren, wie ungefähr das aussehen soll. Bspw. einfach via HTML-Formular mit geeigneten Feldern (je nach Seite, von dem es aufgerufen wird, bereits vorbefüllt (
country_code,level,idetc.))? -
C2D website: The about text should be overhauled.
-
C2D website: The listing of referendums should be overhauled. It currently lacks important information, e.g.
level. -
Once referendum deletions are possible on production servers, extend tests to modify every single data field individually.
-
Once issue #82 is fixed, remove/adapt all remaining code handling
country_code_historicalandis_past_jurisdiction(especially the sudd.ch-related fns have to be overhauled) -
As soon as issue #57 is resolved, properly process the question variables (and adapt codebook).
-
Implement fn to rename file attachments as soon as issue #69 is resolved.