You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+36-3Lines changed: 36 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ Consumers:
47
47
48
48
### Triggering workflows
49
49
50
-
In order to unify the workflow triggering mechanism, we use [a Cloud Run function](./src/README.md) that can be invoked in a number of ways (e.g. listen to PubSub messages), do intermediate checks and trigger the particular Dataform workflow execution configuration.
50
+
In order to unify the workflow triggering mechanism, we use [a Cloud Run function](./infra/README.md) that can be invoked in a number of ways (e.g. listen to PubSub messages), do intermediate checks and trigger the particular Dataform workflow execution configuration.
51
51
52
52
## Contributing
53
53
@@ -59,5 +59,38 @@ In order to unify the workflow triggering mechanism, we use [a Cloud Run functio
59
59
60
60
#### Workspace hints
61
61
62
-
1. In `workflow_settings.yaml` set `env_name: dev` to process sampled data.
63
-
2. In `includes/constants.js` set `today` or other variables to a custome value.
62
+
1. In `workflow_settings.yaml` set `environment: dev` to process sampled data.
63
+
2. For development and testing, you can modify variables in `includes/constants.js`, but note that these are programmatically generated.
64
+
65
+
## Repository Structure
66
+
67
+
-`definitions/` - Contains the core Dataform SQL definitions and declarations
68
+
-`output/` - Contains the main pipeline transformation logic
69
+
-`declarations/` - Contains referenced tables/views declarations and other resources definitions
70
+
-`includes/` - Contains shared JavaScript utilities and constants
71
+
-`infra/` - Infrastructure code and deployment configurations
72
+
-`dataform-trigger/` - Cloud Run function for workflow automation
Copy file name to clipboardExpand all lines: definitions/output/blink_features/usage.js
+44-20Lines changed: 44 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -2,14 +2,36 @@ publish('usage', {
2
2
schema: 'blink_features',
3
3
type: 'incremental',
4
4
protected: true,
5
+
bigquery: {
6
+
partitionBy: 'date',
7
+
clusterBy: ['client','rank','feature']
8
+
},
9
+
description: 'Used in https://lookerstudio.google.com/u/0/reporting/1M8kXOqPkwYNKjJhtag_nvDNJCpvmw_ri/page/tc5b, embedded in https://chromestatus.com/metrics/feature/timeline/popularity/2203',
5
10
tags: ['crawl_complete','blink_report']
6
11
}).preOps(ctx=>`
7
12
DELETE FROM ${ctx.self()}
8
-
WHERE yyyymmdd = REPLACE('${constants.currentMonth}', '-', '');
13
+
WHERE date = '${constants.currentMonth}';
9
14
`).query(ctx=>`
15
+
WITH pages AS (
10
16
SELECT
11
-
REPLACE(CAST(date AS STRING), '-', '') AS yyyymmdd,
17
+
date,
12
18
client,
19
+
rank,
20
+
page,
21
+
features
22
+
FROM ${ctx.ref('crawl','pages')}
23
+
WHERE
24
+
date = '${constants.currentMonth}' AND
25
+
is_root_page = TRUE
26
+
${constants.devRankFilter}
27
+
), ranks AS (
28
+
SELECT DISTINCT rank FROM pages
29
+
)
30
+
31
+
SELECT
32
+
date,
33
+
client,
34
+
rank,
13
35
id,
14
36
feature,
15
37
type,
@@ -19,20 +41,22 @@ SELECT
19
41
sample_urls
20
42
FROM (
21
43
SELECT
22
-
yyyymmdd AS date,
44
+
date,
23
45
client,
24
-
id,
25
-
feature,
26
-
type,
27
-
COUNT(DISTINCT url) AS num_urls,
28
-
ARRAY_AGG(url ORDER BY rank, url LIMIT 100) AS sample_urls
29
-
FROM ${ctx.ref('blink_features','features')}
30
-
WHERE
31
-
yyyymmdd = '${constants.currentMonth}'
32
-
${constants.devRankFilter}
46
+
ranks.rank,
47
+
feature.id,
48
+
feature.feature,
49
+
feature.type,
50
+
COUNT(DISTINCT page) AS num_urls,
51
+
ARRAY_AGG(page ORDER BY pages.rank, page LIMIT 100) AS sample_urls
0 commit comments