Skip to content

Commit

Permalink
Update documentation on writing queries
Browse files Browse the repository at this point in the history
* Bump gaarf-py to 1.9.0

Change-Id: I1fa82662df79b20066f52d1ea787a502a15ff427
  • Loading branch information
AVMarkin committed Jul 25, 2023
1 parent 8df0b70 commit bc6a5d3
Show file tree
Hide file tree
Showing 3 changed files with 178 additions and 203 deletions.
230 changes: 29 additions & 201 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Google Ads API Report Fetcher (gaarf)
[![Downloads npm](https://img.shields.io/npm/dw/google-ads-api-report-fetcher?logo=npm)](https://www.npmjs.com/package/google-ads-api-report-fetcher)
[![PyPI](https://img.shields.io/pypi/v/google-ads-api-report-fetcher?logo=pypi&logoColor=white&style=flat-square)](https://pypi.org/project/google-ads-api-report-fetcher/)
[![Downloads PyPI](https://img.shields.io/pypi/dw/google-ads-api-report-fetcher?logo=pypi)](https://pypi.org/project/google-ads-api-report-fetcher/)
[![GitHub Workflow CI](https://img.shields.io/github/actions/workflow/status/google/ads-api-report-fetcher/pytest.yaml?branch=main&label=pytest&logo=python&logoColor=white&style=flat-square)](https://github.com/google/ads-api-report-fetcher/actions/workflows/pytest.yaml?branch=main)


## Table of content
Expand Down Expand Up @@ -75,7 +76,7 @@ Options:
* `account` - Ads account id, aka customer id, it can contain multiple ids separated with comma, also can be specified in google-ads.yaml as 'customer-id' (as string or list)
* `input` - input type - where queries are coming from (Python only). Supports the following values:
* `file` - (default) local or remote (GCS, S3, Azure, etc.) files
* `console` - data are read from standard output
* `console` - data are read from standard input
* `output` - output type, Supports the following values:
* `csv` - write data to CSV files
* `bq` or `bigquery` - write data to BigQuery
Expand Down Expand Up @@ -124,24 +125,14 @@ Options specific for SqlAlchemy writer (*Python version only*):
* `sqldb.connection-string` to specify where to write the data (see [more](https://docs.sqlalchemy.org/en/14/core/engines.html))
* `sqldb.if-exists` - specify how to behave if the table already exists (see [more](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html))

#### Query specific options

All parameters whose names start with the `macro.` prefix are passed to queries as params object.
For example if we pass parameters: `--macro.start_date=2021-12-01 --macro.end_date=2022-02-28`
then inside sql we can use `start_date` and `end_date` parameters in curly brackets:
```sql
AND segments.date >= "{start_date}"
AND segments.date <= "{end_date}"
```
If your query contains macros, templates, or sql you need to pass `--macro.`, `--template.`, or `--sql.` CLI flags to to `gaarf`.
Lear more about each of those in [How to write queries](docs/how-to-write-queries.md) document:
* [Macros](docs/how-to-write-queries.md#macros)
* [Templates](docs/how-to-write-queries.md#templates)
* [Sql](docs/how-to-write-queries.md#sql)

Full example:
```
gaarf google_ads_queries/*.sql --ads-config=google-ads.yaml \
--account=1234567890 --output=bq \
--macro.start_date=2021-12-01 \
--macro.end_date=2022-02-28 \
--bq.project=my_project \
--bq.dataset=my_dataset
```

If you run Python version of `gaarf` you can provide query directly from console:

Expand All @@ -166,12 +157,12 @@ gaarf-bq <files> [options]
gaarf-sql <files> [options]
```

Options:
* `sql.*` - named SQL parameters to be used in queries as `@param`. E.g. a parameter 'date' supplied via cli as `--sql.date=2022-06-01` can be used in query as `@date` in query.
* `macro.*` - macro parameters to substitute into queries as `{param}`. E.g. a parameter 'dataset' supplied via cli as `--macro.dataset=myds` can be used as `{dataset}` in query's text.
* `template.*` - parameters for templates, strings with "," will be converted to lists/arrays
If your query contains macros, templates, or sql you need to pass `--macro.`, `--template.`, or `--sql.` CLI flags to to `gaarf-bq` or `gaarf-sql`.
Lear more about each of those in [How to write queries](docs/how-to-write-queries.md) document:
* [Macros](docs/how-to-write-queries.md#macros)
* [Templates](docs/how-to-write-queries.md#templates)
* [Sql](docs/how-to-write-queries.md#sql)


The tool assumes that scripts you provide are DDL, i.e. contains statements like create table or create view.

In general it's recommended to separate tables with data from Ads API and final tables/views created by your post-processing queries.
Expand All @@ -182,15 +173,15 @@ In general it's recommended to separate tables with data from Ads API and final
* `dataset-location` - BigQuery [locations](https://cloud.google.com/bigquery/docs/locations) for newly created dataset(s)

So it's likely that your final tables will be in a separate dataset (or datasets). To allow the tool to create those datasets for you, make sure that macro for your datasets contains the word "dataset".
In that case gaarf-bq will check that a dataset exists and create it if not.
In that case `gaarf-bq` will check that dataset exists and create it if not.


For example:
```
CREATE OR REPLACE TABLE `{dst_dataset}.my_dashboard_table` AS
SELECT * FROM {ads_ds}.{campaign}
```
In this case gaarf-bq will check for existance of a dataset specified as 'dst_dataset' macro.
In this case `gaarf-bq` will check for existence of a dataset specified as 'dst_dataset' macro.

**SqlAlchemy specific options [Python only]:**
* `connection-string` - specific connection to the selected DB (see [more](https://docs.sqlalchemy.org/en/14/core/engines.html))
Expand All @@ -210,181 +201,6 @@ export GAARF_DB_PORT=12345
export GAARF_DB_NAME=test
```

**Common options**

There are three type of parameters that you can pass to a script: `macro`, `sql`, and `template`.

*Macro*

Macro is just a substitution in script text.
For example:
```
SELECT *
FROM {dst_dataset}.{table-src}
```
Here `dst_dataset` and `table-src` are macros that can be supplied as:
```
gaarf-bq --macro.table-src=table1 --macro.dst_dataset=dataset1
```

*SQL*

You can also use normal sql type parameters with `sql` argument:
```
SELECT *
FROM {dst_dataset}.{table-src}
WHERE name LIKE @name
```
and to execute:
`gaarf-bq --macro.table-src=table1 --macro.dst_dataset=dataset1 --sql.name='myname%'`

it will create a parameterized query to run in BQ:
```
SELECT *
FROM dataset1.table1
WHERE name LIKE @name
```

*Template*

Your SQL scripts can be templates using a template engine: [Jinja](https://jinja.palletsprojects.com) for Python and [Nunjucks](https://mozilla.github.io/nunjucks/) for NodeJS.
A script will be processed as a template if and only if you supplied `template` argument.

Inside templates you can use appropriate syntax and control structues of a template engine (Jinja/Nunjucks).
They are mostly compatible but please consult the documentations if you migrate between platforms (Python <-> NodeJS).

Usually inside template blocks you use some variable (in if-else/for-loop). To pass their values you use `--template` arguments.

Example:
```
SELECT
customer_id AS
{% if level == "0" %}
root_account_id
{% else %}
leaf_account_id
{% endif %}
FROM dataset1.table1
WHERE name LIKE @name
```
and to execute:

`gaarf-bq path/to/query.sql --template.level=0`

This will create a column named either `root_account_id` since the specified level is 0.

Please note that all values passed through CLI arguments are strings. But there's a special case - a value containing ","
then it's treated as an array - see the following example.

Template are great when you need to create multiple column based on condition:

```
SELECT
{% for day in cohort_days %}
SUM(GetCohort(lag_data.installs, {{day}})) AS installs_{{day}}_day,
{% endfor %}
FROM asset_performance
```
and to execute:

`gaarf-bq path/to/query.sql --template.cohort_days=0,1,3,4,5,10,30`

It will create 7 columns (named `installs_0_day`, `installs_1_day`, etc) because the cohort_days argument was processed as a list.

ATTENTION: passing macros into sql queries is vulnerable to sql-injection so be very careful where you're taking values from.


## Expressions and Macros
> *Note*: currently expressions are supported only in NodeJS version.
As noted earlier both Ads queries and BigQuery queries support macros. They are named values than can be passed alongside
parameters (e.g. command line, config files) and substituted into queries. Their syntax is `{name}`.
On top of this queries can contain expressions. The syntax for expressions is `${expression}`.
They will be executed right after macros substitution. So macros can contain expressions inside.
Both expressions and macros deal with query text before submitting it for execution.
Inside expression block we can do anything that the MathJS library supports - see https://mathjs.org/docs/index.html,
plus work with date and time. It's all sort of arithmetic operations, strings and dates manipulations.

One typical use-case - evaluate date/time expressions to get dynamic date conditions in queries. These are when you don't provide
a specific date but evaluate it right in the query. For example, applying a condition for date range for last month,
which can be expressed as a range from today minus 1 month to today (or yesterday):
```
WHERE start_date >= '${today()-period('P1M')}' AND end_date <= '${today()}'
```
will be evaluated to:
`WHERE start_date >= '2022-06-20 AND end_date <= '2022-07-20'`
if today is 2022 July 20th.

Also you can use expressions for making table names dynamic (in BQ scripts), e.g.
```
CREATE OR REPLACE TABLE `{bq_dataset}_bq.assetssnapshots_${format(yesterday(),'yyyyMMdd')}` AS
```

Supported functions:
* `datetime` - factory function to create a DateTime object, by default in ISO format (`datetime('2022-12-31T23:59:59')`) or in a specified format in the second argument (`datetime('12/31/2022 23:59','M/d/yyyy hh:mm')`)
* `date` - factory function to create a Date object, supported formats: `date(2022,12,31)`, `date('2022-12-31')`, `date('12/31/2022','M/d/yyyy')`
* `duration` - returns a Duration object for a string in [ISO_8601](https://en.wikipedia.org/wiki/ISO_8601#Durations) format (PnYnMnDTnHnMnS)
* `period` - returns a Period object for a string in [ISO_8601](https://en.wikipedia.org/wiki/ISO_8601#Durations) format (PnYnMnD)
* `today` - returns a Date object for today date
* `yesterday` - returns a Date object for yesterday date
* `tomorrow` - returns a Date object for tomorrow date
* `now` - returns a DateTime object for current timestamp (date and time)
* `format` - formats Date or DateTime using a provided format, e.g. `${format(date('2022-07-01'), 'yyyyMMdd')}` returns '20220701'

Please note functions without arguments still should called with brackets (e.g. `today()`)

For dates and datetimes the following operations are supported:
* add or subtract Date and Period, e.g. `today()-period('P1D')` - subtract 1 day from today (i.e. yesterday)
* add or subtract DateTime and Duration, e.g. `now()-duration('PT12H')` - subtract 12 hours from the current datetime
* for both Date and DateTime add or subtract a number meaning it's a number of days, e.g. `today()-1`
* subtract two Dates to get a Period, e.g. `tomorrow()-today()` - subtract today from tomorrow and get 1 day, i.e. 'P1D'
* subtract two DateTimes to get a Duration - similar to subtracting dates but get a duration, i.e. a period with time (e.g. PT10H for 10 hours)

By default all dates will be parsed and converted from/to strings in [ISO format]((https://en.wikipedia.org/wiki/ISO_8601)
(yyyy-mm-dd for dates and yyyy-mm-ddThh:mm:ss.SSS for datetimes).
But additionally you can specify a format explicitly (for parsing with `datetime` and `date` function and formatting with `format` function)
using standard [Java Date and Time Patterns](https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html):

* G Era designator
* y Year
* Y Week year
* M Month in year (1-based)
* w Week in year
* W Week in month
* D Day in year
* d Day in month
* F Day of week in month
* E Day name in week (e.g. Tuesday)
* u Day number of week (1 = Monday, ..., 7 = Sunday)
* a Am/pm marker
* H Hour in day (0-23)
* k Hour in day (1-24)
* K Hour in am/pm (0-11)
* h Hour in am/pm (1-12)
* m Minute in hour
* s Second in minute
* S Millisecond
* z Time zone - General time zone (e.g. Pacific Standard Time; PST; GMT-08:00)
* Z Time zone - RFC 822 time zone (e.g. -0800)
* X Time zone - ISO 8601 time zone (e.g. -08; -0800; -08:00)

Examples:
```
${today() - period('P2D')}
```
output: today minus 2 days, e.g. '2022-07-19' if today is 2022-07-21

```
${today()+1}
```
output: today plus 1 days, e.g. '2022-07-22' if today is 2022-07-21

```
${date(2022,7,20).plusMonths(1)}
```
output: "2022-08-20"


### Dynamic dates
Macro values can contain a special syntax for dynamic dates. If a macro value starts with *:YYYY* it will be processed
as a dynamic expression to calculate a date based on the current date.
Expand Down Expand Up @@ -418,11 +234,23 @@ But you can override it via arguments if needed (e.g. `--macro.date_iso=:YYYYMMD


## Docker
You can run Gaarf as a Docker container. At the moment we don't publish container images so you'll need to build it on your own.

You can run Gaarf as a Docker container.

```
export GAARF_ACCOUNT=123456
docker run \
-v $HOME/google-ads.yaml:/root/google-ads.yaml \
ghcr.io/google/gaarf-py:latest \
gaarf "SELECT customer.id AS account_id FROM customer" \
--input=console --output=console \
--account=$GAARF_ACCOUNT --ads_config=/root/google-ads.yaml
```

### Build a container image
The repository contains sample `Dockerfile`'s for both versions ([Node](js/Dockerfile)/[Python](py/Dockerfile))
that you can use to build a Docker image.

### Build a container image
If you cloned the repo then you can just run `docker build` (see below) inside it (in js/py folders) with the local [Dockerfile](js/Dockerfile).
Otherwise you can just download `Dockerfile` into an empty folder:
```
Expand Down
Loading

0 comments on commit bc6a5d3

Please sign in to comment.