|
| 1 | +# Tutorial |
| 2 | + |
| 3 | +## Explanation |
| 4 | + |
| 5 | +To simplify, let's consider a single profile, for a single year. |
| 6 | +Let's denote it as $p_i$, where $i = 1,\dots,N$. |
| 7 | +The clustering process consists of: |
| 8 | + |
| 9 | +1. Split `N` into (let's assume equal) _periods_ of size `m = period_duration`. |
| 10 | + We can rename $p_i$ as |
| 11 | + |
| 12 | + $$p_{j,k}, \qquad \text{where} \qquad j = 1,\dots,m, \quad k = 1,\dots,N/m.$$ |
| 13 | +2. Compute `num_rps` representative periods |
| 14 | + |
| 15 | + $$r_{j,\ell}, \qquad \text{where} \qquad j = 1,\dots,m, \qquad \ell = 1,\dots,\text{num\_rps}.$$ |
| 16 | +3. During computation of the representative periods, we obtained weight |
| 17 | + $w_{k,\ell}$ between the period $k$ and the representative period $\ell$, |
| 18 | + such that |
| 19 | + |
| 20 | + $$p_{j,k} = \sum_{\ell = 1}^{\text{num\_rps}} r_{j,\ell} \ w_{k,\ell}, \qquad \forall j = 1,\dots,m, \quad k = 1,\dots,N/m$$ |
| 21 | + |
| 22 | +## High level API/DuckDB API |
| 23 | + |
| 24 | +!!! note "High level API" |
| 25 | + This tutorial focuses on the highest level of the API, which requires the |
| 26 | + use of a DuckDB connection. |
| 27 | + |
| 28 | +The high-level API of TulipaClustering focuses on using TulipaClustering as part of the [Tulipa workflow](@ref TODO). |
| 29 | +This API consists of three main functions: [`transform_wide_to_long!`](@ref), [`cluster!`](@ref), and [`dummy_cluster!`](@ref). |
| 30 | +In this tutorial we'll use all three. |
| 31 | + |
| 32 | +Normally, you will have the DuckDB connection from the larger Tulipa workflow, |
| 33 | +so here we will create a temporary connection with fake data to show an example |
| 34 | +of the workflow. You can look into the source code of this documentation to see |
| 35 | +how to create this fake data. |
| 36 | + |
| 37 | +```@setup duckdb_example |
| 38 | +using DuckDB |
| 39 | +
|
| 40 | +connection = DBInterface.connect(DuckDB.DB) |
| 41 | +DuckDB.query( |
| 42 | + connection, |
| 43 | + "CREATE TABLE profiles_wide AS |
| 44 | + SELECT |
| 45 | + 2030 AS year, |
| 46 | + i + 24 * (p - 1) AS timestep, |
| 47 | + 4 + 0.3 * cos(4 * 3.14 * i / 24) + random() * 0.2 AS avail, |
| 48 | + solar_rand * greatest(0, (5 + random()) * cos(2 * 3.14 * (i - 12.5) / 24)) AS solar, |
| 49 | + 3.6 + 3.6 * sin(3.14 * i / 24) ^ 2 * (1 + 0.3 * random()) AS demand, |
| 50 | + FROM |
| 51 | + generate_series(1, 24) AS _timestep(i) |
| 52 | + CROSS JOIN ( |
| 53 | + SELECT p, RANDOM() AS solar_rand |
| 54 | + FROM generate_series(1, 7 * 4) AS _period(p) |
| 55 | + ) |
| 56 | + ORDER BY timestep |
| 57 | + ", |
| 58 | +) |
| 59 | +``` |
| 60 | + |
| 61 | +Here is the content of that connection: |
| 62 | + |
| 63 | +```@example duckdb_example |
| 64 | +using DataFrames, DuckDB |
| 65 | +
|
| 66 | +nice_query(str) = DataFrame(DuckDB.query(connection, str)) |
| 67 | +nice_query("show tables") |
| 68 | +``` |
| 69 | + |
| 70 | +And here is the first rows of `profiles_wide`: |
| 71 | + |
| 72 | +```@example duckdb_example |
| 73 | +nice_query("from profiles_wide limit 10") |
| 74 | +``` |
| 75 | + |
| 76 | +And finally, this is the plot of the data: |
| 77 | + |
| 78 | +```@example duckdb_example |
| 79 | +using Plots |
| 80 | +
|
| 81 | +table = DuckDB.query(connection, "from profiles_wide") |
| 82 | +plot(size=(800, 400)) |
| 83 | +timestep = [row.timestep for row in table] |
| 84 | +for profile_name in (:avail, :solar, :demand) |
| 85 | + value = [row[profile_name] for row in table] |
| 86 | + plot!(timestep, value, lab=string(profile_name)) |
| 87 | +end |
| 88 | +plot!() |
| 89 | +``` |
| 90 | + |
| 91 | +## Transform a wide profiles table into a long table |
| 92 | + |
| 93 | +!!! warning "Required" |
| 94 | + The long table format is a requirement of TulipaClustering, even for the dummy clustering example. |
| 95 | + |
| 96 | +In this context, a wide table is a table where each new profile occupies a new column. A long table is a table where the profile names are stacked in a column with the corresponding values in a separate column. |
| 97 | +Given the name of the source table (in this case, `profiles_wide`), we can create a long table with the following call: |
| 98 | + |
| 99 | +```@example duckdb_example |
| 100 | +using TulipaClustering |
| 101 | +
|
| 102 | +transform_wide_to_long!(connection, "profiles_wide", "input_profiles") |
| 103 | +
|
| 104 | +nice_query("FROM input_profiles LIMIT 10") |
| 105 | +``` |
| 106 | + |
| 107 | +The name `input_profiles` was chosen to conform with the expectations of the `TulipaEnergyModel.jl` format. |
| 108 | + |
| 109 | +## Dummy Clustering |
| 110 | + |
| 111 | +A dummy cluster will essentially ignore the clustering and create the necessary tables for the next steps in the Tulipa workflow. |
| 112 | + |
| 113 | +```@example duckdb_example |
| 114 | +for table_name in ( |
| 115 | + "cluster_rep_periods_data", |
| 116 | + "cluster_rep_periods_mapping", |
| 117 | + "cluster_profiles_rep_periods", |
| 118 | +) |
| 119 | + DuckDB.query(connection, "DROP TABLE IF EXISTS $table_name") |
| 120 | +end |
| 121 | +
|
| 122 | +clusters = dummy_cluster!(connection) |
| 123 | +
|
| 124 | +nice_query("FROM cluster_rep_periods_data LIMIT 5") |
| 125 | +``` |
| 126 | + |
| 127 | +```@example duckdb_example |
| 128 | +nice_query("FROM cluster_rep_periods_mapping LIMIT 5") |
| 129 | +``` |
| 130 | + |
| 131 | +```@example duckdb_example |
| 132 | +nice_query("FROM cluster_profiles_rep_periods LIMIT 5") |
| 133 | +``` |
| 134 | + |
| 135 | +## Clustering |
| 136 | + |
| 137 | +We can perform a real clustering by using the [`cluster!`](@ref) function with two extra arguments (see [Explanation](@ref) for their deeped meaning): |
| 138 | + |
| 139 | +- `period_duration`: How long are the split periods; |
| 140 | +- `num_rps`: How many representative periods. |
| 141 | + |
| 142 | +```@example duckdb_example |
| 143 | +period_duration = 24 |
| 144 | +num_rps = 3 |
| 145 | +
|
| 146 | +for table_name in ( |
| 147 | + "cluster_rep_periods_data", |
| 148 | + "cluster_rep_periods_mapping", |
| 149 | + "cluster_profiles_rep_periods", |
| 150 | +) |
| 151 | + DuckDB.query(connection, "DROP TABLE IF EXISTS $table_name") |
| 152 | +end |
| 153 | +
|
| 154 | +clusters = cluster!(connection, period_duration, num_rps) |
| 155 | +
|
| 156 | +nice_query("FROM cluster_rep_periods_data LIMIT 5") |
| 157 | +``` |
| 158 | + |
| 159 | +```@example duckdb_example |
| 160 | +nice_query("FROM cluster_rep_periods_mapping LIMIT 5") |
| 161 | +``` |
| 162 | + |
| 163 | +```@example duckdb_example |
| 164 | +nice_query("FROM cluster_profiles_rep_periods LIMIT 5") |
| 165 | +``` |
| 166 | + |
| 167 | +## [TODO](@id TODO) |
| 168 | + |
| 169 | +- [ ] Link to TulipaWorkflow |
0 commit comments