Skip to content

Commit 508f5dc

Browse files
author
Quarto GHA Workflow Runner
committed
Built site for gh-pages
1 parent 15db82a commit 508f5dc

4 files changed

Lines changed: 32 additions & 32 deletions

File tree

.nojekyll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
fa436ae3
1+
7ef5fdd7

index.html

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ <h2 class="anchored" data-anchor-id="scenario-auditing-a-partner-dataset">Scenar
216216
<section id="step-1-efficiently-loading-a-large-csv" class="level2">
217217
<h2 class="anchored" data-anchor-id="step-1-efficiently-loading-a-large-csv">Step 1: Efficiently Loading a Large CSV</h2>
218218
<p>Rather than using <code>pandas.read_csv()</code> directly, <strong>csvplus</strong> provides <code>load_optimized_csv()</code> to reduce memory usage automatically.</p>
219-
<div id="73856fcf" class="cell" data-execution_count="1">
219+
<div id="9cea5837" class="cell" data-execution_count="1">
220220
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> csvplus.load_optimized_csv <span class="im">import</span> load_optimized_csv</span>
221221
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
222222
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load a large csv dataset</span></span>
@@ -243,7 +243,7 @@ <h3 class="anchored" data-anchor-id="what-this-function-does">What this function
243243
<section id="step-2-resolving-inconsistent-string-values" class="level2">
244244
<h2 class="anchored" data-anchor-id="step-2-resolving-inconsistent-string-values">Step 2: Resolving Inconsistent String Values</h2>
245245
<p>Text fields such as company names are often inconsistent due to typos or formatting differences. The <code>resolve_string_value()</code> function uses fuzzy matching to standardize values.</p>
246-
<div id="60265d52" class="cell" data-execution_count="2">
246+
<div id="0eb8d769" class="cell" data-execution_count="2">
247247
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
248248
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> csvplus.data_correction <span class="im">import</span> resolve_string_value</span>
249249
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span>
@@ -278,7 +278,7 @@ <h2 class="anchored" data-anchor-id="step-2-resolving-inconsistent-string-values
278278
<section id="step-3-generating-a-data-summary-report" class="level2">
279279
<h2 class="anchored" data-anchor-id="step-3-generating-a-data-summary-report">Step 3: Generating a Data Summary Report</h2>
280280
<p>Before performing deeper analysis, it is often helpful to understand the structure and quality of the dataset.</p>
281-
<div id="bfdd8fdc" class="cell" data-execution_count="3">
281+
<div id="ff2c9a34" class="cell" data-execution_count="3">
282282
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
283283
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> csvplus.generate_report <span class="im">import</span> summary_report</span>
284284
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
@@ -338,7 +338,7 @@ <h2 class="anchored" data-anchor-id="step-3-generating-a-data-summary-report">St
338338
<li>Confidence intervals</li>
339339
</ul>
340340
<p>To inspect categorical columns:</p>
341-
<div id="94253df7" class="cell" data-execution_count="4">
341+
<div id="ed4459eb" class="cell" data-execution_count="4">
342342
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>categorical_stats.loc[<span class="st">'city'</span>, <span class="st">'n_unique'</span>]</span>
343343
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>categorical_stats.loc[<span class="st">'city'</span>, <span class="st">'n_unique'</span>]</span>
344344
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>categorical_stats.loc[<span class="st">'city'</span>, <span class="st">'top_values'</span>]</span>
@@ -352,7 +352,7 @@ <h2 class="anchored" data-anchor-id="step-3-generating-a-data-summary-report">St
352352
<section id="step-4-comparing-dataset-versions" class="level2">
353353
<h2 class="anchored" data-anchor-id="step-4-comparing-dataset-versions">Step 4: Comparing Dataset Versions</h2>
354354
<p>A week later, you receive an updated CSV file of the original CSV with potential schema and data changes. At this point, you want to understand <strong>what changed</strong> compared to the original dataset. After loading the dataset, the <code>data_version_diff()</code> function computes a structured comparison between the two DataFrames.</p>
355-
<div id="3577fc11" class="cell" data-execution_count="5">
355+
<div id="297e6e10" class="cell" data-execution_count="5">
356356
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
357357
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> csvplus.data_version_diff <span class="im">import</span> data_version_diff</span>
358358
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span>
@@ -408,19 +408,19 @@ <h2 class="anchored" data-anchor-id="step-4-comparing-dataset-versions">Step 4:
408408
<section id="step-5-inspecting-dataframe-changes-programmatically" class="level2">
409409
<h2 class="anchored" data-anchor-id="step-5-inspecting-dataframe-changes-programmatically">Step 5: Inspecting Dataframe Changes Programmatically</h2>
410410
<p>You can explore specific components of the diff object directly:</p>
411-
<div id="cfed0b8f" class="cell" data-execution_count="6">
411+
<div id="490aeee8" class="cell" data-execution_count="6">
412412
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>diff[<span class="st">"columns_added"</span>]</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
413413
<div class="cell-output cell-output-display" data-execution_count="5">
414414
<pre><code>['amount', 'category']</code></pre>
415415
</div>
416416
</div>
417-
<div id="650fd399" class="cell" data-execution_count="7">
417+
<div id="4f961608" class="cell" data-execution_count="7">
418418
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>diff[<span class="st">"row_count_change"</span>]</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
419419
<div class="cell-output cell-output-display" data-execution_count="6">
420420
<pre><code>{'old_row_count': 3, 'new_row_count': 4, 'row_difference': 1}</code></pre>
421421
</div>
422422
</div>
423-
<div id="c1dee536" class="cell" data-execution_count="8">
423+
<div id="be427c76" class="cell" data-execution_count="8">
424424
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>diff[<span class="st">"missing_value_changes"</span>]</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
425425
<div class="cell-output cell-output-display" data-execution_count="7">
426426
<div>
@@ -457,7 +457,7 @@ <h2 class="anchored" data-anchor-id="step-5-inspecting-dataframe-changes-program
457457
</div>
458458
</div>
459459
</div>
460-
<div id="d8847202" class="cell" data-execution_count="9">
460+
<div id="eb4381e1" class="cell" data-execution_count="9">
461461
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>diff[<span class="st">"numeric_summary_changes"</span>]</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
462462
<div class="cell-output cell-output-display" data-execution_count="8">
463463
<div>
@@ -518,7 +518,7 @@ <h2 class="anchored" data-anchor-id="step-5-inspecting-dataframe-changes-program
518518
<section id="step-6-displaying-a-human-readable-report" class="level2">
519519
<h2 class="anchored" data-anchor-id="step-6-displaying-a-human-readable-report">Step 6: Displaying a Human-Readable Report</h2>
520520
<p>For interactive use, <strong>csvplus</strong> provides a clean, console-friendly summary of changes in your dataframes from step 5.</p>
521-
<div id="7721ba0a" class="cell" data-execution_count="10">
521+
<div id="1e29af5b" class="cell" data-execution_count="10">
522522
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> csvplus.data_version_diff <span class="im">import</span> display_data_version_diff</span>
523523
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span>
524524
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>display_data_version_diff(diff)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>

sitemap.xml

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,46 +2,46 @@
22
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
33
<url>
44
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/CODE_OF_CONDUCT.html</loc>
5-
<lastmod>2026-02-03T18:43:56.148Z</lastmod>
5+
<lastmod>2026-02-03T21:25:58.684Z</lastmod>
66
</url>
77
<url>
88
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/reference/data_version_diff.html</loc>
9-
<lastmod>2026-02-03T18:44:46.897Z</lastmod>
9+
<lastmod>2026-02-03T21:26:43.360Z</lastmod>
1010
</url>
1111
<url>
1212
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/reference/index.html</loc>
13-
<lastmod>2026-02-03T18:44:46.851Z</lastmod>
13+
<lastmod>2026-02-03T21:26:43.314Z</lastmod>
1414
</url>
1515
<url>
1616
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/index.html</loc>
17-
<lastmod>2026-02-03T18:43:56.149Z</lastmod>
17+
<lastmod>2026-02-03T21:25:58.685Z</lastmod>
1818
</url>
1919
<url>
2020
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/DEVELOPMENT.html</loc>
21-
<lastmod>2026-02-03T18:43:56.148Z</lastmod>
21+
<lastmod>2026-02-03T21:25:58.684Z</lastmod>
2222
</url>
2323
<url>
2424
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/CHANGELOG.html</loc>
25-
<lastmod>2026-02-03T18:43:56.148Z</lastmod>
25+
<lastmod>2026-02-03T21:25:58.684Z</lastmod>
2626
</url>
2727
<url>
2828
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/tutorial.html</loc>
29-
<lastmod>2026-02-03T18:43:56.150Z</lastmod>
29+
<lastmod>2026-02-03T21:25:58.686Z</lastmod>
3030
</url>
3131
<url>
3232
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/CONTRIBUTING.html</loc>
33-
<lastmod>2026-02-03T18:43:56.148Z</lastmod>
33+
<lastmod>2026-02-03T21:25:58.684Z</lastmod>
3434
</url>
3535
<url>
3636
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/reference/generate_report.html</loc>
37-
<lastmod>2026-02-03T18:44:46.917Z</lastmod>
37+
<lastmod>2026-02-03T21:26:43.377Z</lastmod>
3838
</url>
3939
<url>
4040
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/reference/load_optimized_csv.html</loc>
41-
<lastmod>2026-02-03T18:44:46.886Z</lastmod>
41+
<lastmod>2026-02-03T21:26:43.348Z</lastmod>
4242
</url>
4343
<url>
4444
<loc>https://UBC-MDS.github.io/DSCI_524_group37_csvplus/reference/data_correction.html</loc>
45-
<lastmod>2026-02-03T18:44:46.908Z</lastmod>
45+
<lastmod>2026-02-03T21:26:43.368Z</lastmod>
4646
</url>
4747
</urlset>

0 commit comments

Comments
 (0)