Deployed 8a0d808 with MkDocs version: 1.2.2

2026-02-06 07:55:45 +00:00 · 2021-09-28 00:31:12 +01:00
parent c76e3c542a
commit 0e17f26631
9 changed files with 41 additions and 28 deletions
--- a/discussion/approach.html
+++ b/discussion/approach.html
@@ -710,7 +710,7 @@
 <li>The key being the id across all columns (<code>id_all_columns</code>).</li>
 <li>The value being the raw data as an array.</li>
 </ul>
-<p>The mapping table is then condensed to a single dictionary with these key, value pairs and is used as a side input further down the pipeline.</p>
+<p>The mapping table is then condensed to a single dictionary with these key, value pairs (automatically deduplicating repeated rows) and is used as a side input further down the pipeline.</p>
 <p>This mapping table is created to ensure the <code>GroupByKey</code> operation is as quick as possible. The more data you have to process in a <code>GroupByKey</code>, the longer the operation takes. By doing the <code>GroupByKey</code> using just the ids, the pipeline can process the files much quicker than if we included the raw data in this operation.</p>
 <h2 id="prepare-stage">Prepare stage<a class="headerlink" href="#prepare-stage" title="Permanent link">&para;</a></h2>
 <ul>