|
1 | | -## DataTalks.Club Website |
| 1 | +# DataTalks.Club Website |
| 2 | + |
| 3 | +This repository contains the source code and content for [datatalks.club](https://datatalks.club), a Jekyll-based community website for data science, machine learning, AI, and data engineering practitioners. |
| 4 | + |
| 5 | +## What this repository is |
| 6 | + |
| 7 | +- Static website built with Jekyll |
| 8 | +- Content-first structure: markdown, data files, and reusable templates |
| 9 | +- Main entities are modeled as Jekyll collections (`_posts`, `_podcast`, `_books`, `_people`, etc.) |
| 10 | +- Navigation, events, announcements, and sponsors are managed via YAML files in `_data` |
| 11 | + |
| 12 | +## Main pages on the website |
| 13 | + |
| 14 | +| URL | Source file | What it means | How it works | |
| 15 | +|---|---|---|---| |
| 16 | +| `/` | `index.md` | Main landing page for the community | Uses Liquid loops to aggregate data from multiple sources: upcoming events (`_data/events.yaml`), latest podcast episodes (`_podcast`), latest posts (`_posts`), sponsors (`_data/sponsors.yaml`), and active books (`_books`). | |
| 17 | +| `/articles.html` | `articles.md` | Full article index | Iterates over `site.posts` and links to each article with author references from `_people`. | |
| 18 | +| `/podcast.html` | `podcast.md` | Podcast hub page | Lists all episodes by season from `_podcast`; each episode gets its own detail page via collection permalink rules. | |
| 19 | +| `/books.html` | `books.md` | "Book of the Week" program | Splits books into upcoming vs archive using date filters (`book.end > site.time` and `book.end < site.time`). | |
| 20 | +| `/events.html` | `events.md` | Public events calendar page | Reads `_data/events.yaml` and divides events into upcoming and past based on event timestamp relative to `site.time`. | |
| 21 | +| `/people.html` | `people.md` | Community people directory | Renders all person profiles from `_people`, each with an auto-generated profile URL. | |
| 22 | +| `/slack.html` | `slack.md` | Slack onboarding page | Uses `subscribe.html` include for invite flow and documents key channels and participation guidelines. | |
| 23 | +| `/support.html` | `support.md` | Community support and sponsorship page | Static content page for funding model, sponsor principles, and contact details. | |
| 24 | +| `/tools.html` | `tools.md` | Open-source spotlight page | Iterates through `_tools` collection entries (tool links, demos, maintainers). | |
| 25 | +| `/blog/guide-to-free-online-courses-at-datatalks-club.html` | Post in `_posts` | Primary courses landing page in navigation | The top nav "Courses" item points here; individual Zoomcamp pages live mostly in `_posts` plus legacy `_courses` docs. | |
| 26 | + |
| 27 | +## Website architecture (at a glance) |
| 28 | + |
| 29 | +| Layer | Folder/files | Responsibility | |
| 30 | +|---|---|---| |
| 31 | +| Content pages | `*.md` in repo root | Entry pages and hubs (`index.md`, `events.md`, `podcast.md`, etc.). | |
| 32 | +| Blog posts | `_posts/*.md` | Long-form articles, course landing pages, and announcements; rendered under `/blog/:title.html`. | |
| 33 | +| Domain collections | `_podcast`, `_books`, `_people`, `_courses`, `_tools`, `_conferences` | Structured content types with dedicated layouts and permalinks. | |
| 34 | +| Data sources | `_data/*.yaml` | Site-wide data for menus, events, sponsors, and header announcements. | |
| 35 | +| Layouts | `_layouts/*.html` | High-level page skeletons (`home`, `page`, `post`, `podcast`, `book`, `author`). | |
| 36 | +| Reusable components | `_includes/*.html` | Shared snippets (header/footer, authors, event cards, subscribe blocks, etc.). | |
| 37 | +| Assets | `images`, `assets` | Static media, styles, and supporting files. | |
| 38 | +| Generated output | `_site` | Local build output generated by Jekyll. | |
| 39 | + |
| 40 | +## How it works |
| 41 | + |
| 42 | +### Content model |
| 43 | + |
| 44 | +| Type | Location | URL shape | Typical usage | |
| 45 | +|---|---|---|---| |
| 46 | +| Posts | `_posts/*.md` | `/blog/:title.html` | Articles, guides, Zoomcamp pages, editorial content. | |
| 47 | +| Podcast episodes | `_podcast/*.md` | `/podcast/:title.html` | Episode pages linked from `/podcast.html` and homepage. | |
| 48 | +| Books | `_books/*.md` | `/books/:title.html` | Book of the Week detail pages and archive entries. | |
| 49 | +| People | `_people/*.md` | `/people/:title.html` | Author/speaker profiles used across posts, episodes, and events. | |
| 50 | +| Courses | `_courses/*.md` | `/courses/:title.html` | Legacy standalone course pages; many newer course pages are posts. | |
| 51 | +| Tools | `_tools/*.md` | `/tools/:title.html` | Open-source tool spotlights. | |
| 52 | +| Conferences | `_conferences/*.md` | `/conferences/:title.html` | Conference-specific pages. | |
| 53 | + |
| 54 | +| Global data file | Purpose | Used by | |
| 55 | +|---|---|---| |
| 56 | +| `_data/navigation.yaml` | Top and bottom navigation links | `header.html`, `footer.html` includes | |
| 57 | +| `_data/events.yaml` | Event records and metadata | `index.md`, `events.md`, event include | |
| 58 | +| `_data/header.yaml` | Optional announcement bar | `header.html` include | |
| 59 | +| `_data/sponsors.yaml` | Sponsor names/logos/links | Homepage sponsors section | |
| 60 | + |
| 61 | +### Templating and layouts |
| 62 | + |
| 63 | +- Shared page layouts live in `_layouts` (`home`, `page`, `post`, `podcast`, `book`, `author`) |
| 64 | +- Reusable fragments live in `_includes` (`header`, `footer`, `authors`, `event`, subscribe forms, etc.) |
| 65 | +- Pages and collection documents combine front matter + markdown/html + Liquid loops/filters |
| 66 | + |
| 67 | +### Routing and permalink rules |
| 68 | + |
| 69 | +- The global permalink rule in `_config.yml` is `/blog/:title.html` for posts. |
| 70 | +- Collections define their own permalinks in `_config.yml` (`/:collection/:title.html`). |
| 71 | +- This means each content type can have both: |
| 72 | + - a hub/list page (e.g. `podcast.md` -> `/podcast.html`) |
| 73 | + - item detail pages (e.g. `_podcast/*.md` -> `/podcast/<slug>.html`) |
| 74 | + |
| 75 | +## Local development |
| 76 | + |
| 77 | +### Prerequisites |
| 78 | + |
| 79 | +- Ruby 2.7.0 |
| 80 | +- Bundler |
| 81 | +- Python environment manager (`uv`) for helper scripts |
| 82 | + |
| 83 | +### Run Jekyll locally |
2 | 84 |
|
3 | | -### Running Jekyll locally |
4 | | -Use ruby 2.7.0: |
5 | | - |
6 | | -``` |
| 85 | +```bash |
7 | 86 | rvm use ruby-2.7.0 |
8 | | -
|
9 | 87 | gem install bundler |
10 | | -``` |
11 | | - |
12 | | -Running it for the first time: |
13 | | - |
14 | | -``` |
15 | 88 | bundle install |
16 | | -``` |
17 | | - |
18 | | -Running Jekyll: |
19 | | - |
20 | | -``` |
21 | 89 | bundle exec jekyll serve |
22 | 90 | ``` |
23 | 91 |
|
24 | | -Open [http://localhost:4000](http://localhost:4000) |
| 92 | +Open [http://localhost:4000](http://localhost:4000). |
| 93 | + |
| 94 | +## Common contributor workflows |
25 | 95 |
|
| 96 | +| Task | Edit this | Notes | |
| 97 | +|---|---|---| |
| 98 | +| Publish a new article | `_posts` | Include front matter (`title`, `description`, `authors`, `tags`, `layout`, `date`). | |
| 99 | +| Publish a new podcast episode | `_podcast` | Make sure `season` and `episode` are set for correct grouping on `/podcast.html`. | |
| 100 | +| Add/update event | `_data/events.yaml` | Event type controls styling (`webinar`, `podcast`, `workshop`, `conference`). | |
| 101 | +| Add/update person profile | `_people` | Required for author/speaker linking across pages and includes. | |
| 102 | +| Add/update a book | `_books` | `start`/`end` dates determine upcoming vs archived display. | |
| 103 | +| Update top menu links | `_data/navigation.yaml` | Header links are rendered from `top` entries. | |
| 104 | +| Update homepage blocks | `index.md` | Homepage sections are manually structured and data-driven via Liquid. | |
| 105 | +| Update announcement bar | `_data/header.yaml` | Shown in header only when announcement data exists. | |
26 | 106 |
|
27 | | -## Scripts |
| 107 | +## Content and maintenance scripts |
28 | 108 |
|
29 | | -Installing the requirements: |
| 109 | +Install script dependencies: |
30 | 110 |
|
31 | 111 | ```bash |
32 | 112 | uv sync |
33 | | - |
34 | 113 | cd previews |
35 | 114 | npm install |
36 | 115 | cd .. |
37 | 116 | ``` |
38 | 117 |
|
39 | | -Running: |
| 118 | +Run helper creator script: |
40 | 119 |
|
41 | 120 | ```bash |
42 | 121 | uv run python scripts/create.py |
43 | | -``` |
| 122 | +``` |
| 123 | + |
| 124 | +This script helps create/update content entities such as people, books, and events from templates. |
44 | 125 |
|
45 | | -### Generating post from docx |
| 126 | +### Script quick reference |
| 127 | + |
| 128 | +| Script/command | Purpose | |
| 129 | +|---|---| |
| 130 | +| `uv run python scripts/create.py` | Interactive helper to create people, books, and events using templates. | |
| 131 | +| `uv run python scripts/pandoc_full.py ...` | Generate post draft content from a DOCX source. | |
| 132 | +| `scripts/generate-book-preview.sh` (called internally) | Creates book preview assets for newly added books. | |
| 133 | + |
| 134 | +### Generate a post from DOCX |
46 | 135 |
|
47 | 136 | ```bash |
48 | 137 | uv run python scripts/pandoc_full.py \ |
49 | | - --input ~/Downloads/template.docx \ |
50 | | - --author angelicaloduca \ |
51 | | - --tags "mlops,devops,process" |
52 | | -``` |
| 138 | + --input ~/Downloads/template.docx \ |
| 139 | + --author angelicaloduca \ |
| 140 | + --tags "mlops,devops,process" |
| 141 | +``` |
| 142 | + |
| 143 | +## Where to edit common things |
| 144 | + |
| 145 | +- Add/edit article: `_posts` |
| 146 | +- Add/edit podcast episode: `_podcast` |
| 147 | +- Add/edit person profile: `_people` |
| 148 | +- Add/edit book discussion: `_books` |
| 149 | +- Add/edit event: `_data/events.yaml` |
| 150 | +- Edit top menu/footer links: `_data/navigation.yaml` |
| 151 | +- Edit homepage content blocks: `index.md` |
| 152 | +- Edit global page structure/header/footer: `_layouts` and `_includes` |
| 153 | + |
| 154 | +## Deployment notes |
| 155 | + |
| 156 | +- Site URL is configured in `_config.yml` as `https://datatalks.club` |
| 157 | +- GitHub-specific files (like `.github`) are excluded from Jekyll output |
| 158 | +- Generated site output is in `_site` during local builds |
| 159 | + |
| 160 | +## Important implementation details |
| 161 | + |
| 162 | +- The repository includes many pages written in markdown with embedded HTML and Liquid; this is expected and used heavily for SEO and rich formatting. |
| 163 | +- Author references across posts/podcast/books depend on `_people` records; missing person entries usually cause broken attribution links. |
| 164 | +- Event rendering logic is date-driven (`site.time` comparisons), so event timestamp format consistency in `_data/events.yaml` is important. |
| 165 | +- Navigation is fully data-driven from `_data/navigation.yaml`, which keeps menu edits separate from template code. |
0 commit comments