Skip to content

Commit 0d26c52

Browse files
committed
Enhance README.md with comprehensive project overview, content structure, and local development instructions. Added detailed sections on website architecture, content model, and contributor workflows to improve clarity for new contributors.
1 parent c0a1164 commit 0d26c52

1 file changed

Lines changed: 140 additions & 27 deletions

File tree

README.md

Lines changed: 140 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,165 @@
1-
## DataTalks.Club Website
1+
# DataTalks.Club Website
2+
3+
This repository contains the source code and content for [datatalks.club](https://datatalks.club), a Jekyll-based community website for data science, machine learning, AI, and data engineering practitioners.
4+
5+
## What this repository is
6+
7+
- Static website built with Jekyll
8+
- Content-first structure: markdown, data files, and reusable templates
9+
- Main entities are modeled as Jekyll collections (`_posts`, `_podcast`, `_books`, `_people`, etc.)
10+
- Navigation, events, announcements, and sponsors are managed via YAML files in `_data`
11+
12+
## Main pages on the website
13+
14+
| URL | Source file | What it means | How it works |
15+
|---|---|---|---|
16+
| `/` | `index.md` | Main landing page for the community | Uses Liquid loops to aggregate data from multiple sources: upcoming events (`_data/events.yaml`), latest podcast episodes (`_podcast`), latest posts (`_posts`), sponsors (`_data/sponsors.yaml`), and active books (`_books`). |
17+
| `/articles.html` | `articles.md` | Full article index | Iterates over `site.posts` and links to each article with author references from `_people`. |
18+
| `/podcast.html` | `podcast.md` | Podcast hub page | Lists all episodes by season from `_podcast`; each episode gets its own detail page via collection permalink rules. |
19+
| `/books.html` | `books.md` | "Book of the Week" program | Splits books into upcoming vs archive using date filters (`book.end > site.time` and `book.end < site.time`). |
20+
| `/events.html` | `events.md` | Public events calendar page | Reads `_data/events.yaml` and divides events into upcoming and past based on event timestamp relative to `site.time`. |
21+
| `/people.html` | `people.md` | Community people directory | Renders all person profiles from `_people`, each with an auto-generated profile URL. |
22+
| `/slack.html` | `slack.md` | Slack onboarding page | Uses `subscribe.html` include for invite flow and documents key channels and participation guidelines. |
23+
| `/support.html` | `support.md` | Community support and sponsorship page | Static content page for funding model, sponsor principles, and contact details. |
24+
| `/tools.html` | `tools.md` | Open-source spotlight page | Iterates through `_tools` collection entries (tool links, demos, maintainers). |
25+
| `/blog/guide-to-free-online-courses-at-datatalks-club.html` | Post in `_posts` | Primary courses landing page in navigation | The top nav "Courses" item points here; individual Zoomcamp pages live mostly in `_posts` plus legacy `_courses` docs. |
26+
27+
## Website architecture (at a glance)
28+
29+
| Layer | Folder/files | Responsibility |
30+
|---|---|---|
31+
| Content pages | `*.md` in repo root | Entry pages and hubs (`index.md`, `events.md`, `podcast.md`, etc.). |
32+
| Blog posts | `_posts/*.md` | Long-form articles, course landing pages, and announcements; rendered under `/blog/:title.html`. |
33+
| Domain collections | `_podcast`, `_books`, `_people`, `_courses`, `_tools`, `_conferences` | Structured content types with dedicated layouts and permalinks. |
34+
| Data sources | `_data/*.yaml` | Site-wide data for menus, events, sponsors, and header announcements. |
35+
| Layouts | `_layouts/*.html` | High-level page skeletons (`home`, `page`, `post`, `podcast`, `book`, `author`). |
36+
| Reusable components | `_includes/*.html` | Shared snippets (header/footer, authors, event cards, subscribe blocks, etc.). |
37+
| Assets | `images`, `assets` | Static media, styles, and supporting files. |
38+
| Generated output | `_site` | Local build output generated by Jekyll. |
39+
40+
## How it works
41+
42+
### Content model
43+
44+
| Type | Location | URL shape | Typical usage |
45+
|---|---|---|---|
46+
| Posts | `_posts/*.md` | `/blog/:title.html` | Articles, guides, Zoomcamp pages, editorial content. |
47+
| Podcast episodes | `_podcast/*.md` | `/podcast/:title.html` | Episode pages linked from `/podcast.html` and homepage. |
48+
| Books | `_books/*.md` | `/books/:title.html` | Book of the Week detail pages and archive entries. |
49+
| People | `_people/*.md` | `/people/:title.html` | Author/speaker profiles used across posts, episodes, and events. |
50+
| Courses | `_courses/*.md` | `/courses/:title.html` | Legacy standalone course pages; many newer course pages are posts. |
51+
| Tools | `_tools/*.md` | `/tools/:title.html` | Open-source tool spotlights. |
52+
| Conferences | `_conferences/*.md` | `/conferences/:title.html` | Conference-specific pages. |
53+
54+
| Global data file | Purpose | Used by |
55+
|---|---|---|
56+
| `_data/navigation.yaml` | Top and bottom navigation links | `header.html`, `footer.html` includes |
57+
| `_data/events.yaml` | Event records and metadata | `index.md`, `events.md`, event include |
58+
| `_data/header.yaml` | Optional announcement bar | `header.html` include |
59+
| `_data/sponsors.yaml` | Sponsor names/logos/links | Homepage sponsors section |
60+
61+
### Templating and layouts
62+
63+
- Shared page layouts live in `_layouts` (`home`, `page`, `post`, `podcast`, `book`, `author`)
64+
- Reusable fragments live in `_includes` (`header`, `footer`, `authors`, `event`, subscribe forms, etc.)
65+
- Pages and collection documents combine front matter + markdown/html + Liquid loops/filters
66+
67+
### Routing and permalink rules
68+
69+
- The global permalink rule in `_config.yml` is `/blog/:title.html` for posts.
70+
- Collections define their own permalinks in `_config.yml` (`/:collection/:title.html`).
71+
- This means each content type can have both:
72+
- a hub/list page (e.g. `podcast.md` -> `/podcast.html`)
73+
- item detail pages (e.g. `_podcast/*.md` -> `/podcast/<slug>.html`)
74+
75+
## Local development
76+
77+
### Prerequisites
78+
79+
- Ruby 2.7.0
80+
- Bundler
81+
- Python environment manager (`uv`) for helper scripts
82+
83+
### Run Jekyll locally
284

3-
### Running Jekyll locally
4-
Use ruby 2.7.0:
5-
6-
```
85+
```bash
786
rvm use ruby-2.7.0
8-
987
gem install bundler
10-
```
11-
12-
Running it for the first time:
13-
14-
```
1588
bundle install
16-
```
17-
18-
Running Jekyll:
19-
20-
```
2189
bundle exec jekyll serve
2290
```
2391

24-
Open [http://localhost:4000](http://localhost:4000)
92+
Open [http://localhost:4000](http://localhost:4000).
93+
94+
## Common contributor workflows
2595

96+
| Task | Edit this | Notes |
97+
|---|---|---|
98+
| Publish a new article | `_posts` | Include front matter (`title`, `description`, `authors`, `tags`, `layout`, `date`). |
99+
| Publish a new podcast episode | `_podcast` | Make sure `season` and `episode` are set for correct grouping on `/podcast.html`. |
100+
| Add/update event | `_data/events.yaml` | Event type controls styling (`webinar`, `podcast`, `workshop`, `conference`). |
101+
| Add/update person profile | `_people` | Required for author/speaker linking across pages and includes. |
102+
| Add/update a book | `_books` | `start`/`end` dates determine upcoming vs archived display. |
103+
| Update top menu links | `_data/navigation.yaml` | Header links are rendered from `top` entries. |
104+
| Update homepage blocks | `index.md` | Homepage sections are manually structured and data-driven via Liquid. |
105+
| Update announcement bar | `_data/header.yaml` | Shown in header only when announcement data exists. |
26106

27-
## Scripts
107+
## Content and maintenance scripts
28108

29-
Installing the requirements:
109+
Install script dependencies:
30110

31111
```bash
32112
uv sync
33-
34113
cd previews
35114
npm install
36115
cd ..
37116
```
38117

39-
Running:
118+
Run helper creator script:
40119

41120
```bash
42121
uv run python scripts/create.py
43-
```
122+
```
123+
124+
This script helps create/update content entities such as people, books, and events from templates.
44125

45-
### Generating post from docx
126+
### Script quick reference
127+
128+
| Script/command | Purpose |
129+
|---|---|
130+
| `uv run python scripts/create.py` | Interactive helper to create people, books, and events using templates. |
131+
| `uv run python scripts/pandoc_full.py ...` | Generate post draft content from a DOCX source. |
132+
| `scripts/generate-book-preview.sh` (called internally) | Creates book preview assets for newly added books. |
133+
134+
### Generate a post from DOCX
46135

47136
```bash
48137
uv run python scripts/pandoc_full.py \
49-
--input ~/Downloads/template.docx \
50-
--author angelicaloduca \
51-
--tags "mlops,devops,process"
52-
```
138+
--input ~/Downloads/template.docx \
139+
--author angelicaloduca \
140+
--tags "mlops,devops,process"
141+
```
142+
143+
## Where to edit common things
144+
145+
- Add/edit article: `_posts`
146+
- Add/edit podcast episode: `_podcast`
147+
- Add/edit person profile: `_people`
148+
- Add/edit book discussion: `_books`
149+
- Add/edit event: `_data/events.yaml`
150+
- Edit top menu/footer links: `_data/navigation.yaml`
151+
- Edit homepage content blocks: `index.md`
152+
- Edit global page structure/header/footer: `_layouts` and `_includes`
153+
154+
## Deployment notes
155+
156+
- Site URL is configured in `_config.yml` as `https://datatalks.club`
157+
- GitHub-specific files (like `.github`) are excluded from Jekyll output
158+
- Generated site output is in `_site` during local builds
159+
160+
## Important implementation details
161+
162+
- The repository includes many pages written in markdown with embedded HTML and Liquid; this is expected and used heavily for SEO and rich formatting.
163+
- Author references across posts/podcast/books depend on `_people` records; missing person entries usually cause broken attribution links.
164+
- Event rendering logic is date-driven (`site.time` comparisons), so event timestamp format consistency in `_data/events.yaml` is important.
165+
- Navigation is fully data-driven from `_data/navigation.yaml`, which keeps menu edits separate from template code.

0 commit comments

Comments
 (0)