Skip to content

Commit 7284a92

Browse files
committed
up 2026.3.31
1 parent 22bee55 commit 7284a92

62 files changed

Lines changed: 67392 additions & 1360 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Lecture/01_homework.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@
1010

1111
## 作业展示
1212

13-
- [2026 小组作业展示](../homework/HW-pre-2026.md)
13+
- [2026 小组作业展示](https://github.com/lianxhcn/dsfin/blob/main/homework/HW-pre-2026.md)

Lecture/data_clean/lecture_data_clean.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -884,7 +884,7 @@
884884
"\n",
885885
"即使一份数据没有任何格式错误,它仍然可能因为「哪些观测进入了样本、哪些没有」这一机制,导致估计结果系统性偏误。这就是**样本选择偏差**(sample selection bias)。\n",
886886
"\n",
887-
"#### 一个直观的例子\n",
887+
"### 一个直观的例子\n",
888888
"\n",
889889
"我们的研究问题是:**上市公司的贷款利率(Rate)受哪些因素影响?**\n",
890890
"\n",
@@ -963,7 +963,7 @@
963963
"\n",
964964
"> **已知组的拟合线高估了规模对利率的负向效应。** 真实斜率(包含缺失组后)应更平缓。\n",
965965
"\n",
966-
"![](fig/fig_sample_selection.png){width=\"90%\"}\n",
966+
"![](fig/fig_data_clean_simulation_sample_selection.png){width=\"100%\"}\n",
967967
"\n",
968968
"图(a)同样值得关注:缺失组的杠杆率(Leverage)均值略高于已知组(0.71 vs 0.66),方向符合预期,但因样本量极小(n = 4),未达统计显著。在真实数据中,这类方向性证据仍需认真对待。\n",
969969
"\n",

Lecture/data_get_data/data/GMD.csv

Lines changed: 56851 additions & 0 deletions
Large diffs are not rendered by default.

Lecture/data_get_data/data_02_get_data_GMD.ipynb

Lines changed: 948 additions & 0 deletions
Large diffs are not rendered by default.

Lecture/data_get_data/data_03_TS_SZ_index.ipynb

Lines changed: 1159 additions & 0 deletions
Large diffs are not rendered by default.

Lecture/data_get_data/lecture_get_data.ipynb

Lines changed: 621 additions & 21 deletions
Large diffs are not rendered by default.

Lecture/data_store_structure/lecture_data_management_organization.ipynb

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@
141141
"\n",
142142
"数据管理中最核心的三个概念是:**数据表**、**主键** 和 **粒度**。\n",
143143
"\n",
144-
"### 2.1 数据表\n",
144+
"### 数据表",
145145
"\n",
146146
"一张数据表可以理解为:对某一类观测单位在某一层级上的系统记录。例如:\n",
147147
"\n",
@@ -150,7 +150,7 @@
150150
"- 日度收益率表:每行是一个公司-日;\n",
151151
"- 公告元数据表:每行是一篇公告。\n",
152152
"\n",
153-
"### 2.2 主键\n",
153+
"### 主键",
154154
"\n",
155155
"主键可以理解为:**唯一标识一行数据的变量或变量组合**。例如:\n",
156156
"\n",
@@ -164,7 +164,7 @@
164164
"K_i \\neq K_j \\quad \\text{for all } i \\neq j\n",
165165
"$$\n",
166166
"\n",
167-
"### 2.3 粒度\n",
167+
"### 粒度",
168168
"\n",
169169
"粒度可以理解为:**数据记录的细致程度**。在金融数据中,最常见的粒度包括:\n",
170170
"\n",
@@ -215,7 +215,7 @@
215215
"metadata": {},
216216
"source": [
217217
"\n",
218-
"### 2.4 一个典型错误:忽略粒度直接合并\n",
218+
"### 一个典型错误:忽略粒度直接合并",
219219
"\n",
220220
"设有两张表:\n",
221221
"\n",
@@ -367,7 +367,7 @@
367367
"metadata": {},
368368
"source": [
369369
"\n",
370-
"### 3.1 文件命名规则\n",
370+
"### 文件命名规则",
371371
"\n",
372372
"除了目录分层,文件命名也应尽量规范。比较稳妥的做法,是让文件名包含三类信息:\n",
373373
"\n",
@@ -384,7 +384,7 @@
384384
"\n",
385385
"这样的命名方式比 `new_final_use_this.csv` 稳妥得多。文件名本身就能提供足够的上下文信息,便于快速判断其内容和用途。\n",
386386
"\n",
387-
"### 3.2 数据字典与 README\n",
387+
"### 数据字典与 README",
388388
"\n",
389389
"当数据表数量较多时,仅靠文件名仍然不够。建议为关键数据表维护一个简单的数据字典,至少记录:\n",
390390
"\n",
@@ -409,7 +409,7 @@
409409
"\n",
410410
"在分析任务中,最常见的几类存储方式包括:`CSV / Excel`、`Parquet`、`SQLite` 和 `DuckDB`。它们的区别不在于「谁更高级」,而在于适合的任务不同。\n",
411411
"\n",
412-
"### 4.1 CSV / Excel\n",
412+
"### CSV / Excel",
413413
"\n",
414414
"优点是直观、易于交换、便于人工查看。适合:\n",
415415
"\n",
@@ -424,7 +424,7 @@
424424
"- 大文件读写效率较低\n",
425425
"- 容易产生多个版本副本\n",
426426
"\n",
427-
"### 4.2 Parquet\n",
427+
"### Parquet",
428428
"\n",
429429
"`Parquet` 是适合分析场景的列式存储格式。可以把它理解为:比 `CSV` 更现代、更高效的一种表格文件形式。它通常具有以下优点:\n",
430430
"\n",
@@ -433,7 +433,7 @@
433433
"- 更好地保留变量类型\n",
434434
"- 适合大表和重复读取\n",
435435
"\n",
436-
"### 4.3 SQLite\n",
436+
"### SQLite",
437437
"\n",
438438
"`SQLite` 是一种轻量级本地关系数据库。它与 `CSV` 的最大区别在于:它不是单纯的数据文件,而是一个可以使用 SQL 进行查询和连接的小型数据库系统。它尤其适合:\n",
439439
"\n",
@@ -442,7 +442,7 @@
442442
"- 项目主要在本地单机上运行\n",
443443
"- 不需要复杂的多人并发访问\n",
444444
"\n",
445-
"### 4.4 DuckDB\n",
445+
"### DuckDB",
446446
"\n",
447447
"如果说 `SQLite` 更偏向轻量级关系数据库,那么 `DuckDB` 更适合分析型任务。它很适合处理多个本地大文件、多次聚合和连接,以及分析型 SQL 查询。\n",
448448
"\n",
@@ -527,7 +527,7 @@
527527
"\n",
528528
"这并不是一个非此即彼的问题。更现实的做法,是把它看成一个连续谱。\n",
529529
"\n",
530-
"### 6.1 继续使用 `pandas` 即可的情况\n",
530+
"### 继续使用 `pandas` 即可的情况",
531531
"\n",
532532
"以下情况下,继续使用 `pandas` 往往已经足够:\n",
533533
"\n",
@@ -537,7 +537,7 @@
537537
"- 项目是一次性的;\n",
538538
"- 最终只需生成一张分析样本表。\n",
539539
"\n",
540-
"### 6.2 可以考虑 SQLite 或 DuckDB 的情况\n",
540+
"### 可以考虑 SQLite 或 DuckDB 的情况",
541541
"\n",
542542
"出现下面这些信号时,可以开始考虑数据库或分析型 SQL 工具:\n",
543543
"\n",
@@ -636,4 +636,4 @@
636636
},
637637
"nbformat": 4,
638638
"nbformat_minor": 5
639-
}
639+
}

_quarto.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,9 @@ book:
5151
- part: "**数据分析**"
5252
chapters:
5353
- Lecture/data_get_data/lecture_get_data.ipynb
54+
- Lecture/data_get_data/data_02_get_data_GMD.ipynb
55+
- Lecture/data_get_data/data_03_TS_SZ_index.ipynb
56+
- Lecture/data_store_structure/lecture_data_management_organization.ipynb
5457
- Lecture/data_clean/lecture_data_clean.ipynb
5558

5659
- part: "**数据爬取**"

docs/Lecture/00-setup/01_00_coding_with_AI.html

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -191,11 +191,29 @@
191191
<a href="../../Lecture/data_get_data/lecture_get_data.html" class="sidebar-item-text sidebar-link">
192192
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">金融数据获取</span></span></a>
193193
</div>
194+
</li>
195+
<li class="sidebar-item">
196+
<div class="sidebar-item-container">
197+
<a href="../../Lecture/data_get_data/data_02_get_data_GMD.html" class="sidebar-item-text sidebar-link">
198+
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">获取数据:GMD</span></span></a>
199+
</div>
200+
</li>
201+
<li class="sidebar-item">
202+
<div class="sidebar-item-container">
203+
<a href="../../Lecture/data_get_data/data_03_TS_SZ_index.html" class="sidebar-item-text sidebar-link">
204+
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">上证指数的时序特征</span></span></a>
205+
</div>
206+
</li>
207+
<li class="sidebar-item">
208+
<div class="sidebar-item-container">
209+
<a href="../../Lecture/data_store_structure/lecture_data_management_organization.html" class="sidebar-item-text sidebar-link">
210+
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">数据管理与组织</span></span></a>
211+
</div>
194212
</li>
195213
<li class="sidebar-item">
196214
<div class="sidebar-item-container">
197215
<a href="../../Lecture/data_clean/lecture_data_clean.html" class="sidebar-item-text sidebar-link">
198-
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">数据清洗</span></span></a>
216+
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">数据清洗</span></span></a>
199217
</div>
200218
</li>
201219
</ul>
@@ -212,19 +230,19 @@
212230
<li class="sidebar-item">
213231
<div class="sidebar-item-container">
214232
<a href="../../Lecture/crawing/craw_01_introduction.html" class="sidebar-item-text sidebar-link">
215-
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">网络爬虫简介</span></span></a>
233+
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">网络爬虫简介</span></span></a>
216234
</div>
217235
</li>
218236
<li class="sidebar-item">
219237
<div class="sidebar-item-container">
220238
<a href="../../Lecture/crawing/craw_02_sec01_lingnan_prompt.html" class="sidebar-item-text sidebar-link">
221-
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">实例-提示词模式:爬取岭南学院教师名录</span></span></a>
239+
<span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">实例-提示词模式:爬取岭南学院教师名录</span></span></a>
222240
</div>
223241
</li>
224242
<li class="sidebar-item">
225243
<div class="sidebar-item-container">
226244
<a href="../../Lecture/crawing/craw_02_sec02_lingnan_dialog.html" class="sidebar-item-text sidebar-link">
227-
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">实例-对话模式:爬取岭南学院教师名录</span></span></a>
245+
<span class="menu-text"><span class="chapter-number">13</span>&nbsp; <span class="chapter-title">实例-对话模式:爬取岭南学院教师名录</span></span></a>
228246
</div>
229247
</li>
230248
</ul>
@@ -241,25 +259,25 @@
241259
<li class="sidebar-item">
242260
<div class="sidebar-item-container">
243261
<a href="../../Lecture/text_analysis/lecture_text_01_ailn.html" class="sidebar-item-text sidebar-link">
244-
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">金融文本分析(上)——从文本到结构化数据</span></span></a>
262+
<span class="menu-text"><span class="chapter-number">14</span>&nbsp; <span class="chapter-title">金融文本分析(上)——从文本到结构化数据</span></span></a>
245263
</div>
246264
</li>
247265
<li class="sidebar-item">
248266
<div class="sidebar-item-container">
249267
<a href="../../Lecture/text_analysis/lecture_text_02_ailn.html" class="sidebar-item-text sidebar-link">
250-
<span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">金融文本分析(下)——情感分析与文本建模</span></span></a>
268+
<span class="menu-text"><span class="chapter-number">15</span>&nbsp; <span class="chapter-title">金融文本分析(下)——情感分析与文本建模</span></span></a>
251269
</div>
252270
</li>
253271
<li class="sidebar-item">
254272
<div class="sidebar-item-container">
255273
<a href="../../Lecture/text_analysis/lecture_text_analysis_01_claude.html" class="sidebar-item-text sidebar-link">
256-
<span class="menu-text"><span class="chapter-number">13</span>&nbsp; <span class="chapter-title">文本数据处理基础</span></span></a>
274+
<span class="menu-text"><span class="chapter-number">16</span>&nbsp; <span class="chapter-title">文本数据处理基础</span></span></a>
257275
</div>
258276
</li>
259277
<li class="sidebar-item">
260278
<div class="sidebar-item-container">
261279
<a href="../../Lecture/text_analysis/lecture_text_analysis_02_claude.html" class="sidebar-item-text sidebar-link">
262-
<span class="menu-text"><span class="chapter-number">14</span>&nbsp; <span class="chapter-title">文本分析:情感分析与主题建模</span></span></a>
280+
<span class="menu-text"><span class="chapter-number">17</span>&nbsp; <span class="chapter-title">文本分析:情感分析与主题建模</span></span></a>
263281
</div>
264282
</li>
265283
</ul>

docs/Lecture/00-setup/01_01_install_anaconda.html

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -225,11 +225,29 @@
225225
<a href="../../Lecture/data_get_data/lecture_get_data.html" class="sidebar-item-text sidebar-link">
226226
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">金融数据获取</span></span></a>
227227
</div>
228+
</li>
229+
<li class="sidebar-item">
230+
<div class="sidebar-item-container">
231+
<a href="../../Lecture/data_get_data/data_02_get_data_GMD.html" class="sidebar-item-text sidebar-link">
232+
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">获取数据:GMD</span></span></a>
233+
</div>
234+
</li>
235+
<li class="sidebar-item">
236+
<div class="sidebar-item-container">
237+
<a href="../../Lecture/data_get_data/data_03_TS_SZ_index.html" class="sidebar-item-text sidebar-link">
238+
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">上证指数的时序特征</span></span></a>
239+
</div>
240+
</li>
241+
<li class="sidebar-item">
242+
<div class="sidebar-item-container">
243+
<a href="../../Lecture/data_store_structure/lecture_data_management_organization.html" class="sidebar-item-text sidebar-link">
244+
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">数据管理与组织</span></span></a>
245+
</div>
228246
</li>
229247
<li class="sidebar-item">
230248
<div class="sidebar-item-container">
231249
<a href="../../Lecture/data_clean/lecture_data_clean.html" class="sidebar-item-text sidebar-link">
232-
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">数据清洗</span></span></a>
250+
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">数据清洗</span></span></a>
233251
</div>
234252
</li>
235253
</ul>
@@ -246,19 +264,19 @@
246264
<li class="sidebar-item">
247265
<div class="sidebar-item-container">
248266
<a href="../../Lecture/crawing/craw_01_introduction.html" class="sidebar-item-text sidebar-link">
249-
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">网络爬虫简介</span></span></a>
267+
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">网络爬虫简介</span></span></a>
250268
</div>
251269
</li>
252270
<li class="sidebar-item">
253271
<div class="sidebar-item-container">
254272
<a href="../../Lecture/crawing/craw_02_sec01_lingnan_prompt.html" class="sidebar-item-text sidebar-link">
255-
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">实例-提示词模式:爬取岭南学院教师名录</span></span></a>
273+
<span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">实例-提示词模式:爬取岭南学院教师名录</span></span></a>
256274
</div>
257275
</li>
258276
<li class="sidebar-item">
259277
<div class="sidebar-item-container">
260278
<a href="../../Lecture/crawing/craw_02_sec02_lingnan_dialog.html" class="sidebar-item-text sidebar-link">
261-
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">实例-对话模式:爬取岭南学院教师名录</span></span></a>
279+
<span class="menu-text"><span class="chapter-number">13</span>&nbsp; <span class="chapter-title">实例-对话模式:爬取岭南学院教师名录</span></span></a>
262280
</div>
263281
</li>
264282
</ul>
@@ -275,25 +293,25 @@
275293
<li class="sidebar-item">
276294
<div class="sidebar-item-container">
277295
<a href="../../Lecture/text_analysis/lecture_text_01_ailn.html" class="sidebar-item-text sidebar-link">
278-
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">金融文本分析(上)——从文本到结构化数据</span></span></a>
296+
<span class="menu-text"><span class="chapter-number">14</span>&nbsp; <span class="chapter-title">金融文本分析(上)——从文本到结构化数据</span></span></a>
279297
</div>
280298
</li>
281299
<li class="sidebar-item">
282300
<div class="sidebar-item-container">
283301
<a href="../../Lecture/text_analysis/lecture_text_02_ailn.html" class="sidebar-item-text sidebar-link">
284-
<span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">金融文本分析(下)——情感分析与文本建模</span></span></a>
302+
<span class="menu-text"><span class="chapter-number">15</span>&nbsp; <span class="chapter-title">金融文本分析(下)——情感分析与文本建模</span></span></a>
285303
</div>
286304
</li>
287305
<li class="sidebar-item">
288306
<div class="sidebar-item-container">
289307
<a href="../../Lecture/text_analysis/lecture_text_analysis_01_claude.html" class="sidebar-item-text sidebar-link">
290-
<span class="menu-text"><span class="chapter-number">13</span>&nbsp; <span class="chapter-title">文本数据处理基础</span></span></a>
308+
<span class="menu-text"><span class="chapter-number">16</span>&nbsp; <span class="chapter-title">文本数据处理基础</span></span></a>
291309
</div>
292310
</li>
293311
<li class="sidebar-item">
294312
<div class="sidebar-item-container">
295313
<a href="../../Lecture/text_analysis/lecture_text_analysis_02_claude.html" class="sidebar-item-text sidebar-link">
296-
<span class="menu-text"><span class="chapter-number">14</span>&nbsp; <span class="chapter-title">文本分析:情感分析与主题建模</span></span></a>
314+
<span class="menu-text"><span class="chapter-number">17</span>&nbsp; <span class="chapter-title">文本分析:情感分析与主题建模</span></span></a>
297315
</div>
298316
</li>
299317
</ul>

0 commit comments

Comments
 (0)