learn-r-with-ai/part3.qmd at main · htlin222/learn-r-with-ai · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# 第三部分：產出你的 Table 1

```{r}
#| label: setup
#| include: false
source("_common.R")
```

本部分預計時間：40 分鐘

在第二部分，你已經學會讀取資料並探索它的基本結構。現在我們要用這些資料產出論文中最重要的表格 — Table 1。

::: {.callout-note collapse="true" title="一鍵 Prompt：整章一次跑完"}

> 我有一個 R 資料框 `patient_data`（從 patient_data.csv 讀取），欄位有 treatment（A/B）、age、gender（M/F）、los（住院天數）。請幫我用 gtsummary 一次完成：
>
> 1. 建立 Table 1，依 treatment 分組，連續變數顯示 mean ± SD，類別變數顯示 n (%)
> 2. 加上 p-value（`add_p()`），並解釋 gtsummary 怎麼自動選擇統計方法
> 3. 把欄位名稱改成中文（age → 年齡、gender → 性別、los → 住院天數），p < 0.05 粗體
> 4. 用 flextable 和 officer 把表格匯出成 Word 檔（Table1.docx）
>
> 給我一個可以從 `read.csv()` 到匯出 Word 一次跑完的完整 script，加上中文註解。

:::

::: {.callout-note}
## 為什麼要重新讀取資料？
本書的每個章節都會重新讀取 `patient_data.csv`，這樣你可以從任何一個章節開始練習，不需要按順序執行前面的章節。
:::

## 任務 9：Table 1 是什麼？

📋 **複製這段話，貼給 AI：**

> 在醫學論文裡面，Table 1 通常是什麼？它的目的是什麼？裡面通常會放哪些東西？

### Table 1 說明

在醫學論文中，Table 1 通常是「基線特徵表」（Baseline Characteristics Table），用來：

1. 展示研究對象的基本特徵
2. 比較不同組別的基線資料
3. 讓讀者判斷組別是否平衡
4. 提供研究族群的整體概況

## 任務 10：你的第一個 Table 1

📋 **複製這段話，貼給 AI：**

> 我有一個 R 資料框叫做 `patient_data`，裡面有這些欄位：
>
> - treatment：治療組別（A 或 B）
> - age：年齡
> - gender：性別（M 或 F）
> - los：住院天數
>
> 請用 gtsummary 套件幫我做一個 Table 1，依照 treatment 分組，顯示其他變數的描述性統計。請給我可以直接執行的程式碼。

### 建立 Table 1

```{r}
#| label: table1-basic
#| echo: true
#| message: false
#| warning: false

library(gtsummary)
library(dplyr)

# 讀取資料
patient_data <- read.csv("patient_data.csv")

# 建立基本的 Table 1
# 小提醒：%>% 是「管道」符號，把左邊的結果傳給右邊的函數
# 可以想像成工廠流水線：資料 → 選欄位 → 做摘要
table1 <- patient_data %>%
  select(treatment, age, gender, los) %>%
  tbl_summary(
    by = treatment,
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )

table1
```

## 任務 11：加上統計檢定

📋 **複製這段話，貼給 AI：**

> 剛剛的 Table 1 很棒。請幫我加上 p-value，讓我可以看出兩組之間有沒有統計顯著差異。

### 加入 p-value

```{r}
#| label: table1-pvalue
#| echo: true
#| message: false
#| warning: false

# 建立含 p-value 的 Table 1
table1_with_p <- patient_data %>%
  select(treatment, age, gender, los) %>%
  tbl_summary(
    by = treatment,
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) %>%
  add_p()

table1_with_p
```

## 任務 12：看懂 gtsummary 的選擇

📋 **複製這段話，貼給 AI：**

> gtsummary 在計算 p-value 的時候，是怎麼決定要用什麼統計方法的？例如，什麼時候用 t-test、什麼時候用 Wilcoxon、什麼時候用卡方檢定？

### gtsummary 的統計方法選擇

gtsummary 會根據資料類型自動選擇適當的統計檢定：

- **連續變數**：預設使用 Wilcoxon rank-sum test（無母數檢定）
- **類別變數**：使用 Chi-square test（卡方檢定）或 Fisher's exact test（當樣本數小時）

你可以自訂統計方法：

```{r}
#| label: custom-tests
#| echo: true
#| message: false
#| warning: false

# 自訂統計檢定方法
table1_custom <- patient_data %>%
  select(treatment, age, gender, los) %>%
  tbl_summary(
    by = treatment,
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) %>%
  add_p(
    test = list(
      age ~ "t.test",        # 使用 t-test
      los ~ "wilcox.test",   # 使用 Wilcoxon test
      gender ~ "chisq.test"  # 使用卡方檢定
    )
  )

table1_custom
```

## 任務 13：客製化你的表格

📋 **複製這段話，貼給 AI：**

> 我想要修改我的 gtsummary 表格：
>
> 1. 連續變數顯示「平均值 ± 標準差」而不是中位數
> 2. 把欄位名稱改成中文（age → 年齡、gender → 性別、los → 住院天數）
> 3. p-value 如果小於 0.05 就用粗體標示

### 客製化表格

```{r}
#| label: table1-custom
#| echo: true
#| message: false
#| warning: false

# 客製化 Table 1
table1_final <- patient_data %>%
  select(treatment, age, gender, los) %>%
  tbl_summary(
    by = treatment,
    label = list(
      age ~ "年齡",
      gender ~ "性別",
      los ~ "住院天數"
    ),
    statistic = list(
      all_continuous() ~ "{mean} ± {sd}",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) %>%
  add_p() %>%
  bold_p(t = 0.05) %>%
  modify_header(label ~ "**變項**") %>%
  modify_spanning_header(c("stat_1", "stat_2") ~ "**治療組別**")

table1_final
```

## 任務 14：匯出你的表格

📋 **複製這段話，貼給 AI：**

> 我做好了一個 gtsummary 的表格，想存在變數 table1_final 裡面。我想要用 flextable 和 officer 把它輸出成 Word 跟 PPT 檔案，方便貼到我的論文。請給我程式碼。

### 匯出表格

```{r}
#| label: export-table
#| echo: true
#| eval: false

# 安裝必要套件（如果還沒安裝）
# install.packages(c("flextable", "officer"))

library(flextable)
library(officer)

# 儲存表格為變數
my_table <- table1_final

# 轉換為 flextable 格式
ft <- my_table %>% as_flex_table()

# 存成 Word 檔
doc <- read_docx() %>%
  body_add_flextable(ft)
print(doc, target = "Table1.docx")

# 存成 PPT 檔
ppt <- read_pptx() %>%
  add_slide(layout = "Title and Content", master = "Office Theme") %>%
  ph_with(ft, location = ph_location_type(type = "body"))
print(ppt, target = "Table1.pptx")

print("Table 1 已成功匯出為 Table1.docx 和 Table1.pptx")
```