Improve white-space handling in browser clipping extension

汇总：HTML 转换 Markdown 时需要根据元素的 white-space 属性来处理空白字符

### 背景

应该按网页的实际呈现方式来解析空白字符

涵盖两种场景：

- 网页剪藏 → 浏览器拓展 ~与 Lute~ 进行处理
- 从网页复制内容粘贴到思源 → ~Lute~ 前端 进行处理

规范：https://developer.mozilla.org/zh-CN/docs/Web/CSS/white-space

white-space 属性影响空白字符的处理方式，需要依据元素实际的 white-space 属性值分别处理的空白字符包含以下几种：

  | 换行符 | 空格和制表符 | 文本换行 | 行末空格 | 行末的其他空白分隔符
-- | -- | -- | -- | -- | --
normal | 合并 | 合并 | 换行 | 移除 | 挂起
nowrap | 合并 | 合并 | 不换行 | 移除 | 挂起
pre | 保留 | 保留 | 不换行 | 保留 | 不换行
pre-wrap | 保留 | 保留 | 换行 | 挂起 | 挂起
pre-line | 保留 | 合并 | 换行 | 移除 | 挂起
break-spaces | 保留 | 保留 | 换行 | 换行 | 换行

### 方案

1. 首先获取元素的 white-space 属性值

    - 网页剪藏 → 浏览器拓展可以通过 [Window.getComputedStyle()](https://developer.mozilla.org/zh-CN/docs/Web/API/Window/getComputedStyle) 获取元素的样式，例如：
    
      ```js
      const pElement = document.querySelector('p');
      const computedStyle = window.getComputedStyle(pElement);
      console.log(computedStyle.whiteSpace); // 返回实际应用的 white-space 值（包括继承）
      ```
    
    - 从网页复制内容粘贴到思源 → 在复制得到的 HTML 中，元素的所有样式都是内联样式，可以直接判断

2. 然后针对实际的 white-space 属性值分别处理元素中的空白字符

### 关联问题

- [ ] https://github.com/siyuan-note/siyuan/issues/13195
- [x] https://github.com/siyuan-note/siyuan/issues/13838
- [x] https://github.com/siyuan-note/siyuan/issues/14400
- [ ] https://github.com/siyuan-note/siyuan/issues/14772#issuecomment-2858550587

@ruin1990 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve white-space handling in browser clipping extension #14775

背景

方案

关联问题

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	换行符	空格和制表符	文本换行	行末空格	行末的其他空白分隔符
normal	合并	合并	换行	移除	挂起
nowrap	合并	合并	不换行	移除	挂起
pre	保留	保留	不换行	保留	不换行
pre-wrap	保留	保留	换行	挂起	挂起
pre-line	保留	合并	换行	移除	挂起
break-spaces	保留	保留	换行	换行	换行

Uh oh!

Improve white-space handling in browser clipping extension #14775

Description

背景

方案

关联问题

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions