Skip to content

ch 11 练习11.9.2 的解答是否符合题意 #107

Open
@YueZhengMeng

Description

练习11.9.2

展示如何在不使用$\mathbf{g}_t'$的情况下实现算法。为什么这是个好主意?
解答:
  在不使用 $\mathbf{g}_t'$ 的情况下,Adadelta算法的更新步骤可以进行如下修改:

def adadelta(params, states, hyperparams):
    rho, eps = hyperparams['rho'], 1e-5
    
    for param, (s, delta) in zip(params, states):
        with torch.no_grad():
            # 计算梯度平方的移动平均值
            s[:] = rho * s + (1 - rho) * param.grad ** 2
            
            # 计算参数更新的变化量
            update = (torch.sqrt(delta + eps) / torch.sqrt(s + eps)) * param.grad
            
            # 更新参数
            param[:] -= update
            
            # 计算参数更新的变化量的移动平均值
            delta[:] = rho * delta + (1 - rho) * update ** 2
        
        # 清零梯度
        param.grad.data.zero_()

我逐行对比了这段代码与书中的adadelta的实现代码:

def adadelta(params, states, hyperparams):
    rho, eps = hyperparams['rho'], 1e-5
    for p, (s, delta) in zip(params, states):
        with torch.no_grad():
            # In-placeupdatesvia[:]
            s[:] = rho * s + (1 - rho) * torch.square(p.grad)
            g = (torch.sqrt(delta + eps) / torch.sqrt(s + eps)) * p.grad
            p[:] -= g
            delta[:] = rho * delta + (1 - rho) * g * g
        p.grad.data.zero_()

发现二者的区别仅相当于把变量g改名为update
并没有根据题意实现不使用 $\mathbf{g}_t'$ 实现算法

我的水平有限,也给不出更好的解答
请社区大佬们指点

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions