0%

CS329 Machine Learning Quiz 3

Before Quiz

  1. Learning

    p(θ|D)

    • p(D|θ)p(θ) closed-form solution
    • L(θ)
      • b=θL(θ)
      • H=θ2L(θ)

    θ+θH1b

    p(θ|D)=N(θMAP,HMAP1)

  2. Prediction

    p(tN+1|xN+1,D)=p(tN+1,θ|xN+1,D)dθ=p(tN+1|xN+1,θ)p(θ|D)dθ
    • tN+1=y(xN+1,θ)+v, vN(0,β1)

      p(tN+1|xN+1,D)=N(y(xN+1,θMAP)+q¯MAPTHMAP1q¯MAP)

    • y(xN+1,θ)=δ?(ΦT(xN+1)θ)

      p(tN+1|xN+1,D)=p(tN+1|xN+1,θMAP)=(y(xN+1,θMAP))tN+1(1y(xN+1,θMAP))1tN+1

  3. Evaluation

    p(D)=p(D|θ)p(θ)dθlnp(D)=lnp(D|θ)lnp(θ)dθ=lnp(D|θMAP)lnp(θMAP)+12(θθMAP)THMAP1(θθMAP)dθ=lnp(D|θMAP)lnp(θMAP)M2ln(2π)+M2ln|HMAP|

Question 1 Neural Networks without Prior

image.png

Question 1.1

What are the gradients of ykwkj, ykwji for regression and classification, respectively?

Solution 1.1

  1. Regression

    yk=akykwkj=akwkj=zjzj=h(aj),ajwji=ziykwji=akzjzjajajwij=wkjh(aj)zi
  2. Classification

    yk=σ(ak)ykwkj=ykakakwkj=σ(ak)zj=yk(1yk)zjzj=h(aj),ajwji=ziykwji=ykakakzjzjajajwij=yk(1yk)wkjh(aj)zi

Question 1.2

What are the gradients of Enwkj, Enwji for regression and classification, respectively?

Solution 1.2

δkyktkδjk=1KEnakakaj=h(aj)k=1Kwkjδk
  1. Regression

    En=12k=1K(yktk)2Enwkj=Enykykwkj=(yktk)zj=δkzjEnwji=Enajajwkj=h(aj)k=1Nwkjδkzi=δjzi
  2. Classification

    En=k=1Ktklnyk+(1tk)ln(1yk)Enwkj=Enykykakakwkj=(yktk)zj=δkzjEnwji=Enajajwkj=h(aj)k=1Nwkjδkzi=δjzi

Question 1.3

What’s the gradients of ykzi for regression and classification, respectively?

Solution 1.3

  1. Regression

    ykzi=ykakakzjzjajajzi=wkjh(aj)wji
  2. Classification

    ykzi=ykakakzjzjajajzi=yk(1yk)wkjh(aj)wji

Question 2 Neural Networks with Prior

If the prior of wN(m0,Σ01) for both regression and classification, then

Question 2.1

What are the MAP solutions of w,p(w|D) for both cases?

Solution 2.1

By iterating wnew=woldA1E(w), we obtain wMAP.

  1. Regression

    E(w)=lnp(w|t)=α2wTw+β2n=1N[y(xn,w)tn]2+CE(w)=αw+βn=1N(yntn)gng=wy(x,w)|w=wMAPA=2E(w)=αI+βH
  2. Classification

    E(w)=lnp(w|t)=α2wTwn=1N[tnlnyn+(1tn)ln(1yn)]E(w)=αw+n=1N(yntn)gng=wy(x,w)|w=wMAPA=2E(w)=αI+H

where H is the Hessian matrix of the sum of error function.

Hence we have p(wMAP|D)=N(w|wMAP,A1).

Question 2.2

What are the predictive distributions of a new data input xN+1 and label tN+1 for both cases?

Solution 2.2

  1. Regression

    p(tN+1|xN+1,D)=N(y(x,wMAP),β1+gTAg)p(tN+1|xN+1,D)=N(y(xN+1,θMAP)+q¯MAPTHMAP1q¯MAP)
  1. Classification

    p(tN+1|xN+1,D)=σ(κ(σa2)aMAP)
p(tN+1|xN+1,D)=p(tN+1|xN+1,θMAP)=(y(xN+1,θMAP))tN+1(1y(xN+1,θMAP))1tN+1

Gitalking ...