인공지능/부스트캠프 Ai Tech 2022. 1. 28. 23:41

728x90

[Pytorch]3. AutoGrad & Optimizer

In [2]:

from IPython.display import Image
import numpy as np
import torch
from torch import nn
from torch import Tensor

AutoGrad & Optimizer¶

논문 구현?¶

수많은 반복의 연속, Layer = Block 이다! 층층히 쌓아서 넘긴다.

torch.nn.Module¶

딥러닝을 구성하는 Layer의 base class
Input , Output, Forward, Backward(weights, AutoGrad) 정의
학습의 대상이 되는 parameter(tensor) 정의

nn.Parameter¶

Tensor 객체의 상속 객체
nn.Module 내에 attribute가 될 때는 required_grad=True로 지정되어야(AutoGrad의 대상이 됨)학습 대상이 되는 Tensor
직접 지정할 일은 자주 없다.(이미 지정되어 있는 경우가 많음)

In [15]:

class MyLiner(nn.Module):
    def __init__(self, in_features, out_features, bias=True):
        super(MyLiner, self).__init__()
        self.in_features = in_features 
        self.out_features = out_features # output 수 정의
        
        self.weights = nn.Parameter( # 랜덤으로 weight 지정
                        torch.randn(in_features, out_features))
        
        self.bias = nn.Parameter(torch.randn(out_features))
        
    def forward(self, x : Tensor):
        return x @ self.weights + self.bias # xW + b

In [16]:

x = torch.randn(5, 7) # 데이터 5, features 7
x

Out[16]:

tensor([[ 2.2315,  0.0956,  1.8159,  0.1136, -1.5745, -0.5874, -0.7236],
        [ 1.9480, -0.3755,  1.7415, -0.2828, -1.3193,  0.9586,  0.0352],
        [ 0.4110,  1.4071,  0.5583,  0.1973,  0.1787,  1.5075,  0.2040],
        [ 0.1399,  0.8770, -0.8301,  0.8772,  0.0065, -1.6743,  1.0261],
        [ 1.0581, -1.7018,  1.6004, -0.9947,  1.7289, -0.7401,  0.9518]])

In [17]:

layer = MyLiner(7, 12) # features의 수를 12로 늘린다.
layer(x)

Out[17]:

tensor([[-0.2589, -0.1171,  7.2235, -3.2484,  1.8778,  2.3318, -2.2566,  0.2182,
         -4.4338, -1.6875, -3.4916,  0.8335],
        [-3.2494,  2.0115,  6.5132, -0.4704,  2.4123,  3.3352, -2.7785,  2.1241,
         -5.8575, -1.3325, -0.7621,  2.7125],
        [-0.5090,  2.4947,  1.9423, -1.8518,  0.4672,  4.6600, -2.4372,  2.9846,
         -1.3338,  0.7118, -1.5926,  3.2961],
        [-0.2319, -0.7517, -2.8503, -2.9107, -0.7730,  1.3788,  0.2797, -0.9661,
         -0.0976,  2.8919, -0.1761, -3.8557],
        [-2.0619, -1.0440,  3.3978,  5.3282,  2.0542,  1.6618, -1.9756,  6.2608,
         -1.6469, -3.9365,  6.7581, -7.1280]], grad_fn=<AddBackward0>)

In [18]:

layer(x).shape

Out[18]:

torch.Size([5, 12])

In [19]:

for value in layer.parameters():
    print(value) # 미분이 시작(backward)될 때 미분이 되는 값들,

Parameter containing:
tensor([[-1.5480,  0.1631,  1.0952, -0.7923, -0.5274,  0.6655, -1.1778, -0.9224,
         -1.0598, -1.5020,  0.0603,  0.7770],
        [ 0.5643, -0.0360, -0.0771, -2.3711, -0.6963,  0.3136, -0.9718, -0.1148,
         -0.0801,  0.7217, -1.8141,  0.4873],
        [ 0.4632, -0.5194,  1.4989,  0.4411,  0.6651,  0.2560, -0.3979,  2.7196,
         -0.2847, -0.8004, -0.4595, -1.3037],
        [ 0.1466, -0.0753, -0.8628,  0.3369, -0.2586,  1.1739,  1.0376,  0.8156,
          1.4424, -0.6178, -0.5628, -0.0255],
        [ 0.4825, -0.3550, -0.3252,  1.0316, -0.4442,  0.6301, -0.3017,  1.3094,
          1.6264, -1.4975,  1.2606, -1.1069],
        [-0.8737,  1.3001,  0.2661,  0.7043,  0.1763,  1.0441, -0.3086,  0.5652,
         -0.3237, -0.1710,  0.2037,  2.3431],
        [-2.4293,  0.2242, -1.3115,  0.3302, -0.2049,  0.0649, -0.4978,  1.1964,
         -1.4836,  0.8056,  1.3153, -1.4734]], requires_grad=True)
Parameter containing:
tensor([ 0.7723,  0.8410,  0.8585,  0.1840,  1.1988,  1.8709,  0.0525,  0.5159,
        -0.4110,  1.2433,  0.5020, -0.0096], requires_grad=True)

In [23]:

dict(layer.named_parameters())

Out[23]:

{'weights': Parameter containing:
 tensor([[-1.5480,  0.1631,  1.0952, -0.7923, -0.5274,  0.6655, -1.1778, -0.9224,
          -1.0598, -1.5020,  0.0603,  0.7770],
         [ 0.5643, -0.0360, -0.0771, -2.3711, -0.6963,  0.3136, -0.9718, -0.1148,
          -0.0801,  0.7217, -1.8141,  0.4873],
         [ 0.4632, -0.5194,  1.4989,  0.4411,  0.6651,  0.2560, -0.3979,  2.7196,
          -0.2847, -0.8004, -0.4595, -1.3037],
         [ 0.1466, -0.0753, -0.8628,  0.3369, -0.2586,  1.1739,  1.0376,  0.8156,
           1.4424, -0.6178, -0.5628, -0.0255],
         [ 0.4825, -0.3550, -0.3252,  1.0316, -0.4442,  0.6301, -0.3017,  1.3094,
           1.6264, -1.4975,  1.2606, -1.1069],
         [-0.8737,  1.3001,  0.2661,  0.7043,  0.1763,  1.0441, -0.3086,  0.5652,
          -0.3237, -0.1710,  0.2037,  2.3431],
         [-2.4293,  0.2242, -1.3115,  0.3302, -0.2049,  0.0649, -0.4978,  1.1964,
          -1.4836,  0.8056,  1.3153, -1.4734]], requires_grad=True),
 'bias': Parameter containing:
 tensor([ 0.7723,  0.8410,  0.8585,  0.1840,  1.1988,  1.8709,  0.0525,  0.5159,
         -0.4110,  1.2433,  0.5020, -0.0096], requires_grad=True)}

In [13]:

class MyLiner(nn.Module): # parameter를 Tensor로 선언한다면
    def __init__(self, in_features, out_features, bias=True):
        super(MyLiner, self).__init__()
        self.in_features = in_features 
        self.out_features = out_features # output 수 정의
        
        self.weights = torch.Tensor( # 랜덤으로 weight 지정
                        torch.randn(in_features, out_features))
        
        self.bias = torch.Tensor(torch.randn(out_features))
        
    def forward(self, x : Tensor):
        return x @ self.weights + self.bias # xW + b
    
layer = MyLiner(7, 12) # features의 수를 12로 늘린다.
layer(x)

Out[13]:

tensor([[-2.3932,  0.2151,  0.3737,  0.4169,  1.2809, -0.6099, -1.9564, -4.4724,
         -4.7116, -0.1150, -1.4709, -1.6101],
        [-0.9545,  1.4772,  1.4300,  2.5543, -1.3388, -2.6983, -5.5924, -3.0663,
         -0.0821, -2.6961, -1.0033, -1.8998],
        [ 0.2819, -0.3657,  1.7904,  1.5366, -0.0506, -1.0388, -2.8368, -1.4347,
         -1.2372, -1.2309, -1.6568, -0.4108],
        [-3.2857, -1.4354,  3.0789, -0.5547,  6.1609,  1.4157,  1.3279, -2.7192,
         -8.8037, -2.2403, -1.7092,  1.2031],
        [-0.8241, -2.0260,  1.6704,  2.7957, -2.0761,  1.2131, -0.5549, -0.9469,
         -2.3634, -1.5369,  1.0155,  0.9032]])

In [14]:

for value in layer.parameters():
    # parameters에 나타나지 않는다. Tensor는 미분이 되는 값이 아니기 때문
    print(value) 

Backward : Backpropagation 수행¶

Layer에 있는 Parameter들의 미분을 수행
Forward의 결과값(model의 output = $\hat{y}$)과 실제값(${y}$) 간의 차이(loss)에 대해 미분을 수행하여 해당 값으로 Parameter를 업데이트
- AutoGrad 수행은 Backward 함수가 수행되면서 진행되는 것이다!

loss.backward() = $\frac{\partial L}{\partial W}$

In [ ]:

for epoch in range(epochs):
    '''
    # Clear gradient buffers because we don't ant any gradient from previous epoch
    to carry forward
    '''
    optimizer.zero_grad() # 이전의 학습된 값이 지금 영향을 주지 않기 위해 초기화
    
    # 모델의 결과값이 나오게 된다.
    outputs = model(inputs)
    
    # 결과값과 실제값의 차이로 loss를 구한다.
    loss = criterion(outputs, labels)
    print(loss)
    # loss에 대한 모든 wegiht 값의 gradient를 구한다.
    loss.backward()
    
    # update parameters
    optimizer.step()

AutoGrad for Linear Regression¶

출처 : https://towardsdatascience.com/linear-regression-with-pytorch-eb6dedead817

\begin{align}y = 2x + 1\end{align}

In [25]:

import numpy as np
# create dummy data for training
x_values = [i for i in range(11)]
x_train = np.array(x_values, dtype=np.float32)
x_train = x_train.reshape(-1, 1)

y_values = [2*i + 1 for i in x_values]
y_train = np.array(y_values, dtype=np.float32)
y_train = y_train.reshape(-1, 1)

In [26]:

x_train

Out[26]:

array([[ 0.],
       [ 1.],
       [ 2.],
       [ 3.],
       [ 4.],
       [ 5.],
       [ 6.],
       [ 7.],
       [ 8.],
       [ 9.],
       [10.]], dtype=float32)

In [28]:

import torch
from torch.autograd import Variable 

class LinearRegression(torch.nn.Module):
    def __init__(self, inputSize, outputSize):
        super(LinearRegression, self).__init__()
        self.linear = torch.nn.Linear(inputSize, outputSize)
        
    def forward(self, x):
        out = self.linear(x)
        return out

In [33]:

inputDim = 1        # takes variable 'x' 
outputDim = 1       # takes variable 'y'
learningRate = 0.01 
epochs = 100

model = LinearRegression(inputDim, outputDim) # 모델 생성
##### For GPU #######
if torch.cuda.is_available():
    model.cuda()
    
criterion = torch.nn.MSELoss()  # MSE Loss
# 대상이 되는 파라미터를 넣어준다. model.paramaeters()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate) 

In [ ]:

for epoch in range(epochs):
    # Converting inputs and labels to Variable
    if torch.cuda.is_available():
        # torch.AutoGrad.Variable : # autograd가 자동으로 Gradient를 계산 가능
        inputs = Variable(torch.from_numpy(x_train).cuda())
        labels = Variable(torch.from_numpy(y_train).cuda())
    else:
        inputs = Variable(torch.from_numpy(x_train))
        labels = Variable(torch.from_numpy(y_train))

    # Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, dont want to cummulate gradients
    optimizer.zero_grad()

    # get output from the model, given the inputs
    outputs = model(inputs)

    # get loss for the predicted output
    loss = criterion(outputs, labels)
    print(loss)
    # get gradients w.r.t to parameters
    loss.backward()

    # update parameters
    optimizer.step()

    print('epoch {}, loss {}'.format(epoch, loss.item()))

In [35]:

with torch.no_grad(): # we don't need gradients in the testing phase
    if torch.cuda.is_available():
        predicted = model(Variable(torch.from_numpy(x_train).cuda())).cpu().data.numpy()
    else:
        predicted = model(Variable(torch.from_numpy(x_train))).data.numpy()
    print(predicted)

[[ 0.4443256]
 [ 2.5243478]
 [ 4.6043696]
 [ 6.684392 ]
 [ 8.764414 ]
 [10.844436 ]
 [12.9244585]
 [15.00448  ]
 [17.084503 ]
 [19.164524 ]
 [21.244547 ]]

In [36]:

for p in model.parameters():
    if p.requires_grad:
         print(p.name, p.data)

None tensor([[2.0800]], device='cuda:0')
None tensor([0.4443], device='cuda:0')

Backward from the scratch¶

실제 backward는 Module 단계에서 직접 지정가능하지만 할필요는 없다..(AutoGrad)
Module에서 backward 와 optimizer 오버라이딩
- 사용자가 직접 미분 수식을 써야하는 부담이 있다.(쓸일은 없어도 순서는 이해를 해야 한다.)

Logistic Regression¶

AutoGrad 없이 직접 짜기

In [38]:

import torch
import torchvision
import torch.nn as nn
from torchvision import datasets, models, transforms
import os
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## print out the pytorch version used (1.31 at the time of this tutorial)
print(torch.__version__)

1.5.0

In [39]:

## configuration to detect cuda or cpu
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print (device)

cuda:0

In [40]:

DATA_PATH = "https://download.pytorch.org/tutorial/hymenoptera_data.zip"
import urllib
import os
import shutil
from zipfile import ZipFile


urllib.request .urlretrieve(DATA_PATH, "hymenoptera_data.zip")



with ZipFile("hymenoptera_data.zip", 'r') as zipObj:
   # Extract all the contents of zip file in current directory
   zipObj.extractall()

os.rename("hymenoptera_data", "data")

In [41]:

# configure root folder on your local
data_dir = "./data"

# 이미지 파일 변환
class ReshapeTransform:
    def __init__(self, new_size):
        self.new_size = new_size
        
    def __call__(self, img):
        result = torch.reshape(img, self.new_size)
        return result
    
## transformations used to standardize and normalize the datasets
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        ReshapeTransform((-1,)) # flattens the data
    ]),
    'val': transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        ReshapeTransform((-1,)) # flattens the data
    ]),
}

## load the correspoding folders
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}

## load the entire dataset; we are not using minibatches here
train_dataset = torch.utils.data.DataLoader(image_datasets['train'],
                                            batch_size=len(image_datasets['train']),
                                            shuffle=True)
test_dataset = torch.utils.data.DataLoader(image_datasets['val'],
                                           batch_size=len(image_datasets['val']),
                                           shuffle=True)

In [42]:

## load the entire dataset , generator
x, y = next(iter(train_dataset))

## print one example
dim = x.shape[1]
print("Dimension of image:", x.shape, "\n", 
      "Dimension of labels", y.shape)

plt.imshow(x[160].reshape(1, 3, 224, 224).squeeze().T.numpy())

Dimension of image: torch.Size([244, 150528]) 
 Dimension of labels torch.Size([244])

Out[42]:

<matplotlib.image.AxesImage at 0x1823b77cfd0>

In [65]:

class LR(nn.Module):
    def __init__(self, dim, lr=torch.scalar_tensor(0.01)):
        super(LR, self).__init__()
        # intialize parameters, AutoGrad를 사용하지 않아 Paramater를 쓰지 않는다
        self.w = torch.zeros(dim, 1, dtype=torch.float).to(device)
        self.b = torch.scalar_tensor(0).to(device)
        self.grads = {"dw": torch.zeros(dim, 1, dtype=torch.float).to(device),
                     "db": torch.scalar_tensor(0).to(device)}
        self.lr = lr.to(device)
        
    def forward(self, x):
        z = torch.mm(self.w.T, x) + self.b
        a = self.sigmoid(z) # activation
        return a # yhat
    
    def sigmoid(self, z):
        return 1/(1 + torch.exp(-z))
    
    def backward(self, x, yhat, y):
        self.grads["dw"] = (1/x.shape[1]) * torch.mm(x, (yhat - y).T)
        self.grads["db"] = (1/x.shape[1]) * torch.sum(yhat - y)
        
    def optimize(self):
        self.w = self.w - self.lr * self.grads["dw"]
        self.b = self.b - self.lr * self.grads["db"]
        
# utility functions
def loss(yhat, y):
    m = y.size()[1]
    return -(1/m)* torch.sum(y*torch.log(yhat) + (1 - y)* torch.log(1-yhat))

def predict(yhat, y):
    y_prediction = torch.zeros(1, y.size()[1])
    for i in range(yhat.size()[1]):
        if yhat[0, i] <= 0.5:
            y_prediction[0, i] = 0
        else:
            y_prediction[0, i] = 1
    return 100 - torch.mean(torch.abs(y_prediction - y)) * 100

In [67]:

## model pretesting
x, y = next(iter(train_dataset))

## flatten/transform the data
x_flatten = x.T
y = y.unsqueeze(0) 

## num_px is the dimension of the images
dim = x_flatten.shape[0]

## model instance
model = LR(dim)
model.to(device)
yhat = model.forward(x_flatten.to(device))
yhat = yhat.data.cpu()

## calculate loss
cost = loss(yhat, y)
prediction = predict(yhat, y)
print("Cost: ", cost)
print("Accuracy: ", prediction)

## backpropagate , 학습시마다 저장이 안되어 의미가 없음
model.backward(x_flatten.to(device), yhat.to(device), y.to(device))
model.optimize() 

Cost:  tensor(0.6931)
Accuracy:  tensor(50.4098)

In [68]:

## hyperparams
costs = []
dim = x_flatten.shape[0]
learning_rate = torch.scalar_tensor(0.0001).to(device)
num_iterations = 100
lrmodel = LR(dim, learning_rate)
lrmodel.to(device)

## transform the data
def transform_data(x, y):
    x_flatten = x.T
    y = y.unsqueeze(0) 
    return x_flatten, y 

## training the model
for i in range(num_iterations):
    x, y = next(iter(train_dataset))
    test_x, test_y = next(iter(test_dataset))
    x, y = transform_data(x, y)
    test_x, test_y = transform_data(test_x, test_y)

    # forward
    yhat = lrmodel.forward(x.to(device))
    cost = loss(yhat.data.cpu(), y)
    train_pred = predict(yhat, y)
        
    # backward
    lrmodel.backward(x.to(device), 
                    yhat.to(device), 
                    y.to(device))
    lrmodel.optimize()
    ## test
    yhat_test = lrmodel.forward(test_x.to(device))
    test_pred = predict(yhat_test, test_y)

    if i % 10 == 0:
        costs.append(cost)

    if i % 10 == 0:
        print("Cost after iteration {}: {} | \
              Train Acc: {} | Test Acc: {}".format(i, \
                                                        cost, 
                                                        train_pred,
                                                        test_pred))

Cost after iteration 0: 0.6931470036506653 |               Train Acc: 50.40983581542969 | Test Acc: 45.75163269042969
Cost after iteration 10: 0.669146716594696 |               Train Acc: 64.3442611694336 | Test Acc: 54.24836730957031
Cost after iteration 20: 0.6513173580169678 |               Train Acc: 68.44261932373047 | Test Acc: 54.24836730957031
Cost after iteration 30: 0.6367812156677246 |               Train Acc: 68.03278350830078 | Test Acc: 54.24836730957031
Cost after iteration 40: 0.6245325207710266 |               Train Acc: 69.67213439941406 | Test Acc: 54.90196228027344
Cost after iteration 50: 0.6139214038848877 |               Train Acc: 70.90164184570312 | Test Acc: 56.20914840698242
Cost after iteration 60: 0.6045222878456116 |               Train Acc: 72.54098510742188 | Test Acc: 56.86274337768555
Cost after iteration 70: 0.596049964427948 |               Train Acc: 74.18032836914062 | Test Acc: 57.51633834838867
Cost after iteration 80: 0.58830726146698 |               Train Acc: 73.77049255371094 | Test Acc: 57.51633834838867
Cost after iteration 90: 0.581154465675354 |               Train Acc: 74.59016418457031 | Test Acc: 58.1699333190918

In [69]:

## the trend in the context of loss
plt.plot(costs)
plt.show()

In [70]:

costs

Out[70]:

[tensor(0.6931),
 tensor(0.6691),
 tensor(0.6513),
 tensor(0.6368),
 tensor(0.6245),
 tensor(0.6139),
 tensor(0.6045),
 tensor(0.5960),
 tensor(0.5883),
 tensor(0.5812)]

In [ ]:

'인공지능 > 부스트캠프 Ai Tech' 카테고리의 다른 글

[Pytorch]5. Model Load (0)	2022.01.28
[Pytorch]4. Dataset & Dataloader (0)	2022.01.28
[Pytorch]2. project template (0)	2022.01.28
[Pytorch]1. PyTorch Basics (0)	2022.01.28
[python]6. pandas_2 (0)	2022.01.21

ABOUT ME

AI_RecSys AI_RecSys

AutoGrad & Optimizer¶

논문 구현?¶

torch.nn.Module¶

nn.Parameter¶

Backward : Backpropagation 수행¶

AutoGrad for Linear Regression¶

Backward from the scratch¶

Logistic Regression¶

'인공지능 > 부스트캠프 Ai Tech' 카테고리의 다른 글

티스토리툴바

ABOUT ME

AutoGrad & Optimizer¶

논문 구현?¶

torch.nn.Module¶

nn.Parameter¶

Backward : Backpropagation 수행¶

AutoGrad for Linear Regression¶

Backward from the scratch¶

Logistic Regression¶

'인공지능 > 부스트캠프 Ai Tech' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바