-
[Pytorch]3. AutoGrad & Optimizer인공지능/부스트캠프 Ai Tech 2022. 1. 28. 23:41728x90
[Pytorch]3. AutoGrad & Optimizer In [2]:from IPython.display import Image import numpy as np import torch from torch import nn from torch import Tensor
nn.Parameter¶
- Tensor 객체의 상속 객체
- nn.Module 내에 attribute가 될 때는 required_grad=True로 지정되어야(AutoGrad의 대상이 됨)학습 대상이 되는 Tensor
- 직접 지정할 일은 자주 없다.(이미 지정되어 있는 경우가 많음)
In [15]:class MyLiner(nn.Module): def __init__(self, in_features, out_features, bias=True): super(MyLiner, self).__init__() self.in_features = in_features self.out_features = out_features # output 수 정의 self.weights = nn.Parameter( # 랜덤으로 weight 지정 torch.randn(in_features, out_features)) self.bias = nn.Parameter(torch.randn(out_features)) def forward(self, x : Tensor): return x @ self.weights + self.bias # xW + b
In [16]:x = torch.randn(5, 7) # 데이터 5, features 7 x
Out[16]:tensor([[ 2.2315, 0.0956, 1.8159, 0.1136, -1.5745, -0.5874, -0.7236], [ 1.9480, -0.3755, 1.7415, -0.2828, -1.3193, 0.9586, 0.0352], [ 0.4110, 1.4071, 0.5583, 0.1973, 0.1787, 1.5075, 0.2040], [ 0.1399, 0.8770, -0.8301, 0.8772, 0.0065, -1.6743, 1.0261], [ 1.0581, -1.7018, 1.6004, -0.9947, 1.7289, -0.7401, 0.9518]])
In [17]:layer = MyLiner(7, 12) # features의 수를 12로 늘린다. layer(x)
Out[17]:tensor([[-0.2589, -0.1171, 7.2235, -3.2484, 1.8778, 2.3318, -2.2566, 0.2182, -4.4338, -1.6875, -3.4916, 0.8335], [-3.2494, 2.0115, 6.5132, -0.4704, 2.4123, 3.3352, -2.7785, 2.1241, -5.8575, -1.3325, -0.7621, 2.7125], [-0.5090, 2.4947, 1.9423, -1.8518, 0.4672, 4.6600, -2.4372, 2.9846, -1.3338, 0.7118, -1.5926, 3.2961], [-0.2319, -0.7517, -2.8503, -2.9107, -0.7730, 1.3788, 0.2797, -0.9661, -0.0976, 2.8919, -0.1761, -3.8557], [-2.0619, -1.0440, 3.3978, 5.3282, 2.0542, 1.6618, -1.9756, 6.2608, -1.6469, -3.9365, 6.7581, -7.1280]], grad_fn=<AddBackward0>)
In [18]:layer(x).shape
Out[18]:torch.Size([5, 12])
In [19]:for value in layer.parameters(): print(value) # 미분이 시작(backward)될 때 미분이 되는 값들,
Parameter containing: tensor([[-1.5480, 0.1631, 1.0952, -0.7923, -0.5274, 0.6655, -1.1778, -0.9224, -1.0598, -1.5020, 0.0603, 0.7770], [ 0.5643, -0.0360, -0.0771, -2.3711, -0.6963, 0.3136, -0.9718, -0.1148, -0.0801, 0.7217, -1.8141, 0.4873], [ 0.4632, -0.5194, 1.4989, 0.4411, 0.6651, 0.2560, -0.3979, 2.7196, -0.2847, -0.8004, -0.4595, -1.3037], [ 0.1466, -0.0753, -0.8628, 0.3369, -0.2586, 1.1739, 1.0376, 0.8156, 1.4424, -0.6178, -0.5628, -0.0255], [ 0.4825, -0.3550, -0.3252, 1.0316, -0.4442, 0.6301, -0.3017, 1.3094, 1.6264, -1.4975, 1.2606, -1.1069], [-0.8737, 1.3001, 0.2661, 0.7043, 0.1763, 1.0441, -0.3086, 0.5652, -0.3237, -0.1710, 0.2037, 2.3431], [-2.4293, 0.2242, -1.3115, 0.3302, -0.2049, 0.0649, -0.4978, 1.1964, -1.4836, 0.8056, 1.3153, -1.4734]], requires_grad=True) Parameter containing: tensor([ 0.7723, 0.8410, 0.8585, 0.1840, 1.1988, 1.8709, 0.0525, 0.5159, -0.4110, 1.2433, 0.5020, -0.0096], requires_grad=True)
In [23]:dict(layer.named_parameters())
Out[23]:{'weights': Parameter containing: tensor([[-1.5480, 0.1631, 1.0952, -0.7923, -0.5274, 0.6655, -1.1778, -0.9224, -1.0598, -1.5020, 0.0603, 0.7770], [ 0.5643, -0.0360, -0.0771, -2.3711, -0.6963, 0.3136, -0.9718, -0.1148, -0.0801, 0.7217, -1.8141, 0.4873], [ 0.4632, -0.5194, 1.4989, 0.4411, 0.6651, 0.2560, -0.3979, 2.7196, -0.2847, -0.8004, -0.4595, -1.3037], [ 0.1466, -0.0753, -0.8628, 0.3369, -0.2586, 1.1739, 1.0376, 0.8156, 1.4424, -0.6178, -0.5628, -0.0255], [ 0.4825, -0.3550, -0.3252, 1.0316, -0.4442, 0.6301, -0.3017, 1.3094, 1.6264, -1.4975, 1.2606, -1.1069], [-0.8737, 1.3001, 0.2661, 0.7043, 0.1763, 1.0441, -0.3086, 0.5652, -0.3237, -0.1710, 0.2037, 2.3431], [-2.4293, 0.2242, -1.3115, 0.3302, -0.2049, 0.0649, -0.4978, 1.1964, -1.4836, 0.8056, 1.3153, -1.4734]], requires_grad=True), 'bias': Parameter containing: tensor([ 0.7723, 0.8410, 0.8585, 0.1840, 1.1988, 1.8709, 0.0525, 0.5159, -0.4110, 1.2433, 0.5020, -0.0096], requires_grad=True)}
In [13]:class MyLiner(nn.Module): # parameter를 Tensor로 선언한다면 def __init__(self, in_features, out_features, bias=True): super(MyLiner, self).__init__() self.in_features = in_features self.out_features = out_features # output 수 정의 self.weights = torch.Tensor( # 랜덤으로 weight 지정 torch.randn(in_features, out_features)) self.bias = torch.Tensor(torch.randn(out_features)) def forward(self, x : Tensor): return x @ self.weights + self.bias # xW + b layer = MyLiner(7, 12) # features의 수를 12로 늘린다. layer(x)
Out[13]:tensor([[-2.3932, 0.2151, 0.3737, 0.4169, 1.2809, -0.6099, -1.9564, -4.4724, -4.7116, -0.1150, -1.4709, -1.6101], [-0.9545, 1.4772, 1.4300, 2.5543, -1.3388, -2.6983, -5.5924, -3.0663, -0.0821, -2.6961, -1.0033, -1.8998], [ 0.2819, -0.3657, 1.7904, 1.5366, -0.0506, -1.0388, -2.8368, -1.4347, -1.2372, -1.2309, -1.6568, -0.4108], [-3.2857, -1.4354, 3.0789, -0.5547, 6.1609, 1.4157, 1.3279, -2.7192, -8.8037, -2.2403, -1.7092, 1.2031], [-0.8241, -2.0260, 1.6704, 2.7957, -2.0761, 1.2131, -0.5549, -0.9469, -2.3634, -1.5369, 1.0155, 0.9032]])
In [14]:for value in layer.parameters(): # parameters에 나타나지 않는다. Tensor는 미분이 되는 값이 아니기 때문 print(value)
Backward : Backpropagation 수행¶
- Layer에 있는 Parameter들의 미분을 수행
Forward의 결과값(model의 output = $\hat{y}$)과 실제값(${y}$) 간의 차이(loss)에 대해 미분을 수행하여 해당 값으로 Parameter를 업데이트
- AutoGrad 수행은 Backward 함수가 수행되면서 진행되는 것이다!
loss.backward() = $\frac{\partial L}{\partial W}$
In [ ]:for epoch in range(epochs): ''' # Clear gradient buffers because we don't ant any gradient from previous epoch to carry forward ''' optimizer.zero_grad() # 이전의 학습된 값이 지금 영향을 주지 않기 위해 초기화 # 모델의 결과값이 나오게 된다. outputs = model(inputs) # 결과값과 실제값의 차이로 loss를 구한다. loss = criterion(outputs, labels) print(loss) # loss에 대한 모든 wegiht 값의 gradient를 구한다. loss.backward() # update parameters optimizer.step()
AutoGrad for Linear Regression¶
출처 : https://towardsdatascience.com/linear-regression-with-pytorch-eb6dedead817
\begin{align}y = 2x + 1\end{align}In [25]:import numpy as np # create dummy data for training x_values = [i for i in range(11)] x_train = np.array(x_values, dtype=np.float32) x_train = x_train.reshape(-1, 1) y_values = [2*i + 1 for i in x_values] y_train = np.array(y_values, dtype=np.float32) y_train = y_train.reshape(-1, 1)
In [26]:x_train
Out[26]:array([[ 0.], [ 1.], [ 2.], [ 3.], [ 4.], [ 5.], [ 6.], [ 7.], [ 8.], [ 9.], [10.]], dtype=float32)
In [28]:import torch from torch.autograd import Variable class LinearRegression(torch.nn.Module): def __init__(self, inputSize, outputSize): super(LinearRegression, self).__init__() self.linear = torch.nn.Linear(inputSize, outputSize) def forward(self, x): out = self.linear(x) return out
In [33]:inputDim = 1 # takes variable 'x' outputDim = 1 # takes variable 'y' learningRate = 0.01 epochs = 100 model = LinearRegression(inputDim, outputDim) # 모델 생성 ##### For GPU ####### if torch.cuda.is_available(): model.cuda() criterion = torch.nn.MSELoss() # MSE Loss # 대상이 되는 파라미터를 넣어준다. model.paramaeters() optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
In [ ]:for epoch in range(epochs): # Converting inputs and labels to Variable if torch.cuda.is_available(): # torch.AutoGrad.Variable : # autograd가 자동으로 Gradient를 계산 가능 inputs = Variable(torch.from_numpy(x_train).cuda()) labels = Variable(torch.from_numpy(y_train).cuda()) else: inputs = Variable(torch.from_numpy(x_train)) labels = Variable(torch.from_numpy(y_train)) # Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, dont want to cummulate gradients optimizer.zero_grad() # get output from the model, given the inputs outputs = model(inputs) # get loss for the predicted output loss = criterion(outputs, labels) print(loss) # get gradients w.r.t to parameters loss.backward() # update parameters optimizer.step() print('epoch {}, loss {}'.format(epoch, loss.item()))
In [35]:with torch.no_grad(): # we don't need gradients in the testing phase if torch.cuda.is_available(): predicted = model(Variable(torch.from_numpy(x_train).cuda())).cpu().data.numpy() else: predicted = model(Variable(torch.from_numpy(x_train))).data.numpy() print(predicted)
[[ 0.4443256] [ 2.5243478] [ 4.6043696] [ 6.684392 ] [ 8.764414 ] [10.844436 ] [12.9244585] [15.00448 ] [17.084503 ] [19.164524 ] [21.244547 ]]
In [36]:for p in model.parameters(): if p.requires_grad: print(p.name, p.data)
None tensor([[2.0800]], device='cuda:0') None tensor([0.4443], device='cuda:0')
Backward from the scratch¶
- 실제 backward는 Module 단계에서 직접 지정가능하지만 할필요는 없다..(AutoGrad)
- Module에서 backward 와 optimizer 오버라이딩
- 사용자가 직접 미분 수식을 써야하는 부담이 있다.(쓸일은 없어도 순서는 이해를 해야 한다.)
Logistic Regression¶
AutoGrad 없이 직접 짜기
In [38]:import torch import torchvision import torch.nn as nn from torchvision import datasets, models, transforms import os import numpy as np import matplotlib.pyplot as plt %matplotlib inline ## print out the pytorch version used (1.31 at the time of this tutorial) print(torch.__version__)
1.5.0
In [39]:## configuration to detect cuda or cpu device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print (device)
cuda:0
In [40]:DATA_PATH = "https://download.pytorch.org/tutorial/hymenoptera_data.zip" import urllib import os import shutil from zipfile import ZipFile urllib.request .urlretrieve(DATA_PATH, "hymenoptera_data.zip") with ZipFile("hymenoptera_data.zip", 'r') as zipObj: # Extract all the contents of zip file in current directory zipObj.extractall() os.rename("hymenoptera_data", "data")
In [41]:# configure root folder on your local data_dir = "./data" # 이미지 파일 변환 class ReshapeTransform: def __init__(self, new_size): self.new_size = new_size def __call__(self, img): result = torch.reshape(img, self.new_size) return result ## transformations used to standardize and normalize the datasets data_transforms = { 'train': transforms.Compose([ transforms.Resize(224), transforms.CenterCrop(224), transforms.ToTensor(), ReshapeTransform((-1,)) # flattens the data ]), 'val': transforms.Compose([ transforms.Resize(224), transforms.CenterCrop(224), transforms.ToTensor(), ReshapeTransform((-1,)) # flattens the data ]), } ## load the correspoding folders image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']} ## load the entire dataset; we are not using minibatches here train_dataset = torch.utils.data.DataLoader(image_datasets['train'], batch_size=len(image_datasets['train']), shuffle=True) test_dataset = torch.utils.data.DataLoader(image_datasets['val'], batch_size=len(image_datasets['val']), shuffle=True)
In [42]:## load the entire dataset , generator x, y = next(iter(train_dataset)) ## print one example dim = x.shape[1] print("Dimension of image:", x.shape, "\n", "Dimension of labels", y.shape) plt.imshow(x[160].reshape(1, 3, 224, 224).squeeze().T.numpy())
Dimension of image: torch.Size([244, 150528]) Dimension of labels torch.Size([244])
Out[42]:<matplotlib.image.AxesImage at 0x1823b77cfd0>
In [65]:class LR(nn.Module): def __init__(self, dim, lr=torch.scalar_tensor(0.01)): super(LR, self).__init__() # intialize parameters, AutoGrad를 사용하지 않아 Paramater를 쓰지 않는다 self.w = torch.zeros(dim, 1, dtype=torch.float).to(device) self.b = torch.scalar_tensor(0).to(device) self.grads = {"dw": torch.zeros(dim, 1, dtype=torch.float).to(device), "db": torch.scalar_tensor(0).to(device)} self.lr = lr.to(device) def forward(self, x): z = torch.mm(self.w.T, x) + self.b a = self.sigmoid(z) # activation return a # yhat def sigmoid(self, z): return 1/(1 + torch.exp(-z)) def backward(self, x, yhat, y): self.grads["dw"] = (1/x.shape[1]) * torch.mm(x, (yhat - y).T) self.grads["db"] = (1/x.shape[1]) * torch.sum(yhat - y) def optimize(self): self.w = self.w - self.lr * self.grads["dw"] self.b = self.b - self.lr * self.grads["db"] # utility functions def loss(yhat, y): m = y.size()[1] return -(1/m)* torch.sum(y*torch.log(yhat) + (1 - y)* torch.log(1-yhat)) def predict(yhat, y): y_prediction = torch.zeros(1, y.size()[1]) for i in range(yhat.size()[1]): if yhat[0, i] <= 0.5: y_prediction[0, i] = 0 else: y_prediction[0, i] = 1 return 100 - torch.mean(torch.abs(y_prediction - y)) * 100
In [67]:## model pretesting x, y = next(iter(train_dataset)) ## flatten/transform the data x_flatten = x.T y = y.unsqueeze(0) ## num_px is the dimension of the images dim = x_flatten.shape[0] ## model instance model = LR(dim) model.to(device) yhat = model.forward(x_flatten.to(device)) yhat = yhat.data.cpu() ## calculate loss cost = loss(yhat, y) prediction = predict(yhat, y) print("Cost: ", cost) print("Accuracy: ", prediction) ## backpropagate , 학습시마다 저장이 안되어 의미가 없음 model.backward(x_flatten.to(device), yhat.to(device), y.to(device)) model.optimize()
Cost: tensor(0.6931) Accuracy: tensor(50.4098)
In [68]:## hyperparams costs = [] dim = x_flatten.shape[0] learning_rate = torch.scalar_tensor(0.0001).to(device) num_iterations = 100 lrmodel = LR(dim, learning_rate) lrmodel.to(device) ## transform the data def transform_data(x, y): x_flatten = x.T y = y.unsqueeze(0) return x_flatten, y ## training the model for i in range(num_iterations): x, y = next(iter(train_dataset)) test_x, test_y = next(iter(test_dataset)) x, y = transform_data(x, y) test_x, test_y = transform_data(test_x, test_y) # forward yhat = lrmodel.forward(x.to(device)) cost = loss(yhat.data.cpu(), y) train_pred = predict(yhat, y) # backward lrmodel.backward(x.to(device), yhat.to(device), y.to(device)) lrmodel.optimize() ## test yhat_test = lrmodel.forward(test_x.to(device)) test_pred = predict(yhat_test, test_y) if i % 10 == 0: costs.append(cost) if i % 10 == 0: print("Cost after iteration {}: {} | \ Train Acc: {} | Test Acc: {}".format(i, \ cost, train_pred, test_pred))
Cost after iteration 0: 0.6931470036506653 | Train Acc: 50.40983581542969 | Test Acc: 45.75163269042969 Cost after iteration 10: 0.669146716594696 | Train Acc: 64.3442611694336 | Test Acc: 54.24836730957031 Cost after iteration 20: 0.6513173580169678 | Train Acc: 68.44261932373047 | Test Acc: 54.24836730957031 Cost after iteration 30: 0.6367812156677246 | Train Acc: 68.03278350830078 | Test Acc: 54.24836730957031 Cost after iteration 40: 0.6245325207710266 | Train Acc: 69.67213439941406 | Test Acc: 54.90196228027344 Cost after iteration 50: 0.6139214038848877 | Train Acc: 70.90164184570312 | Test Acc: 56.20914840698242 Cost after iteration 60: 0.6045222878456116 | Train Acc: 72.54098510742188 | Test Acc: 56.86274337768555 Cost after iteration 70: 0.596049964427948 | Train Acc: 74.18032836914062 | Test Acc: 57.51633834838867 Cost after iteration 80: 0.58830726146698 | Train Acc: 73.77049255371094 | Test Acc: 57.51633834838867 Cost after iteration 90: 0.581154465675354 | Train Acc: 74.59016418457031 | Test Acc: 58.1699333190918
In [69]:## the trend in the context of loss plt.plot(costs) plt.show()
In [70]:costs
Out[70]:[tensor(0.6931), tensor(0.6691), tensor(0.6513), tensor(0.6368), tensor(0.6245), tensor(0.6139), tensor(0.6045), tensor(0.5960), tensor(0.5883), tensor(0.5812)]
In [ ]:'인공지능 > 부스트캠프 Ai Tech' 카테고리의 다른 글
[Pytorch]5. Model Load (0) 2022.01.28 [Pytorch]4. Dataset & Dataloader (0) 2022.01.28 [Pytorch]2. project template (0) 2022.01.28 [Pytorch]1. PyTorch Basics (0) 2022.01.28 [python]6. pandas_2 (0) 2022.01.21