RNN 에 대하여

8월 08, 2018 0 Comments

RNN에 대하여 공유할 일이 생겨서 작성하는 RNN에 대한 포스트 입니다.

RNN?

RNN은 순차적인 데이터를 학습하는 뉴럴네트워크 입니다. 구글 번역기로 대표적인 자연어처리가 그 한 예 입니다.

RNN의 입력값은 아래의 데이터처럼 순차적인 데이터인 𝓍가 들어 갑니다.𝓍의 예시로는 문장이 될 수 있습니다.

우리가 원하는 출력값인 о는 아래와 같이 입력값 처럼 순차적인 시퀀스 데이터가 나올 수 있습니다.

중간의 s는 Hidden State로 중간중간 결과마다 State값을 매긴 결과 이고, 수식은 $s_t=f(Ux_t+Ws_t-1)$ . 으로 나타낼 수 있습니다. 아래 그림처럼 볼 수 있듯이 각 State마다 결과 값 о를 출력 할 수 있습니다.

W,U,V는 파라미터 값 이고, 모든 State 마다 공유 합니다.

RNN (https://aikorea.org/blog/rnn-tutorial-1/)

Tensorflow로 짠 RNN Core

Oreilly의 Learning TensorFlow의 RNN예제를 분석 하려고 합니다.
아래 그림은 Tensorboard를 통해서 텐서플로우 코드를 도식화 한 그림 입니다.
RNN의 Flow라기 보다는 딥러닝 자체의 플로우로 봐도 무방 해 보입니다.

RNN 학습 로그를 Tensorboard로 분석한 결과

Input, Transpose

아래 코드는 RNN 예제 데이터인 mnist 데이터를 불러오고, 데이터를 Transpose 합니다.

	from __future__ import print_function
	import tensorflow as tf

	from tensorflow.examples.tutorials.mnist import input_data
	mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

	element_size = 28
	time_steps = 28
	num_classes = 10
	batch_size = 128
	hidden_layer_size = 128

	LOG_DIR = "logs/RNN_with_summaries"

	_inputs = tf.placeholder(tf.float32,
	shape=[None, time_steps, element_size],
	name='inputs')
	y = tf.placeholder(tf.float32, shape=[None, num_classes], name='labels')


	# Processing inputs to work with scan function
	# Current input shape: (batch_size, time_steps, element_size)
	processed_input = tf.transpose(_inputs, perm=[1, 0, 2])
	# Current input shape now: (time_steps,batch_size, element_size)

view raw rnn_minist_01_input.py hosted with ❤ by GitHub

RNN States

아래는 Hidden State를 구하는 부분 입니다.

	# Weights and bias for input and hidden layer
	with tf.name_scope('rnn_weights'):
	with tf.name_scope("W_x"):
	Wx = tf.Variable(tf.zeros([element_size, hidden_layer_size]))
	variable_summaries(Wx)
	with tf.name_scope("W_h"):
	Wh = tf.Variable(tf.zeros([hidden_layer_size, hidden_layer_size]))
	variable_summaries(Wh)
	with tf.name_scope("Bias"):
	b_rnn = tf.Variable(tf.zeros([hidden_layer_size]))
	variable_summaries(b_rnn)

	def rnn_step(previous_hidden_state, x):

	current_hidden_state = tf.tanh(
	tf.matmul(previous_hidden_state, Wh) +
	tf.matmul(x, Wx) + b_rnn)

	return current_hidden_state

	initial_hidden = tf.zeros([batch_size, hidden_layer_size])
	# Getting all state vectors across time
	all_hidden_states = tf.scan(rnn_step,
	processed_input,
	initializer=initial_hidden,
	name='states')

view raw rnn_minist_02_states.py hosted with ❤ by GitHub

RNN Output

아래 코드는 RNN의 출력값 o를 나타 냅니다. 출력값은 선형 방식을 사용하였습니다.

	# Weights for output layers
	with tf.name_scope('linear_layer_weights') as scope:
	with tf.name_scope("W_linear"):
	Wl = tf.Variable(tf.truncated_normal([hidden_layer_size, num_classes],
	mean=0, stddev=.01))
	variable_summaries(Wl)
	with tf.name_scope("Bias_linear"):
	bl = tf.Variable(tf.truncated_normal([num_classes],
	mean=0, stddev=.01))
	variable_summaries(bl)


	# Apply linear layer to state vector
	def get_linear_layer(hidden_state):
	return tf.matmul(hidden_state, Wl) + bl


	with tf.name_scope('outputs') as scope:
	# Iterate across time, apply linear layer to all RNN outputs
	all_outputs = tf.map_fn(get_linear_layer, all_hidden_states)
	# Get Last output -- h_28
	output = all_outputs[-1]
	tf.summary.histogram('outputs', output)

view raw rnn_minist_03_outputs.py hosted with ❤ by GitHub

RNN Train, Test

아래 코드는 위에서 쌓은 Tensor 객체를 통해서 Train과 Test를 진행 합니다.

	# Merge all the summaries
	merged = tf.summary.merge_all()

	# Get a small test set
	test_data = mnist.test.images[:batch_size].reshape((-1, time_steps, element_size))
	test_label = mnist.test.labels[:batch_size]

	with tf.Session() as sess:
	# Write summaries to LOG_DIR -- used by TensorBoard
	train_writer = tf.summary.FileWriter(LOG_DIR + '/train',
	graph=tf.get_default_graph())
	test_writer = tf.summary.FileWriter(LOG_DIR + '/test',
	graph=tf.get_default_graph())

	sess.run(tf.global_variables_initializer())

	for i in range(10000):

	batch_x, batch_y = mnist.train.next_batch(batch_size)
	# Reshape data to get 28 sequences of 28 pixels
	batch_x = batch_x.reshape((batch_size, time_steps, element_size))
	summary, _ = sess.run([merged, train_step],
	feed_dict={_inputs: batch_x, y: batch_y})
	# Add to summaries
	train_writer.add_summary(summary, i)

	if i % 1000 == 0:
	acc, loss, = sess.run([accuracy, cross_entropy],
	feed_dict={_inputs: batch_x,y: batch_y})
	print("Iter " + str(i) + ", Minibatch Loss= " +
	"{:.6f}".format(loss) + ", Training Accuracy= " +
	"{:.5f}".format(acc))
	if i % 100 == 0:
	# Calculate accuracy for 128 mnist test images and
	# add to summaries
	summary, acc = sess.run([merged, accuracy],
	feed_dict={_inputs: test_data,y: test_label})
	test_writer.add_summary(summary, i)

	test_acc = sess.run(accuracy, feed_dict={_inputs: test_data,y: test_label})
	print("Test Accuracy:", test_acc)

view raw rnn_minist_04_train_and_test.py hosted with ❤ by GitHub

결과

아래처럼 정확도는 97퍼센트로 나쁘지 않습니다.
Iter 0, Minibatch Loss= 2.302837, Training Accuracy= 7.03125
Iter 1000, Minibatch Loss= 1.187517, Training Accuracy= 56.25000
Iter 2000, Minibatch Loss= 0.601766, Training Accuracy= 82.81250
Iter 3000, Minibatch Loss= 0.380846, Training Accuracy= 85.93750
Iter 4000, Minibatch Loss= 0.118226, Training Accuracy= 97.65625
Iter 5000, Minibatch Loss= 0.102771, Training Accuracy= 96.87500
Iter 6000, Minibatch Loss= 0.083136, Training Accuracy= 97.65625
Iter 7000, Minibatch Loss= 0.153865, Training Accuracy= 95.31250
Iter 8000, Minibatch Loss= 0.074156, Training Accuracy= 98.43750
Iter 9000, Minibatch Loss= 0.049012, Training Accuracy= 98.43750
Test Accuracy: 97.65625

텐서플로우 내장 함수

아래 코드는 텐서플로우에서 내장되어 있는 RNN Core 코드 입니다. 위의 직접 구현한 코드에서 Hidden State 및 Output 부분을 지원 해 줍니다. 관련 코드

	rnn_cell = tf.contrib.rnn.BasicRNNCell(hidden_layer_size)
	outputs, _ = tf.nn.dynamic_rnn(rnn_cell, _inputs, dtype=tf.float32)

view raw rnn_mnist_05_builtin.py hosted with ❤ by GitHub

RNN의 문제점, LSTM

위의 RNN의 Hidden State의 구조를 보면, Hidden State✖Wh의 곱연산 입니다.

이로 인하여 순차 데이터가 들어간다고 하지만, wh가 1보다 작으면 영향력이 점점 작아질 수 있고, 1보다 크면 무한이 커진다는 단점이 있을 텐데 그를 해소하기 위한것이 LSTM(Longest Short Term Memory) 입니다.
LSTM의 구조는 자세히 공부하지 않아서 다음 기회로~

LSTM

양방향 RNN, Bidirectional RNN

문장을 순차적으로 읽는것도 중요하지만, 전후관계에 따라서 해석이 달라 질 수 있습니다. 예를 들어 빈칸 찾기 문제가 적절한데요. 그러기 위해서 이전 데이터와 이후 데이터를 모두 사용하는 양방향 RNN이 있습니다. 이는 RNN을 양쪽으로 만들어서 한 State에는 두개의 RNN이 영향을 주는 구조 입니다.
양방향 RNN의 경우 Kaggle의 답안중 높은 순위로 자주 나오곤 합니다.

Bidirectional RNN

이 블로그 검색

김띵준의 Programming Story