Quantization and Training of Neural Networks for EfficientInteger-Arithmetic-Only Inference

Miss_Baker 2019. 4. 18. 03:00

2019. 4. 18. 03:00

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

2. Quantized Inference

2.1 Quantization scheme

q : quantized value
- 8bit quantization에서
  - q is quantized as an 8-bit integer
  - Some arrays, typically bias vectors, are quantized as 32-bit integers
    - 32bit 쓰는 이유는 bias에서 오차가 생길 경우 패널티가 더 심하기 때문
r : real value
S : scale, 보통 floating point로 표현되고, 이를 피하는 방법이 2.2에 설명되어있다.
Z : zero-point, real value 0과 매칭되는 quantized value
- zero-padding 등 효율적인 구현을 위해 필요하다.

2.2 Integer-arithmetic-only matrix multiplication

M0은 [0.5, 1) 사이의 값이고, n은 non-negative integer(0을 포함한 자연수, 음이 아닌 정수)이다.
- Half-Closed Interval, [은 include, )는 not inclued
normalized multiplier M0은 fixed-point multiplier로 표현되기 적합하고, 2^-n은 bit-shift로 구현될 수 있다.

2.3 Efficient handling of zero-points

2.4 Implementation of a typical fuesd layer

행렬곱에 bias-addition, activation function evaluation을 추가한 fused layer를 위해 (7) 식을 수정할것이다.
Inference code에서도 training 때 사용한 "fake quantization" operator와 동일하게 매치되어야한다.
q1 행렬은 weights, q2 행렬은 activation, 둘 다 uint8 타입을 가진다고 하면, 곱셈 결과 타입은 signed 32bit.
왜냐하면 bias-vector가 int32로 quantization되었기 때문 (bias의 zero-point는 0으로 사용한다.)
(9) 식을 아래와 같이 나타낼 수 있다

그 다음으로,
1. scale down (8bit output activations을 위해)
2. cast down (uint8로)
3. apply the activation function (최종 8bit output 생성!)

3. Training with simulated quantization

3.1 Learning quantization ranges

For each layer, quantization is parameterized by the number of quantization levels and clamping range, and is performed by applying point-wise the quantization function q defined as follows:

의미 :
- r : real value
- [a; b] : quantization range
- n : the number of quantization levels
- ⌊·⌉ : rounding to the nearest integer
weight quantization과 activation quantization의 quantization range는 다르게 다루어진다.

[ 참고 ]

RL & GA (0)	2019.10.01
Stock Price Prediction \| AI in Finance (0)	2019.04.22
TF-Lite uses gemmlowp for matrix multiplication (0)	2019.04.18
밑바닥 딥러닝_7장 합성곱 신경망(CNN) (1)	2019.01.27

배움의 과정 : 실천