'개발' 카테고리의 글 목록

https://www.xilinx.com/support/documentation/application_notes/xapp1093-amp-bare-metal-microblaze.pdf

Cortex®-A9 processors and a MicroBlaze processor, each running its own bare-metal software application, and allowing each processor to communicate with the other through shared memory.
CA9랑 MB 프로세서 두개에 bare-metal sw app 각 각 올리고 커뮤니케이션으로 shared memory를 쓴다.

Zynq-7000 AP SoC는 PL (Programmable Logic)에 상주하는 MicroBlaze 프로세서에서도 액세스 할 수있는 공통 메모리 및 주변 장치를 공유하는 2 개의 Cortex-A9 프로세서를 제공합니다. (CA9는 둘 중 하나만 사용함)

AMP (Asymmetric Multiprocessing)는 여러 프로세서가 공유 리소스를 통해 해당 응용 프로그램을 느슨하게 연결할 수있는 자체 운영 체제 또는 베어 메탈 응용 프로그램을 실행할 수 있도록하는 메커니즘입니다.

In this AMP example, the bare-metal application running on CPU0 is the master of the system and is responsible for:
• System initialization • Releasing PL reset • Communicating with MB0 • Sharing the UART with MB0

The bare-metal application running on MB0 is responsible for:
• Communicating with CPU0 • Servicing interrupts from a core in the PL • Sharing the UART with CPU0

Zynq SoC PS에는 CPU0 전용 개인 리소스와 MB0에서 액세스 할 수있는 기타 리소스가 포함되어 있습니다. AMP 구성에서 디자인을 실행할 때 두 CPU가 이러한 공유 리소스에 대해 경합하지 않도록주의해야합니다.

Examples of some of the private resources are:
• L1 cache • Private peripheral interrupts (PPI) • Memory management unit (MMU) • Private timers

Examples of some of the shared resources are:
• Interrupt control distributor (ICD) • DDR memory • OCM • Global timer • Snoop control unit (SCU) and L2 cache • UART0

이 예에서 CPU0은 마스터로 취급되며 공유 자원을 제어합니다. MB0이 공유 자원의 제어를 요구해야하는 경우, 요청을 CPU0에 전달하고 CPU0이 자원을 제어하게해야합니다. 이 참조 디자인의 복잡성을 최소한으로 유지하기 위해 MB0에서 실행되는 베어 메탈 응용 프로그램은 공유 리소스에 대한 액세스를 제한합니다.

OCM은 두 프로세서가 서로 통신하는 데 사용됩니다. DDR 메모리와 비교할 때 OCM은 두 프로세서 모두에서 매우 높은 성능과 낮은 대기 시간 액세스를 제공합니다. 두 프로세서 모두에서 OCM에 대한 캐시 액세스를 비활성화하면 결정적 액세스가 더욱 보장됩니다.

공유 리소스 관련 문제를 방지하기 위해이 디자인에서 취한 조치는 다음과 같습니다.
• DDR memory: CPU0 has only been made aware of memory at 0x00100000 to 0x2FFFFFFF. MB0 uses DDR memory from 0x30000000 to 0x3FFFFFFF for its bare-metal application.
• OCM: Accesses to OCM are handled very carefully by each CPU to prevent contention. A single OCM address location is used as a flag to communicate between the two processors. CPU0 initializes the flag to 0 before starting MB0. When the flag is zero, CPU0 owns the UART. When the flag is not zero, MB0 owns the UART. Only CPU0 sets the flag and only MB0 clears it.

MicroBlaze Address Map
0x30000000–0x3FFFFFFF : PS DDR via MB0 cache accesses through the HP ports. Refer to Zynq-7000 All Programmable SoC Technical Reference Manual [Ref 1] for more details regarding access to the PS DDR from the HP ports.

Software : FSBL, Bare-metal app for CPU0/MB0

Inter-Processor Communication
예제 디자인에서 프로세서 간 통신은 세마포어 플래그입니다. 세마포어가 설정되면 MB0이 UART를 소유하고 MB0에 의해 지워지면 CPU0은 UART를 자유롭게 사용할 수 있습니다. 이것은 리소스를 공유하는 간단한 메커니즘입니다. 지연 시간이 짧은 공유 리소스이므로 OCM 메모리가 선택됩니다. 또한이 OCM 영역은 캐시되지 않으므로 메모리 액세스가 일관되고 결정적입니다.

용어	풀이
PL	programmable logic
HP	high-performance
PS	processing system
OCM	on-chip memory
VIO	virtual input/output
ILA	integrated logic analyzer
S_GP0	slave general-purpose port
MB0	MicroBlaze processor
FSBL	First stage boot loader

저작자표시 비영리 변경금지

'개발' 카테고리의 다른 글

tvm (0)	2019.12.21
stack (0)	2019.12.17
어셈블리어 (0)	2019.12.17
AArch64 generic timer (0)	2019.12.09
plot graph color (0)	2019.10.04

- Tensor Core란 무엇인가? https://www.nvidia.com/en-us/data-center/tensorcore/
. NVIDIA의 new Volta, Turing GPU Architecture에서 정의하는 feature인데, matrix multiplication이랑 convolution을 위한 거대한 boost를 주는 것임;
. mixed-precision을 사용해서 accuracy 감소 없이 높은 thrroughput을 달성하게 해줌
. Accelerate large matrix operations, perform mixed-precision matrix multiply and accumulate calculations in a single operation. = 대규모 매트릭트 연산 가속화, mixed-precision 매트릭스 곱 수행하고 단일 연산으로 계산 누적
. Enables massive increases in throughput and efficiency = 하나의 NVIDIA GPU에서 병렬적으로 수행되는 수백개의 텐서코어 연산으로 처리량과 효율성을 크게 향상시킬 수 있음
. TensorCore in TVM https://sampl.cs.washington.edu/tvmconf/slides/2019/L02-Siyuan-Feng.pdf
. TensorCores는 Hardware accelerators이다.
. Warp-level opration이다.
. New memory scope fragment
. WMMA : Warp-Level Matrix Multiply and Accumulate API (CUDA 9에서 소개)

전통적인 GPU schdule에서 global, shared, local memory scope를 가지고 있음.
https://docs.tvm.ai/tutorials/optimize/opt_conv_tensorcore.html
TensorCore를 원하기 위해서 3개의 특별한 메모리 스코프를 추가함 : wmma.matrix_a, wmma.matrix_bb, wmma.accumulator
하드웨어에서 모든 fragments scope는 온칩 레지스터 레벨에서 저장됨. 이거는 local memory와 동일함.

TensorCore는 특별한 하드웨어 오퍼레이션임. 그래서 TensorCore 명령어로 컴퓨터 유닛에 대체하기 위해서 tensorize를 사용할 수 있음. 먼저 Tensor intrinsic을 정의해야함.

TensorCore의 4개의 기본 operation : fill_fragment, load_matrix, mma_sync, store_matrix
fill_fragment랑 mma_sync는 둘다 매트릭스 곱셈에 사용되어짐.
그래서 샘플에서는 load, gemm, stora 이 세가지 intrinsic을 만듬! ir_builder로..

Warp-level operation이란?
TensorCore의 모든 명령어는 워프레벨 명령어입니다. 워프안에 모든 32개의 스레드 동시에 명령어를 해야합니다. threadidx.x = 32로 하는 것은 가장 쉬운 방법 중에 하나입니다. 그런 다음 threadldx.x를 직접 또는 간접적으로 TensorCore intrinsic이 포함된 루프를 제외한 모든 루프에 바인딩할 수있습니다. 이건 그냥 하나의 솔루션일 뿐 워프의 모든 스레드가 동시에 TensorCore를 호출할 수 있도록 해야한다.

Turing includes Tensor Cores, which are specialized hardware units designed for performing mixed precision matrix computations commonly used in deep learning neural network training and inference applications.

CUDA C++ makes Tensor Cores available via the Warp-Level Matrix Operations (WMMA) API. This API exposes specialized matrix load, matrix multiply and accumulate, and matrix store operations to efficiently use Tensor Cores from a CUDA-C++ program,

--------------------
https://docs.tvm.ai/tutorials/optimize/opt_matmul_auto_tensorcore.html
When the “tensor_core” pragma is set, the “rewrite for tensorcore” ir pass will automatically transform the schedule for tensorcore codegen, otherwise normal CUDA code, with lower performance but equal functionality, will be generated.

---------------
fireiron : 2018년부터해서 만들고 있다, 9개월정도 됨
GPU에서 linear algebra를 위한 고성능 도메인 스페시픽 랭귀지 DSL
Halid와 TVM을 기초로 계층적 스케쥴링 랭귀지
성능 최대를 위한 GPU 최적화를 표현하기 위해 디자인됨
Storage hierarchy(register, fragments, shared memory)와 compute hierarchy(threads, warps, blocks, kernels)의 요소들을 직접 표현할수 있다? 대신할 수 있다?
tensorcore와 machine level operation을 추론할 수 있다.
auto-scheduling과 auto-tuning에 적합하다.

matmul을 더 작은 부분으로 분해하기

저작자표시 비영리 변경금지

'개발' 카테고리의 다른 글

Simple AMP: Zynq SoC Cortex-A9 Bare-MetalSystem with MicroBlaze Processor (0)	2020.01.16
stack (0)	2019.12.17
어셈블리어 (0)	2019.12.17
AArch64 generic timer (0)	2019.12.09
plot graph color (0)	2019.10.04

STACK은 history 기능, 뭔가를 쌓는 구조의 Memory 영역, LIFO, 마지막에 집어넣은 Data가 가장 처음으로 나오는 Data

push는 stack에 자료를 집어넣는 용어, pop은 stack에서 자료를 빼는 것

어떻게 채워지느냐, 보통 Stack은 높은 주소에서 낮은 주소로 쌓아감.

Stack이 자라는 방향이 높은 주소로 자라는 것은 Ascending, 낮은 주소로 자라는 것은 Descending stack으로 분류

현재 stack pointer가 방금 push나 pop을 한 data를 포함하면 Full, 아니면 empty로 분류

stack pointer가 data를 넣은 후 변하느냐 아니면 먼저 변하고 sp가 먼저 변하느냐에 따라 After와 Before로 분류

stack에는 multiple register transfer addressing 명령어를 이용하여 push/pop

ST는 store = push, LD는 load = pop, M은 Multiple이라는 의미로 다 포함

IB는 Increase Before로서 sp를 data를 넣기 전에 증가 시킨다

IA는 Increase After로서 sp를 datas를 넣고 난 후에 증가 시킨다

DB는 Decrease Before로서 sp를 data에 넣기 전에 감소 시킨다

DA는 Decrease After로서 sp를 data 넣고 난 후에 감소

명령어 r9!, {r0, r1, r5} : R9가 가리키는 곳에 R0, R1, R5를 넣고 넣은 개수 3개 만큼 R9을 update하라는 의미

서브 루틴 호출 시 수행되는 일
1. 전달 인자와 돌아갈 주소를 스택에 push
2. 함수 호출(즉, pc를 불리워진 함수의 주소로 jump 시킴
3. 지역변수에 대하여 스택에 공간을 할당하는 일)
4. 호출된 함수를 수행하는 일
5. stack에서 부터 할당된 지역변수 저장고간의 해제
6. 돌아갈 주소를 stack으로부터 꺼내와 함수로부터의 복귀
7. 전달인자에 의해 사용되던 공간을 해제

Linked Register를 이용해서(R14), Branch하기 전에 돌아올 주소를 R1에 넣어두고 복귀할 때 R14를 pc에 넣고 돌아오는 mechanism 이용

저작자표시 비영리 변경금지

'개발' 카테고리의 다른 글

Simple AMP: Zynq SoC Cortex-A9 Bare-MetalSystem with MicroBlaze Processor (0)	2020.01.16
tvm (0)	2019.12.21
어셈블리어 (0)	2019.12.17
AArch64 generic timer (0)	2019.12.09
plot graph color (0)	2019.10.04

Multiple Register Transfer 명령어, ldmfd 같은 명령어를 쓸 때, 끝에다 ^를 붙여주면 SPSR을 CPSR로 넣어줌.

ldmfd sp!, {r1-12, pc}^

r0, r12, pc로 stack에 있는 값을 빼내오면서, ^을 이용해서 SPSR을 CPSR로 넣어주는 일도 함

LDM SP!, {PC}^ : SP에 저장되어있는 주소값을 PC에 넣을 때 '^'을 붙여주면 CPSR이 SPSR로부터 동시에 복원

SWI_Handler

stmfd sp!, {r0-r12, r14} : register들을 backup
--- 구현하고 싶은 내용 추가
ldmfd sp!, {r0-r12, pc}^ : register 들을 restore

저작자표시 비영리 변경금지

'개발' 카테고리의 다른 글

tvm (0)	2019.12.21
stack (0)	2019.12.17
AArch64 generic timer (0)	2019.12.09
plot graph color (0)	2019.10.04
0713 kernel (1)	2019.07.13

HCR_EL2 : configuration routing.. Exception state determined by..

Software can initiate a return from an exception by executing an ERET instruction from AArch64.
-> 예외에서 복귀하는 명령어 ERET

Vector Base Address Register, VBAR_ELx

The processor timers

- EL1 physical timer
- EL1 virtual timer

CNTPCT_EL0 system register reports the current system count value.
CNTFRQ_EL0 reports the frequency of the system count.

<timer>_CTL_EL<x> : Contorl register
<timer>_CVAL_EL<x> : Comparator value
<timer>_TVAL_EL<x> : Timer value
--> <timer> : EL1 physical timer는 CNTP, EL1 virtual timer는 CNTV가 prefix임

CNTKCTL_EL1 을 통해 EL physical, virtual timer들은 컨트롤 될 수 있음

타이머를 설정하는건 두가지 방법이 있어. 하나는 CVAL을 사용하는거고, 다른 하나는 TVAL을 사용하는거지.
CVAL 레지스터는 64bit고, 이 레지스터에 count가 도달하거나 초과하는 경우에 트리거를 발생시키는건가?
TVAL은 32bit고, 소프트웨어가 TVAL을 쓸 때, 프로세서는 현재 시스템 카운터를 읽고, 쓴 값을 더해서 CVAL 값을 구함.

CTL 레지스터에서 아래 필드는..
• ENABLE – Enables the timer.
• IMASK – Interrupt mask. Enables or disables interrupt generation.
• ISTATUS – When ENABLE==1, reports whether the timer is firing (CVAL <= System Count).
-> To generate an interrupt, software must set ENABLE to 1 and clear IMASK to 0.

The interrupt ID (INTID)
- EL1 physical timer : 30
- EL1 vitrual timer : 27
-> Private Peripheral Interrupt (PPI) range.

저작자표시 비영리 변경금지

'개발' 카테고리의 다른 글

stack (0)	2019.12.17
어셈블리어 (0)	2019.12.17
plot graph color (0)	2019.10.04
0713 kernel (1)	2019.07.13
컴파일러 (0)	2019.01.08

* ARM processor

- word : CPU가 한번에 처리할 수 있는 크기

- Thumb mode : 32 bit ARM에서 돌아가는 16bit 기계어, ARM 명령어들을 16bit로 압축한 명령어 set

- ARM Mode

Mode		약자	설명
Normal mode	User	USR	Normal Program execution mode
Privileged mode	System	SYS	Run privileged operating system tasks
	Fast Interrupt Request	FIQ	When a high priority fast interuppt is raised
	Interrupt Request	IRQ	When a low priority normal interrupt is raised
	Supervisor	SVC	A protected mode for the operaing system, entered when a SWI(소프트웨어인터럽트) instruction is executed
	Abort	ABT	Used to handle memory access violations
	Undefined Instruction	UND	Used to handle undefined instructions

- Privileged Mode는 IRQ나 FIQ 등의 Interrupt의 사용 가능 유무를 직접 설정할 수 있다.

- Privileged Mode는 자기들끼리, 자기네들 스스로 서로 Mode 변경이 가능하다. Normal Mode는 자기 스스로 Privileged Mode로 Mode 변경이 불가능하다.

- ARM의 default mode는 SCV mode이다. 여기서 출발해야 boot up시에 ARM에 대한 모든 권한을 행사할 수 있으니!

* Compile

*.c

arm-elf-gcc

C compiler

*.o

linker.ld

arm-elf-gcc

or

arm-elf-ld

Linker

*.elf

arm-elf-objcopy

Object copy

*.bin

*.s

arm-elf-as

Assembly compiler

- elf : 헤더, 바이너리, 심볼 3개의 파트로 나뉘어짐
. 헤더 : 파일의 구성을 나타내는 로더맵 역할
. 바이너리 : 실제 타켓 플래시 메모리에 올라가는 실행할 수 있는 코드로 구성
. 심볼 : 디버깅을 위한 정보로 개발자가 정의한 함수나 변수들이 실제 메모리 주소와 일치가 되도록 매핑

* Startup.S

- 부팅 과정 중 가장 첫번째 단계를 맏고 있는 파일로 *.s 어셈블리언어로 만든 소스
. 왜 어셈블리 언어로?
1. 인터럽트 처리는 개발자 몫인데, 인터럽트는 자주 발생되니까 최대한 빨리 적게 개발하는 방법으로 실행 코드 수를 줄이는 방법으로 선택
2. 메모리 컨트롤러 레지스터 설정과 스택 주소 할당을 위해서. main 함수도 C언어로 만들었기에 스택 영역이 필요하고, 스택 할당 전 SDRAM이 읽고 쓰기가 가능해야하기 때문에 스택 영역 할당 전 메모리 컨트롤러 레지스터에 데이터 값을 넣어 초기화.

- 익셉션 벡터, 인터럽트 disable, PPL(Phase Looked Loop), SDRAM 초기화, 스택 영역 할당, 변수 초기화, C언어의 main 함수 진입 등을 어셈블리 언어로 만든다.

- 익셉션 벡터 : 임베디드 시스템이 리셋되거나 인터럽트 발생시 ARM core는 고정된 주소로 점프한 후 개발자가 지정한 주소로 다시 점프해서 프로그램이 흘러가도록 함

- 인터럽트 disable : 임베디드 시스템 부팅 초기가 매우 중요하기 때문에 그 순간 만큼은 인터럽트를 받지 않기 위함

- PLL 설정 : CPU와 각종 디바이스에 클럽 설정하기 위함

- 메모리 클러스터 설정 : CPU가 SDRAM 메모를 읽기, 쓰기할 수 있도록

- main 함수 진입 : CPU가 SDRAM에 읽기 쓰기가 가능해지면 C 프로그램이 동작하기 위한 스택 영역 지정 후 C언어의 main으로 점프한다.

- startup.s와 main.c 파일을 컴파일하면 누가 먼저 실행되어야할지를 linker.ld라는 링크 스크립트에서 결정할 수 있다. : ENTRY(_start) -> startup.s에 있는 레이블

* Makefile

- 빌드 과정을 한번 만에 할 수있도록 제공

- 프리웨어인 ARM-GCC, 상용 컴파일러 ADS

- startup.o와 main.o를 가지고 링크 시켜야함, 이 과정 후 *.elf 파일 생성, 링크시켜주는 파일은 arm-elf-ld인데, 컴파일러 버전에 따라 arm-elf-gcc에서도 링크 역할해줌

- 구성 : source file name / compiler name / assembler compiler option / c compiler option / compile command

## File Definition ##

PRJ = general

INIT1 = Startup

CM1 = main

## Destination path Definition ##

PRE = arm-elf

SOURCE=./sources/

## ARM tool Definition ##

ARMASM = $(PRE)-as

ARMCC = $(PRE)-gcc

...

## Option Definition ##

AFLAGS = -marm9tdmi -EL -M --gdwarf2

CFLAGS = -B$(GCC_EXEC_PREFIX) -g -gdwarf-2 -Oo -c -mcpu=arm9tdmi -mlittle-endian -mapcs-frame -mno-apcs-stack-check

LFLAGS = -Bstatic -nostartfiles -Xlinker --script=linker.ld -lc

OBJS = $(INIT1).o $(CM1).o

$(PRJ).elf: $(OBJS)

$(ARMLINK) $(LFALGS) -o $(PRJ).elf $(OBJS)

$(ARMOBJCOPY) $(PRJ).elf --output-target=binary $(PRJ).bin

..

all: clean $(PRJ).elf

clean:

rm -f *.o ......

* 링커 스크립트(Linker Script) 파일

- 컴파일을 하고 나면 오브젝트 파일이 만들어 질 때 4개의 독립된 영역으로 나누어 진다 : 코드 / 데이터 / 힙/ 스택

- 링크 스크립트는 섹션 영역의 start와 end 영역을 지정, 개발자가 링크 스크립트를 만들 때 임베디드 제품의 메모리 맵을 확인한 다음 각 섹션 영역을 어떻게 나눌 것인지 결정한 후 만든다.

- 일반적으로 .text_start와 .data_start 그리고 .bss_start 주소를 지정해주면 된다.

- arm-elf-size general.elf : 해당 섹션 영역의 사이즈를 알 수 있다.

- 어떤 컴파일을 사용하느냐에 따라 스케트(Scatter) 파일 또는 링크 스크립트 파일이라고 부른다.

- 만드는 방법
1. MCU 데이터 시트를 열어서 메모리 맵 확인 : 플래시 메모리 시작 주소와 사이즈, SDRAM 시작 주소와 사이즈 확인
2. 메모리 영역을 어떻게 나눌 것인지 결정 : 플래시 메모리에는 코드영역과 데이터영역 지정, SDRAM 영역은 주로 BSS(Block Started Segment)의 시작 주소 지정

- 힙 영역은 링크 스크립트에서 따로 지정하지 않아도 됨. malloc 함수 자체가 자동으로 .bss_end+4 주소 이후로 잡아준다.

-ARM 프로세서에서 스택은 상위주소에서 하위주소로 증가, Heap은 하위 주소에서 상위 주소로 증가.

- 스택 영역 시작 주소 지정은 linker.ld에서 하지 않고 startup.s에서 지정

* RTOS, Embedded OS

- Kernel : Context Switing, Scheduling, Memory Management, ISR Management 등을 담당하는 부분을 따로 떼어서 Kernel이라고 부른다. Task, Process 등이 Kernel에게 Service를 받기 위해 Kernel API를 부르던가, Interrupt가 걸려서 Interrupt 처리할 때 Interrupt 처리 routine의 끝에 Kernel API가 불리던가 하는 등으로 Kernel 활성화

저작자표시 비영리 변경금지

'개발 > System' 카테고리의 다른 글

특정 파일 UTF8 ↔ ANSI 변환 (0)	2013.08.24
UNIX의 디렉토리 (0)	2012.10.29
파일 정보의 획득 (0)	2012.10.29
다수의 이름을 갖는 파일 (0)	2012.10.29
access, chmod, chown 시스템 호출 (0)	2012.10.29

plot graph 색깔입히기!!

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec

########## color
background = '#272822'
fontcolor = '#737373' #'w'
#actorcolor = ['#4BB2F2', '#F2766B']
actorcolor = ['#66d9ef', '#f92772']

########## figure
fig = plt.figure(figsize=(10, 5))
fig.suptitle('Realtime Performance Graph', fontsize=17, fontweight='bold', color='w')
fig.set_facecolor(background)
gs = gridspec.GridSpec(1, 2, width_ratios=[2, 3]) 
plt.rcParams['text.color'] ="w"

########## first graph
#data
height = [3, 12]
bars = ('A', 'B')
y_pos = np.arange(len(bars))

ax1 = fig.add_subplot(gs[0])
ax1.title.set_text('First Plot')
ax1.set_facecolor("None")
ax1.set_xlabel("FPS", color='w')
ax1.set_ylabel("Second", color='w')
ax1.tick_params(color='w', labelcolor='w')
ax1.set_xticklabels(['A', 'B'], color='w')
ax1.set_xticks(y_pos)
for spine in ax1.spines:
    ax1.spines[spine].set_color('w')

ax1.bar(y_pos, height, 0.5, color=actorcolor)

########## second graph
# data
x2 = range(60,300)
y2 = [v*v for v in x2]
x3 = range(60,300)
y3 = [v*v for v in y2]

ax2 = fig.add_subplot(gs[1])
ax2.title.set_text('Second Plot')
ax2.set_facecolor("None")
ax2.set_xlabel('time', color='w')
ax2.set_ylabel('Realtime', color='w')
ax2.tick_params(color='w', labelcolor='w')
for spine in ax2.spines:
    ax2.spines[spine].set_color('w')

ax2.plot(x2, y2, color=actorcolor[0])
ax2.plot(x3, y3, color=actorcolor[1])

########## show
fig.tight_layout()
plt.subplots_adjust(top=0.85)
plt.show()

저작자표시 비영리 변경금지

'개발' 카테고리의 다른 글

어셈블리어 (0)	2019.12.17
AArch64 generic timer (0)	2019.12.09
0713 kernel (1)	2019.07.13
컴파일러 (0)	2019.01.08
[AWS] 스케일업, 스케일다운: 오토스케일링과 클라우드와치 (0)	2018.10.13

https://www.quora.com/How-is-reinforcement-learning-related-to-genetic-algorithms

How is reinforcement learning related to genetic algorithms?

Answer (1 of 4): Reinforcement learning (RL) attempts to maximise the expected sum of rewards (as per a pre-defined reward structure) obtained by the agent. It does so by learning a value (or action-value) function which is updated using information obtain

www.quora.com

RL은 에이전트로부터 얻은 reward의 합계를 최대하기 위해 시도한다. Q function을 사용해서.

GA는 무작위로 생성된 solution으로 시작해서 자연 선택 원리를 사용하여 솔루션을 발견한다. 그 선택은 다음 세대에 선발될 확률이 높아진다. mutation이나 crossover 영향을 받는다.

GA는 heuristics, RL은 gradient-based update

experience replay는 RL에만 있다. (stored and recalled)

둘 다 nature-inspired, gooood soultion을 찾기 위해 시도함

저작자표시 비영리 변경금지

'개발 > Deep learning' 카테고리의 다른 글

Stock Price Prediction \| AI in Finance (0)	2019.04.22
TF-Lite uses gemmlowp for matrix multiplication (0)	2019.04.18
Quantization and Training of Neural Networks for EfficientInteger-Arithmetic-Only Inference (0)	2019.04.18
밑바닥 딥러닝_7장 합성곱 신경망(CNN) (1)	2019.01.27

어셈블리 기초 http://blog.naver.com/PostView.nhn?blogId=gangst11&logNo=145839687

캐시레지스터 http://jake.dothome.co.kr/cache2/

- msr : 시스템 레지스터는 MSR, MRS 인스트럭션을 사용하여 접근할 수 있다.
msr SPsel , #1 // 1을 stack pointer seletor에 1 값을 쓴다.
. The Processor State, or PSTATE fields, for AArch64 have the following definitions:
SP : Stack pointer selector, 0 = SP_EL0, 1 = SP_ELn
(https://developer.arm.com/docs/100878/latest/processor-state)

- mrs : 레지스터에 값을 복사해주는 명령
mrs x0, CurrentEL // x0에 CurrentEL(Current Exception Level)을 읽어온다

- CPSR : Current Program Status Register
현재 프로그램 상태 레지스터
연산결과를 저장하는 CPSR(Current Program Status Register)의 4개 필드

N (Negative) : 연산결과가 음수인 경우 1로 SET.
Z (Zero) : 연산결과가 0인 경우 1로 SET.
C (Carry) : 덧셈 연산의 Carry일 경우 1로 SET. 뺄셈 연산의 Borrow일 경우 0으로 Clear. rotation shift 연산에서는 shift되어 밀린 1 bit를 저장.
V (Overflow) : signed 연산의 덧셈, 뺼셈에서 연산 결과로 overflow가 발생한 경우 1로 SET

- cmp : x0과 CurrendEL_EL2의 값 비교
cmp x0, #CurrentEL_EL2 // CurrentEL_EL2의 값은 8

- b.eq : branch on equal 약어. 같이 같으면 분기, condition 부분에 .eq(매크로일수도)
b.eq 1f // f = forward, b = backward, 가까운 1: 으로 이동

- mov_q : 64bit register에 상수를 즉시 move, operand bit가 한정되어있어서 인스트럭션 두개로 나눔
mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)

- SCTLR : The System Control Register (SCTLR) is a register that controls standard memory, system facilities and provides status information for functions that are implemented in the core.
(https://developer.arm.com/docs/100878/latest/the-system-control-register)

- mov : 왼쪽에 있는 값을 오른쪽으로 복사한다.
mov w0, #BOOT_CPU_MODE_EL1

- isb : Instruction Synchronization Barrier
isb 명령이 동작하는 순간 파이프라인으로 인해 다음 명령이 뒤 따라 들어오게 되는데 이를 모두 버리게한다. 파이프라인 flush. 우선순위를 확실히 구분해야하는 루틴에서는 두 명령 사이에 isb를 실행시켜 두 명령어의 실행순서를 명확히 보장한다.

- adrp : Address of 4KB page at a PC-relative offset. PC에서부터 _PHYS_OFFSET(0x82000)을 더하여 x23에 저장
adrp x23, __PHYS_OFFSET

- adr_l : macro, adrp/add
adr_l x1, __boot_cpu_mode

- adrp : From PC-relative address to 4KB page. 페이지 시스템에서 사용하기위한?

- ADR은 ARM 명령어가 아니라 PC의 상대 주소를 계산하여 주소값을 얻어오는 어셈블러 매크로입니다.
adr r0, adr_test // r0 = adr_test 함수의 주소를 복사한다?

- str : 저장 명령어
str w0, [x1] : x1의 주소에 w0을 저장한다.
[Rm] = Rd. 메모리 주소에 레지스터 값을 저장합니다.

- ldr : 로드 명령어
ldr r1, [r2, #16] : r2에 16byte만큼 더한 주소에서 정수형 값을 읽어와 r1에 저장한다.
Rd = [Rm]. 해당 메모리 주소의 값을 레지스터에 저장합니다.

- dmb : http://jake.dothome.co.kr/barriers/
dmb sy

- dc :
dc ivac, x1

- __create_page_tables : head.S line 287

- TTBR : Translation Table Base Register, 페이지 테이블 주소가 저장되는 레지스터

저작자표시 비영리 변경금지

'개발' 카테고리의 다른 글

AArch64 generic timer (0)	2019.12.09
plot graph color (0)	2019.10.04
컴파일러 (0)	2019.01.08
[AWS] 스케일업, 스케일다운: 오토스케일링과 클라우드와치 (0)	2018.10.13
AWS dynamodb (0)	2018.09.04

https://www.youtube.com/watch?v=7vunJlqLZok

70년대부터 월가에서 시장의 가격 예측을 위한 예측 모델을 사용했다
금융 데이터 포인터 : 재무자료, 배당금, 세금, 주가
시간 분석으로 회귀 분석 해야한다
분류하는 여러 방법. 분류 문제/회귀
실무적회계/재무보고/배당조정/화해
금융 산업에서 인공지능 시장의 규모는 2017년 15조원에서 22년에는 80조원까지 오를 것으로 예상 -> 시장이 커진다!
인공지능이 핀테크 시장에 쓰이는 방법에 대한 지도 : 신용 평가, 밀레니얼 세대를 위한 개인융자, 금융 조언, 보안분야(오탐률, 사기로 분류된 거래 등 오분류, 비정상 금융거래탐지), 신용 대출, 자산관리, ...
회귀 모델은 수치형 데이터만 사용 가능하다
NLTK, Text blob 라이브러리로 문자로 감정/의미/생각 분석 가능, 좋다 나쁘다~~
분류 문제의 결과가 회귀 모델의 데이터로 들어갈 수도 (수치형)
LSTM : 주가 분석에 월등한 성능을 보이는 모델, 먼 미래까지 순차적으로 예측
강화 학습 : 시행착오를 통하여 학습

저작자표시 비영리 변경금지

'개발 > Deep learning' 카테고리의 다른 글

RL & GA (0)	2019.10.01
TF-Lite uses gemmlowp for matrix multiplication (0)	2019.04.18
Quantization and Training of Neural Networks for EfficientInteger-Arithmetic-Only Inference (0)	2019.04.18
밑바닥 딥러닝_7장 합성곱 신경망(CNN) (1)	2019.01.27

배움의 과정 : 실천

개발

Simple AMP: Zynq SoC Cortex-A9 Bare-MetalSystem with MicroBlaze Processor

'개발' 카테고리의 다른 글

tvm

'개발' 카테고리의 다른 글

stack

'개발' 카테고리의 다른 글

어셈블리어

'개발' 카테고리의 다른 글

AArch64 generic timer

'개발' 카테고리의 다른 글

임베디드 스케치 공부

'개발 > System' 카테고리의 다른 글

plot graph color

'개발' 카테고리의 다른 글

RL & GA

'개발 > Deep learning' 카테고리의 다른 글

0713 kernel

'개발' 카테고리의 다른 글

Stock Price Prediction | AI in Finance

'개발 > Deep learning' 카테고리의 다른 글

+ Recent posts

티스토리툴바