
Jetson Nano 설치 후 TensorFlow 설치 및 동작 확인
2022-10-05 last update
11 minutes reading Jetson JetsonNano TensorFlow소개
Raspberry PI와 다른 것은 GPU를 탑재하고 있다는 점이기 때문에 역시 기계 학습을 시도하고 싶다. 그러나 주어진 샘플은 너무 높기 때문에 우선 간단한 AND 회로를 만들어 보자.
TensorFlow 설치
Jetson Nano 용으로 공식 TensorFlow 1가 있으므로 설치합니다.
sudo apt-get install python3-pip libhdf5-serial-dev hdf5-tools
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.4 --user
다음 명령으로 확인할 수 있습니다.
python3 -c 'import tensorflow; print(tensorflow.__version__)'
현재 1.13.1이 설치되었습니다.
다른 관련 패키지에는 TensorBoard, Estimator가 설치됩니다. TensorRT는 사용한 적이 없으므로 별도 조사할 예정.
$ pip3 freeze | grep tensor
tensorboard==1.13.1
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1+nv19.4
tensorrt==5.0.6.3
소스 코드 작성
공식 TensorFlow에는 Keras도 붙어 있다는 것이므로 Keras를 사용했다. 적절한 편집기를 사용하여 다음 and.py
를 만듭니다.
and.pyimport numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Model
model = Sequential()
model.add(Dense(1, input_shape=(2, ), activation='sigmoid'))
model.compile(loss='mse', optimizer='adam', metrics=['acc'])
model.summary()
# Training
x_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) # N x 2
y_train = np.array([0, 0, 0, 1]).reshape(-1, 1) # N x 1
model.fit(x_train, y_train, epochs=3000, verbose=True)
# Evaluation
x_test = x_train
y_test = y_train
score = model.evaluate(x_test, y_test, verbose=False)
print('Test score:', score[0])
print('Test accuracy:', score[1])
실행
만든 프로그램은 python3을 사용하여 시작한다.
$ python3 and.py
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/keras/utils/losses_utils.py:170: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 3
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-05-20 01:36:57.560684: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-05-20 01:36:57.561667: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15dbc480 executing computations on platform Host. Devices:
2019-05-20 01:36:57.561737: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): <undefined>, <undefined>
2019-05-20 01:36:57.627701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-05-20 01:36:57.627993: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15e9b700 executing computations on platform CUDA. Devices:
2019-05-20 01:36:57.628052: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-05-20 01:36:57.628401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 598.47MiB
2019-05-20 01:36:57.628476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-20 01:36:58.625436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-20 01:36:58.625521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-05-20 01:36:58.625559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-05-20 01:36:58.625749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 130 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Epoch 1/3000
2019-05-20 01:36:59.260357: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
4/4 [==============================] - 1s 204ms/sample - loss: 0.2856 - acc: 0.5000
Epoch 2/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.2853 - acc: 0.5000
Epoch 3/3000
...
Epoch 2999/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Epoch 3000/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Test score: 0.08399419486522675
Test accuracy: 1.0
기타
실행중, 전력은 실행전 3.8W였던 것이 5.8~6.0W로 2W 정도의 증가였다. 실행 시간은 time 커맨드로 계측하면 이하와 같다.
$ time python3 add.py
...
real 0m47.031s
user 1m2.516s
sys 0m9.184s
Official TensorFlow for Jetson Nano !!! ↩
Jetson Nano 용으로 공식 TensorFlow 1가 있으므로 설치합니다.
sudo apt-get install python3-pip libhdf5-serial-dev hdf5-tools
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.4 --user
다음 명령으로 확인할 수 있습니다.
python3 -c 'import tensorflow; print(tensorflow.__version__)'
현재 1.13.1이 설치되었습니다.
다른 관련 패키지에는 TensorBoard, Estimator가 설치됩니다. TensorRT는 사용한 적이 없으므로 별도 조사할 예정.
$ pip3 freeze | grep tensor
tensorboard==1.13.1
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1+nv19.4
tensorrt==5.0.6.3
소스 코드 작성
공식 TensorFlow에는 Keras도 붙어 있다는 것이므로 Keras를 사용했다. 적절한 편집기를 사용하여 다음 and.py
를 만듭니다.
and.pyimport numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Model
model = Sequential()
model.add(Dense(1, input_shape=(2, ), activation='sigmoid'))
model.compile(loss='mse', optimizer='adam', metrics=['acc'])
model.summary()
# Training
x_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) # N x 2
y_train = np.array([0, 0, 0, 1]).reshape(-1, 1) # N x 1
model.fit(x_train, y_train, epochs=3000, verbose=True)
# Evaluation
x_test = x_train
y_test = y_train
score = model.evaluate(x_test, y_test, verbose=False)
print('Test score:', score[0])
print('Test accuracy:', score[1])
실행
만든 프로그램은 python3을 사용하여 시작한다.
$ python3 and.py
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/keras/utils/losses_utils.py:170: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 3
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-05-20 01:36:57.560684: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-05-20 01:36:57.561667: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15dbc480 executing computations on platform Host. Devices:
2019-05-20 01:36:57.561737: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): <undefined>, <undefined>
2019-05-20 01:36:57.627701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-05-20 01:36:57.627993: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15e9b700 executing computations on platform CUDA. Devices:
2019-05-20 01:36:57.628052: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-05-20 01:36:57.628401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 598.47MiB
2019-05-20 01:36:57.628476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-20 01:36:58.625436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-20 01:36:58.625521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-05-20 01:36:58.625559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-05-20 01:36:58.625749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 130 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Epoch 1/3000
2019-05-20 01:36:59.260357: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
4/4 [==============================] - 1s 204ms/sample - loss: 0.2856 - acc: 0.5000
Epoch 2/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.2853 - acc: 0.5000
Epoch 3/3000
...
Epoch 2999/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Epoch 3000/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Test score: 0.08399419486522675
Test accuracy: 1.0
기타
실행중, 전력은 실행전 3.8W였던 것이 5.8~6.0W로 2W 정도의 증가였다. 실행 시간은 time 커맨드로 계측하면 이하와 같다.
$ time python3 add.py
...
real 0m47.031s
user 1m2.516s
sys 0m9.184s
Official TensorFlow for Jetson Nano !!! ↩
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Model
model = Sequential()
model.add(Dense(1, input_shape=(2, ), activation='sigmoid'))
model.compile(loss='mse', optimizer='adam', metrics=['acc'])
model.summary()
# Training
x_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) # N x 2
y_train = np.array([0, 0, 0, 1]).reshape(-1, 1) # N x 1
model.fit(x_train, y_train, epochs=3000, verbose=True)
# Evaluation
x_test = x_train
y_test = y_train
score = model.evaluate(x_test, y_test, verbose=False)
print('Test score:', score[0])
print('Test accuracy:', score[1])
만든 프로그램은 python3을 사용하여 시작한다.
$ python3 and.py
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/keras/utils/losses_utils.py:170: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 3
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:From /home/yamamo-to/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-05-20 01:36:57.560684: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-05-20 01:36:57.561667: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15dbc480 executing computations on platform Host. Devices:
2019-05-20 01:36:57.561737: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): <undefined>, <undefined>
2019-05-20 01:36:57.627701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-05-20 01:36:57.627993: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x15e9b700 executing computations on platform CUDA. Devices:
2019-05-20 01:36:57.628052: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-05-20 01:36:57.628401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 598.47MiB
2019-05-20 01:36:57.628476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-20 01:36:58.625436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-20 01:36:58.625521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-05-20 01:36:58.625559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-05-20 01:36:58.625749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 130 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Epoch 1/3000
2019-05-20 01:36:59.260357: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
4/4 [==============================] - 1s 204ms/sample - loss: 0.2856 - acc: 0.5000
Epoch 2/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.2853 - acc: 0.5000
Epoch 3/3000
...
Epoch 2999/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Epoch 3000/3000
4/4 [==============================] - 0s 2ms/sample - loss: 0.0840 - acc: 1.0000
Test score: 0.08399419486522675
Test accuracy: 1.0
기타
실행중, 전력은 실행전 3.8W였던 것이 5.8~6.0W로 2W 정도의 증가였다. 실행 시간은 time 커맨드로 계측하면 이하와 같다.
$ time python3 add.py
...
real 0m47.031s
user 1m2.516s
sys 0m9.184s
Official TensorFlow for Jetson Nano !!! ↩
$ time python3 add.py
...
real 0m47.031s
user 1m2.516s
sys 0m9.184s