When building a house, the choice of wood is a problem.
A carpenter's goal is essentially to carry a good cutting tool. When he has time, he sharpens his equipment.
{-:} - Miyamoto Musashi (wulunshu)
This article is excerpted from Chapter 2 of Python big data analysis (version 2)
For new Python users, python deployment seems simple. Rich optional installation libraries and packages are also easy to install. First, Python is more than one, and it has many different "flavors," such as CPython, Jython, IronPython, and PyPy. Then Python 2.7 and 3.x are different worlds. [1]
Even after you decide to use the version, deployment is difficult for the following reasons:
- The interpreter (standard CPython installation) has only so-called "standard libraries" (for example, including typical mathematical functions);
- Optional Python packages must be installed separately. There are hundreds of them;
- It may be difficult to compile / build these nonstandard packages on your own due to the dependency and the specific requirements of the operating system;
- It is tedious and time-consuming to pay attention to dependency and version consistency (that is, maintenance work) at any time;
- The update and upgrade of some packages may need to recompile many other packages;
- Updating or replacing a package can cause trouble elsewhere (in many places).
Fortunately, we can turn to tools and strategies. This chapter describes the following techniques that can help with Python deployment.
Package manager
Package managers such as pip and conda are used to install, update, and remove Python packages; they also help keep versions of different packages consistent.
Virtual Environment Manager
Virtual environment managers such as virtualenv or conda can manage multiple Python installations in parallel (for example, installing Python 2.7 and 3.7 on a single machine to test the latest development version of an interesting Python package without risk). [2]
container
The Docker container represents a complete file system that contains all the system components (such as code, runtime libraries, or system tools) needed to run a specific software. For example, you can run Ubuntu 18.04 in a Docker container on a machine running Mac OS or windows 10, where you can install Python 3.7 and separate Python code.
Cloud instance
Deploying Python code for financial applications usually requires high availability, security and performance; these requirements can only be met with professional computing and storage infrastructure. At present, this kind of infrastructure exists in the form of good cloud instances, ranging from small scale to different scale. Compared with the long-term leased dedicated server, one of the advantages of cloud instance (i.e. virtual server) is that users usually only need to pay according to the actual use time; the other advantage is that this cloud instance can be put into use in a minute or two, which helps agile development and improves scalability.
The structure of this chapter is as follows.
Conda as package manager
This section describes Conda used as a Python package manager.
Conda as virtual environment manager
This section focuses on the functions of conda as a virtual environment manager.
Using Docker containers
This section provides a brief overview of Docker containerization technology, focusing on building Ubuntu containers with Python 3.7 installation.
Using cloud instances
This section describes how to deploy Python and Jupyter Notebook in the cloud, which is a browser based and powerful Python deployment tool suite.
The goal of this chapter is to build a proper Python environment on a professional infrastructure, including the most important tools as well as numerical, data analysis and visualization software packages (Libraries). Later, this combination will serve as the backbone for implementing and deploying Python code in subsequent chapters, whether it's interactive financial analysis code or in the form of scripts and modules.
1.1 conda used as package manager
conda can be installed separately, but one of the more efficient ways is through Miniconda, a minimal Python distribution that includes conda and uses it as a package and as a virtual environment manager.
1.1.1 installation of Miniconda
Miniconda is available for Windows, Mac OS, and Linux. You can download different versions from the miniconda web page. The following assumes that Python 3.7 64 bit is installed. The main example in this section is a session in the Docker container based on Ubuntu. Download the Linux 64 bit installer through wget, and then install miniconda. The following code should work on any Linux or Mac OS machine (with a few changes, perhaps).
$ docker run -ti -h py4fi -p 11111:11111 ubuntu:latest /bin/bash root@py4fi:/# apt-get update; apt-get upgrade -y ... root@py4fi:/# apt-get install -y bzip2 gcc wget ... root@py4fi:/# cd root root@py4fi:~# wget \ > https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \ > -O miniconda.sh ... HTTP request sent, awaiting response... 200 OK Length: 62574861 (60M) [application/x-sh] Saving to: 'miniconda.sh' miniconda.sh 100%[====================>] 59.68M 5.97MB/s in 11s 2018-09-15 09:44:28 (5.42 MB/s) - 'miniconda.sh' saved [62574861/62574861] root@py4fi:~# bash miniconda.sh Welcome to Miniconda3 4.5.11 In order to continue the installation process, please review the license agreement. Please, press ENTER to continue >>>
Press Enter to begin the installation process. After reading the license agreement, click the yes button to agree to the terms.
... Do you accept the license terms? [yes|no] [no] >>> yes Miniconda3 will now be installed into this location: /root/miniconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/root/miniconda3] >>> PREFIX=/root/miniconda3 installing: python-3.7. ... ... installing: requests-2.19.1-py37_0 ... installing: conda-4.5.11-py37_0 ... installation finished.
After you agree to the license terms and confirm the installation location, you should click the "yes" button again to allow Miniconda to attach the new Miniconda installation location to the PATH environment variable.
Do you wish the installer to prepend the Miniconda3 install location to PATH in your /root/.bashrc ? [yes|no] [no] >>> yes Appending source /root/miniconda3/bin/activate to /root/.bashrc A backup will be made to: /root/.bashrc-miniconda3.bak For this change to become active, you have to open a new terminal. Thank you for installing Miniconda3! root@py4fi:~#
After that, you may want to upgrade conda and Python. [3]
root@py4fi:~# export PATH="/root/miniconda3/bin/:$PATH" root@py4fi:~# conda update -y conda python ... root@py4fi:~# echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc root@py4fi:~# bash
After this fairly simple installation process, you can use the basic Python installation and conda. The basic Python installation includes some good features, such as the SQLite3 database engine. After you attach the relevant path to the corresponding environment variable (as you did earlier), you can test whether Python can be started from a new Shell instance:
root@py4fi:~# python Python 3.7.0 (default, Jun 28 2018, 13:15:42) [GCC 7.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> print('Hello Python for Finance World.') Hello Python for Finance World. >>> exit() root@py4fi:~#
2.1.2 conda basic operation
conda can be used to efficiently handle the installation, update, and deletion of Python packages. Here is an overview of the main features.
Install Python x.x version
conda install python=x.x
Update Python
conda update python
Install package
conda install $PACKAGE_NAME
Update package
conda update $PACKAGE_NAME
Remove package
conda remove $PACKAGE_NAME
Update conda
conda update conda
Search package
conda search $SEARCH_TERM
List installed packages
conda list
With these features in mind, installing numpy, one of the most important libraries in the so-called "science stack," requires only one command. When installing on a machine with Intel processor, the Intel Math Kernel Library (MKL) will be installed automatically. This library can not only speed up numpy numerical operation, but also play a role in several other Python science libraries. [4]
root@py4fi:~# conda install numpy Solving environment: done ## Package Plan ## environment location: /root/miniconda3 added / updated specs: numpy The following packages will be downloaded: Package | build ---------------------------|----------------- mkl-2019.0 | 117 204.4 MB intel-openmp-2019.0 | 117 721 KB mkl_random-1.0.1 | py37h4414c95_1 372 KB libgfortran-ng-7.3.0 | hdf63c60_0 1.3 MB numpy-1.15.1 | py37h1d66e8a_0 37 KB numpy-base-1.15.1 | py37h81de0dd_0 4.2 MB blas-1.0 | mkl 6 KB mkl_fft-1.0.4 | py37h4414c95_1 149 KB ------------------------------------------------------------ Total: 211.1 MB The following NEW packages will be INSTALLED: blas: 1.0-mkl intel-openmp: 2019.0-117 libgfortran-ng: 7.3.0-hdf63c60_0 mkl: 2019.0-117 mkl_fft: 1.0.4-py37h4414c95_1 mkl_random: 1.0.1-py37h4414c95_1 numpy: 1.15.1-py37h1d66e8a_0 numpy-base: 1.15.1-py37h81de0dd_0 Proceed ([y]/n)? y Downloading and Extracting Packages mkl-2019.0 | 204.4 MB | ###################################### | 100% ... numpy-1.15.1 | 37 KB | ###################################### | 100% numpy-base-1.15.1 | 4.2 MB | ###################################### | 100% ... root@py4fi:~#
You can also install multiple packages at once. -The y flag indicates that all (possible) questions are answered with yes.
root@py4fi:/# conda install -y ipython matplotlib pandas pytables scikit-learn \ > scipy ... pytables-3.4.4 | 1.5 MB | ##################################### | 100% kiwisolver-1.0.1 | 83 KB | ##################################### | 100% icu-58.2 | 22.5 MB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done root@py4fi:~#
After the installation process, in addition to the standard library, some important financial analysis libraries are available, including:
Ipython
Improved interactive Python shell;
matplotlib
Python standard drawing library;
NumPy
For efficient array processing;
pandas
Used to manage tabular data, such as financial time series data;
PyTables
Python HDF5 library wrapper;
scikit-learn
Software package for machine learning and related tasks;
SciPy
A set of scientific classes and functions (installed as dependencies).
This provides a basic tool set for data analysis, especially financial analysis. The following example uses IPython and NumPy to extract a set of pseudo-random numbers.
root@py4fi:~# ipython Python 3.7.0 (default, Jun 28 2018, 13:15:42) Type 'copyright', 'credits' or 'license' for more information IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import numpy as np In [2]: np.random.seed(100) In [3]: np.random.standard_normal((5, 4)) Out[3]: array([[-1.74976547, 0.3426804 , 1.1530358 , -0.25243604], [ 0.98132079, 0.51421884, 0.22117967, -1.07004333], [-0.18949583, 0.25500144, -0.45802699, 0.43516349], [-0.58359505, 0.81684707, 0.67272081, -0.10441114], [-0.53128038, 1.02973269, -0.43813562, -1.11831825]]) In [4]: exit root@py4fi:~#
Execute the conda list command to display the installed packages.
root@py4fi:~# conda list # packages in environment at /root/miniconda3: # # Name Version Build Channel asn1crypto 0.24.0 py37_0 backcall 0.1.0 py37_0 blas 1.0 mkl blosc 1.14.4 hdbcaa40_0 bzip2 1.0.6 h14c3975_5 ... python 3.7.0 hc3d631a_0 ... wheel 0.31.1 py37_0 xz 5.2.4 h14c3975_4 yaml 0.1.7 had09818_2 zlib 1.2.11 ha838bed_2 root@py4fi:~#
If you no longer need a package, you can use conda remove to delete it efficiently.
root@py4fi:~# conda remove scikit-learn Solving environment: done ## Package Plan ## environment location: /root/miniconda3 removed specs: - scikit-learn The following packages will be REMOVED: scikit-learn: 0.19.1-py37hedc7406_0 Proceed ([y]/n)? y Preparing transaction: done Verifying transaction: done Executing transaction: done root@py4fi:~#
conda's package manager function is quite practical. However, with the addition of virtual environment manager, you can see the full power of it.
[failed to transfer the pictures in the external chain. The source station may have anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-J6n4OCQP-1582770738682)(/api/storage/getbykey/screenshow?key=18013c684ffc590e9219))
Simple package management
As a package manager, conda makes it easy to install, update and delete Python packages. There's no need to build and compile packages yourself - this step can be tricky at times, given the list of dependencies specified by the packages and the details to consider for different operating systems.
2.2 conda as virtual environment manager
Miniconda provides a default Python 2.7 or 3.7 installation, depending on the version of the installer you choose. conda's virtual environment manager can allow different combinations. For example, add a completely independent Python 2.7.x installation based on the Python 3.7 default installation. For this purpose, conda provides the following functions.
Create a virtual environment
conda create --name $ENVIRONMENT_NAME
Activate environment
conda activate $ENVIRONMENT_NAME
Stop the environment
conda deactivate $ENVIRONMENT_NAME
Delete environment
conda env remove --name $ENVIRONMENT_NAME
Export to environment file
conda env export > $FILE_NAME
Create an environment from a file
conda env create -f $FILE_NAME
List all environments
conda info --envs
The following code is a simple illustration: create an environment called py27, install IPython in it, and execute a line of Python 2.7.x code.
root@py4fi:~# conda create --name py27 python=2.7 Solving environment: done ## Package Plan ## environment location: /root/miniconda3/envs/py27 added / updated specs: - python=2.7 The following NEW packages will be INSTALLED: ca-certificates: 2018.03.07-0 ... python: 2.7.15-h1571d57_0 ... zlib: 1.2.11-ha838bed_2 Proceed ([y]/n)? y Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use: # > conda activate py27 # # To deactivate an active environment, use: # > conda deactivate # root@py4fi:~#
Note the change of prompt after environment activation (py27).
root@py4fi:~# conda activate py27 (py27) root@py4fi:~# conda install ipython Solving environment: done ... Executing transaction: done (py27) root@py4fi:~#
Finally, the following code allows you to use IPython in Python 2.7 syntax.
(py27) root@py4fi:~# ipython Python 2.7.15 |Anaconda, Inc.| (default, May 1 2018, 23:32:55) Type "copyright", "credits" or "license" for more information. IPython 5.8.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. Help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: print "Hello Python for Finance World!" Hello Python for Finance World! In [2]: exit (py27) root@py4fi:~#
As shown in the above example, when using conda as a virtual environment manager, you can install different Python versions at the same time, and you can also install different versions of some packages. The default Python installation is not affected by this process, as are other environments that exist on the same machine. conda env list displays all available environments.
(py27) root@py4fi:~# conda env list # conda environments: # base /root/miniconda3 py27 * /root/miniconda3/envs/py27 (py27) root@py4fi:~#
Sometimes, you need to share environment information with others, or use environment information on multiple machines. To do this, you can use conda env export to export the list of installed packages to a file. This only applies by default to machines that use the same operating system, because the build version is specified in the resulting YAML file, but it can be deleted to specify only the package version.
(py27) root@py4fi:~# conda env export --no-builds > py27env.yml (py27) root@py4fi:~# cat py27env.yml name: py27 channels: - defaults dependencies: - backports=1.0 ... - python=2.7.15 ... - zlib=1.2.11 prefix: /root/miniconda3/envs/py27 (py27) root@py4fi:~#
Technically, a virtual environment is nothing more than a specific (subfolder) structure, often created for quick testing [5] . In this case, the stopped environment is easy to delete with conda env remove.
(py27) root@py4fi:/# conda deactivate root@py4fi:~# conda env remove -y --name py27 Remove all packages in environment /root/miniconda3/envs/py27: ## Package Plan ## environment location: /root/miniconda3/envs/py27 The following packages will be REMOVED: backports: 1.0-py27_1 ... zlib: 1.2.11-ha838bed_2 root@py4fi:~#
This concludes the functional overview of conda's virtual environment manager.
[failed to transfer the pictures in the external link. The source station may have anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-2yaSY054-1582770738683)(/api/storage/getbykey/screenshow?key=18013c684ffc590e9219))
Simple environmental management
conda not only helps with package management, it's also Python's virtual environment manager. It simplifies the creation of different Python environments, allowing multiple versions of Python and optional packages to be used on the same machine, which do not affect each other. conda can also import environment information, so you can easily copy the environment on multiple machines or share it with others.
2.3 using Docker container
Docker containers have conquered IT. Although this technology is still relatively young, IT has set a benchmark for the efficient development and deployment of almost all types of software applications.
In this book, you can think of the Docker container as a stand-alone ("containerized") file system, which contains operating systems (such as Ubuntu Sercer 18.04), (Python) runtime libraries, other systems and development tools, as well as other (Python) libraries and software packages you need. Such a Docker container can run on a Windows 10 local machine or a cloud instance using the Linux operating system.
This section is not intended to introduce all the exciting details of the Docker container, but simply explain the capabilities of Docker technology in Python development. [6]
2.3.1 Docker image and container
However, before we can explain it, we must distinguish two basic concepts of Docker. The first is Docker Image, which can be compared to a python class. The second is the Docker container, which can be compared to an instance of the corresponding Python class. [7]
You can find a more technical definition of image from Docker Vocabulary:
Docker image is the foundation of container. Image is an ordered collection of root file system changes and corresponding execution parameters used inside the container runtime library. A mirror usually consists of a set of stacked, hierarchical file systems that have no state and never change.
Similarly, you can find the definition of the container in the Docker vocabulary, which makes the comparison with Python classes and class instances very clear:
A container is a runtime instance of a Docker image. An instance of Docker includes a Docker image, an execution environment and a standard instruction set.
Depending on the operating system, the installation of Docker is slightly different, which is why it is not detailed in this section. More information and links can be found on the About Docker CE page.
2.3.2 build Ubuntu and Python Docker images
This section gives an example of how to build a Docker image based on the latest version of Ubuntu. The image also includes Miniconda and several important Python packages. In addition, by updating the Linux package index, upgrading the package when necessary, and installing some additional system tools, we can do some "housekeeping" work for Linux. There are two scripts needed for this work, one is bash script to complete all the Linux level work [8] . The other is the so-called Dockerfile, which controls the building process of the image itself.
The bash installation script in example 2-1 includes three main parts. The first part deals with Linux "housekeeping.". The second part installs Miniconda, and the third part installs Python optional packages. See the script for more detailed comments.
Example 2-1 script for installing Python and optional packages
#!/bin/bash # # Script to Install # Linux System Tools and # Basic Python Components # # Python for Finance, 2nd ed. # (c) Dr. Yves J. Hilpisch # # GENERAL LINUX apt-get update # updates the package index cache apt-get upgrade -y # updates packages # installs system tools apt-get install -y bzip2 gcc git htop screen vim wget apt-get upgrade -y bash # upgrades bash if necessary apt-get clean # cleans up the package index cache # INSTALL MINICONDA # downloads Miniconda wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O \ Miniconda.sh bash Miniconda.sh -b # installs it rm -rf Miniconda.sh # removes the installer export PATH="/root/miniconda3/bin:$PATH" # prepends the new path # INSTALL PYTHON LIBRARIES conda update -y conda python # updates conda & Python (if required) conda install -y pandas # installs pandas conda install -y ipython # installs IPython shell
The Dockerfile in example 2-2 uses the bash script in example 2-1 to build a new Docker image. The main part of it also has embedded comments.
Example 2-2 build the Dockerfile of the image
# # Building a Docker Image with # the Latest Ubuntu Version and # Basic Python Install # # Python for Finance, 2nd ed. # (c) Dr. Yves J. Hilpisch # # latest Ubuntu version FROM ubuntu:latest # information about maintainer MAINTAINER yves # add the bash script ADD install.sh / # change rights for the script RUN chmod u+x /install.sh # run the bash script RUN /install.sh # prepend the new path ENV PATH /root/miniconda3/bin:$PATH # execute IPython when container is run CMD ["ipython"]
If the two files are in the same folder and Docker is installed, it is easy to build a new image. Here we use the Ubuntu Python label for the image, which is necessary to reference the image, for example, when running the container based on it.
~/Docker$ docker build -t py4fi:basic . ... Removing intermediate container 5fec0c9b2239 ---> accee128d9e9 Step 6/7 : ENV PATH /root/miniconda3/bin:$PATH ---> Running in a2bb97686255 Removing intermediate container a2bb97686255 ---> 73b00c215351 Step 7/7 : CMD ["ipython"] ---> Running in ec7acd90c991 Removing intermediate container ec7acd90c991 ---> 6c36b9117cd2 Successfully built 6c36b9117cd2 Successfully tagged py4fi:basic ~/Docker$
Existing docker images can be listed with docker images. The new image appears at the beginning of the list:
(py4fi) ~/Docker$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE py4fi basic f789dd230d6f About a minute ago 1.79GB ubuntu latest cd6d8154f1e1 9 days ago 84.1MB (py4fi) ~/Docker$
After successfully building py4fi:basic, you can run the corresponding Docker container with docker run. Parameter combination - ti is required to run interaction processes (such as Shell processes) inside the Docker container (see the docker run reference page):
~/Docker$ docker run -ti py4fi:basic Python 3.7.0 (default, Jun 28 2018, 13:15:42) Type 'copyright', 'credits' or 'license' for more information IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import numpy as np In [2]: a = np.random.standard_normal((5, 3)) In [3]: import pandas as pd In [4]: df = pd.DataFrame(a, columns=['a', 'b', 'c']) In [5]: df Out[5]: A b c 0 -1.412661 -0.881592 1.704623 1 -1.294977 0.546676 1.027046 2 1.156361 1.979057 0.989772 3 0.546736 -0.479821 0.693907 4 -1.972943 -0.193964 0.769500 In [6]:
Exiting IPython will also exit the container because IPython is the only application running in the container. However, you can type Ctrl-P, Ctrl-Q to remove the container.
After the container is removed, the docker ps command still displays the running container (and any other currently running containers):
~/Docker$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS e815df8f0f4d py4fi:basic "ipython" About a minute ago Up About a minute 4518917de7dc ubuntu:latest "/bin/bash" About an hour agoUp About an hour d081b5c7add0 ubuntu:latest "/bin/bash" 21 hours ago Up 21 hours ~/Docker$
The connection of the Docker container can be completed with the Docker attach container ID < / code > command (note that the < code > container ID < / code > command completes (note that the < code > container Id only needs a few characters):
~/Docker$ docker attach e815d In [6]: df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): a 5 non-null float64 b 5 non-null float64 c 5 non-null float64 dtypes: float64(3) memory usage: 200.0 bytes In [7]: exit ~/Docker$
The exit command terminates IPython and stops the Docker container. The container can be deleted by docker rm:
~/Docker$ docker rm e815d e815d ~/Docker$
Similarly, if you no longer need the Docker container py4fi:basic, you can delete it through docker rmi. Although the container is relatively light, a single container can still consume a considerable amount of storage capacity. Take py4fi:basic for example, its size is close to 2GB. This is why you may want to clean up the Docker image list regularly:
~/Docker$ docker rmi 6c36b9117cd2
Of course, there are still many worthwhile explanations about Docker container and its benefits in some application scenarios. But for this book, you only need to know that the container provides a modern method for deploying python, completing Python development in a completely independent (containerized) environment, and delivering algorithm transaction code, which is enough.
[failed to transfer and store the pictures in the external chain. The source station may have anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-gm2xDST8-1582770738684)(/api/storage/getbykey/screenshow?key=18013c684ffc590e9219))
Benefits of Docker container
If you haven't used the Docker container, you should consider trying it. They provide many benefits for Python deployment and development, not only when working locally, but also when using remote cloud instances to deploy algorithm transaction codes with servers.
2.4 using cloud instances
This section describes how to build a complete Python infrastructure on a Digital Ocean cloud instance. There are many other cloud providers, such as the industry-leading Amazon Web Services. But Digital Ocean is famous for its simplicity, and for small cloud instances (called droplets), the price is relatively low. Research and development usually uses the smallest Droplet, which costs only $5 a month (or $0.007 an hour). The charges are hourly, so people can easily use a Droplet for two hours and then delete it, which costs only $0.014. [9]
The goal of this section is to build a Droplet on Digital Ocean, which contains Python 3.7 installation package and typical software packages (such as Numpy and Pandas), and is combined with a password protected, secure socket layer (SSL) encrypted Jupyter Notebook server installation. This server installation will provide three important tools that can be used through regular browsers.
Jupyter Notebook
A popular interactive development environment is characterized by the choice of different language kernels (for example, Python, R, and Julia).
terminal
It can be realized through the system Shell accessed by the browser, and can complete all the typical system management tasks and use the utilities such as Vim and git.
editor
Browser based file editor can highlight the syntax of many different programming languages and file types, and also has typical text / code editing functions.
After installing the Jupyter Notebook on the Droplet, you can develop and deploy Python through a browser without accessing and logging in to the cloud instance through the secure shell (SSH).
To achieve the objectives of this section, several documents are required.
Server settings script
This script coordinates all the necessary steps, such as copying other files into the Droplet and running on it.
Python and Jupyter installation scripts
This script is used to install Python, add-on packages, and Jupyter Notebook, and start the Jupyter Notebook server.
Jupyter Notebook profile
This file is used to configure the Jupyter Notebook server, such as details about password protection.
RSA public key and private key files
These two files are required for SSL encryption of the Jupyter Notebook server.
The following sections deal with the above documents in reverse order.
2.4.1 RSA public key and private key
In order to establish a secure connection with the Jupyter Notebook server through any browser, we need an SSL certificate containing the RSA public key and private key. In general, such a certificate comes from a so-called certification authority (CA). In this book, however, self generated certificates are "good enough.". [10] One of the popular tools for generating RSA key pairs is OpenSSL. The following short interactive session explains how to generate a certificate for the Jupyter Notebook server (insert your own country name and other fields at the prompt):
~/cloud$ openssl req -x509 -nodes -days 365 -newkey \ > rsa:1024 -out cert.pem -keyout cert.key Generating a 1024 bit RSA private key ..++++++ .......++++++ writing new private key to 'cert.key'
You will be asked to enter the information to be added to the certificate request. You will enter a so-called distinguished name (DN). There are multiple fields to enter, but you can leave some blank and others with default values. If you enter ".", the field will be left blank:
Country Name (2 letter code) [AU]:DE State or Province Name (full name) [Some-State]:Saarland Locality Name (eg, city) []:Voelklingen Organization Name (eg, company) [Internet Widgits Pty Ltd]:TPQ GmbH Organizational Unit Name (eg, section) []:Python for Finance Common Name (e.g. server FQDN or YOUR name) []:Jupyter Email Address []:team@tpq.io ~/cloud$ ls cert.key cert.pem ~/cloud$
cert.key and cert.pem must be copied to the Droplet and referenced by the Jupyter Notebook configuration file. The configuration file is described below.
2.4.2 Jupyter Notebook configuration file
You can deploy a public jupyter notebook server as explained in the jupyter notebook documentation. Among other features, Jupyter Notebook can be password protected. For this purpose, the notebook.auth sub package provides a password hash code generation function, passwd(). The following code generates a hash code with a password of jupyter:
~/cloud$ ipython Python 3.7.0 (default, Jun 28 2018, 13:15:42) Type 'copyright', 'credits' or 'license' for more information IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: from notebook.auth import passwd In [2]: passwd('jupyter') Out[2]: 'sha1:d4d34232ac3a:55ea0ffd78cc3299e3e5e6ecc0d36be0935d424b' In [3]: exit
This hash code must be placed in the Jupyter Notebook configuration file provided in example 2-3. The profile assumes that the RSA public key file has been copied to the / root/.jupyter / folder on the Droplet.
Example 2-3 Jupyter Notebook configuration file
# # Jupyter Notebook Configuration File # # Python for Finance, 2nd ed. # (c) Dr. Yves J. Hilpisch # # SSL ENCRYPTION # replace the following filenames (and files used) with your choice/files c.NotebookApp.certfile = u'/root/.jupyter/cert.pem' c.NotebookApp.keyfile = u'/root/.jupyter/cert.key' # IP ADDRESS AND PORT # set ip to '*' to bind on all IP addresses of the cloud instance c.NotebookApp.ip = '*' # it is a good idea to set a known, fixed default port for server access c.NotebookApp.port = 8888 # PASSWORD PROTECTION # here: 'jupyter' as password # replace the hash code with the one for your strong password c.NotebookApp.password = 'sha1:d4d34232ac3a:55ea0ffd78cc3299e3e5e6ecc0d36be0935d424b' # NO BROWSER OPTION # prevent Jupyter from trying to open a browser c.NotebookApp.open_browser = False
[failed to transfer the pictures in the external chain. The source station may have anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-7snKWGbT-1582770738685)(/api/storage/getbykey/screenshow?key=18013c684ffc590e9219))
Jupyter and security
Deploying Jupyter Notebook in the cloud will cause some security problems because it is a complete development environment accessed through a browser. Therefore, the most important thing is to use the security measures provided by the Jupyter Notebook server, such as password protection and SSL encryption. But this is just the beginning: further security measures are recommended based on the work done on the cloud instance.
The next step is to make sure Python and Jupyter Notebook are installed on the Droplet.
2.4.2 Python and Jupyter Notebook installation script
The bash script for installing Python and Jupyter Notebook is similar to the script for installing Python through Miniconda in Docker container in Section 2.3. However, the script in example 2-4 also needs to start the Jupyter Notebook server. All important parts and lines of code have embedded comments.
Example 2-4 install Python and run bash script of Jupyter Notebook server
#!/bin/bash # # Script to Install # Linux System Tools, # Basic Python Packages and # Jupyter Notebook Server # # Python for Finance, 2nd ed. # (c) Dr. Yves J. Hilpisch # # GENERAL LINUX apt-get update # updates the package index cache apt-get upgrade -y # updates packages apt-get install -y bzip2 gcc git htop screen vim wget # installs system tools apt-get upgrade -y bash # upgrades bash if necessary apt-get clean # cleans up the package index cache # INSTALLING MINICONDA wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_ 64.sh -O \ Miniconda.sh bash Miniconda.sh -b # installs Miniconda rm Miniconda.sh # removes the installer # prepends the new path for current session export PATH="/root/miniconda3/bin:$PATH" # prepends the new path in the shell configuration echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc echo "conda activate" >> ~/.bashrc # INSTALLING PYTHON LIBRARIES # More packages can/must be added # depending on the use case. conda update -y conda # updates conda if required conda create -y -n py4fi python=3.7 # creates an environment source activate py4fi # activates the new environment conda install -y jupyter # interactive data analytics in the browser conda install -y pytables # wrapper for HDF5 binary storage conda install -y pandas # data analysis package conda install -y matplotlib # standard plotting library conda install -y scikit-learn # machine learning library conda install -y openpyxl # library for Excel interaction conda install -y pyyaml # library to manage YAML files pip install --upgrade pip # upgrades the package manager pip install cufflinks # combining plotly with pandas # COPYING FILES AND CREATING DIRECTORIES mkdir /root/.jupyter mv /root/jupyter_notebook_config.py /root/.jupyter/ mv /root/cert.* /root/.jupyter mkdir /root/notebook cd /root/notebook # STARTING JUPYTER NOTEBOOK jupyter notebook --allow-root # STARTING JUPYTER NOTEBOOK # as background process: # jupyter notebook --allow-root &
This script must be copied to the Droplet and started by the orchestration script described in the next section.
2.4.3 script for coordinating Droplet settings
Set the second bash script of the Droplet to be the shortest (example 2-5). Its main function is to copy all other files to the Droplet. The IP address of the Droplet is a parameter of the script. The last line of the script is used to start the install.sh script, which starts the Jupyter Notebook server when the installation is complete.
Example 2-5 set the bash script of Droplet
#!/bin/bash # # Setting up a DigitalOcean Droplet # with Basic Python Stack # and Jupyter Notebook # # Python for Finance, 2nd ed. # (c) Dr Yves J Hilpisch # # IP ADDRESS FROM PARAMETER MASTER_IP=$1 # COPYING THE FILES scp install.sh root@${MASTER_IP}: scp cert.* jupyter_notebook_config.py root@${MASTER_IP}: # EXECUTING THE INSTALLATION SCRIPT ssh root@${MASTER_IP} bash /root/install.sh
Now, all the conditions for setting up the code are in place. On Digital Ocean, create a new Droplet with the following options.
operating system
Ubuntu 18.10 x64 (the latest version at the time of writing).
Size
1 core, 1GB memory, 25GB SSD (minimum Droplet).
Data center region
Frankfurt (because the author lives in Germany).
SSH key
Add a (New) SSH key for password free login. [11]
Droplet name
You can use a pre specified name, or choose a name such as py4fi.
Click the Create button to start the Droplet creation process, which usually takes a minute. The main result of the setup process is an IP address, which may be 46.101.156.199 if you choose Frankfurt as the data center location. Now, to set up the droplet, you only need the following simple commands:
(py3) ~/cloud$ bash setup.sh 46.101.156.199
The follow-up process may take a few minutes. At the end, the Jupyter Notebook server displays the following information:
The Jupyter Notebook is running at: https://[all ip addresses on your system]:8888/
In any modern browser, you can access the running Jupyter Notebook server by visiting the following address (note https protocol):
https://46.101.156.199:8888
The server may ask you to add a security exception. After adding, the Jupyter Notebook login screen will appear, prompting for the password (in our example, jupyter). Now, you're ready to start Python development in the browser through Jupyter Notebook, IPython in the terminal window, or the text file editor. Other file management functions (such as file upload, delete and folder creation) are also available.
[failed to transfer the pictures in the external chain. The source station may have anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-epggfanu-1582770738685) (/ API / storage / getbykey / screenshow? Key = 18013c684ffc590e9219))
Cloud benefits
The combination of cloud instances and Jupyter Notebook provided by companies such as digital ocean is very powerful, enabling Python developers and "quanke" to use professional computing and storage infrastructure. Professional cloud and data center providers ensure the physical security and high availability of your (virtual) machines. Using cloud instances can also keep the cost of research and development at a low level, because the charges are usually based on the number of hours used, and there is no need to sign a long-term agreement.
2.5 conclusion
Python is the programming language and technology platform chosen for this book, and it is adopted by almost all leading financial institutions. However, python development can be difficult, sometimes even tedious and annoying. Fortunately, many technologies have emerged in recent years to help solve deployment problems. Open source project conda can help manage Python packages and virtual environments. The Docker container goes a step further, making it easy to create a complete file system and runtime environment in a technically isolated "sandbox" (container). Digital ocean and other cloud providers can provide computing and storage capacity in a professional managed security data center within minutes, and charge by hour. These technologies, combined with Python 3.7 installation and secure Jupyter Notebook server installation, provide a professional environment for the development and deployment of Python financial projects.
2.6 extended reading
Python package management can refer to the following resources:
- pip package manager home page;
- conda package manager home page;
- Python installation package page.
Virtual environment management can refer to the following resources:
- virtualenv Environment Manager page;
- conda environmental management page;
- pipenv package and environment manager.
The following resources provide information about the Docker container:
- Docker homepage;
- Matthias, Karl, and Sean Kane (2015). Docker: Up and Running. Sebastopol, CA: O'Reilly.
For the introduction and overview of bash script language, please refer to:
- Robbins, Arnold (2016). Bash Pocket Reference. Sebastopol, CA: O'Reilly.
The Jupyter Notebook document explains how to run a public Jupyter Notebook server safely. JupyterHub is a hub for multiple users managing the Jupyter Notebook server.
Register DigetalOcean via the referrer's link and get a starting balance of $10 in your new account. This can pay a minimum Droplet usage fee of two months.
[1] This version is based on CPython 3.7 (the latest major version at the time of writing), which is the most popular original version of Python programming language. - original note
[2] The latest pipenv project combines the capabilities of package manager pip and virtual environment manager virtualenv. - original note
[3] Miniconda installers are usually not updated as often as conda and Python. - original note
[4] Install the metapackage nomkl (for example, use the command conda install numpy nomkl) to avoid automatic installation and use of mkl and other related packages. - original note
[5] In Python's official documentation, you can find the following explanation: "Python 'virtual environment' allows Python packages to be installed in a separate location for a specific application, rather than in a global installation." - original note
[6] For a comprehensive introduction to Docker technology, see Matthias and Kane (2015). - original note
[7] If the understanding of this term is not clear enough, don't worry, Chapter 6 will introduce it further. - original note
[8] Consult Robbins (2016) gives a brief introduction and overview of bash scripts. - original note
[9] New users registered through the recommended link can get a $10 credit line from Digital Ocean. - original note
[10] If you use a self generated certificate, you may need to add a security exception when prompted by your browser. - original note
[11] If you need help, visit the How to Add SSH Keys to Droplets or How to Create SSH Keys with PuTTY on Windows Web pages. - original note