Introducing Environments — Run Anything on PiCloud

September 26th, 2011 by Ken Elkabany

Environments mark an important milestone for PiCloud. Whereas Function Publishing makes the computing power of PiCloud accessible to all programming languages, Environments enable you to use any library or binary you need in your computation. The possibilities are limitless, but here are a couple examples of what you can do with environments:

  1. Install any non-Python software package you need via apt-get or make.
  2. Install any Python module that we do not automatically extract from your machines, which are typically those that require compilation or depend on external libraries.

In this post, we’ll show you how to create and use your first environment. We’ll be installing the ObsPy package, which is a Python toolbox for processing seismological data.

Why Environments?

We strive to make moving your computation to the cloud as easy as possible. That’s why our cloud Python package automatically detects and transfers dependencies over to our cloud.

import cloud
from your_expansive_library_of_functions import complex_function
# cloud.call transfers all the modules needed to run complex_function on PiCloud
cloud.call(complex_function)

Unfortunately, automatic dependency transfer only works for pure Python modules. The ObsPy package requires both a .pth file and C-code compilation for proper operation. So the following simple function quickly runs into problems:

def simple_function():
    import obspy
>>> jid = cloud.call(simple_function)
>>> cloud.result(jid)
[Mon Sep 19 16:39:13 2011] - [WARNING] - Cloud: Job 1337 threw exception:
 Could not depickle job
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/cloud/serialization/cloudpickle.py", line 679, in subimport
    __import__(name)
ImportError: No module named obspy

Importing ObsPy fails because it could not be transferred to PiCloud in working form. You might be wondering how you’ve been able to use NumPy, SciPy, and other natively-compiled libraries on PiCloud. The answer is we have many libraries pre-installed on our systems. Here are the respective links for what we have pre-installed for Python 2.6, and Python 2.7.

Creating a new Environment

Step 1: Go to the Environments tab in the Control Panel.



Step 2: Click “create new environment”.



A popup box will appear. The Base Environment option allows you to choose what distribution of Ubuntu Linux you would like to use as the base filesystem. It’s important to understand why we give you this option. If you use Python 2.7 on your local machine to offload computation to PiCloud, we will run your functions in the Python 2.7 interpreter from the Ubuntu 11.04 (Natty) base. If you use Python 2.6 on your local machine to offload computation to PiCloud, we will run your functions in the Python 2.6 interpreter from the Ubuntu 10.10 (Maverick) base. We are consistent about which interpreter we use since the modules you install in your environment may compile against a specific version of Python. In short, if you’re using Python 2.6 on your machines, but you use the Natty base, or vice versa, you will most likely run into compatibility issues.

The Environment Name is the name you’ll use to reference the environment in your jobs. The Environment Description is for yourself and/or your team to keep track of the purpose and contents of each environment.

Step 3: Click submit.

When you click submit, your environment will appear under the “Environments being configured” tab. You may have to wait a minute or two while we boot and configure a server with the appropriate base environment for you.

For our example, we’ve named the environment seismology_env.

Connecting to your Environment Setup Server

When the server is ready, click the connect link. Note that the instructions are currently tailored towards *nix environments. If you are using Windows and do not have an SSH client, we recommend Tunnelier.



Download the private key we have generated for you. You will use this same private key for all future environment setup servers. SSH enforces that only the owner should have access to the file, which is why we instruct you to run chmod 400 privatekey.pem. Once you’ve done that, SSH into the provided server using the private key by using the -i flag as shown in the instructions.

Getting Around Your Environment

Once you’ve SSH-ed in, you’ll find yourself in a Ubuntu Linux filesystem environment.

picloud@ip-10-46-223-4:~$ ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  sbin  selinux  srv  sys  tmp  usr  var

Your current working directory is /home/picloud:

picloud@ip-10-46-223-4:~$ pwd
/home/picloud

You can verify the distribution of Ubuntu you’re using:

picloud@ip-10-46-223-4:~$ cat /etc/issue
Ubuntu 11.04 \n \l

We give you sudo access so that you have the freedom to install anything anywhere.

# this does not produce an error
picloud@ip-10-46-223-4:~$ sudo touch /root/i_can_be_root

Important: The owner and group for files and directories in your environment do not matter. While you’ll be using the picloud and root user accounts, your jobs will be run with an entirely different user account that will have access to the entire filesystem environment.

Setting Up Your Environment

We’ll use sudo access to install the ObsPy library.

picloud@ip-10-46-223-4:~$ sudo pip install obspy.core obspy.signal
Downloading/unpacking obspy.core
  Downloading obspy.core-0.4.8.zip (186Kb): 186Kb downloaded
  Running setup.py egg_info for package obspy.core

    no previously-included directories found matching 'docs/other/*'
Downloading/unpacking obspy.signal
  Downloading obspy.signal-0.4.9.zip (4.0Mb): 4.0Mb downloaded
  Running setup.py egg_info for package obspy.signal

Requirement already satisfied (use --upgrade to upgrade): numpy>1.0.0 in /usr/local/lib/python2.7/dist-packages (from obspy.core)
Requirement already satisfied (use --upgrade to upgrade): scipy in /usr/local/lib/python2.7/dist-packages (from obspy.signal)
Installing collected packages: obspy.core, obspy.signal
  Running setup.py install for obspy.core

    no previously-included directories found matching 'docs/other/*'
    Skipping installation of /usr/local/lib/python2.7/dist-packages/obspy/__init__.py (namespace package)
    Installing /usr/local/lib/python2.7/dist-packages/obspy.core-0.4.8-nspkg.pth
    Installing obspy-runtests script to /usr/local/bin
  Running setup.py install for obspy.signal

    building 'libsignal' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c obspy/signal/src/recstalta.c -o build/temp.linux-x86_64-2.7/obspy/signal/src/recstalta.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c obspy/signal/src/xcorr.c -o build/temp.linux-x86_64-2.7/obspy/signal/src/xcorr.o
    ...
I/usr/include/python2.7 -c obspy/signal/src/fft/fftpack_litemodule.c -o build/temp.linux-x86_64-2.7/obspy/signal/src/fft/fftpack_litemodule.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.7/obspy/signal/src/recstalta.o build/temp.linux-x86_64-2.7/obspy/signal/src/xcorr.o build/temp.linux-x86_64-2.7/obspy/signal/src/coordtrans.o build/temp.linux-x86_64-2.7/obspy/signal/src/pk_mbaer.o build/temp.linux-x86_64-2.7/obspy/signal/src/filt_util.o build/temp.linux-x86_64-2.7/obspy/signal/src/arpicker.o build/temp.linux-x86_64-2.7/obspy/signal/src/bbfk.o build/temp.linux-x86_64-2.7/obspy/signal/src/fft/fftpack.o build/temp.linux-x86_64-2.7/obspy/signal/src/fft/fftpack_litemodule.o -o build/lib.linux-x86_64-2.7/obspy/signal/lib/libsignal.so
    Skipping installation of /usr/local/lib/python2.7/dist-packages/obspy/__init__.py (namespace package)
    Installing /usr/local/lib/python2.7/dist-packages/obspy.signal-0.4.9-nspkg.pth
Successfully installed obspy.core obspy.signal
Cleaning up...

As you can see, installing obspy.signal requires compiling C code with references to the NumPy library. We would not have been able to automatically extract this package from your machine.

Save the Environment

When you click “save” from the Environment Panel, your SSH connection will be closed. The length of time it takes to save your environment depends on how much you’ve installed. Once it’s ready, your new Environment will be listed under the “Your environments” section.

Using Your Environment

To use an environment, use the _env keyword argument to specify the environment you want to use by name. _env is valid for cloud.call, cloud.map, cloud.cron.register, or cloud.rest.publish.

To demonstrate, we will run a beamforming algorithm using the ObsPy library that we just installed. Beamforming is a technique used in seismology for geolocating seismic events. In this case, the event is the demolition of the AGFA skyscraper in Munich. We’ve derived the example from here.

Step 1: Upload the recorded dataset from the demolition to cloud.files

cloud.files.put('agfa.dump')

Step 2: Define the beamforming function.

Note that our version pulls the data from cloud.files directly into memory using the getf function.

import cloud
import pickle, urllib
from obspy.core import UTCDateTime
from obspy.signal.array_analysis import sonic
from obspy.signal import cornFreq2Paz

def beamforming(file_name):
    st = pickle.loads(cloud.files.getf(file_name).read())

    # Instrument correction to 1Hz corner frequency
    paz1hz = cornFreq2Paz(1.0, damp=0.707)
    st.simulate(paz_remove='self', paz_simulate=paz1hz)

    # Execute sonic
    kwargs = dict(
	# slowness grid: X min, X max, Y min, Y max, Slow Step
	sll_x=-3.0, slm_x=3.0, sll_y=-3.0, slm_y=3.0, sl_s=0.03,
	# sliding window propertieds
	win_len=1.0, win_frac=0.05,
	# frequency properties
	frqlow=1.0, frqhigh=8.0, prewhiten=0,
	# restrict output
	semb_thres=-1e9, vel_thres=-1e9, verbose=True, timestamp='mlabhour',
	stime=UTCDateTime("20080217110515"), etime=UTCDateTime("20080217110545")
    )

    return sonic(st, **kwargs)

Step 3: Run it on PiCloud using the _env keyword.

Since this is a computationally intensive task, we use the c2 core type to take advantage of 2.5 compute units of power.

>>> jid = cloud.call(beamforming, 'agfa.dump', _env='seismology_env', _type='c2')
>>> out = cloud.result(jid)
>>> out
array([[  7.33088462e+05,   6.52313948e-01,   4.77909058e-17,
          1.63009177e+02,   1.12929181e+00],
       [  7.33088462e+05,   6.54728115e-01,   4.92721051e-17,
          1.58838740e+02,   9.97246208e-01],
       [  7.33088462e+05,   6.65887892e-01,   5.40099541e-17,
          1.58552264e+02,   9.02496537e-01],
       ...,
       [  7.33088462e+05,   7.97200561e-01,   2.54941247e-16,
          1.84349488e+01,   1.23328829e+00],
       [  7.33088462e+05,   7.93642402e-01,   2.90117096e-16,
         -1.20947571e+01,   8.59069264e-01],
       [  7.33088462e+05,   8.08987796e-01,   3.06803433e-16,
         -2.65650512e+01,   8.72066511e-01]])

To better visualize the result, we can plot the output:



The following code was used to generate the plot:

import matplotlib.pyplot as plt
labels = 'rel.power abs.power baz slow'.split()

fig = plt.figure()
for i, lab in enumerate(labels):
    ax = fig.add_subplot(4, 1, i + 1)
    ax.scatter(out[:, 0], out[:, i + 1], c=out[:, 1], alpha=0.6,
               edgecolors='none')
    ax.set_ylabel(lab)
    ax.xaxis_date()

fig.autofmt_xdate()
fig.subplots_adjust(top=0.95, right=0.95, bottom=0.2, hspace=0)

Pricing

Creating environments is free. There are no additional charges associated with this feature. From our perspective, environments allow you to run more cycles on PiCloud, which is where we get our pay off.

Conclusion

With Environments, every programming language and software package is now a viable tool to use on PiCloud. While we love Python, we’re excited that it is no longer the “sole language of the PiCloud Platform.” Coupled with our function publishing feature, we envision Python serving as the glue language through which users can orchestrate their computation on our otherwise language-agnotistic platform.



Thanks to the select group of users who beta tested this feature for us, and to Ken Park from PiCloud who envisioned and shepherded this project to launch!

Tags: ,

Categories: What's New

You can follow any responses to this entry through the RSS 2.0 feed.

13 Responses to “Introducing Environments — Run Anything on PiCloud”

  1. Eugene says:

    Wow great work by PiCloud.

    Exactly what I’ve been waiting for.

    So i assume any data stored in the environment will be lost should I finish the function ?

  2. Ken Elkabany says:

    @Eugene
    Yes, all data written to the filesystem by a job, whether or not an environment is used, is treated as transient, and is purged at the completion of the job.

  3. Amit says:

    very cool! Now, I can finally port my dryrun (https://bitbucket.org/amitksaha/dryrun) to the cloud! Looking forward to try it out very soon.

  4. Greg says:

    Absolutely fantastic news. Having installed OpenCV in an environment, we can now distribute and run our image analysis using your infrastructure!

    Thanks a lot and congratulations.

  5. Stan says:

    This looks great! Now I can think of PiCloud as the friendly layer on top of Amazon EC2.

    Our C++ libraries need some environment variables set to run. (Ugly, I know…) Where is the appropriate place to put that in a PiCloud environment? Can I put stuff in /etc/profile.d?

  6. Ken Elkabany says:

    @Stan
    You can “import os”, and then modify “os.environ” in your function to set the appropriate environment variables. Since we aren’t running a shell (just a bare Python process), modifying /etc/profile.d will have no effect, unless you explicitly execute a shell from your Python function.

  7. Barry Carter says:

    Note that it’s now “picloud@[server]“, not “setup@[server]“. This tripped me up briefly.

  8. Ken Elkabany says:

    @Barry Thanks! Change made.

  9. Luke Stanley says:

    Would love to see you guys doing a nice Common Crawl demo!

  10. [...] the release of Environments, many of our users are running non-Python multithreaded programs. Some of those can use as many [...]

  11. Is it possible to initiate Environment creation using CLI or API ? Thanks.

  12. Tatiana says:

    Hi! Can you comment on obtained slowness value of 2-2.5 s/km?

Leave a Reply