Posts Tagged ‘new features’

Introducing Scraping-Optimized Cores

Monday, October 10th, 2011

For users who aggregate data from around the web, you’ll find our latest core to be an integral part of your toolbox. s1 cores are comparable in performance to c1 cores with one extra trick up their sleeve: each job running in parallel will have a different IP.

Why is this important?

Using unique IPs is necessary to minimize the automated throttling most sites will impose when seeing fast, repeated access from a single IP.

How do I use it?

If you’re already using our c1 cores, all you’ll need to do is set the _type keyword.

cloud.call(func, _type=’s1′)

How much?

$0.04/core/hour

Why don’t other cores have individual IPs?

For other core types, such as c2, multiple cores may be running on a single machine that is assigned only a single IP address. When using s1 cores, you’re guaranteed that each core sits on a different machine.

Suggestions?

We’re excited to move the s1 core type out of beta for our customers. If you have any suggestions for other core types you would like to see, please let us know.

Introducing Environments — Run Anything on PiCloud

Monday, September 26th, 2011

Environments mark an important milestone for PiCloud. Whereas Function Publishing makes the computing power of PiCloud accessible to all programming languages, Environments enable you to use any library or binary you need in your computation. The possibilities are limitless, but here are a couple examples of what you can do with environments:

  1. Install any non-Python software package you need via apt-get or make.
  2. Install any Python module that we do not automatically extract from your machines, which are typically those that require compilation or depend on external libraries.

In this post, we’ll show you how to create and use your first environment. We’ll be installing the ObsPy package, which is a Python toolbox for processing seismological data.

Why Environments?

We strive to make moving your computation to the cloud as easy as possible. That’s why our cloud Python package automatically detects and transfers dependencies over to our cloud.

import cloud
from your_expansive_library_of_functions import complex_function
# cloud.call transfers all the modules needed to run complex_function on PiCloud
cloud.call(complex_function)

Unfortunately, automatic dependency transfer only works for pure Python modules. The ObsPy package requires both a .pth file and C-code compilation for proper operation. So the following simple function quickly runs into problems:

def simple_function():
    import obspy
>>> jid = cloud.call(simple_function)
>>> cloud.result(jid)
[Mon Sep 19 16:39:13 2011] - [WARNING] - Cloud: Job 1337 threw exception:
 Could not depickle job
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/cloud/serialization/cloudpickle.py", line 679, in subimport
    __import__(name)
ImportError: No module named obspy

Importing ObsPy fails because it could not be transferred to PiCloud in working form. You might be wondering how you’ve been able to use NumPy, SciPy, and other natively-compiled libraries on PiCloud. The answer is we have many libraries pre-installed on our systems. Here are the respective links for what we have pre-installed for Python 2.6, and Python 2.7.

Creating a new Environment

Step 1: Go to the Environments tab in the Control Panel.



Step 2: Click “create new environment”.



A popup box will appear. The Base Environment option allows you to choose what distribution of Ubuntu Linux you would like to use as the base filesystem. It’s important to understand why we give you this option. If you use Python 2.7 on your local machine to offload computation to PiCloud, we will run your functions in the Python 2.7 interpreter from the Ubuntu 11.04 (Natty) base. If you use Python 2.6 on your local machine to offload computation to PiCloud, we will run your functions in the Python 2.6 interpreter from the Ubuntu 10.10 (Maverick) base. We are consistent about which interpreter we use since the modules you install in your environment may compile against a specific version of Python. In short, if you’re using Python 2.6 on your machines, but you use the Natty base, or vice versa, you will most likely run into compatibility issues.

The Environment Name is the name you’ll use to reference the environment in your jobs. The Environment Description is for yourself and/or your team to keep track of the purpose and contents of each environment.

Step 3: Click submit.

When you click submit, your environment will appear under the “Environments being configured” tab. You may have to wait a minute or two while we boot and configure a server with the appropriate base environment for you.

For our example, we’ve named the environment seismology_env.

Connecting to your Environment Setup Server

When the server is ready, click the connect link. Note that the instructions are currently tailored towards *nix environments. If you are using Windows and do not have an SSH client, we recommend Tunnelier.



Download the private key we have generated for you. You will use this same private key for all future environment setup servers. SSH enforces that only the owner should have access to the file, which is why we instruct you to run chmod 400 privatekey.pem. Once you’ve done that, SSH into the provided server using the private key by using the -i flag as shown in the instructions.

Getting Around Your Environment

Once you’ve SSH-ed in, you’ll find yourself in a Ubuntu Linux filesystem environment.

picloud@ip-10-46-223-4:~$ ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  sbin  selinux  srv  sys  tmp  usr  var

Your current working directory is /home/picloud:

picloud@ip-10-46-223-4:~$ pwd
/home/picloud

You can verify the distribution of Ubuntu you’re using:

picloud@ip-10-46-223-4:~$ cat /etc/issue
Ubuntu 11.04 \n \l

We give you sudo access so that you have the freedom to install anything anywhere.

# this does not produce an error
picloud@ip-10-46-223-4:~$ sudo touch /root/i_can_be_root

Important: The owner and group for files and directories in your environment do not matter. While you’ll be using the setup and root user accounts, your jobs will be run with an entirely different user account that will have access to the entire filesystem environment.

Setting Up Your Environment

We’ll use sudo access to install the ObsPy library.

picloud@ip-10-46-223-4:~$ sudo pip install obspy.core obspy.signal
Downloading/unpacking obspy.core
  Downloading obspy.core-0.4.8.zip (186Kb): 186Kb downloaded
  Running setup.py egg_info for package obspy.core

    no previously-included directories found matching 'docs/other/*'
Downloading/unpacking obspy.signal
  Downloading obspy.signal-0.4.9.zip (4.0Mb): 4.0Mb downloaded
  Running setup.py egg_info for package obspy.signal

Requirement already satisfied (use --upgrade to upgrade): numpy>1.0.0 in /usr/local/lib/python2.7/dist-packages (from obspy.core)
Requirement already satisfied (use --upgrade to upgrade): scipy in /usr/local/lib/python2.7/dist-packages (from obspy.signal)
Installing collected packages: obspy.core, obspy.signal
  Running setup.py install for obspy.core

    no previously-included directories found matching 'docs/other/*'
    Skipping installation of /usr/local/lib/python2.7/dist-packages/obspy/__init__.py (namespace package)
    Installing /usr/local/lib/python2.7/dist-packages/obspy.core-0.4.8-nspkg.pth
    Installing obspy-runtests script to /usr/local/bin
  Running setup.py install for obspy.signal

    building 'libsignal' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c obspy/signal/src/recstalta.c -o build/temp.linux-x86_64-2.7/obspy/signal/src/recstalta.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c obspy/signal/src/xcorr.c -o build/temp.linux-x86_64-2.7/obspy/signal/src/xcorr.o
    ...
I/usr/include/python2.7 -c obspy/signal/src/fft/fftpack_litemodule.c -o build/temp.linux-x86_64-2.7/obspy/signal/src/fft/fftpack_litemodule.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.7/obspy/signal/src/recstalta.o build/temp.linux-x86_64-2.7/obspy/signal/src/xcorr.o build/temp.linux-x86_64-2.7/obspy/signal/src/coordtrans.o build/temp.linux-x86_64-2.7/obspy/signal/src/pk_mbaer.o build/temp.linux-x86_64-2.7/obspy/signal/src/filt_util.o build/temp.linux-x86_64-2.7/obspy/signal/src/arpicker.o build/temp.linux-x86_64-2.7/obspy/signal/src/bbfk.o build/temp.linux-x86_64-2.7/obspy/signal/src/fft/fftpack.o build/temp.linux-x86_64-2.7/obspy/signal/src/fft/fftpack_litemodule.o -o build/lib.linux-x86_64-2.7/obspy/signal/lib/libsignal.so
    Skipping installation of /usr/local/lib/python2.7/dist-packages/obspy/__init__.py (namespace package)
    Installing /usr/local/lib/python2.7/dist-packages/obspy.signal-0.4.9-nspkg.pth
Successfully installed obspy.core obspy.signal
Cleaning up...

As you can see, installing obspy.signal requires compiling C code with references to the NumPy library. We would not have been able to automatically extract this package from your machine.

Save the Environment

When you click “save” from the Environment Panel, your SSH connection will be closed. The length of time it takes to save your environment depends on how much you’ve installed. Once it’s ready, your new Environment will be listed under the “Your environments” section.

Using Your Environment

To use an environment, use the _env keyword argument to specify the environment you want to use by name. _env is valid for cloud.call, cloud.map, cloud.cron.register, or cloud.rest.publish.

To demonstrate, we will run a beamforming algorithm using the ObsPy library that we just installed. Beamforming is a technique used in seismology for geolocating seismic events. In this case, the event is the demolition of the AGFA skyscraper in Munich. We’ve derived the example from here.

Step 1: Upload the recorded dataset from the demolition to cloud.files

cloud.files.put('agfa.dump')

Step 2: Define the beamforming function.

Note that our version pulls the data from cloud.files directly into memory using the getf function.

import cloud
import pickle, urllib
from obspy.core import UTCDateTime
from obspy.signal.array_analysis import sonic
from obspy.signal import cornFreq2Paz

def beamforming(file_name):
    st = pickle.loads(cloud.files.getf(file_name).read())

    # Instrument correction to 1Hz corner frequency
    paz1hz = cornFreq2Paz(1.0, damp=0.707)
    st.simulate(paz_remove='self', paz_simulate=paz1hz)

    # Execute sonic
    kwargs = dict(
	# slowness grid: X min, X max, Y min, Y max, Slow Step
	sll_x=-3.0, slm_x=3.0, sll_y=-3.0, slm_y=3.0, sl_s=0.03,
	# sliding window propertieds
	win_len=1.0, win_frac=0.05,
	# frequency properties
	frqlow=1.0, frqhigh=8.0, prewhiten=0,
	# restrict output
	semb_thres=-1e9, vel_thres=-1e9, verbose=True, timestamp='mlabhour',
	stime=UTCDateTime("20080217110515"), etime=UTCDateTime("20080217110545")
    )

    return sonic(st, **kwargs)

Step 3: Run it on PiCloud using the _env keyword.

Since this is a computationally intensive task, we use the c2 core type to take advantage of 2.5 compute units of power.

>>> jid = cloud.call(beamforming, 'agfa.dump', _env='seismology_env', _type='c2')
>>> out = cloud.result(jid)
>>> out
array([[  7.33088462e+05,   6.52313948e-01,   4.77909058e-17,
          1.63009177e+02,   1.12929181e+00],
       [  7.33088462e+05,   6.54728115e-01,   4.92721051e-17,
          1.58838740e+02,   9.97246208e-01],
       [  7.33088462e+05,   6.65887892e-01,   5.40099541e-17,
          1.58552264e+02,   9.02496537e-01],
       ...,
       [  7.33088462e+05,   7.97200561e-01,   2.54941247e-16,
          1.84349488e+01,   1.23328829e+00],
       [  7.33088462e+05,   7.93642402e-01,   2.90117096e-16,
         -1.20947571e+01,   8.59069264e-01],
       [  7.33088462e+05,   8.08987796e-01,   3.06803433e-16,
         -2.65650512e+01,   8.72066511e-01]])

To better visualize the result, we can plot the output:



The following code was used to generate the plot:

import matplotlib.pyplot as plt
labels = 'rel.power abs.power baz slow'.split()

fig = plt.figure()
for i, lab in enumerate(labels):
    ax = fig.add_subplot(4, 1, i + 1)
    ax.scatter(out[:, 0], out[:, i + 1], c=out[:, 1], alpha=0.6,
               edgecolors='none')
    ax.set_ylabel(lab)
    ax.xaxis_date()

fig.autofmt_xdate()
fig.subplots_adjust(top=0.95, right=0.95, bottom=0.2, hspace=0)

Pricing

Creating environments is free. There are no additional charges associated with this feature. From our perspective, environments allow you to run more cycles on PiCloud, which is where we get our pay off.

Conclusion

With Environments, every programming language and software package is now a viable tool to use on PiCloud. While we love Python, we’re excited that it is no longer the “sole language of the PiCloud Platform.” Coupled with our function publishing feature, we envision Python serving as the glue language through which users can orchestrate their computation on our otherwise language-agnotistic platform.



Thanks to the select group of users who beta tested this feature for us, and to Ken Park from PiCloud who envisioned and shepherded this project to launch!

Introducing Function Publishing via REST

Wednesday, September 14th, 2011

We’ve been devoting significant time to making PiCloud a useful utility outside of the Python ecosystem. The first feature we have to showcase this is the ability to publish a Python function to a URL. There are a few reasons you might want to do this:

  1. To call Python functions from a programming language other than Python. For example, if you’re integrating the PiCloud platform into a Java codebase, or even into a smartphone application (Android or iPhone).
  2. To use PiCloud from Google AppEngine, since our cloud client library is not supported on GAE.
  3. Because you’re tired of setting up web application projects when what you really need is a scalable RPC system.

In this post, we’ll give you your first taste of publishing functions on the web.

Define your Function

Just like when you offload regular computation to PiCloud, feel free to do anything in your function including importing custom libraries and making external connections.

def add(x, y):
    """This function adds!"""
    return x+y

Publish It

>>> import cloud
>>> cloud.setkey(key, secret_key)
>>> cloud.rest.publish(add, 'addition')
'https://api.picloud.com/r/2/addition'

The first argument, add, is your function. The second argument, addition, is a label so you can reference the function later; it’s also present in the returned URL for clarity. For a list of all other arguments, refer to the cloud.rest module documentation.

Let’s get information about the function we just published by making a GET request on the returned url. We recommend curl to do this from a shell. We authenticate requests using basic authentication. In curl, use “-u” as shown below to specify your key as your username, and secret key as your password. Note that we automatically extract the function’s doc string as the description.

$ curl -k -u 'key:secret_key' https://api.picloud.com/r/2/addition/
{"output_encoding": "json", "version": "0.1", "description": "This function adds!", "signature": "addition(x, y)", "uri": "https://api.picloud.com/r/2/addition", "label": "addition"}

You can also see your published functions from your account control panel.

Call the Published Function

Now let’s call the function by using a POST request to the same URL. To specify arguments to the function add, you simply pass them in as JSON encoded POST values. In this case, you would specify the POST values x and y.

$ curl -k -u 'key:secret_key' https://api.picloud.com/r/2/addition/ -d x=1 -d y=1
{'jid': 809730}

Get the Result

There are two ways we can grab the result of this job. The standard way is through your Python console:

>>> import cloud
>>> cloud.setkey(key, secret_key)
>>> cloud.result(809730)
2

The language-agnostic way to do this using our REST API is to query the following URL: https://api.picloud.com/job/{job_id}/result/.

$ curl -k -u 'key:secret_key' https://api.picloud.com/job/809730/result/
{"result": 2}

The difference between these two methods is that cloud.result will block until the result is ready; our REST API will return a “job not done” error, so you’ll have to keep querying until it’s ready.

For a full specification of our API, please see our REST API documentation.

Taking Advantage of JSON Arguments

Since arguments are specified as JSON, you can easily pass in strings, lists, and dictionaries into your published functions. For example, we can concatenate two strings using our addition function:

$ curl -k -u 'key:secret_key' https://api.picloud.com/r/2/addition -d "x=\"Hello, \"" -d "y=\"World\""
{'jid': 809731}
$ curl -k -u 'key:secret_key' https://api.picloud.com/job/809731/result/
{"result": "Hello, World"}

We can also merge two lists using our addition function:

$ curl -k -u 'key:secret_key' https://api.picloud.com/r/2/addition -d "x=[1,2,3]" -d "y=[4,5,6]"
{'jid': 809732}
$ curl -k -u 'key:secret_key' https://api.picloud.com/job/809732/result/
{"result": [1, 2, 3, 4, 5, 6]}

These work, of course, because in Python the addition operator can be applied to strings and lists, not just numbers.

Handling Raw Data

JSON does not natively support binary data. While you can encode the data to base64, and decode it in your function, we offer a more straightforward and efficient method. Binary data can be passed into a published function by using multipart/form-data as a file upload (MIME Content-Disposition sub-header has a filename parameter).

Example

To showcase raw data handling, we’re going to publish a function to create thumbnails. We’ll use this picture of Albert Einstein.



Here’s the function we’ll use to create a thumbnail of an image. We use StringIO so that we can open and save the image in a memory buffer, rather than to a file.

from PIL import Image
from cStringIO import StringIO

def thumbnail(raw_img_data, width=50, height=50, output_format='JPEG'):
    im = Image.open(StringIO(raw_img_data))
    im.thumbnail((width, height))
    out_data = StringIO()
    im.save(out_data, output_format)
    return out_data.getvalue()

import cloud
# be sure to set the output encoding to raw
cloud.rest.publish(thumbnail, 'thumbnail', out_encoding='raw')

Call the function. Use -F in conjunction with the @ symbol to POST an image file as a file upload, which will be treated as raw data by PiCloud. We can adjust the width and height by passing in POST values, or if we omit them, the default value of 50 will be used.

$ curl -k -u 'key:secret_key' -F width=60 -F height=76 -F "raw_img_data=@albert_einstein.jpg" https://api.picloud.com/r/2/thumbnail/
{'jid': 809737}

The content of the result is the binary data representing the thumbnail image. Unlike JSON encoded results, there is no enclosing dictionary. Thus, all you have to do to see the image is pipe the result of the job into a file.

$ curl -k -u 'key:secret_key' https://api.picloud.com/job/809737/result/ > albert_einstein.thumb.jpg

Open the thumbnail in your favorite image program!

Albert Einstein Thumbnail

Conclusion: Take a rest, and then give it a spin!

We’re particularly excited by function publishing because it bridges PiCloud with the world outside of Python, and in doing so, brings all the computing benefits of our standard service. You can publish functions without any care for the amount of hardware running underneath. As your functions get called more frequently, we automatically scale our servers to meet demand. You can also reserve real-time cores if they want to guarantee a certain number of cores at all times. Lastly, you can be confident that your computation is being run on a system built with performance, robustness, and redundancy at its core.

If this technology captivates you, follow us on Twitter, or go above and beyond and join our team!

Introducing High-Memory Cores

Friday, September 2nd, 2011

When we first launched PiCloud, we provided two options for processing power: standard and high cpu. Standard provided 1 compute unit with 300MB of RAM, while high cpu provided 2.5 compute units and 800MB of RAM. But what about tasks that require GBs of memory and an even faster CPU? Enter core types.

You can now select the type of core you want your job to be run on.

  • c1: Replaces our standard option as default.
  • c2: Replaces the high cpu option.
  • m1: Our new high-memory core with 3.25 compute units and 8GB of memory.

For more details, see our updated pricing page.

How to Use It

We’re committed to maintaining an extraordinarily simple API for you. With our old library you would do the following:

cloud.call(func, _high_cpu=True)

With our new library (available here), you do this instead:

cloud.call(func, _type='c2')

Additional Information

In conjunction with these changes, real-time cores are now reserved by type. You can see the new interface in your control panel.

We’ll be releasing more cores as we hear demand for them. Our next core, s1, which is beta, is a solution for users on our platform who scrape the web. When running jobs in parallel on s1 cores, each job will have its own IP, minimizing throttling effects. However, consecutive jobs may share the same IP address.

Crons in the Cloud!

Tuesday, August 10th, 2010

We’re pleased to announce the addition of crons to the PiCloud platform. A cron is a simple way to schedule a function to be run periodically. Time and dates are specified using the standard crontab format. Crons can be triggered as often as every minute, and there’s no limit to the number of functions you can register as crons. You will be billed for the amount of compute time consumed by the function triggered by your cron–just like if you were running a function on PiCloud. We have also added a tab to the web interface for managing crons.

Here’s how to register a cron:

# registers function ping_webserver with the label heartbeat
# this function could be checking whether a webservice is active
cloud.cron.register(ping_webserver, 'heartbeat', '* * * * *') # runs every minute

When you no longer need a cron, you can deregister it via our web interface or using the following:

# deregister function ping_webserver with the label heartbeat
cloud.cron.deregister('ping_webserver')

Here’s a function that runs once a day at noon.

# 19 is the 19th GMT hour, which translates to 12pm PDT (GMT -7)
cloud.cron.register(sudo_make_me_a_sandwich, 'lunch', '0 19 * * *')

That’s all it takes! See our documentation for the full cron specification.

Store your files with PiCloud!

Monday, May 3rd, 2010

One of the most frequent questions we get is “where do I put my data?” To this, we’ve always had the same answer: Anywhere you want. Unlike other platforms, we’ve never believed in locking in your data into our proprietary data store. Our users keep data in all sorts of different places (AWS, Rackspace, or on their local machines), and in all different forms (flat files, relational databases, and key stores). We don’t plan to change this, because we don’t believe we can provide the single best data storage solution to satisfy everyone’s needs. We’re big fans of using the correct tool for every problem.

So what is our new file storage solution? It’s a simple and easy way for our users to get their data on the cloud to be crunched by PiCloud. We don’t pretend that it’s the holy grail of data storage solutions, but it’s a solid answer for users who don’t already have a data store setup. If you don’t need it, you won’t be affected.

The module is included in our cloud library as cloud.files. Here’s the most basic way to use it:

cloud.files.put('data.txt') # stores data.txt on the cloud
cloud.files.get('data.txt') # saves data.txt onto your machine
cloud.files.getf('data.txt') # gets a stream of the contents of data.txt

See our documentation for the full specification and examples.

New Users, New Features, and PyCon!

Friday, February 19th, 2010

Wait no longer! We’ve opened up PiCloud to another batch of users today, and from now onward, we promise to accelerate the roll out of PiCloud to new users. For users, both new and old, I wanted to highlight some of the many changes we’ve made in the past month that haven’t necessarily been the most visible.

Variable Compute Units
We had customers asking us for more powerful CPUs, and so we’ve delivered. With a simple keyword argument change, you can now switch between using 1 Compute Unit (1-1.2 ghz Xeon) to 2.5 Compute Units (2.5-3ghz Xeon). Check it out (code):

cloud.call(cpu_intensive_func, _high_cpu=True) # uses 2.5 compute units

Profiler Option
While we’ve gotten great feedback for profiling functions that run on PiCloud, we’ve also received requests to have the ability to turn off the profiler. After all, the deterministic profiler does have overhead that scales with the number of function calls in a script. To turn off the profiler, it’s simply another keyword argument _profile.

cloud.call(foo, _profiler=False)

Drop in for multiprocessing
If you’re already using Python multiprocessing, but want to run your computation across our cluster, now you can. Check out our docs to see how.

cloud library is now open source
We told users before that the client library was not open sourced, because frankly, we didn’t believe it was stable enough to deserve the attention of developers in the community. We are now at that point, so the client library has been released with an LGPL license.

Inclusion in the Enthought Python Distribution (EPD)
EPD is ideal for scientists and engineers looking for an easy, standardized way to deploy a powerful set of scientific tools on their own computer or across a whole organization. As of the latest EPD release, 6.0, the cloud library is now included in the distribution. Welcome EPD customers!

Bug fixes
Having hundreds of users using our platform is the easiest way to expose all the nitty-gritty bugs and race conditions that are lurking in our system. We would like to thank our ever-growing community for the many bug reports and critical fixes we have had over the past month.

Lastly, our CTO, Aaron Staley, and I will be at PyCon this weekend. Hope to see you all there!