How to encode all of your videos, quickly and cheaply!

July 21st, 2010 by Ken Elkabany

Update: This post is outdated. Please refer to our most recent documentation example on video encoding.

With the ubiquity of video on the web, it’s important that services be able to encode their videos in a variety of formats to maximize their viewership. Specific formats are necessary for displaying content on certain mediums, most notably, flv for flash videos and mp4 for the iPhone. Video encoding is a time consuming and computationally intensive task, which makes the computing power of the cloud ideal for the job. This post will cover how to use PiCloud to offload encoding to the cloud using our cloud library and ffmpeg, a popular video encoding tool. With just a couple lines of code, you’ll be able to leverage the compute power of hundreds of cores on Amazon Web Services without touching a single server at a fraction of the cost (3%-20%) of encoding.com.

Source Video

You can use any avi file as the “source video.” If you want to follow this post to the letter, you can download what we used: rickroll.avi. Use the “Save file to your PC” link (BEWARE: The “Download Now” graphics are ads).

ffmpeg Basics

ffmpeg provides a command-line interface for manipulating videos. Since it’s not our purpose to teach ffmpeg in this post, here are the two command strings we’ll be using:

1. Converting to flv: ‘ffmpeg -i source_video.avi -y -b 200000 -r 25 -s 320×240 -ab 56 -ar 44100 -f flv output_video.flv’
2. Converting to mp4: ‘ffmpeg -i source_video.avi -y -b 200000 -r 25 -s 320×240 -acodec aac -ab 128kb -vcodec mpeg4 -b 1200kb -mbd 2 -flags +4mv -cmp 2 -subcmp 2 -s 320×180 output_video.mp4′

For more useful commands, check out the 19 ffmpeg commands for all needs.

Example 1: Encoding a video locally

Assuming you have ffmpeg installed, the function below, ffmpeg_exec(), will encode a specified source video on your local machine.

from subprocess import Popen, PIPE

encoding_cmd_strings = {
'flv': 'ffmpeg -i {0} -y -b 200000 -r 25 -s 320x240 -ab 56 -ar 44100 -f flv {1}',
'mp4': 'ffmpeg -i {0} -y -b 200000 -r 25 -s 320x240 -acodec aac -ab 128kb -vcodec mpeg4 -b 1200kb -mbd 2 -flags +4mv -cmp 2 -subcmp 2 -s 320x180 {1}'
}

def ffmpeg_exec(source, target, encoding):
    """Uses a shell call to ffmpeg to convert a video
    to the desired encoding"""

    # Popen calls the ffmpeg process, and collects the standard out/error
    p = Popen(encoding_cmd_strings[encoding].format(source, target),
                  stdout=PIPE,
                  stderr=PIPE,
                  shell=True)
    stdout, stderr = p.communicate(input=None)

    # return these for debugging purposes
    return stdout, stderr

Running the function ffmmpeg_exec('rickroll.avi', 'rickroll.flv', 'flv') produces a flash video of the rickroll.avi source file. Likewise, ffmmpeg_exec('rickroll.avi', 'rickroll.mp4', 'mp4') produces an mpeg4 encoding.

Example 2: Retrieving a file from the cloud, encoding it locally, and then putting it on the cloud.

We’ll define a function, convert_video(), to download the source video, encode it using ffmpeg_exec(), and then put the encoded file on the cloud. For convenience, we’ll use the cloud.files module to get and put your video files, but you could use other storage locations such as Amazon S3 (via boto), a database, or even a website.

If you have downloaded the rick roll video, you can store it on the cloud from the Python console:

>>> import cloud
>>> cloud.files.put('rickroll.avi')

convert_video() uses cloud.files.get() to retrieve the source video that we’ve stored on the cloud, encodes it, and then puts the encoded file on the cloud with cloud.files.put().

import os
import cloud

def convert_video(source, encoding):
    """Gets the source file, converts it to the specified encoding,
    and puts it on the cloud"""

    # automatically generate target name, ie. video.avi -> video.flv
    basename, ext = os.path.splitext(source)
    target = '%s.%s' % (basename, encoding)

    # gets the source file from the cloud and saves it to the
    # current directory with the same name
    cloud.files.get(source, source)

    # execute ffmpeg (Example 1)
    ret = ffmpeg_exec(source, target, encoding)

    # store output file on the cloud
    cloud.files.put(target)

    return ret

You can verify that convert_video('rickroll.avi', 'flv') adds ‘rickroll.flv’ to your cloud files collection.

>>> convert_video('rickroll.avi', 'flv')
>>> cloud.files.list()
['rickroll.avi', 'rickroll.flv']

Example 3: Encoding a video with PiCloud

Now that we’ve created the functions to encode a video locally, we want to move the computation to the cloud. We’ll use our cloud library to do this. The most basic function in the library is cloud.call(), which takes a function as its argument, and returns a job id (an integer). cloud.call() inspects the execution state of the Python interpreter and copies everything it needs to execute the given function on PiCloud’s cluster. The only change we’ll need to make is the following: Instead of calling convert_video() directly, we’ll instead pass the function into cloud.call().

# executes convert_video('rickroll.avi', 'flv') on the cloud
# _high_cpu mode dedicates 2.5 compute units to the task (2.5-3.0ghz core)
jid = cloud.call(convert_video, 'rickroll.avi', 'flv', _high_cpu=True)

The function is now running on PiCloud. You can check the jobs panel in the web interface to see its status.


Alternatively, you can use cloud.status(jid) to see when the function is done.

>>> cloud.status(jid)
'processing'
>>> cloud.status(jid)   # after some time has passed
'done'

If you check the result of the function using cloud.result() (blocks until completion), you’ll get this:


FFmpeg version SVN-r22379, Copyright (c) 2000-2010 the FFmpeg developers
  built on Mar  9 2010 12:45:06 with gcc 4.4.1
  libavutil     50.11. 0 / 50.11. 0
  libavcodec    52.58. 0 / 52.58. 0
  libavformat   52.55. 0 / 52.55. 0
  libavdevice   52. 2. 0 / 52. 2. 0
  libswscale     0.10. 0 /  0.10. 0
Input #0, avi, from 'rickroll.avi':
  Duration: 00:03:34.96, start: 0.000000, bitrate: 2108 kb/s
    Stream #0.0: Video: mpeg4, yuv420p, 704x544 [PAR 1:1 DAR 22:17], 25 tbr,
 25 tbn, 25 tbc
    Stream #0.1: Audio: mp3, 48000 Hz, 2 channels, s16, 128 kb/s
Output #0, flv, to 'rickroll.flv':
  Metadata:
    encoder         : Lavf52.55.0
    Stream #0.0: Video: flv, yuv420p, 320x240 [PAR 33:34 DAR 22:17],
q=2-31, 200 kb/s, 1k tbn, 25 tbc
    Stream #0.1: Audio: libmp3lame, 44100 Hz, 2 channels, s16, 0 kb/s
Stream mapping:
  Stream #0.0 -> #0.0
  Stream #0.1 -> #0.1
Press [q] to stop encoding
[mp3 @ 0x1adfe70]incomplete frame   8785kB time=211.24 bitrate= 340.7kbits/s
frame= 5374 fps=177 q=2.0 Lsize=    8877kB time=214.96 bitrate= 338.3kbits/s
video:5305kB audio:3359kB global headers:0kB muxing overhead 2.456632%

Congrats! You’re now officially encoding on the cloud.

Example 4: Leveraging Parallelism to Batch Process a Large Video Collection

While encoding a dozen hours of videos using the above functions may be tractable on a single machine, encoding an entire library composed of thousands of hours is not. This is where the elasticity of the cloud shines. Using PiCloud, you can easily leverage the parallel computing power of hundreds of cores on Amazon. Instead of using cloud.call to run a function once in the cloud, use cloud.map to run the same encoding function on all videos.

To encode all videos in both flv and mp4 locally, we can do the following:

# this list can contain as many source files as you want
source_names = ['rickroll.avi', 'source1.avi', 'source2.avi']
source_args = 2*source_names
encoding_args = ['flv']*len(source_args)+['mp4']*len(source_args)

# expands to: map(convert_video, ['rickroll.avi', 'source1.avi', 'source2.avi', 'rickroll.avi', 'source1.avi', 'source2.avi'], ['flv', 'flv', 'flv', 'mp4', 'mp4', 'mp4']
map(convert_video, source_args, encoding_args)

To move the work to PiCloud, change the map function to the cloud.map function:

jids = cloud.map(convert_video, source_args, encoding_args, _high_cpu=True)

That’s all it takes to offload your encoding to our cluster! We’ll automatically scale up the number of Amazon EC2 instances in our cluster depending on how much workload you give us (we estimate this on the fly). Here’s a graph demonstrating the speed gains from this one-line change:


The local machine is equivalent to a single 2.5Ghz Core i7 Intel processor. If you’re still thinking, “but I need to process videos even faster,” then check out our real time compute units feature.

How much did that cost?

According to my account, encoding 30 3-minute videos, which took about 120 seconds total, cost me $0.073. Each video took about 70 seconds to get, encode, and save, for a total of 30*70=2100 seconds or (2100 seconds)/(3600 seconds/hour)*(2.5 compute units)=1.46 compute hours. At the rate of $0.05/compute unit/hour, and noting that I was using high cpu mode (2.5 compute units), the total cost was 1.46 compute hours * $0.05/compute unit/hour = $0.073.

With encoding.com, the same task would cost $2.97 at their cheapest high-volume tier. This was derived from $1.80/GB * (55 mb/Rick Roll) * (30 Rick Rolls). That makes PiCloud less than 3% the cost of encoding.com! To be fair, if you aren’t storing your videos on Amazon, you’ll have to pay bandwidth costs, which will be (1.65GB Data In)*($0.15/GB) + (1.65GB Data Out)*($0.16/GB) = $0.512. PiCloud’s total cost would be $0.512+$0.073=$0.585, which is still only 20% of the cost of encoding.com. Extra point for PiCloud: We didn’t include the amount you’d have to pay for bandwidth to send and receive videos files with encoding.com. Needless to say, they do have a full video encoding service with a wide range of options and customer support, whereas we’re showing you a building block that could be used to replicate their service. But, this does give you an idea of the premium they are charging for their service.

Summary (TL;DR)

  • ffmpeg is a tool for encoding videos, and is available on PiCloud.
  • PiCloud offers the cloud.files module, a simple file storage service, as an easy way to get and put files on the cloud.¬†Using cloud.files is completely optional–use whatever other data store you want–but it’s there when you need it.
  • Getting on the cloud with PiCloud is easy!
    • Passing convert_video() into cloud.call() is all you need to do to offload your encoding to the cloud.
    • If you want to encode a lot of videos, use cloud.map() instead of map(), and all of it will be pushed to the cloud for processing.
  • We’re inexpensive!

Take it from here, Rick!



Tags: , ,

Categories: How To

You can follow any responses to this entry through the RSS 2.0 feed.

8 Responses to “How to encode all of your videos, quickly and cheaply!”

  1. Jim says:

    What if I would like to convert something other than an avi?

  2. Greg says:

    One note of warning: you’ll be charged full hours for every partial hour you start on Amazon. Might need to take that into consideration for realistic cost estimation, as opposed to the one in the text.
    Very good writeup, otherwise! Thanks!

  3. Ken says:

    @Jim
    We support many of the standard video codecs (mpeg-1, mpeg-2, h.264, xvid, theora, shroedinger, xvid, and more), as well as most of the container formats (quicktime, avi, mpeg-1/2, ogg). We’ll get an official list up soon detailing all the binary executables we have pre-installed (ffmpeg), as well as what features they have (video & audio codecs, and container formats). Also, if there are any additional formats/codecs you’d like to use, feel free to file a support ticket, and we’ll install it.

  4. Ken says:

    @Greg
    Our users interact with PiCloud directly, rather than Amazon EC2, which we run on top of. We’ve decided to charge our users only for the exact number of milliseconds they use on our service–no rounding to the hour. We can do this because our servers are multi-tenant: When user A’s encoding task is finished, we can put user B’s computational task on the same server.

  5. [...] that uses the Twitter API to search for new “picloud” tweets. As with our previous video encoding tutorial, we’ll first demonstrate how to run the retweeter locally, and then show how to move it to [...]

  6. Murali Kumar says:

    If you can give a complete code to easily implement and get encoded then we would be more interested in your service.

  7. Soroush says:

    Most used video codec in video compression and FFmpeg is x264 that is highly multi-threading optimized and support virtually up to 127 cores but picloud.com run at best on _high_cpu which I think is single core; if picloud support running command on multi cores then I would consider it for my video encoding needs seriously.

  8. Ken Elkabany says:

    @Soroush
    Good point. We’ve been working on, and will soon be adding a “_cores” keyword argument to allow users to choose the number of cores their processes can utilize.

Leave a Reply