If you benefit from web2py hope you feel encouraged to pay it forward by contributing back to society in whatever form you choose!

Using Google App Engine blobstore with web2py

File uploads on gae-hosted web2py sites have to be stored in the database since the file system is read-only. Uploaded binary data can be held in a blob datatype but this type is limited to a maximum size of 1MB. In addition, any one http request to GAE has a data transfer limit of 10MB.

Google now support a blobstore API (http://code.google.com/appengine/docs/python/blobstore/overview.htm) to upload, store and serve up larger files. Blobs up to 50Mb each can be stored in the blobstore, and then a key as a text string is provided as a way of accessing this blob. The blobstore API is currently listed by Google as experimental.

File uploads take place on a specific url google generated for you at the time of upload, so uploads can take place outside of the usual GAE 10MB limit.

To subsequently provide the file as a download, the blob key string can be injected as a http reader of a blank page response. GAE then detects this header, hooks into the page output and outputs the file contents (up to 50MB downloads).

This can all be tested in development with dev_appserver.py. For production you need to enable billing for the GAE hosted app, however you get the first 1GB of file storage for free.

Web2py file uploads currently only work with blob datatypes. I've adapted a project I'm working on using web2py 1.75.4 to successfully use the blobstore API for file uploads. When running in a non-gae environment it falls back to standard web2py file upload behaviour.

This is by no means a generic plug-in yet but I will list the steps I took in case anyone else would like to try this:

  • modify db definition of file upload table

to my existing table I added a blob_key string field:

Field('blob_key',
    readable=False,
    writable=False)
)
  • Create blank upload method

Once the blobstore completes an upload, it needs to redirect back to a regular web2py url. I set up a new controller file called gae.py with a blank method called 'upload' for this.

  • Modify existing form upload controller code

I had existing controller code to output a form upload in a page as follows:

media_form=SQLFORM(db.my_uploads,fields=['file'])   
if media_form.accepts(request.vars,session):
    response.flash='media file uploaded'
    ...

I modified this code so when running on gae hosting, the form submit will jump to a generated blobstore url:

media_form=SQLFORM(db.my_uploads,fields=['file'])

#Get blobstore file upload url if on gae 
    upload_url = ""

if request.env.web2py_runtime_gae: 
    from google.appengine.ext import blobstore
    upload_url = blobstore.create_upload_url(URL(r=request,c='gae',f='upload',args=id))

    media_form['_action']=upload_url

    if media_form.accepts(request.vars,session):
    response.flash='media file uploaded'
    ...

blobstore.create_upload_url generates a url for a google-supplied handler to carry out the file upload.

This method takes in as one parameter the web2py url to jump back to after the upload has completed, which I have pointed to my blank upload method.

In my case each file upload record also holds a reference to a related table. For my purposes I pass that reference into the url here as args=id. The upload method will be responsible for actually inserting the file upload record.

  • fix web2py unicode issue (Warning: This is a hack)

The web2py framework was crashing for me when the blobstore upload handler tried to invoke my upload.url. What I found was that request.env.PATH_INFO was coming in as a unicode string which was not usually the case, and this was causing an error later in during gluon/main.py, session.connect.

To fix this I modified my gaehandler.py and added one line to the wsgiapp method, just before the return statement:

def wsgiapp(env, res):
...
env['PATH_INFO'] = str(env['PATH_INFO'])
return ...
  • implement upload url

This is called by the blobstore API once it has processed and stored the raw upload data. Information about the upload file(s) is available in the page request as mime-encoded data. The blobstore API provides individual methods to access this data and turn it back into blobinfo objects, and it also provides a WSGI request handler base class to do the same thing.

I could not figure out how to get the individual method calls working, so I wrapped a call to the request handler instead. Here is my gae.py controller with upload method:

from google.appengine.api import users with 
from google.appengine.ext import blobstore
from google.appengine.ext import webapp
from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp.util import run_wsgi_app

#web2py controller, handle gae upload  
def upload():

    #define WSGI request handler for upload 
    class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
        def post(self):
            upload_files = self.get_uploads('file')
            blob_info = upload_files[0]
            globals()['blob_info'] = blob_info

    #create wsgi application
    application = webapp.WSGIApplication([(request.env.path_info, UploadHandler)],debug=True)
    application(request.wsgi.environ,request.wsgi.start_response)

    blob_info = globals()['blob_info']

    #Create new file upload record 
    db.my_file_uploads.insert(related_table=request.args[0],
                              filename=blob_info.filename,
                              blob_key=blob_info.key())

    session.flash='file uploaded'
    redirect(URL(r=request,c='admin',f='edit',args=request.args))

So UploadHandler is a class derived from the google-supplied blobstore_handlers.BlobstoreUploadHandler. This responds to a web request. The line

application = webapp.WSGIApplication([(request.env.path_info, UploadHandler)],debug=True)

sets up a fake web request with the same incoming url as the current url, and invokes this upload handler method.

Within the handler I retrieve the blobinfo object with details on the file upload. I did not know how to pass this back to the calling method, so as a hack I store this as the global variable blob_info.

Once the WSGI handler completes, back in regular web2py code I save details in my file upload table, then finally redirect to an appropriate url specific to my application.

Just a mention about the database field 'filename' - this is my own text field, separate from the built-in web2py file database type. In a gae installation this field gets set to blob_info.filename as above. For completeness here is how this gets set in the non-gae version:

In non-gae mode because the form url is not modifed, the media_form.accepts line gets the chance to run on the HTTP form post when the file upload is submitted. In this situation I have the following logic to retrieve the filename after successful upload handling:

if media_form.accepts(request.vars,session):

    new_record = (db.my_uploads.id==media_form.vars.id).select()[0]
    new_record.update_record(filename = request.vars.file.filename)
  • Implement download url.

When the file upload table is displayed by sqlform and crud methods, the hyperlink for the download field is generated from the defintion of the table in db.py. These links will fail with GAE blobstore uploaded files.

In my project I put the following hack in for crud generated urls to work with gae-uploaded files:

  • set the hyperlink generated for this field in crud/sqlform methods to a custom download url

In db.py, to my my_uploads table I added a represent property to my file column definition as follows:

Field('file',
        'upload',
    ...
    represent=lambda file : A('download', _href=URL(r=request, c='gae', f='download', args=file))
  • when a blobstore file upload is processed, set the file database column to something

previously the file column would be blank when the website is gae hosted since blobstore is handling everything. Now I set this to a unique id just so it can be used as a lookup.

In gae.py, upload handler I modifed my existing:

#Create new file upload record 
db.my_file_uploads.insert(related_table=request.args[0],
                              filename=blob_info.filename,
                              blob_key=blob_info.key())

to

db.my_uploads.upload_field=False
db.my_uploads.insert(related_table=request.args[0],
                              filename=blob_info.filename,
                              blob_key=blob_info.key(),
              file=str(uuid.uuid4()).replace('-',''))

Running in non-GAE mode, column file will be the standard unique string used by the built-in web2py file upload logic. In gae mode the same field is now a unique id I set with the call to uuid.uuid4()

  • provide a custom download handler

Thanks to the represent property above, I can funnel all download requests to the following controller method:

(in controller gae.py)

def download():

#handle non-gae download    
if not request.env.web2py_runtime_gae:            
        return response.download(request,db)

#handle gae download
    my_uploads=db(db.my_uploads.file==request.args[0]).select()[0]
        blob_info = blobstore.get(my_uploads.blob_key)

    response.headers['X-AppEngine-BlobKey'] = my_uploads.blob_key;
    response.headers['Content-Type'] = blob_info.content_type;
    response.headers['Content-Disposition'] = "attachment; filename=%s" % blob_info.filename
    return response.body.getvalue()

Examining the non-gae portion first: the request args are just forwarded to the regular response.download method to handle

In gae mode the file column of this record will be blank. Instead the blob_key string will be set to a key I can use to look up the blobstore api with. The X-AppEngine-Blobkey response header is set to this key to tell GAE we want the blobstore download helper to kick in for this page response. We also have to manually set the response headers Content-Type and Content-Disposition here so web browsers will show a File Save As dialog and not just dump the binary in the webpage window.

(the blobstore name is available with the 'from google.appengine.ext import blobstore' that already exists at the top of my gae.py file)

I could have implemented this by invoking a BlobstoreDownloadHandler class like I did for BlobstoreUploadHandler, but in this instance the logic was simple enough to avoid this and use native web2py code instead.

Related slices

Comments (2)

  • Login to post



  • 0
    cfhowes 14 years ago
    Thanks for the great tutorial. I made some changes when i implemented it to support a table that has more data than just the uploaded image, and i also figured out how to get the blob_info w/o creating a new dummy request. I also made it handle the deletion requests, and update requests (assuming that the same blob is only referenced in 1 record) my gae.py (there are different names for the tables and fields here, i was hoping to make it generic but have not yet done that)
    
    from google.appengine.ext import blobstore
    from google.appengine.ext import webapp
    from google.appengine.ext.webapp import blobstore_handlers
    from google.appengine.ext.webapp.util import run_wsgi_app
    import uuid
    import logging
    
    #web2py controller, handle gae blobstore upload
    def upload():
        logging.info("in upload")
        logging.info(request.args)
        logging.info(request.vars)
        #@TOD: make the table name and field parameters to pass in
        table_name = 'artwork'
    
        form = SQLFORM(db[table_name])
        if request.args and request.vars.id:
            form = SQLFORM(db[table_name], request.vars.id)
    
        if form.accepts(request.vars, session):
            logging.info("form accepted")
            row = db(db[table_name].id == form.vars.id).select().first()
            if request.vars.preview_image__delete == 'on' or \
                (form.vars.preview_image and row.blob_key):
                #remove from blobstore
                key = row.blob_key
                blobstore.delete(key)
                #remove reference here
                row.update_record(blob_key=None, preview_image=None)
            if form.vars.preview_image:
                logging.info("new image")
                #@TODO: delete old image if replacing.
                blob_info = blobstore.parse_blob_info(form.vars.preview_image)
                row.update_record(preview_image = \
                    table_name+".preview_image."+str(uuid.uuid4()).replace('-','')+".jpg",
                    blob_key = blob_info.key())
            crud.archive(form)
        else:
            logging.info("form not accepted")
            logging.info(form.errors)
            session.flash=BEAUTIFY(form.errors)
            #there was an error, let's delete the newly uploaded image
            if request.vars.preview_image != None:
                logging.info("deleting new image")
                blob_info = blobstore.parse_blob_info(request.vars.preview_image)
                blobstore.delete(blob_info.key())
    
    
        logging.info("all done")
        #Raise the HTTP exception so that the response content stays empty.  calling
        #redirect puts content in the body which fails the blob upload
        raise HTTP(303,
                   Location=URL(r=request,c='form',f='artwork',args=request.vars.id or []))
    
    
    def download():
    
        #handle non-gae download
        if not request.env.web2py_runtime_gae or not request.args[0]:
            return response.download(request,db)
    
        #handle gae download
        my_uploads=db(db.artwork.preview_image==request.args[0]).select()[0]
        if not my_uploads.blob_key:
            return None
        blob_info = blobstore.get(my_uploads.blob_key)
    
        response.headers['X-AppEngine-BlobKey'] = my_uploads.blob_key;
        response.headers['Content-Type'] = blob_info.content_type;
        response.headers['Content-Disposition'] = "attachment; filename=%s" % blob_info.filename
        return response.body.getvalue()
    
    

  • 0
    cfhowes 14 years ago
    So i did some more work, and learned some more things: - i think in the above there are some errors - it seems that you have to parse the field into the blob_info object before you call form.accept. at least when i made some changes to my code today it was broken until i moved that. - i also created a single function that self-posts for the image work, i've included that below. I learned that when i wanted to customize the form with a sub-set of fields my function above was not quite generic enough. here is my custom form function: (lightly tested, so there might be some bugs)
    @auth.requires_login()
    def upload_art():
      """
      This is where an artist uploads a work of art.
      """
      form = SQLFORM(db.artwork,
                     fields=['title',
                             'type',
                             'completed_date',
                             'image'])
    
      if request.env.web2py_runtime_gae:
        from google.appengine.ext import blobstore
        import uuid
        #get the blob_info.  NOTE this MUST be done before any other operations on
        # the request vars.  otherwise something modifies them (perhaps the form
        # validators) in a way that makes this not work
        blob_info = None
        if request.vars.image != None:
            blob_info = blobstore.parse_blob_info(request.vars.image)
    
        upload_url = blobstore.create_upload_url(URL(r=request,f='upload_art',
                                                     args=request.args))
    
        form['_action']=upload_url
        if form.accepts(request.vars,session, formname="artworkform"):
            #@TODO: can this blob-key update be a post-validation function?
            #get the record we just inserted/modified
            row = db(db.artwork.id == form.vars.id).select().first()
            if request.vars.image__delete == 'on' or \
                (form.vars.image != None and (row and row.blob_key)):
                #remove from blobstore because of delete or update of image
                key = row.blob_key
                blobstore.delete(key)
                #remove reference in the artwork record
                row.update_record(blob_key=None, image=None)
            if form.vars.image != None:
                #add reference to image in this record
                row.update_record(image = \
                    "artwork.image."+str(uuid.uuid4()).replace('-','')+".jpg",
                    blob_key = blob_info.key())
            crud.archive(form)
            #Raise the HTTP exception so that the response content stays empty.
            #calling redirect puts content in the body which fails the blob upload
            raise HTTP(303,
                       Location= URL(r=request,f='index'))
        elif form.errors:
            #logging.info("form not accepted")
            logging.info(form.errors)
            session.flash=BEAUTIFY(form.errors)
            #there was an error, let's delete the newly uploaded image
            if request.vars.image != None:
                blobstore.delete(blob_info.key())
            #Raise the HTTP exception so that the response content stays empty.
            #calling redirect puts content in the body which fails the blob upload
            raise HTTP(303,
                       Location= URL(r=request,f='upload_art'))
    
      return dict(form=form)
    

Hosting graciously provided by:
Python Anywhere