diff --git a/_posts/2016-01-25-azure-storage-sas.md b/_posts/2016-01-25-azure-storage-sas.md new file mode 100644 index 0000000..b44bf1c --- /dev/null +++ b/_posts/2016-01-25-azure-storage-sas.md @@ -0,0 +1,120 @@ +--- +layout: post +title: Shared Access Signatures with Azure Storage +subtitle: How to replace your more or less secure ftp server. +description: Shared Access Signatures with Azure Storage +category: general +author: [Martin Danielsson](https://github.com/DonMartin76) +author_email: martin.danielsson@haufe-lexware.com +--- + +Continuing our API journey, we're currently designing an API for one of our most valuable assets: Our content, such as law texts and commentaries. Let's call this project the "Content Hub". The API will eventually consist of different sub-APIs: content search, retrieval and ingestion ("upload"). This blog post will shed some light on how we will support bulk ingestion (uploading) of documents into our content hub using Azure out of the box technology: [Azure Storage SAS - Shared Access Signatures](https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-shared-access-signature-part-1/). + +### Problem description + +In order to create new content, our API needs a means to upload content into the Content Hub, both single documents and bulk ZIP files, which for example correspond to updated products (blocks of content). Ingesting single documents via an API is less a problem (and not covered in this blog post), but supporting large size ZIP files (up to 2 GB and even larger) is a different story, for various reasons: + +* Large http transfers need to be supported by all layers of the web application stack (chunked transfer), which potentially introduces additional complexity +* Transferring large files is a rather difficult problem we do not want to solve on our own (again) +* Most SaaS API gateways (such as [Azure API Management](https://azure.microsoft.com/en-us/services/api-management/)) have traffic limits on their API gateways + +### First approach: Setting up an sftp server + +Our first architectural solution approach was to set up an (s)ftp server for use with the bulk ingestion API. From a high level perspective, this looks like a valid solution: We make use of existing technology which is known to work well. When detailing the solution, we found a series of caveats which made us look a little further: + +* Providing secure access to the sftp server requires dynamic creation of users; this would also - to make the API developer experience smooth - have to be integrated into the API provisioning process (via an API portal) +* Likewise: After a document upload has succeeded, we would want to revoke the API client's rights on the sftp server to avoid misuse/abuse +* Setting up an (s)ftp server requires an additional VM, which introduces operations efforts, e.g. for OS patching and monitoring. +* The sftp server has to provide reasonable storage space for multiple ZIP ingest processes, and this storage would have to be provided up front and paid for subsequently. + +### Second approach: Enter Azure Storage + +Knowing we will most probably host our content hub on the [Microsoft Azure](https://azure.microsoft.com) cloud, looks immediately went to services offered via Azure, and in our case we we took a closer look at [Azure Storage](https://azure.microsoft.com/en-us/services/storage/). The immediate benefits of having such a storage as a SaaS offering are striking: + +* You only pay for the storage you actually use, and the cost is rather low +* Storage capacity is unlimited for most use cases (where you don't actually need multiple TB of storage), and for our use case (document ingest) more than sufficient +* Storage capacity is does not need to be defined up front, but adapts automatically as you upload more files ("blobs") into the storage +* Azure Storage has a large variety of SDKs for use with it, all leveraging a standard REST API (which you can also use fairly simple) + +Remains the question of securing the access to the storage, which was one of the main reasons why an ftp server seemed like a less than optimal idea. + +### Accessing Azure Storage + +Accessing an Azure Storage usually involves passing a storage identifier and an access key (the application secret), which in turn grants full access to the storage. Having an API client have access to these secrets is obviously a security risk, and as such not advisable. Similarly to the ftp server approach, it would in principle be possible to create multiple users/roles which have limited access to the storage, but this is also an additional administrational effort, and/or an extra implementation effort to make it automatic. + +### Azure Storage Shared Access Signatures + +Luckily, Azure already provides a means of anonymous and restricted access to storages using a technique which is know e.g. from JWT tokens: Signed access tokens with a limited time span, a.k.a. "Shared Access Signatures" ("SAS"). These SAS tokens actually match our requirements: + +* Using a SAS, you can limit access to the storage to either a "container" (similar to a folder/directory) or a specific "blob" (similar to a file) +* The SAS only has a limited validity which you can define freely, e.g. from "now" to "now plus 30 minutes"; after the validity of the token has expired, the storage can no longer be accessed +* Using an Azure Storage SDK, creating SAS URLs is extremely simple. Tokens are created without Storage API interaction, simply by *signing* the URL with the application secret key. This in turn can be validated by Azure Storage (which obviously also has the secret key). + +We leverage the SAS feature to explicitly grant **write** access to one single blob (file) on the storage for which we define the file name. The access is granted for 60 minutes (one hour), which is enough to transfer large scale files. Our Contant API exposes an end point which returns an URL containing the SAS token which can immediately be used to do a `PUT` to the storage. + +