Quickstart examples for the Azure Storage extension in Azure Database for PostgreSQL

Following is a list of examples to help you learn how to use the Azure Storage extension.

Create an Azure Storage account and populate it with data

Create an Azure Storage account. To create an Azure Storage account, if you don't have one already, customize the values of <resource_group>, <___location>, <account_name>, and <container_name>, and run the following Azure CLI command:

random_suffix=$(tr -dc 'a-z0-9' </dev/urandom | head -c8)
resource_group="resource-group-$random_suffix"
___location="eastus2"
storage_account="storageaccount$random_suffix"
blob_container="container-$random_suffix"
az group create --name $resource_group --___location $___location
az storage account create --resource-group $resource_group --name $storage_account --___location $___location --sku Standard_LRS --kind BlobStorage --public-network-access enabled --access-tier hot
echo "Take note of the storage account name, which you'll have to replace in subsequent examples, whenever you find a reference to <account_name>:"
echo $storage_account
echo "Take note of the container name, which you'll have to replace in subsequent examples, whenever you find a reference to <container_name>:"
echo $blob_container

Create a blob container. To create the blob container, run the following Azure CLI:

az storage container create --account-name $storage_account --name $blob_container -o tsv

Fetch one of the two access keys assigned to the storage account. Make sure you copy the value of your access_key as you need to pass it as an argument to azure_storage.account_add in a subsequent step. To fetch the first of the two access keys, run the following Azure CLI command:
```
access_key=$(az storage account keys list --resource-group $resource_group --account-name $storage_account --query [0].value)
echo "Following is the value of your access key:"
echo $access_key
```

Download the file with the data set that is used during the examples, and upload it to your blob container. To download the file with the data set, run the following Azure CLI command:

mkdir --parents azure_storage_examples
cd azure_storage_examples
curl -L -O https://github.com/Azure-Samples/azure-postgresql-storage-extension/raw/main/storage_extension_sample.parquet
az storage blob upload-batch --account-name $storage_account --destination $blob_container --source . --pattern "storage_extension_sample.parquet" --account-key $access_key --overwrite --output none --only-show-errors
curl -L -O https://github.com/Azure-Samples/azure-postgresql-storage-extension/raw/main/parquet_without_extension
az storage blob upload-batch --account-name $storage_account --destination $blob_container --source . --pattern "parquet_without_extension" --account-key $access_key --overwrite --output none --only-show-errors
curl -L -O https://github.com/Azure-Samples/azure-postgresql-storage-extension/raw/main/storage_extension_sample.csv
az storage blob upload-batch --account-name $storage_account --destination $blob_container --source . --pattern "storage_extension_sample.csv" --account-key $access_key --overwrite --output none --only-show-errors
curl -L -O https://github.com/Azure-Samples/azure-postgresql-storage-extension/raw/main/csv_without_extension
az storage blob upload-batch --account-name $storage_account --destination $blob_container --source . --pattern "csv_without_extension" --account-key $access_key --overwrite --output none --only-show-errors

Note

You can list containers or the blobs stored in them for a specific storage account, but only if your PostgreSQL user or role is granted permission on the reference to that storage account by using azure_storage.account_user_add. Members of the azure_storage_admin role are granted this privilege over all Azure Storage accounts that have been added using azure_storage.account_add. By default, only members of azure_pg_admin are granted the azure_storage_admin role.

Create a table in which data is loaded

Let's create the table into which we import the contents of the files that we uploaded to the storage account. To do so, connect to your instance of Azure Database for PostgreSQL flexible server using PostgreSQL for Visual Studio Code (Preview), psql, PgAdmin, or the client of your preference, and execute the following statement:

CREATE TABLE IF NOT EXISTS sample_data (
    id BIGINT PRIMARY KEY,
    sample_text TEXT,
    sample_integer INTEGER,
    sample_timestamp TIMESTAMP
);

Prepare the extension for usage

Before proceeding, make sure that you:

Add access key of storage account

This example illustrates how to add a reference to a storage account, together with the access key of that storage account which is required to access its content via the functionality provided by the azure_storage extension in your instance of Azure Database for PostgreSQL flexible server.

<account_name> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

Similarly, <access_key> must be set to the value you fetched from your storage account.

SELECT azure_storage.account_add('<account_name>', '<access_key>');

Tip

If you want to retrieve the storage account name and one of its access keys from the Azure portal, search for your storage account, in the resource menu select Access keys, copy the Storage account name and copy the Key from key1 section (you have to select Show next to the key first).

Grant access to a user or role on the Azure Blob storage reference

This example illustrates how to grant access to a user or role named <regular_user>, so that such PostgreSQL user can use the azure_storage extension to access the blobs stored in containers hosted by the referred Azure storage account.

<regular_user> must be set to the name of an existing user or role.

SELECT * FROM azure_storage.account_user_add('<account_name>', '<regular_user>');

List all blobs in a container

This example illustrates how to list all existing blobs inside container <container_name> of storage account <account_name>.

<container_name> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT * FROM azure_storage.blob_list('<account_name>','<container_name>');

List blobs with a specific name prefix

This example illustrates how to list all existing blobs inside container <container_name> of storage account <account_name>, whose blob name begins with <blob_name_prefix>.

<blob_name_prefix> should be set to whatever prefix you want the blobs enumerated to include in their names. If you want to return all blobs, you can set this parameter to an empty string or don't even specify a value for this parameter, in which case the value defaults to an empty string.

SELECT * FROM azure_storage.blob_list('<account_name>','<container_name>','<blob_name_prefix>');

Alternatively, you can use the following syntax:

SELECT * FROM azure_storage.blob_list('<account_name>','<container_name>') WHERE path LIKE '<blob_name_prefix>%';

Import data using a COPY FROM statement

The following example shows the import of data from a blob called storage_extension_sample.parquet that resides in the blob container <container_name> in the Azure Storage account <account_name>, via the COPY command:

Create a table that matches the schema of the source file:

CREATE TABLE IF NOT EXISTS sample_data (
    id BIGINT PRIMARY KEY,
    sample_text TEXT,
    sample_integer INTEGER,
    sample_timestamp TIMESTAMP
);

Use a COPY statement to copy data into the target table. Format is inferred as Parquet from the extension of the file.

TRUNCATE TABLE sample_data;
COPY sample_data
FROM 'https://<account_name>.blob.core.windows.net/<container_name>/storage_extension_sample.parquet';

Use a COPY statement to copy data into the target table. Because encoding format cannot be inferred from file extension, it's explicitly specified via the FORMAT option.

TRUNCATE TABLE sample_data;
COPY sample_data
FROM 'https://<account_name>.blob.core.windows.net/<container_name>/parquet_without_extension'
WITH (FORMAT 'parquet');

Use a COPY statement to copy data into the target table. Encoding format can be inferred from file extension. However, presence of column headers in first row needs to be explicitly configured via HEADERS option.
```
TRUNCATE TABLE sample_data;
COPY sample_data
FROM 'https://<account_name>.blob.core.windows.net/<container_name>/storage_extension_sample.csv'
WITH (HEADERS);
```
Execute the following SELECT statement to confirm that the data is loaded into the table.
```
SELECT *
FROM sample_data
LIMIT 100;
```

Export data using a COPY TO statement

The following examples show the export of data from a table called sample_data, to multiple blobs with different names, and characteristics like their encoding format, all of which reside in the blob container <container_name> in the Azure Storage account <account_name>, via the COPY command:

Create a table that matches the schema of the source file:

CREATE TABLE IF NOT EXISTS sample_data (
    id BIGINT PRIMARY KEY,
    sample_text TEXT,
    sample_integer INTEGER,
    sample_timestamp TIMESTAMP
);

Load data into the table. Either run INSERT statements to populate it with several synthetic rows, or use the Import data using a COPY FROM statement example to populate it with the contents of the sample data set.

Use a COPY statement to copy data out of the target table. Specify that the encoding format must be parquet.

COPY sample_data
TO 'https://<account_name>.blob.core.windows.net/<container_name>/storage_extension_sample_exported.parquet'
WITH (FORMAT 'parquet');

Use a COPY statement to copy data out of the target table. Specify that the encoding format must be CSV and the first row of the resulting file contains column headers.
```
COPY sample_data
TO 'https://<account_name>.blob.core.windows.net/<container_name>/storage_extension_sample_exported.csv'
WITH (FORMAT 'csv', HEADERS);
```

Execute the following SELECT statement to confirm that the blob exists in the storage account.

SELECT * FROM azure_storage.blob_list('<account_name>','<container_name>') WHERE path LIKE 'storage_extension_sample_exported%';

Read content from a blob

The blob_get function retrieves the contents of one specific blob, in the referred container <container_name> of the <account_name> storage. In order for blob_get to know how to parse the data you can pass a value in the form NULL::table_name, where table_name refers to a table whose schema matches that of the blob being read. In the example, it refers to the sample_data table we created at the very beginning.

<blob_name> should be set to the full path of the blob whose contents you want to read.

In this case, the decoder that must be used to parse the blob is inferred from the .parquet file extension.

SELECT * FROM azure_storage.blob_get
        ('<account_name>'
        ,'<container_name>'
        ,'storage_extension_sample.parquet'
        , NULL::sample_data)
LIMIT 5;

Alternatively, you can explicitly define the schema of the result using the AS clause after the blob_get function.

SELECT * FROM azure_storage.blob_get('<account_name>','<container_name>','storage_extension_sample.parquet')
AS res (
        id BIGINT PRIMARY KEY,
        sample_text TEXT,
        sample_integer INTEGER,
        sample_timestamp TIMESTAMP)
LIMIT 5;

Read, filter, and modify content read from a blob

This example illustrates the possibility to filter and modify the content imported from the blob, before loading that into a SQL table.

SELECT concat('P-',id::text) FROM azure_storage.blob_get
        ('<account_name>'
        ,'<container_name>'
        ,'storage_extension_sample.parquet'
        , NULL::sample_data)
WHERE sample_integer=780
LIMIT 5;

Read content from file with custom options (headers, column delimiters, escape characters)

This example illustrates how you can use custom separators and escape characters, by passing the result of options_copy to the options argument.

SELECT * FROM azure_storage.blob_get
        ('<account_name>'
        ,'<container_name>'
        ,'storage_extension_sample.csv'
        ,NULL::sample_data
        ,options := azure_storage.options_csv_get(header := 'true')
        );

Use the decoder option

This example illustrates the use of the decoder option. When the decoder option is not present, it's inferred from the extension of the file. But when the file name doesn't have an extension, or when that file name extension doesn't correspond to the one associated to the decoder that must be used to properly parse the contents of the file, you can explicitly pass the decoder argument.

SELECT * FROM azure_storage.blob_get
        ('<account_name>'
        ,'<container_name>'
        ,'parquet_without_extension'
        , NULL::sample_data
        , decoder := 'parquet')
LIMIT 5;

Compute aggregations over the content of a blob

This example illustrates how you can do aggregation operations over information that is stored in a blob container, without the need to import the contents of the blob into PostgreSQL tables.

SELECT sample_integer, COUNT(*) FROM azure_storage.blob_get
        ('<account_name>'
        ,'<container_name>'
        ,'storage_extension_sample.parquet'
        , NULL::sample_data)
GROUP BY sample_integer
ORDER BY 2 DESC
LIMIT 5;

Write content to a blob

The blob_put function composes the contents of one specific blob (sample_data_copy.parquet in this case), and uploads it to the referred container <container_name> of the <account_name> storage. This example uses blob_get to construct a set of five rows, which are then passed to the blob_put aggregate function which uploads them as a blob named sample_data_copy.parquet.

Encoding format is inferred as parquet, based on file extension .parquet.

SELECT azure_storage.blob_put
        ('<account_name>'
        ,'<container_name>'
        ,'sample_data_copy.parquet'
        , top_5_sample_data)
FROM (SELECT * FROM sample_data LIMIT 5) AS top_5_sample_data;

Encoding format is inferred as CSV, based on file extension .csv.

SELECT azure_storage.blob_put
        ('<account_name>'
        ,'<container_name>'
        ,'sample_data_copy.csv'
        , top_5_sample_data)
FROM (SELECT * FROM sample_data LIMIT 5) AS top_5_sample_data;

Encoding format cannot be inferred because the file doesn't have a file extension, so it's explicitly configured as parquet. Also, compression algorithm is set to zstd.

SELECT azure_storage.blob_put
        ('<account_name>'
        ,'<container_name>'
        ,'sample_parquet_data_copy_without_extension_with_zstd_compression'
        , top_5_sample_data
        ,'parquet'
        ,'zstd')
FROM (SELECT * FROM sample_data LIMIT 5) AS top_5_sample_data;

List all the references to Azure storage accounts

This example illustrates how to find out which Azure storage accounts the azure_storage extension can reference in this database, together with the type of authentication that is used to access each storage account, and which users or roles are granted permission, via the azure_storage.account_user_add function, to access that Azure storage account through the functionality provided by the extension.

SELECT * FROM azure_storage.account_list();

Revoke access from a user or role on the Azure Blob storage reference

This example illustrates how to revoke access from a user or role named <regular_user>, so that such PostgreSQL user can't use the azure_storage extension to access the blobs stored in containers hosted by the referred Azure storage account.