Python-SQL Interaction (MySQLdb): Azure ML Studio

The following tutorial illustrates how to use Microsoft Azure to create a database and execute SQL commands using Python. Specifically, this tutorial is divided into the following two segments:

  1. How to configure Azure to set up a storage account, SQL database and load the Machine Learning Workspace
  2. Configure a Jupyter Notebook to interact with an Azure database and execute SQL commands through Python

Microsoft Azure is a cornerstone of the cloud computing services available today. It has become a formidable competitor to Amazon Web Services commerically. For instance, 80 percent of the world’s banking firms currently use Azure as their cloud computing platform of choice. In this regard, the capabilities of Azure are widely far-reaching. While on the development side the main uses of Azure are centered around building and deploying applications, the Azure Machine Learning Studio caters very well to users looking for cloud analytics tools.

In this regard, while the capabailities of Azure are very broad (and I intend to cover the same more fully in future tutorials), I wanted to start off by exploring how Python – a key language in data science – can be used to interact with a SQL database using the Azure platform. Given that SQL databases in Azure are placed into an Azure storage account, the way that we command Python to interact with this database will also be more detailed than if we were simply using Python to interact with a local database.

Azure Configuration

The first thing we will do is create a storage account, as this is where the output generated from the Machine Learning Workspace, e.g. tables, files, etc, will be stored.

We then need to create a SQL database which we will then interact with inside the Machine Learning Studio:

azure configuration

Now that we have created our storage account and SQL database, we can now go ahead and open up the Machine Learning Workspace (and enter a Workspace Name upon doing so). Once we have specified a name for the Machine Learning Workspace (I have named mine esquire as below), we then select All Resources, select esquire (or whatever name you have assigned to the workspace), and then select Launch Machine Learning Studio:

ml studio

Once we have opened the workspace, select Notebook → Python 2. This opens a Jupyter Notebook which we will use to configure our Python code.

Python Script

When it comes to using Python to interact with a SQL database, there are a variety of libraries that could be used to do this, including PyMySQL, but I choose to use the pyodbc library here as (from personal experience), it is the one which tends to work best across cloud environments.

You will note that we are importing a variety of libraries below such as numpy (which is the main package for scientific computing and could be useful should the user wish to extract data from the database into Python). However, the most essential libraries below are pyodbc (for allowing the Machine Learning Studio to connect with the SQL database), tables (which allows for creation of tables in Python), as well as the BlobService REST API (which allows access to the storage account to host the generated data).

We start off our Python code by importing the following libraries:

import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
from time import time
import pyodbc
import os
import tables
import time
from azure.storage.blob import BlobService
import urllib2

Then, we need to specify the server and database names, as well as inputting the appropriate User ID and Password and database (DB) driver:

SERVER_NAME   = 'yourname.database.windows.net'
DATABASE_NAME = 'yourdatabase'
USERID        = 'userid'
PASSWORD      = 'password'
DB_DRIVER     = 'SQL Server Native Client 11.0'

Once the appropriate credentials have been entered, we then need to enter the storage account name, as well as the storage account key:

STORAGEACCOUNTNAME = "accountstorage"
STORAGEKEY = "storagekey"

To obtain the storage key (to be inputted into the code above), this can be accessed by going back to the All Resources tab in Azure, selecting the storage account, navigating to Settings → Access Keys, and copying the appropriate access key:

access keys

Once that is done, the driver, server, user id and password for the database is specified:

driver = 'DRIVER={' + DB_DRIVER + '}'
server = 'SERVER=' + SERVER_NAME 
database = 'DATABASE=' + DATABASE_NAME
uid = 'UID=' + USERID 
pwd = 'PWD=' + PASSWORD

Now that we have configured the Jupyter Notebook to interact successfully with our Azure SQL database, we can go ahead and run some commands in Python using the pyodbc.connect and con.cursor commands.

For this particular example, we will use Python to:

  1. Create a new table titled sample_table
  2. Insert new data into our table
  3. Run the LIKE, DELETE, and CASE function
CONNECTION_STRING = ';'.join([driver,server,database,uid,pwd, 'Encrypt=yes;TrustServerCertificate=no'])
print CONNECTION_STRING
con = pyodbc.connect(CONNECTION_STRING)
cur=con.cursor()

 

command1 = "CREATE TABLE sample_table(ID varchar(255), Variable1 decimal(18,5), Variable2 decimal(18,5), Variable3 decimal(18,5), Variable4 decimal(18,5), Variable5 decimal(18,5), Variable6 decimal(18,5), Variable7 decimal(18,5), Variable8 decimal(18,5), Variable9 decimal(18,5), Variable10 decimal(18,5))"
 
command2 = "INSERT INTO sample_table VALUES('SADLA', '13.11', '3.44', '0.78', '18.65', '6.42', '30.78', '433585899', '2026413087', '603175377', '244785459'), ('SLDWE', '-3.54', '2.75', '0.88', '8.48', '13.89', '-7.88', '244254641', '3128993268', '642955083', '390710702'), ('SPQPO', '-6.55', '2.13', '0.14', '25.66', '15.31', '-9.14', '677650559', '2328483810', '456055649', '321279232'), ('STOWG', '-14.06', '2.85', '0.61', '10.94', '8.62', '11.61', '475063755', '3283736400', '795785580', '385985579'), ('WOEPQ', '0.03', '3.13', '1.84', '22.84', '9.23', '21.84', '410089090', '3591560399', '160777388', '218052057'), ('WOQPP', '-2.07', '2.64', '0.59', '19.1', '8.47', '10.59', '626769580', '4357602324', '680939681', '210111931'), ('WOWOL', '18.87', '0.18', '0.21', '24.29', '12.86', '-20.21', '167320788', '2665596994', '707040194', '337396151'), ('WWIOP', '-2.25', '3.31', '0.69', '13.13', '13.7', '-2.69', '382902244', '1469725591', '624286274', '226692854'), ('ZZXAAZ', '0.77', '3.5', '2.9', '12.94', '7.26', '-6.9', '398499039', '3379796323', '432637933', '309340545')"
 
command3 = SELECT * FROM sample_table WHERE ID LIKE 'S%'
 
command4 = DELETE FROM sample_table WHERE ID='SADLA'
 
command5 = SELECT ID, YearlyReturn, CASE WHEN Variable2>= 2.5 THEN 'Variable Above 2.5' WHEN DividendYield >= 2 THEN 'Variable Below 2.5' ELSE 'Unattractive Dividend Yield' END "DividendYield" FROM sample_table;
 
cur.execute(command1)
cur.execute(command2)
cur.execute(command3)
cur.execute(command4)
cur.execute(command5)
con.commit()

Note that we are using the cur.execute command to carry out the commands specified (under the codenames command1, command2, et al.) and then using the con.commit command to commit the updates using the cursor to the database.

Using the above commands, we are concurrently 1) creating a table (sample_table) in our database, while also using related SQL queries through Python to conduct data manipulation.

Using the LIKE command as a case in point, suppose that we wish to screen for IDs in our table only beginning with the letter S. By running the LIKE command (as in command 3 above), we then yield the following table:

sql

The above are just a few of the commands that could be run with a SQL database. Of course, one of the big draws of using Python to interact with a SQL database is the ability to not only make direct edits to a database, but only import data which can then be used to conduct more robust analysis using Python (which is another aspect that I would intend to cover in future tutorials).

In any case, once we have executed the Python code, we can then navigate to our SQL database and choose to export to a BACPAC file if we wish:

sql database

Conclusion

As mentioned, the draw of the Azure Machine Learning Studio lies not only in access to programs such as Python and R, but also through the ability to integrate such platforms (in this case, Python and SQL) across the one cloud computing platform.

Moreover, Azure is highly efficient when it comes to handling large datasets due to the speed of its cloud computing platform (which can be customised depending on your price point). For instance, when it comes to integrating Python with a large SQL database, doing so over Azure is a much better solution than with a standard SQL Server which does not have as much flexibility when it comes to factors such as clustering or database mirroring.

Additionally, SQL Azure is also quite flexible when it comes to tweaking performance levels. For instance, should a database need a higher performance level in the future, the pricing tier can be adjusted accordingly to reflect this. From a data standpoint, Azure SQL and the Machine Learning Studio combine the best of both worlds – the ability to conduct robust analysis with a wide range of machine learning tools on databases that are highly flexible in terms of performance.

In this regard, Azure is definitely my go-to platform when it comes to data analysis using cloud computing, and I intend on following up on even more of the platform’s great features in the near future.