Connecting to SQL Server in Python

“SQL is an important tool for any data scientist” — is my entry for the understatement of the year.

Image for post
Image for post

As information jockeys, data scientists use SQL to query the data they will need for analysis from established databases. Depending on the demands of your position as a data scientist, you might even be asked to help construct a database from your employer’s data sources, which SQL is also helpful for. Being able to select the information that is pertinent to the current project at the start of the project is a simple efficiency to learn and will save time on the project.

Early on in my journey to learn data science, I wanted to be able to leverage Python with data pulled from a Microsoft SQL Server Database and after learning about the use of sqlite3 in Python, I figured out how to connect directly to streamline my work, pulling data directly into a Jupyter notebook. Continue reading if you would like to learn how to do this!

If you’re not familiar with the sqlite3 Python module, it is a module which provides a SQL interface to SQLite databases. There is plenty of literature out on how to use SQLite in Python, but I found this blog post to be most helpful.

If you aren’t building your own database with SQLite and you want to connect to a database that is already established, you can use the pyodbc module. Pyodbc allows a user to connect with a DataBase Management System (DBMS) database using the ODBC driver. A quick pip install should get you up and running, but you can find full installation docs here. Once you have the pyodbc installed (full documentation for pyodbc can be found here) you will need to install the ODBC driver for the DBMS you want to connect to — a quick Google search should direct you to the appropriate download.

Now I’ll walk through an example of writing a connection to a Microsoft SQL Server DB and extracting a query.

Similar to sqlite3 module, to connect to the database you will need to establish a connection object to represent the database. Pyodbc passes an ODBC connection string to the local driver manager which in turn calls the relevant database driver which in turn calls the database to request the connection.

Image for post
Image for post

For Microsoft SQL Server 2017, there are a number of connection strings to select from depending on whether you are connecting to a db with standard security, connecting to a trusted connection, or if you want to enable Multiple Active Result Sets (MARS). I used the following to connect to a trusted connection:

‘Driver={ODBC Driver 13 for SQL Server}; Server=MyServerAddress; Database= myDataBase; Trusted_Connection=yes;’

Find other connection strings configured for numerous DBMSs here.

Once you have your connection established, there are two methods you can use to extract a query. The first way is to establish a cursor object. A cursor object enables the traversal over the rows of a result set pulled from a database. A cursor will allow you to process individual rows in the set by managing the context of fetch operations. If the .cursor() method is called without specified class and index arguments, it creates a DB-API style cursor which can use numerous operations, like find(), execute(), fetchone(), and fetchmany().

Image for post
Image for post

As you can see in the above code, you can then select a subset of the query to pull — fetchone() would pull just one row, fetchall() selects all rows, and fetchmany(x) pulls the next x remaining rows and returns them as a list of tuples.

The second method is to use pandas.read_sql function, which is a wrapper function for the read_sql_table and read_sql_query calls shown above. This function allows the user to input a written query or database table name and the connection string and feed the result directly into a dataframe.

Image for post
Image for post

And just like that you have your data ready to use in a dataframe!

Some helpful tips to keep in mind:

  • Queries can be written in python exactly as they would be in a DBMS.
  • Convert data types directly in the query to reduce conversion operations.
  • When writing long queries, wrap the query string in triple quote marks — this allows strings to span multiple lines, so you can indent for readability.

Hopefully this helps you streamline your data sourcing!

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store