How to Trick SQL into Doing All the Work for You

2:45pm - 3:15pm on Friday, October 5 in Madison

Molly Leen

Audience Level:
All
Slides:
https://slides.com/mollyleen/pygotham2018
Watch:
https://youtu.be/1984uglrQSo

Overview

What do you get when you combine the power of Python, the load of healthcare big data, and the benefit of SQL’s COPY method? An efficient methodology to import that data into a database. This talk will demonstrate how and why I designed a file-like object to streamline the import.

Description

Consider the process of importing data into a SQL database with a SQL COPY within your Python app. As data grows, it becomes more and more important that your preprocessing steps are as efficient as possible.

You must validate and reformat before importing the data. If SQL has control over the import step, most would think that at least one additional iteration is needed to validate and format before sending to COPY. But what if you tricked the copy step into doing the validating and formatting for you?

In this talk, I will demonstrate how to create a file-like object that COPY will use to validate and reformat the data as it is read. This will remove inefficiencies when processing large data sets and prevent execution time from growing. While this talk explores sending a file-like object to a copy command, the technique can be extended to any method that reads from an object.

Want to edit this page?