Blog - Page 5 of 6 - Data in the New Age

Apache Spark’s fine-grained processing of RDBMS records part 14: Generic JDBC Write Facilities

Written byadmin
Posted on October 9, 2018October 20, 2018

The previous post introduced the commonalities in implementing the update and insert operations. Now, we translate that insight into some classes presented in this post. First, let’s look at the class sitting at the top…

Written byadmin
Posted on October 6, 2018October 15, 2018

Where Are We? So where are we now? Let’s take a step back. We started with the set of requirements for the write operation from a Spark dataframe to a database table. Mainly, we’re required…

Written byadmin
Posted on October 4, 2018October 15, 2018

Now, after all the runtime and operational issues have been addressed, let’s move on to looking at code quality. We need to make it more robust, flexible, and maintainable. In other words, the code need to…

Written byadmin
Posted on October 2, 2018October 15, 2018

The Problem The previous post suggests a fix to execSQL() to handle null values. We are not done yet. Let’s take a look at the main function updateTable() in approach IV: In the call updateDF.foreachPartition()…

Written byadmin
Posted on September 29, 2018October 15, 2018

The Problem As alluded in Approach IV post, that implementation would not work in real production data. What’s going on? Now’s the time to deal with it. Take a look at this key line of…

Written byadmin
Posted on September 27, 2018October 14, 2018

This is a bonus post. It chronicles an activity that doesn’t generate a concrete outcome, yet reflects the considerations emerging at this point in an engineering project. Motivation Now approach IV is chosen. Approach III…

Written byadmin
Posted on September 25, 2018October 14, 2018

Approach III, although the best so far, still doesn’t quite make it. Hence, we now move on to this next one. Summary Each row in the updateDF dataframe gets extracted individually, and explicitly updates the…

Written byadmin
Posted on September 22, 2018October 14, 2018

The first two approaches don’t cut it. We now move on to the third one, in search of a solution that can reasonably perform the Update operation. Summary We dump the updateDF dataframe to a…

Written byadmin
Posted on September 20, 2018October 16, 2018

We now explore the second approach using the Update operation as the first test. Following is its examination. Summary In a nutshell, this approach writes update data to a temp table, then uses database trigger…

Written byadmin
Posted on September 18, 2018October 11, 2018

We now start exploring different approaches with the Update operation as the first evaluation criterion. Following is the first potential solution. Summary This first approach is the most naive, which relies most heavily on available…

Apache Spark’s fine-grained processing of RDBMS records part 14: Generic JDBC Write Facilities

Apache Spark’s fine-grained processing of RDBMS records part 13: Motivation for JDBC Write Facilities

Apache Spark’s fine-grained processing of RDBMS records part 12: SparkSQL Values Retrieval Facility

Apache Spark’s fine-grained processing of RDBMS records part 11: Yet another issue in execSQL()!

Apache Spark’s fine-grained processing of RDBMS records part 10: Fixing an issue in execSQL()

Apache Spark’s fine-grained processing of RDBMS records part 9: Benchmark between Approaches III and IV

Apache Spark’s fine-grained processing of RDBMS records part 8: Approach IV – Granular Level Record Update

Apache Spark’s fine-grained processing of RDBMS records part 7: Approach III – SQL Batch Update via joining with temp table

Apache Spark’s fine-grained processing of RDBMS records part 6: Approach II – Using Database Trigger

Apache Spark’s fine-grained processing of RDBMS records part 5: Approach I – Forcing DataFrameWriter API to work