Partially two of the SQL Scripting announcement weblog collection, we’ll look at the executive process we mentioned in half one—easy methods to apply a case-insensitive rule to each STRING column in a desk. We’ll stroll by that instance step-by-step, clarify the options used, and broaden it past a single desk to cowl a complete schema.
You can even comply with alongside on this pocket book.
Altering the collation of all textual content fields in all tables in a schema
Databricks helps a variety of language-aware, case-insensitive, and accent-insensitive collations. It is simple to make use of this function for brand new tables and columns. However what when you’ve got an present system utilizing higher() or decrease() in predicates all over the place and wish to decide up the efficiency enhancements related to a local case-insensitive collation whereas simplifying your queries? That can require some programming; now you are able to do all of it in SQL.
Let’s use the next take a look at schema:
The order relies on the ASCII codepoints, the place all uppercase letters precede all lowercase letters. Are you able to repair this with out including higher() or decrease()?
Dynamic SQL statements and setting variables
Our first step is to inform the desk to vary its default collation for newly added columns. You possibly can feed your native variables with parameter markers, which the pocket book will mechanically detect and add widgets. You can even use EXECUTE IMMEDIATE to run a dynamically composed ALTER TABLE assertion.
Each SQL script consists of a BEGIN .. END (compound) assertion. Native variables are outlined first inside a compound assertion, adopted by the logic.
That is all only a set of linear statements. To this point, you would write all this with SQL Session variables with out the compound assertion. You additionally haven’t achieved a lot. In any case, you needed to vary the collation for present columns. To do that, it’s essential to:
- Uncover all present string columns within the desk
- Change the collation for every column
In brief, it’s essential to loop over the INFORMATION_SCHEMA.COLUMNS desk.
Loops
SQL Scripting provides 4 methods of looping and methods to regulate loop iterations.
- LOOP … END LOOP;
This can be a “eternally” loop.
This loop will proceed till an exception or an specific ITERATE or LEAVE command breaks out of the loop.
We’ll focus on exception dealing with later and level to the ITERATE and LEAVE documentation explaining easy methods to management loops. - WHILE predicate DO … END WHILE;
This loop might be entered and re-entered so long as the predicate expression evaluates to true or the loop is damaged out by an exception, ITERATE or LEAVE. - REPEAT … UNTIL predicate END REPEAT;
Not like WHILE, this loop is entered not less than as soon as and re-executes till the predicate expression evaluates to false or the loop is damaged by an exception, LEAVE, or ITERATE command. - FOR question DO …. END FOR;
This loop executes as soon as per row the question returns except it’s left early with an exception, LEAVE, or ITERATE assertion.
Now, apply the FOR loop to our collation script. The question will get the column names of all string columns of the desk. The loop physique alters every column collation in flip:
Let’s confirm that the desk has been correctly up to date:
To this point, so good. Our code is functionally full, however it’s best to inform Delta to research the columns you modified to profit from file skipping. You do not wish to do that per column. However collect all of them collectively and do the work provided that there was, in reality, a string column for which the collation was altered. Choices, selections ….
Conditional logic
SQL Scripting provides 3 ways to carry out conditional execution of SQL statements.
- If-then-else logic. The syntax for that is simple:
IF predicate THEN … ELSEIF predicate THEN … ELSE …. END IF;
Naturally, you may have any variety of elective ELSEIF blocks, and the ultimate ELSE can also be elective. - A easy CASE assertion
This assertion is the SQL Scripting model of the easy case expression.
CASE expression WHEN choice THEN … ELSE … END CASE;
A single execution of an expression is in comparison with a number of choices, and the primary match decides which set of SQL statements needs to be executed. If none match, the elective ELSE block might be executed. - A searched CASE assertion
This assertion is the SQL Scripting model of the searched case expression.
CASE WHEN predicate THEN …. ELSE … END CASE;
The THEN block is executed for the primary of any predicates that consider to true. If none match, the elective ELSE block is executed.
For our collation script, a easy IF THEN END IF will suffice. You additionally want to gather the set of columns to use ANALYZE to and a few higher-order perform magic to supply the column checklist:
Nesting
What you may have written up to now works for particular person tables. What if you wish to function on all tables in a schema? SQL Scripting is absolutely composable. You possibly can nest compound statements, conditional statements, and loops inside different SQL scripting statements.
So what you’ll do right here is twofold:
- Add an outer FOR loop to seek out all tables inside a schema utilizing INFORMATION_SCHEMA.TABLES. As a part of this, it’s essential to substitute the references to the desk title variable with references to the outcomes of the FOR loop question.
- Add a nested compound to maneuver the column checklist variable down into the outer FOR loop. You can not declare a variable immediately within the FOR loop physique; it doesn’t add a brand new scope. That is primarily a call associated to coding type, however you’ll have a extra severe purpose for a brand new scope..
This error is smart. You might have a number of methods to proceed:
- Filter out unsupported desk varieties, corresponding to views, within the info schema question. The issue is that there are quite a few desk varieties, and new ones are often added.
- Deal with views. That is a terrific thought. Let’s name that your homework task.
- Tolerating the error situation
Exception dealing with
A key functionality of SQL Scripting is the power to intercept and deal with exceptions. Situation handlers are outlined within the declaration part of a compound assertion, they usually apply to any assertion inside that compound, together with nested statements. You possibly can deal with particular error situations by title, particular SQLSTATEs dealing with a number of error situations, or all error situations. Inside the physique of the situation handler, you should utilize the GET DIAGNOSTICS assertion to retrieve details about the exception being dealt with and execute any SQL scripting you deem acceptable, corresponding to recording the error in a log or working an alternate logic to the one which failed. You possibly can then SIGNAL a brand new error situation, RESIGNAL the unique situation, or just exit the compound assertion the place the handler is outlined and proceed with the next assertion.
In our script, you wish to skip any assertion for which the ALTER TABLE DEFAULT COLLATION assertion didn’t apply and log the thing’s title.
Above, you may have developed an administrative script purely in SQL. You can even write ELT scripts and switch them into Jobs. SQL Scripting is a very highly effective device it’s best to exploit.
What to do subsequent
Whether or not you’re an present Databricks consumer or migrating from one other product, SQL Scripting is a functionality it’s best to use. SQL Scripting follows the ANSI normal and is absolutely appropriate with OSS Apache Spark™. SQL Scripting is described intimately in SQL Scripting | Databricks Documentation.
You can even use this pocket book to see for your self.