Intermediate SQL

  1. Summarizing Data. (Aggreagte functions, CASE)

  2. Date and Math functions

  3. Processing Data with T-SQL (Loops, Derived Tables and CTE)

  4. Windowing and it's common functions

Chapter 1: Summarizing Data

We summarize date using Aggregate functions.

  • Aggregate functions

Some exercises

SELECT Country, AVG(InternetUse) As MeanInternetUse,
MIN(GDP) As SmallestGDP,
MAX(InternetUse) As MaxInternetUSe
FROM EconomicIndicators
GROUP BY Country
HAVING MAX(InternetUse) > 1000
-- Calculate the aggregations by Shape
SELECT Shape,
       AVG(DurationSeconds) AS Average, 
       MIN(DurationSeconds) AS Minimum, 
       MAX(DurationSeconds) AS Maximum
FROM Incidents
GROUP BY Shape
-- Return records where minimum of DurationSeconds is greater than 1
having min(DurationSeconds) > 1
  • Dealing with Missing Data

Blank is not NULL

  • A blank is not the same as a NULL value

  • May show up in columns containing texts

  • An Empty string '' can be used to find blank values

  • The best way is look for a column where the length or LEN > 0

  • Replacing Null values ISNULL(column, 'replacedValue') is there are any nulls it will be replaced with the replacedValue

we can replace with existing column

we have also another method COALESCE(value_1, value_2, value_3, ... value_n) returns the first non-missing value

Binning Date with Case

The case statement allows us to evaluate records like an if statement

for example, we can check if a record contains a value and if it does exist we then replace it with a value from our choice, and if it doesn't we can replace it with another value

The ELSE is optional

we can use CASE to create groups which are often called bins

another example

Chapter 2: Math Functions

  • Count and sum

  • Dates

DATEPART is used to determine what part of the date you want to Calculate, some of the common abbreviation are:

  • DD for Day

  • MM for Month

  • YY for Year

  • HH for Hour

DATEADD(DATEPART, number, date) Add or subtract datetime values - Always returns a date

what is the date is 30 days from Jun 12, 2020? and also 30 days before

DATEDIFF(datepart, startdate, enddate) Obtain the difference between two datetime values - Always returns a number

Practical Example

  • Write a query that returns the number of days between OrderDate and ShipDate.

  • Write a query that returns the approximate delivery date as five days after the ShipDate.

  • Rounding And Truncating numbers

ROUND(number, length, [, function]) -> length: the number of places the number should be rounded.

  • Absolute, square root, square, logarithm

  • ABS()

  • SQUARE()

  • SQRT()

  • LOG()

Chapter 3: Processing Data

  • While Loops While loops here are just as in programming languages exactly

  • Derived Tables Derived Tables are another name for a query acting as a table and are commonly used to do aggregations in T-SQL

we used derived table when we need to break down a complex query into smaller steps it's a great solution, if we want to create intermediate Calculations that need to ne used in a larger query

They are specified in the FROM clause

-- you will create a derived table to return all patient records with the highest BloodPressure at their Age level

  • CTES (Common Table Expression) are another type of derived table, they are little different as they can be used multiple times in a query and are defined like a table. [Defined before you use it]

the columns names need to maatch coluns in the query

Some Exercises

Chapter 4: Window functions

Intro

Window functions provides the ability to create and analyze groups of data. With window functions, we can look at the current row, next row and previous row all at the same time very efficiently

Data here are processed as a group, allowing each group to be evaluated seperately

here the data are seperated onto groups based on SalesYear

we've different categories of window functions

  • Aggreate functions - AVG, SUM, COUNt, ...

  • Ranking Functions - RANK, DENSE_RANK, ROW_NUMBER, ...

  • Analytic functions - LEAD, LAG, FIRST_VALUE, LAST_VALUE, ...

using windowing function, you can create a query to return values by year, without knowing the value of year is!

  • Window function is create by OVER() clause

  • to arrange the results, use ORDER BY.

  • Allows aggregations to be created at the same time as the window.

OVER accepts:

  • PARTITON BY which creates the frame, dividing the query result set into partitions, the window function is applied to each partition seperately, if you do not include the PARTITION BY the frame is entire table

  • ORDER BY which determines the logical orders of rows

  • RANGE or ROWS which limites the rows within the parition by Specifying the start and end points within the partition.

the default for ROWS or RANGE clause is

which means => Do the windowing function from the start of the result set to the current row [cumulatively] for example

would result in

Name
Gender
Salary
AVG
SUM
COUNT

t

M

1000

1000

1000

1

l

f

2000

1500

3000

2

x

f

3000

2000

6000

3

y

M

4000

2500

10000

4

z

f

5000

3000

15000

5

j

M

6000

3500

21000

6

the avg result for 3rd row is the average of (1st row to the 3rd row)

there's also another values for RANGE

which means => Do the windowing function from the start of the result set to the current row [cumulatively] # for the entire result set

for example

would result in

Name
Gender
Salary
AVG
SUM
COUNT

t

M

1000

3500

21000

6

l

f

2000

3500

21000

6

x

f

3000

3500

21000

6

y

M

4000

3500

21000

6

z

f

5000

3500

21000

6

j

M

6000

3500

21000

6

  • Write a T-SQL query that returns the sum of OrderPrice by creating partitions for each TerritoryName

OrderID
TerritoryName
TotalPrice

43706

Australia

1469

43722

Australia

1469

-- calculate the number of orders in each territory.

OrderID
TerritoryName
TotalOrders

43706

Australia

13

43722

Australia

16

Common Window functions

we'll cover 4 commonly used windowing functions

  • FIRST_VALUE

  • LAST_VALUE

  • LEAD

  • LAG`

(FIRST_VALUE, LAST_VALUE) each one of them return only one value for the entire window

The ORDER BY is required in the OVER statement adter the column bein paritioned, because the order values appear in the window and that determines the first or last value

SalesPerson
SalesYear
CurrentQuota
StartQuota
EndQuota
ModDate

Bob

2011

28000.00

28000.00

91000.00

2011-04-16

Bob

2011

7000.00

28000.00

91000.00

2011-07-17

Bob

2011

91000.00

28000.00

91000.00

2011-10-17

Bob

2012

140000.00

140000.00

107000.00

2012-01-15

Bob

2012

70000.00

140000.00

107000.00

2012-04-15

Bob

2012

154000.00

140000.00

107000.00

2012-07-16

Bob

2012

107000.00

140000.00

107000.00

2012-10-16

(LEAD, LAG) Using LEAD we can compare the value of the current row to the value of the next row Using LAG we can compare the value of the current row to the value of the previous row

  • they both provide the ability to query values from the [next, previous] row

  • requires using ORDER BY

For the last row in the window, the LEAD value is null For the first row in the window, the LAG value is null

** note: ORDER BY is required only when using common windowing functions (LEAD, LAG, FIRST_VALUE, LAST_VALUE) but not required when using aggregation functions (SUM, AVG, ...)

Time to practice

  • Suppose you want to figure out the first OrderDate in each territory or the last one. How would you do that?

TerritoryName
OrderDate
FirstOrder

Australia

2015-02-23 09:00:00

2015-02-23 09:00:00

Australia

2015-02-23 11:00:00

2015-02-23 09:00:00

Australia

2015-02-23 12:00:00

2015-02-23 09:00:00

Australia

2015-04-23 02:00:00

2015-02-23 09:00:00

  • Suppose you want to know the next and previous order for each TerritoryName?

TerritoryName
OrderDate
PreviousOrder
NextOrder

Australia

2015-02-23 09:00:00

null

2015-02-23 11:00:00

Australia

2015-02-23 11:00:00

2015-02-23 09:00:00

2015-02-23 12:00:00

Increasing the Complexity of Windowing functions

Calculating running totals, and the problem we'll face when doing so with Windowing function and how to overcome those problems ? First let's take a look at what we did in this previous example

SalesPerson
SalesYear
CurrentQuota
YearlyTotal
ModDate

Bob

2011

28000.00

1551000.00

2011-04-16

Bob

2011

7000.00

1551000.00

2011-07-17

Bob

2011

91000.00

1551000.00

2011-10-17

Mary

2011

140000.00

1551000.00

2012-04-15

Mary

2011

70000.00

1551000.00

2012-07-15

Mary

2011

154000.00

1551000.00

2012-01-15

Mary

2012

107000.00

1859000.00

2012-01-16

you'll notice that we have one total for each window Now if we take the same query and add ORDER BY clause in the OVER clause, you will see different result

SalesPerson
SalesYear
CurrentQuota
YearTotal
ModDate

Bob

2011

28000.00

35000.00

2011-04-16

Bob

2011

7000.00

35000.00

2011-07-17

Mary

2011

367000.00

958000.00

2011-10-17

Mary

2011

367000.00

958000.00

2012-04-15

Bob

2012

70000.00

401000.00

2012-07-15

Bob

2012

154000.00

401000.00

2012-10-16

Because now the values in the CurrentQuota column are totaled by SalesPerson and SalesYear so let's break down what've happened we've partitioned by SalesYear beside Ordering by SalesPerson so for example, year 2011 and Bob => 28000 + 7000 = 35000 year 2011' and 'Mary => 367000 + 367000 =

Creating running totals You usually don't have to use ORDER BY when using aggregations, but if you want to create running totals, you should arrange your rows

Further Read

Last updated