Intermediate SQL
Summarizing Data. (Aggreagte functions, CASE)
Date and Math functions
Processing Data with T-SQL (Loops, Derived Tables and CTE)
Windowing and it's common functions
Chapter 1: Summarizing Data
We summarize date using Aggregate functions.
Aggregate functions
Some exercises
SELECT Country, AVG(InternetUse) As MeanInternetUse,
MIN(GDP) As SmallestGDP,
MAX(InternetUse) As MaxInternetUSe
FROM EconomicIndicators
GROUP BY Country
HAVING MAX(InternetUse) > 1000-- Calculate the aggregations by Shape
SELECT Shape,
AVG(DurationSeconds) AS Average,
MIN(DurationSeconds) AS Minimum,
MAX(DurationSeconds) AS Maximum
FROM Incidents
GROUP BY Shape
-- Return records where minimum of DurationSeconds is greater than 1
having min(DurationSeconds) > 1Dealing with Missing Data
Blank is not NULL
A blank is not the same as a NULL value
May show up in columns containing texts
An Empty string '' can be used to find blank values
The best way is look for a column where the length or LEN > 0
Replacing Null values
ISNULL(column, 'replacedValue')is there are any nulls it will be replaced with thereplacedValue
we can replace with existing column
we have also another method COALESCE(value_1, value_2, value_3, ... value_n) returns the first non-missing value
Binning Date with Case
The case statement allows us to evaluate records like an if statement
for example, we can check if a record contains a value and if it does exist we then replace it with a value from our choice, and if it doesn't we can replace it with another value
The ELSE is optional
we can use CASE to create groups which are often called bins
another example
Chapter 2: Math Functions
Count and sum
Dates
DATEPART is used to determine what part of the date you want to Calculate, some of the common abbreviation are:
DDfor DayMMfor MonthYYfor YearHHfor Hour
DATEADD(DATEPART, number, date) Add or subtract datetime values - Always returns a date
what is the date is 30 days from Jun 12, 2020? and also 30 days before
DATEDIFF(datepart, startdate, enddate) Obtain the difference between two datetime values - Always returns a number
Practical Example
Write a query that returns the number of days between OrderDate and ShipDate.
Write a query that returns the approximate delivery date as five days after the ShipDate.
Rounding And Truncating numbers
ROUND(number, length, [, function]) -> length: the number of places the number should be rounded.
Absolute, square root, square, logarithm
ABS()SQUARE()SQRT()LOG()
Chapter 3: Processing Data
While Loops While loops here are just as in programming languages exactly
Derived Tables Derived Tables are another name for a query acting as a table and are commonly used to do aggregations in T-SQL
we used derived table when we need to break down a complex query into smaller steps it's a great solution, if we want to create intermediate Calculations that need to ne used in a larger query
They are specified in the FROM clause
-- you will create a derived table to return all patient records with the highest BloodPressure at their Age level
CTES (Common Table Expression) are another type of derived table, they are little different as they can be used multiple times in a query and are defined like a table. [Defined before you use it]
the columns names need to maatch coluns in the query
Some Exercises
Chapter 4: Window functions
Intro
Window functions provides the ability to create and analyze groups of data. With window functions, we can look at the current row, next row and previous row all at the same time very efficiently
Data here are processed as a group, allowing each group to be evaluated seperately
here the data are seperated onto groups based on SalesYear
we've different categories of window functions
Aggreate functions - AVG, SUM, COUNt, ...
Ranking Functions - RANK, DENSE_RANK, ROW_NUMBER, ...
Analytic functions - LEAD, LAG, FIRST_VALUE, LAST_VALUE, ...
using windowing function, you can create a query to return values by year, without knowing the value of year is!
Window function is create by OVER() clause
to arrange the results, use ORDER BY.
Allows aggregations to be created at the same time as the window.
OVER accepts:
PARTITON BYwhich creates the frame, dividing the query result set into partitions, the window function is applied to each partition seperately, if you do not include the PARTITION BY the frame is entire tableORDER BYwhich determines the logical orders of rowsRANGEorROWSwhich limites the rows within the parition by Specifying the start and end points within the partition.
the default for ROWS or RANGE clause is
which means => Do the windowing function from the start of the result set to the current row [cumulatively] for example
would result in
t
M
1000
1000
1000
1
l
f
2000
1500
3000
2
x
f
3000
2000
6000
3
y
M
4000
2500
10000
4
z
f
5000
3000
15000
5
j
M
6000
3500
21000
6
the avg result for 3rd row is the average of (1st row to the 3rd row)
there's also another values for RANGE
which means => Do the windowing function from the start of the result set to the current row [cumulatively] # for the entire result set
for example
would result in
t
M
1000
3500
21000
6
l
f
2000
3500
21000
6
x
f
3000
3500
21000
6
y
M
4000
3500
21000
6
z
f
5000
3500
21000
6
j
M
6000
3500
21000
6
Write a T-SQL query that returns the sum of OrderPrice by creating partitions for each TerritoryName
43706
Australia
1469
43722
Australia
1469
-- calculate the number of orders in each territory.
43706
Australia
13
43722
Australia
16
Common Window functions
we'll cover 4 commonly used windowing functions
FIRST_VALUELAST_VALUELEADLAG`
(FIRST_VALUE, LAST_VALUE) each one of them return only one value for the entire window
The ORDER BY is required in the OVER statement adter the column bein paritioned, because the order values appear in the window and that determines the first or last value
Bob
2011
28000.00
28000.00
91000.00
2011-04-16
Bob
2011
7000.00
28000.00
91000.00
2011-07-17
Bob
2011
91000.00
28000.00
91000.00
2011-10-17
Bob
2012
140000.00
140000.00
107000.00
2012-01-15
Bob
2012
70000.00
140000.00
107000.00
2012-04-15
Bob
2012
154000.00
140000.00
107000.00
2012-07-16
Bob
2012
107000.00
140000.00
107000.00
2012-10-16
(LEAD, LAG) Using LEAD we can compare the value of the current row to the value of the next row Using LAG we can compare the value of the current row to the value of the previous row
they both provide the ability to query values from the [next, previous] row
requires using ORDER BY
For the last row in the window, the
LEAD value is null For the first row in the window, the
LAG value is null
** note: ORDER BY is required only when using common windowing functions (LEAD, LAG, FIRST_VALUE, LAST_VALUE) but not required when using aggregation functions (SUM, AVG, ...)
Time to practice
Suppose you want to figure out the first OrderDate in each territory or the last one. How would you do that?
Australia
2015-02-23 09:00:00
2015-02-23 09:00:00
Australia
2015-02-23 11:00:00
2015-02-23 09:00:00
Australia
2015-02-23 12:00:00
2015-02-23 09:00:00
Australia
2015-04-23 02:00:00
2015-02-23 09:00:00
Suppose you want to know the next and previous order for each TerritoryName?
Australia
2015-02-23 09:00:00
null
2015-02-23 11:00:00
Australia
2015-02-23 11:00:00
2015-02-23 09:00:00
2015-02-23 12:00:00
Increasing the Complexity of Windowing functions
Calculating running totals, and the problem we'll face when doing so with Windowing function and how to overcome those problems ? First let's take a look at what we did in this previous example
Bob
2011
28000.00
1551000.00
2011-04-16
Bob
2011
7000.00
1551000.00
2011-07-17
Bob
2011
91000.00
1551000.00
2011-10-17
Mary
2011
140000.00
1551000.00
2012-04-15
Mary
2011
70000.00
1551000.00
2012-07-15
Mary
2011
154000.00
1551000.00
2012-01-15
Mary
2012
107000.00
1859000.00
2012-01-16
you'll notice that we have one total for each window Now if we take the same query and add ORDER BY clause in the OVER clause, you will see different result
Bob
2011
28000.00
35000.00
2011-04-16
Bob
2011
7000.00
35000.00
2011-07-17
Mary
2011
367000.00
958000.00
2011-10-17
Mary
2011
367000.00
958000.00
2012-04-15
Bob
2012
70000.00
401000.00
2012-07-15
Bob
2012
154000.00
401000.00
2012-10-16
Because now the values in the CurrentQuota column are totaled by SalesPerson and SalesYear so let's break down what've happened we've partitioned by SalesYear beside Ordering by SalesPerson so for example, year 2011 and Bob => 28000 + 7000 = 35000 year 2011' and 'Mary => 367000 + 367000 =
Creating running totals You usually don't have to use ORDER BY when using aggregations, but if you want to create running totals, you should arrange your rows
Further Read
Last updated