Writing Stored Procedures And Functions
Temporal EDA, Variables & Date Manipulation
EDA: Exploratory Data analysis In this section, we will study date manipulation using different Date functions such as datepart, datename and datediff
-- Convert date time to date data type
select TOP(1) PickupDate, convert(Date, pickupDate) as DateOnly from YellowTripData
--here we use datepart to select the busiest hours for Taxi pickups
select top 3 count(Id) as BumverOfRides, Datepart(hour, pickupdate) as hour from ... group by datapart(hour, pickupdate) order by count(id) desc
--find the difference between the pickup date and dropoff date in seconds in sunday
select AVG(Datediff(second, pickupdate, dropoffdate) / 60)
as avgRideLenghtInmin
from ... where datename(weekday, pickupdate) = 'Sunday';
exercise: ransactions per day It's time for you to do some temporal EDA on the BikeShare dataset. Write a query to determine how many transactions exist per day.
SELECT
-- Select the date portion of StartDate
convert(date, StartDate) as StartDate,
-- Measure how many records exist for each StartDate
count(StartDate) as CountOfRows
FROM CapitalBikeShare
-- Group by the date portion of StartDate
group BY convert(date, StartDate)
-- Sort the results by the date portion of StartDate
order BY convert(date, StartDate);
second exercise: seconds or no seconds? is the second datapart on startDate is equal to zero or not
SELECT
-- Count the number of IDs
COUNT(ID) AS Count,
-- Use DATEPART() to evaluate the SECOND part of StartDate
"StartDate" = CASE WHEN datepart(second, StartDate) = 0 THEN 'SECONDS = 0'
WHEN datepart(second, StartDate) > 0 THEN 'SECONDS > 0' END
FROM CapitalBikeShare
GROUP BY
-- Use DATEPART() to Group By the CASE statement
CASE WHEN datepart(second, StartDate) = 0 THEN 'SECONDS = 0'
WHEN datepart(second, StartDate) > 0 THEN 'SECONDS > 0' END
third exercies: Which day of week is busiest?
SELECT
-- Select the day of week value for StartDate
datename(weekday, StartDate) as DayOfWeek,
-- Calculate TotalTripHours
sum(datediff(second, StartDate, EndDate))/ 3600 as TotalTripHours
FROM CapitalBikeShare
-- Group by the day of week
GROUP BY datename(weekday, StartDate)
-- Order TotalTripHours in descending order
ORDER BY TotalTripHours DESC
fourth exercise: find the outliers what is outlier? In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses. in the previous exercise 'Satureday' was the busiest day of the week, Do you wonder if there were any individual Saturday outliers that contributed to this
SELECT
-- Calculate TotalRideHours using SUM() and DATEDIFF()
sum(datediff(second, StartDate, EndDate))/ 3600 AS TotalRideHours,
-- Select the DATE portion of StartDate
convert(date, StartDate) AS DateOnly,
-- Select the WEEKDAY
datename(weekday, convert(date, StartDate)) AS DayOfWeek
FROM CapitalBikeShare
-- Only include Saturday
WHERE datename(WEEKDAY, StartDate) = 'Saturday'
GROUP BY CONVERT(DATE, StartDate);
Variable for datetime data
User-defined Functions
Stored Procedures
NYC Taxi Ride Case Study
Variables for datetime date: we can create single value variable or table variable
DECLARE @TaxiRideDate TABLE(StartDate date, EndDate datetime)
DECLARE @StartTime as time = '08:00 AM' --intial value
DECLARE @StartTime As time
SET @StartDate = '08:00 AM'
--declare and then set value depending on the result of select statement
DECLARE @BeginDate as date
SET
@BeginDate = (
SELECT TOP 1 PickupDate
FROM YellowTripDate
Order by PickupDate Asc
)
CASTing: sql server use default rules when exeuting CAST() statements so you can cast a time to datetime, even whn there's no datetime present
declare @StartDateTime as datetime
set @StartDateTime = CAST(@BeginDate as datetime) + CAST(@StartTime as datetime)
Table Variables
Declare @TaxiRideDates Table(
StartDate date,
EndDate date
)
-- INSERT static values into table variable
insert into @TaxiRideDates (StartDate, EndDate)
SELECT '3/1/2018', '3/2/2018'
--insert a query result
insert into @TaxiRideDates(StartDate, EndDate)
Select Distinct CAST(pickupdate as date), CAST(DropOffDate as date)
FROM YellowTripDate
table variables are not recommended to store more than one hundered recored beuase of potential performance
-- Create @ShiftStartTime
declare @ShiftStartTime AS time = '08:00 AM'
-- Create @StartDate
declare @StartDate AS date
-- Set StartDate to the first StartDate from CapitalBikeShare
set
@StartDate = (
SELECT TOP 1 StartDate
FROM CapitalBikeShare
ORDER BY StartDate ASC
)
-- Create ShiftStartDateTime
declare @ShiftStartDateTime AS datetime
-- Cast StartDate and ShiftStartTime to datetime data types
SET @ShiftStartDateTime = CAST(@StartDate AS datetime) + CAST(@ShiftStartTime AS datetime)
SELECT @ShiftStartDateTime
-- Declare @Shifts as a TABLE
declare @Shifts Table(
-- Create StartDateTime column
StartDateTime datetime ,
-- Create EndDateTime column
EndDateTime datetime)
-- Populate @Shifts
insert into @Shifts (StartDateTime, EndDateTime)
SELECT '3/1/2018 8:00 AM', '3/1/2018 4:00 PM'
SELECT *
FROM @Shifts
-- Declare @RideDates
DECLARE @RideDates TABLE(
-- Create RideStart
RideStart date,
-- Create RideEnd
RideEnd date)
-- Populate @RideDates
INSERT INTO @RideDates(RideStart, RideEnd)
-- Select the unique date values of StartDate and EndDate
SELECT DISTINCT
-- Cast StartDate as date
CAST(StartDate as date),
-- Cast EndDate as date
CAST(EndDate as date)
FROM CapitalBikeShare
SELECT *
FROM @RideDates
Date manipulation GETDATE DATEADD -> we don't have DATESUBTRACT because we can do the same functionality using negative number
-- Yersterday's Taxi Passenger Count
SELECT SUM(PassengerCount) FROM YellowTripDate WHERE CAST(PickupDate as date) = DATEADD(d, -1, GETDATE())
note very important: the return value of DATEDIFF() depends on the datepart argument you pass
--First day of the current week
SELECT DATEADD(week, DATEDIFF(week, 0 , GETDATE()), 0)
what does this query actually about? when starting to figure out complex query, start by the inner functions GETDATE() returns the current Date Second, we're calling DATEDIFF() to see how many weeks between today and 0 -> which is 1/1/1900 By adding zero weeks to the 6.217th week with the DATEADD(), sql server will return the date of the beginning of the week
Exercise Parameters matter with DATEDIFF How many times, in terms of days, weeks, and months, are the datepart boundaries crossed between the dates 2/26/2018 and 3/3/2018?
answer: 5 days, 0 weeks, 1 month. now you understand the assumption sql server take to evaluate datediff
find the fist day of the current month
select dateadd(month, datediff(month, 0, GETDATE()), 0)
User defined functions
-- Scalar function with no input parameters
CREATE FUNCTION GetTomorrow()
RETURNS date AS BEGIN
RETURN (SELECT DATEADD(day, 1, GETDATE()))
END
-- Scalar function with one Parameter
-- return numeric value, which is the sum of all trips equal to PickupDate
CREATE FUNCTION GetRideHrsOneDay(@DateParm date)
RETURNS numeric as begin
return (SELECT
SUM(DATEDIFF(second, PickupDate, DropoffDate) / 360) from yellowTripData WHERE CONVERT(date, PickupDate) = @DatePam
) END;
-- Create GetYesterday()
CREATE function GetYesterday()
-- Specify return data type
returns date
AS
BEGIN
-- Calculate yesterday's date value
return( SELECT dateadd(day, -1, Getdate()))
END
-- Create SumRideHrsSingleDay
CREATE FUNCTION SumRideHrsSingleDay (@DateParm date)
-- Specify return data type
RETURNS numeric
AS
-- Begin
BEGIN
RETURN
-- Add the difference between StartDate and EndDate
(SELECT SUM(DATEDIFF(second, StartDate, EndDate))/3600
FROM CapitalBikeShare
-- Only include transactions where StartDate = @DateParm
WHERE CAST(StartDate AS date) = @DateParm)
-- End
END
-- Create the function
CREATE FUNCTION SumRideHrsDateRange (@StartDateParm datetime, @EndDateParm datetime)
-- Specify return data type
RETURNS numeric
AS
BEGIN
RETURN
-- Sum the difference between StartDate and EndDate
(SELECT SUM(DATEDIFF(second, StartDate, EndDate))/3600
FROM CapitalBikeShare
-- Include only the relevant transactions
WHERE StartDate > @StartDateParm and StartDate < @EndDateParm)
END
Table Valued UDF
there are two types: 1- Inline table valued functions (ITVF)
create function SumLoationStats(
@StartDate As datetime = '1/1/2017'
) RETURNS Table as return
select PUlocaation as PickupLocation,
Count(id) as RideCount,
SUM(TRIPDistance) AS TotalTripDistance
from YellowTriData
WHERE PUlocationID;
2-Multi statement table function (MSTFV)
create function CountTripAvgFareDay(
@Month char(2),
@Year char(4)
) RETURNS @TripCountAvgFare Table(
DropOffDate date, TripCount int, AvgFare numeric
) AS BEGIN INSRT INTO @TripCountAVGFare
SELECT
CAST(DropOffDate as date),
COUNT(ID),
AVG(FareAmount) as AVGFAreAmt
FROM YellowTripDate
WHERE
DATEPART(month, DropOffDate) = @Month
AND DatePART(year, DropOffDate) = @Year
Group by CAST(DropOffDate as data)
returns END
;
exercises
-- Create the function
CREATE FUNCTION SumStationStats(@StartDate AS datetime)
-- Specify return data type
RETURNS TABLE
AS
RETURN
SELECT
StartStation,
-- Use COUNT() to select RideCount
COUNT(ID) as RideCount,
-- Use SUM() to calculate TotalDuration
SUM(DURATION) as TotalDuration
FROM CapitalBikeShare
WHERE CAST(StartDate as Date) = @StartDate
-- Group by StartStation
GROUP BY StartStation;
-- Create the function
CREATE FUNCTION CountTripAvgDuration (@Month CHAR(2), @Year CHAR(4))
-- Specify return variable
RETURNS @DailyTripStats TABLE(
TripDate date,
TripCount int,
AvgDuration numeric)
AS
BEGIN
-- Insert query results into @DailyTripStats
INSERT @DailyTripStats
SELECT
-- Cast StartDate as a date
CAST(StartDate AS date),
COUNT(ID),
AVG(Duration)
FROM CapitalBikeShare
WHERE
DATEPART(month, StartDate) = @Month AND
DATEPART(year, StartDate) = @Year
-- Group by StartDate as a date
GROUP BY CAST(StartDate AS date)
-- Return
RETURN
END
UDF in action: executing the UDF
SELECT dbo.GetTomorrow()
the schema must be specified when creating and executing the function if we don't specifiy the schema upon creation, the sql server will by default assign to to the user defulat scehama
DECLARE @TotalRideHrs as numberic
EXEC @TotalRideHrs = dbo.GetRideHrsOneDay @DateParm = '1/15/2017'
SELECT
'Toal Ride Hours for 1/15/2017:',
@TotalRideHrs
exec is used to execute the function and assign the function to the local variable called @TotalRideHrs
EXecuting scalar UDF
-- Declare parameter variable
-- set to oldest date in YellowTripData
-- Pass to function with select
Declare @DateParm as date = (SELECT TOP 1 CONVERT(date, PickupDate) FROM YellowTripDate ORDER BY PickupDate DESC)
SELECT @DateParm, dbo.GetRideHrsOneDay (@DateParm)
it's possibly to execute the function in the where clause, but this affect the query performance
since table valued function returns a table, we can execute it using the select from keywords
SELECT TOP 10 * FROM dbo.SumLocationStats('2017')
ORDER BY RideCount DESC
you can sepcify order by clause when executing the function, but it's not allowed n thr funciton itsslef
DECLARE @CountTripAvgFareDay TAble(
DropOffDate date,
TripCount int,
AvgFare numeric
)
INSERT INTO @CountTripAvgFareDay
SELECT TOP 10 *
FROM dbo.CountTripAvgFareDay
ORDER BY DropOFFDATE
SELECT * FROM @CountTripAvgFareDay
here we created a local table variable with the same structure as the function table result structure
exercises
-- Create @BeginDate
declare @BeginDate AS date = '3/1/2018'
-- Create @EndDate
declare @EndDate as date = '3/10/2018'
SELECT
-- Select @BeginDate
@BeginDate AS BeginDate,
-- Select @EndDate
@EndDate AS EndDate,
-- Execute SumRideHrsDateRange()
dbo.SumRideHrsDateRange(@BeginDate, @EndDate) AS TotalRideHrs
-- Create @RideHrs
declare @RideHrs AS numeric
-- Execute SumRideHrsSingleDay function and store the result in @RideHrs
exec @RideHrs = dbo.SumRideHrsSingleDay @DateParm = '3/5/2018'
SELECT
'Total Ride Hours for 3/5/2018:',
@RideHrs
-- Create @StationStats
DECLARE @StationStats TABLE(
StartStation nvarchar(100),
RideCount int,
TotalDuration numeric)
-- Populate @StationStats with the results of the function
INSERT INTO @StationStats
SELECT TOP 10 *
-- Execute SumStationStats with 3/15/2018
FROM dbo.SumStationStats ('3/15/2018')
ORDER BY RideCount DESC
-- Select all the records from @StationStats
SELECT *
FROM @StationStats
Maintaing user defined Functions
change the function or modify using ALTER Function
alter function SumStationStats (@EndDate as datetime = '1/01/2017')
...
..
Exercise
-- Update SumStationStats
CREATE OR ALTER FUNCTION dbo.SumStationStats(@EndDate AS date)
-- Enable SCHEMABINDING
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
SELECT
StartStation,
COUNT(ID) AS RideCount,
SUM(DURATION) AS TotalDuration
FROM dbo.CapitalBikeShare
-- Cast EndDate as date and compare to @EndDate
WHERE CAST(EndDate AS Date) = @EndDate
GROUP BY StartStation;
-- Create the stored procedure
CREATE PROCEDURE dbo.cuspSumRideHrsSingleDay
-- Declare the input parameter
@DateParm date,
-- Declare the output parameter
@RideHrsOut numeric OUTPUT
AS
-- Don't send the row count
SET NOCOUNT ON
BEGIN
-- Assign the query result to @RideHrsOut
SELECT
@RideHrsOut = SUM(DATEDIFF(second, StartDate, EndDate))/3600
FROM CapitalBikeShare
-- Cast StartDate as date and compare with @DateParm
WHERE CAST(StartDate AS date) = @DateParm
RETURN
END
Oh crud
-- Create the stored procedure
CREATE PROCEDURE dbo.cusp_RideSummaryCreate
(@DateParm date, @RideHrsParm numeric)
AS
BEGIN
SET NOCOUNT ON
-- Insert into the Date and RideHours columns
INSERT INTO dbo.RideSummary(Date, RideHours)
-- Use values of @DateParm and @RideHrsParm
VALUES(@DateParm, @RideHrsParm)
-- Select the record that was just inserted
SELECT
-- Select Date column
Date,
-- Select RideHours column
RideHours
FROM dbo.RideSummary
-- Check whether Date equals @DateParm
WHERE Date = @DateParm
END;
-- Create the stored procedure
CREATE PROCEDURE dbo.cuspRideSummaryUpdate
-- Specify @Date input parameter
(@Date date,
-- Specify @RideHrs input parameter
@RideHrs numeric(18,0))
AS
BEGIN
SET NOCOUNT ON
-- Update RideSummary
UPDATE RideSummary
-- Set
SET
Date = @Date,
RideHours = @RideHrs
-- Include records where Date equals @Date
WHERE Date = @Date
END;
-- Create the stored procedure
CREATE PROCEDURE dbo.cuspRideSummaryDelete
-- Specify @DateParm input parameter
(@DateParm date,
-- Specify @RowCountOut output parameter
@RowCountOut int OUTPUT)
AS
BEGIN
-- Delete record(s) where Date equals @DateParm
DELETE FROM dbo.RideSummary
WHERE Date = @DateParm
-- Set @RowCountOut to @@ROWCOUNT
SET @RowCountOut = @@ROWCOUNT
END;
Oh EXEC
-- Create @RideHrs
DECLARE @RideHrs AS numeric(18,0)
-- Execute the stored procedure
EXEC dbo.cuspSumRideHrsSingleDay
-- Pass the input parameter
@DateParm = '3/1/2018',
-- Store the output in @RideHrs
@RideHrsOut = @RideHrs OUTPUT
-- Select @RideHrs
SELECT @RideHrs AS RideHours
-- Create @ReturnStatus
DECLARE @ReturnStatus AS int
-- Execute the SP
EXEC @ReturnStatus = dbo.cuspRideSummaryUpdate
-- Specify @DateParm
@DateParm = '3/1/2018',
-- Specify @RideHrs
@RideHrs = 300
-- Select the columns of interest
SELECT
@ReturnStatus AS ReturnStatus,
Date,
RideHours
FROM dbo.RideSummary
WHERE Date = '3/1/2018';
-- Create @ReturnStatus
DECLARE @ReturnStatus AS int
-- Create @RowCount
DECLARE @RowCount AS int
-- Execute the SP, storing the result in @ReturnStatus
EXEC @ReturnStatus = dbo.cuspRideSummaryDelete
-- Specify @DateParm
@DateParm = '3/1/2018',
-- Specify RowCountOut
@RowCountOut = @RowCount OUTPUT
-- Select the columns of interest
SELECT
@ReturnStatus AS ReturnStatus,
@RowCount as 'RowCount';
TRY CATCH
-- Alter the stored procedure
CREATE OR ALTER PROCEDURE dbo.cuspRideSummaryDelete
-- Specify @DateParm
@DateParm nvarchar(30),
-- Specify @Error
@Error nvarchar(max) = NULL OUTPUT
AS
SET NOCOUNT ON
BEGIN
-- Start of the TRY block
BEGIN TRY
-- Delete
DELETE FROM RideSummary
WHERE Date = @DateParm
-- End of the TRY block
END TRY
-- Start of the CATCH block
BEGIN CATCH
SET @Error =
'Error_Number: '+ CAST(ERROR_NUMBER() AS VARCHAR) +
'Error_Severity: '+ CAST(ERROR_SEVERITY() AS VARCHAR) +
'Error_State: ' + CAST(ERROR_STATE() AS VARCHAR) +
'Error_Message: ' + ERROR_MESSAGE() +
'Error_Line: ' + CAST(ERROR_LINE() AS VARCHAR)
-- End of the CATCH block
END CATCH
END;
-- Create @ReturnCode
DECLARE @ReturnCode int
-- Create @ErrorOut
DECLARE @ErrorOut nvarchar(max)
-- Execute the SP, storing the result in @ReturnCode
EXECUTE @ReturnCode = dbo.cuspRideSummaryDelete
-- Specify @DateParm
@DateParm = '1/32/2018',
-- Assign @ErrorOut to @Error
@Error = @ErrorOut OUTPUT
-- Select @ReturnCode and @ErrorOut
SELECT
@ReturnCode AS ReturnCode,
@ErrorOut AS ErrorMessage;
CASE STUDY
SELECT
-- PickupDate is after today
COUNT (CASE WHEN PickupDate > GetDate() THEN 1 END) AS 'FuturePickup',
-- DropOffDate is after today
COUNT (CASE WHEN DropOffDate > GetDate() THEN 1 END) AS 'FutureDropOff',
-- PickupDate is after DropOffDate
COUNT (CASE WHEN PickupDate > DropOffDate THEN 1 END) AS 'PickupBeforeDropoff',
-- TripDistance is 0
COUNT (CASE WHEN TripDistance = 0 THEN 1 END) AS 'ZeroTripDistance'
FROM YellowTripData;
-- Create the stored procedure
CREATE PROCEDURE dbo.cuspImputeTripDistanceMean
AS
BEGIN
-- Specify @AvgTripDistance variable
DECLARE @AvgTripDistance AS numeric (18,4)
-- Calculate the average trip distance
SELECT @AvgTripDistance = AVG(TripDistance)
FROM YellowTripData
-- Only include trip distances greater than 0
WHERE TripDistance > 0
-- Update the records where trip distance is 0
UPDATE YellowTripData
SET TripDistance = @AvgTripDistance
WHERE TripDistance = 0
END;
-- Create the function
CREATE FUNCTION dbo.GetTripDistanceHotDeck()
-- Specify return data type
RETURNS numeric(18,4)
AS
BEGIN
RETURN
-- Select the first TripDistance value
(SELECT TOP 1 TripDistance
FROM YellowTripData
-- Sample 1000 records
TABLESAMPLE(1000 rows)
-- Only include records where TripDistance is > 0
WHERE TripDistance > 0)
END;
Formatting
Calculate Total Fare Amount per Total Distance for each day of week. If the TripDistance is zero use the Hot Deck imputation function you created earlier in the chapter.
SELECT
-- Select the pickup day of week
DATENAME(weekday, PickupDate) as DayofWeek,
-- Calculate TotalAmount per TripDistance
CAST(AVG(TotalAmount/
-- Select TripDistance if it's more than 0
CASE WHEN TripDistance > 0 THEN TripDistance
-- Use GetTripDistanceHotDeck()
ELSE dbo.GetTripDistanceHotDeck() END) as decimal(10,2)) as 'AvgFare'
FROM YellowTripData
GROUP BY DATENAME(weekday, PickupDate)
-- Order by the PickupDate day of week
ORDER BY
CASE WHEN DATENAME(weekday, PickupDate) = 'Monday' THEN 1
WHEN DATENAME(weekday, PickupDate) = 'Tuesday' THEN 2
WHEN DATENAME(weekday, PickupDate) = 'Wednesday' THEN 3
WHEN DATENAME(weekday, PickupDate) = 'Thursday' THEN 4
WHEN DATENAME(weekday, PickupDate) = 'Friday' THEN 5
WHEN DATENAME(weekday, PickupDate) = 'Saturday' THEN 6
WHEN DATENAME(weekday, PickupDate) = 'Sunday' THEN 7
END ASC;
Write a query to display the TotalDistance, TotalRideTime and TotalFare for each day and NYC Borough. Display the date, distance, ride time, and fare totals for German culture.
Write a query to display the TotalDistance, TotalRideTime and TotalFare for each day and NYC Borough. Display the date, distance, ride time, and fare totals for German culture.
Last updated