13  Introduction to SQL

13.1 Overview

  • Data-definition language (DDL):
    • defining and modifying relation schemas
    • integrity constrains
    • view definition
    • authorization: access rights to relations and views.
  • Data-manipulation language (DML): querry information from and modify tuples in relations
  • Transaction Control: Specifying beginning and end points of transactions.
  • Embedded/dynamic SQL: how SQL is embedded in general-purpose programmig languages.

13.2 SQL DDL

following can be specified:

  • schema for each relation
  • set of indices to be maintained for each relation
  • security and authorization information for each relation
  • physical storage structure of each relation on disk

13.2.1 Basic Types

  • char(n): fixed n-long string. (full form: character(n) )
  • varchar(n): variable-length character string with max length n. (full form character varying(n))
  • int (full form: integer)
  • smallint
  • numeric(p, d): p digits in total, d of the digits after decimal point.
  • real, double precision: floating-point / double precision floating point.
  • float(n): Floating-point with precision of at least n digits.

13.2.2 Basic Schema Definition

create table department (
    dept_name   varchar(20),
    building    varchar(15),
    budget      numeric(12,2),
    primary key (dept_name) --primary key integrity constraint
);

general form:

create table r(
    a1 domain1,
    a2 domain2,
    ...
    a_n domain_n
    [integrity constraint 1],
    ...
    [integrity constraing 2] -- these are optional
)

13.2.3 Basic Constraints:

  • primary key: a1, …, a_n together form the primary key of the relation:

    primary key(a1, a2, ..., a_n)
  • foreign key: a1, …, a_n together form a foreign key over a relation s, i.e. a1, …, a_n must be a primary key of some tuple in s (existence: referential integrity)

    foreign key (a1, a2, ..., a_n) references s
  • not null:

    name varchar(20) not null, --name can not be null

concrete example:

create table instructor(
  ID          varchar(5),
  name        varchar(20) not null, --name can not be null
  dept_name   varchar(20),
  salary      numeric(8,2),
  primary key (ID),
  foreign key (dept_name) references department
)

Note that explicitly specifying the primary key of the referenced relation

foreign key (dept_name) references department(dept_name)

is also possible but not required.

13.2.4 Altering the Schema

  • remove a relation completely from the database schema:

    drop table r; --not to be confused with 'delete from'
  • modify a relation schema:

    alter table r add a1 a2; -- add the attributes a1, a2 to the relation r
    alter table r drop a; -- drop/remove attribute a from the relation r

    some dbs’ don’t support dropping single attributes but only whole tables.

13.3 Basic SQL Querries

A typical sql querry has the form

select a1, ..., a_n
from r1, ..., r_m
where P -- P is a predicate/condition
select name
from instructor
Displaying records 1 - 10
name
Srinivasan
Wu
Mozart
Einstein
El Said
Gold
Katz
Califieri
Singh
Crick
select dept_name
from instructor
Displaying records 1 - 10
dept_name
Comp. Sci.
Finance
Music
Physics
History
Physics
Comp. Sci.
History
Finance
Biology
select distinct dept_name --removes duplicates
from instructor
7 records
dept_name
Physics
Biology
Elec. Eng.
Finance
Comp. Sci.
History
Music

Select allows arbitrary expressions of attributes. we can output 10% salary raise for instructors

select id, name, dept_name, salary * 1.1
from instructor
Displaying records 1 - 10
id name dept_name ?column?
10101 Srinivasan Comp. Sci. 71500
12121 Wu Finance 99000
15151 Mozart Music 44000
22222 Einstein Physics 104500
32343 El Said History 66000
33456 Gold Physics 95700
45565 Katz Comp. Sci. 82500
58583 Califieri History 68200
76543 Singh Finance 88000
76766 Crick Biology 79200

13.3.1 Where clause.

  • Names of all instructors in the CS department who have salary greater than $70,000:
select name
from instructor
where dept_name = 'Comp. Sci.'
and salary > 70000
2 records
name
Katz
Brandt

13.3.2 Joining Tables

  • names of instructors, names of their departments and the names of the buildings where departments are located:
select i.name, d.name as dept_name, building
from instructor i, department d
where i.dept_name = d.name
Displaying records 1 - 10
name dept_name building
Nishimoto Biology Watson
Crick Biology Watson
Brandt Comp. Sci. Taylor
Katz Comp. Sci. Taylor
Srinivasan Comp. Sci. Taylor
Kim Elec. Eng. Taylor
Singh Finance Painter
Wu Finance Painter
Roznicki History Painter
Califieri History Painter
  • names of instructors and identifiers of courses they have tought:
select i.name, t.course_id
from instructor i, teaches t
where i.id = t.instructor_id
Displaying records 1 - 10
name course_id
Srinivasan CS-101
Srinivasan CS-315
Srinivasan CS-347
Wu FIN-201
Mozart MU-199
Einstein PHY-101
El Said HIS-351
Katz CS-101
Katz CS-319
Crick BIO-101
  • names of instructors from the CS department and identifiers of courses that they have tought:
select i.name, t.course_id
from instructor i, teaches t
where i.id = t.instructor_id and i.dept_name = 'Comp. Sci.'
8 records
name course_id
Srinivasan CS-101
Srinivasan CS-315
Srinivasan CS-347
Katz CS-101
Katz CS-319
Brandt CS-190
Brandt CS-190
Brandt CS-319

13.3.3 Renaming

select i.name as instructor_name, t.course_id -- as can be omitted
from instructor as i, teaches as t -- as can be ommited
where i.id = t.instructor_id
Displaying records 1 - 10
instructor_name course_id
Srinivasan CS-101
Srinivasan CS-315
Srinivasan CS-347
Wu FIN-201
Mozart MU-199
Einstein PHY-101
El Said HIS-351
Katz CS-101
Katz CS-319
Crick BIO-101

Renaming useful when comparing tuples in the same relation:

  • names of instructors whose salary is greater than at least one instructor in the biology department \(\approx\) names of all instructors who earn more than the lowest paid instructor in the Biology department:
select distinct i.name
from instructor i , instructor i2 
where i.salary > i2.salary and i2.dept_name = 'Biology'
8 records
name
Brandt
Gold
Wu
Kim
Nishimoto
Katz
Einstein
Singh
  • correlation name = table alias = tuple variable

13.3.4 String Operations

  • concatenation with ||
values 
('hey' || ' there!')
1 records
column1
hey there!
  • upper() and lower()
values 
(upper('hey there')),
(lower('HEY THERE'))
2 records
column1
HEY THERE
hey there
  • removing spaces with trim()
values 
(trim('hey   ' ) || ' there!')
1 records
column1
hey there!

13.3.5 Pattern Matching

  • %: matches any string
  • _: matches any character
  • examples:
    • intro%: any string beginning with ‘Intro’
    • %Comp%: any string containing ‘Comp’ as a substring
    • ___: any string of exactly three characters
    • ___%: any string of at least three characters
  • concrete example; information of courses that have ‘comp’ as a substring in their title:
select *
from course c
where c.title ilike '%comp%' --ilike is case insensitive like
2 records
id title dept_name credits
BIO-399 Computational Biology Biology 3
CS-101 Intro. to Computer Science Comp. Sci. 4

*Escaping special characters like “%” with “":

select *
from (
    values
    ('%15')
) r(a)
where a like '\%%'  
1 records
a
%15
  • Defining custom escape characters other than “":
select *
from (
    values
    ('%15')
) r(a)
where a like '^%%' escape '^' -- '^' is defined as the escape character
1 records
a
%15

13.3.6 Ordering Display of Tuples

  • ordering
select name
from instructor
where dept_name  = 'Physics'
order by name;
2 records
name
Einstein
Gold
  • ordering order, multiple attributes
select *
from instructor
order by salary desc, name asc
Displaying records 1 - 10
id name dept_name salary
30765 Green Music NA
22222 Einstein Physics 95000
83821 Brandt Comp. Sci. 92000
12121 Wu Finance 90000
33456 Gold Physics 87000
98345 Kim Elec. Eng. 80000
76543 Singh Finance 80000
45565 Katz Comp. Sci. 75000
10032 Nishimoto Biology 73120
76766 Crick Biology 72000

13.3.7 Between

SELECT name
from instructor i
where i.salary BETWEEN 90000 and 100000
3 records
name
Wu
Einstein
Brandt

13.3.8 Tuples in Where Predicates

SELECT i.name, t.course_id
from instructor i, teaches t
where i.id = t.instructor_id and i.dept_name = 'Biology' 

is equivalent to

SELECT i.name, t.course_id
from instructor i, teaches t
where (i.id, i.dept_name) = (t.instructor_id, 'Biology')
2 records
name course_id
Crick BIO-101
Crick BIO-301

13.4 Set Operations

Set operations eliminate duplicates by default. Duplicates retained by all keyword.

13.4.1 Union

corresponds to \(\cup\).

  • courses offered in 2017 Fall or 2018 Spring:
(
    select  course_id
    from section
    where semester = 'Fall' and year = 2017
)
UNION
(
    select course_id
    from section
    where semester = 'Spring' and year = 2018
)
8 records
course_id
CS-101
CS-315
CS-319
CS-347
FIN-201
HIS-351
MU-199
PHY-101

13.4.2 Intersect

Corresponds to \(\cap\).

  • Courses offered in both Fall 2017 and Spring 2018:
(
    select  course_id
    from section
    where semester = 'Fall' and year = 2017
)
intersect
(
    select course_id
    from section
    where semester = 'Spring' and year = 2018
)
1 records
course_id
CS-101

13.4.3 Except

Corresponds to \(\backslash\).

  • courses offered in the Fall 2017 but not in Spring 2018:
(
    select  course_id
    from section
    where semester = 'Fall' and year = 2017
)
except
(
    select course_id
    from section
    where semester = 'Spring' and year = 2018
)
2 records
course_id
PHY-101
CS-347
  • Illustrating how all works:

union all:

with r(a) as (
    values
    (1),
    (1)
), s(a) as (
    values
    (1)
)
(select *
from r )
union  all
(select *
from s)
3 records
a
1
1
1

intersect all:

with r(a) as (
    values
    (1),
    (1),
    (1)
), s(a) as (
    values
    (1),
    (1)
)
(select *
from r )
INTERSECT all
(select *
from s)
2 records
a
1
1

except all:

with r(a) as (
    values
    (1),
    (1)
), s(a) as (
    values
    (1)
)
(select *
from r )
except  all
(select *
from s)
1 records
a
1
with r(a) as (
    values
    (1),
    (1)
), s(a) as (
    values
    (1)
)
(select *
from s )
except  all
(select *
from r)
0 records
a

13.5 Null

  • null represents values that are not known. (can be anything).
  • arithmetic expressions involving null produce null:
values
(1 + null)
1 records
column1
NA
  • boolean expressions and predicates involving null (other than is [not] null) have a special truth value unknown.
    • e.g. (1 < unknown) = uknown
  • truth tables for unknown:
p q p and q
false unknown false
true unknown unknown
p q p or q
false unknown unknown
true unknown true

not unknown = uknown

  • tuples that evaluate to uknown in the where clause are not included in the result (just like the ones that evaluate to false)

  • we can test if a value is or isn’t null:

select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b is null
1 records
a
2
select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b is not null
1 records
a
1
  • we can test whether the result of a predicate is or isn’t uknown
select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b > 2 is unknown
1 records
a
2
select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b > 2 is not unknown
1 records
a
1

13.6 Aggregate Functions

  • Take collection (i.e. aggregate, multiset) of values as input and return a single value.
  • Built in aggregate functions in sql:
    • avg: input must be numeric
    • min
    • max
    • sum: computes the total sum of the values in the aggregate, input must be numeric
    • count

13.6.1 Basic Aggregation

  • Average salary of instructors in the CS department:
select avg(salary) as avg_salary
from instructor 
where dept_name = 'Comp. Sci.'
1 records
avg_salary
77333.33

Duplicates are retained. Retention of duplicates is obviously important for calculating averages. But sometimes we may want to eliminate duplicates:

  • Find the total number of instructors that teach a course in the Spring 2018 semester:
select count(distinct t.instructor_id)
from teaches t
where semester = 'Spring' and year = 2018
1 records
count
6

as opposed to

select count(t.instructor_id)
from teaches t
where semester = 'Spring' and year = 2018
1 records
count
7
  • counting all attributes with count(*):
select count(*)
from course
1 records
count
13

count(*) retains null values. It is the only aggregate function that does so. All other aggregate functions ignore null values, including count functions applied on a single attribute. That is count(*) and count(attr) are different:

select count(*) as count_star, count(a) as count_attribute
from (
    values 
    (1),
    (null)
) r(a)
1 records
count_star count_attribute
2 1

13.6.2 Aggregation with Grouping

Instead of applying the aggregate function to a single aggregate (collection), we can apply it to multiple aggregates/collections, that consist of tuples grouped together w.r.t certain grouping attributes:

  • find the average salary in each department:
select dept_name, avg(salary) as avg_dept_salary
from instructor  
group by dept_name
order by avg(salary) desc
7 records
dept_name avg_dept_salary
Physics 91000.00
Finance 85000.00
Elec. Eng. 80000.00
Comp. Sci. 77333.33
Biology 72560.00
History 59666.67
Music 40000.00
  • find number of instructors working in each department:
select dept_name, count(*) cnt
from instructor 
group by dept_name 
order by dept_name
7 records
dept_name cnt
Biology 2
Comp. Sci. 3
Elec. Eng. 1
Finance 2
History 3
Music 2
Physics 2
  • find the number of instructors in each department who teach a course in the Spring 2018 semester:
select i.dept_name, count(distinct i.id) as instr_count
from instructor i , teaches t 
where i.id = t.instructor_id 
and t.semester = 'Spring' and t.year = 2018
group by i.dept_name
4 records
dept_name instr_count
Comp. Sci. 3
Finance 1
History 1
Music 1
Warning

Note that only attributes allowed to appear in the select clause (other than the attribute being aggregated) are the attributes used in the group by clause. Thus

select a, X, avg(b) -- X doesn't appear in the group by clause below
from r -- some relation r
where P -- some predicate
group by a

it not legal.

13.6.3 Having Clause

Specifies a condition that applies to groups rather than to tuples.

  • departments where average salayr is morethan $42,000:
select i.dept_name, avg(i.salary)
from instructor i 
group by dept_name
having avg(i.salary) > 42000
order by avg(i.salary) desc
6 records
dept_name avg
Physics 91000.00
Finance 85000.00
Elec. Eng. 80000.00
Comp. Sci. 77333.33
Biology 72560.00
History 59666.67

Like with select the attributes that are allowed in the having clause are either present in the group by clause, it is the attribute that is being aggregated.

13.6.4 Semantics of Group by and Having

Can be understood roughly as:

  1. from is evaluated to get a relation
  2. where predicate is applied on each tupel to get a new relation
  3. tupels that agree on values of those attributes listed in the group by clause are placed into groups.
  4. having clause applied to each group, the ones that satisfy it are retianed to obtain a new relation
  5. select clause is applied to the relation to obtain the resulting relation.

A query with both where and having:

  • for each course section offered in 2017, find the average total credits (tot_cred) of all students enrolled in the section, if the section has at least 2 students:
select t.course_id, t.sec_id, t.semester, avg(s.tot_cred)
from takes t, student s 
where t.student_id = s.id 
and t."year" = 2017
group by course_id, sec_id, semester
having count(s.id) > 1;
3 records
course_id sec_id semester avg
CS-101 1 Fall 65
CS-190 2 Spring 43
CS-347 1 Fall 67
Aggregation with null

All aggregate functions except of count(*) ignore null values

Aggregation of Boolean Values
  • the aggregate function some() can be applied to an aggregate consisting of boolean values to compute the disjunction of these values
  • the aggregate function every() can be applied to an aggregate consisting of boolean values to compute the conjunction of the values:
select every(a)
from (
    values 
    (true),
    (true)
) r(a)
1 records
every
TRUE

13.7 Nested Subqueries

Nested subqueries are used for:

  • test for set membership with [not] in
  • make set comparisons
  • determine set cardinality by nesting queries in the where clause

Queries can be nested in the

  • where clause
  • from clause

Simple example for a subquery:

-- names of departments and the average pay
-- where average pay in that department is above the overall average pay
select dept_name, avg(i.salary) as avg_dep_pay
from instructor i 
group by i.dept_name 
having avg(i.salary) > (
    select avg(i2.salary)
    from instructor i2 
)
order by avg(i.salary) desc
4 records
dept_name avg_dep_pay
Physics 91000.00
Finance 85000.00
Elec. Eng. 80000.00
Comp. Sci. 77333.33

13.7.1 Testing for Set Membership

Reconsider the queries

  • Find all the courses taught in both Fall 2017 and Spring 2018 semesters
  • Find all the courses taught in Fall 2017 but not in Spring 2018

Previously we used set operations. Now we can use in:

select distinct s.course_id  
from "section" s 
where s.semester = 'Fall' and s."year" = 2017 
and s.course_id in (
    select s.course_id
    from "section" s 
    where s.semester = 'Spring' and s."year" = 2018
)
1 records
course_id
CS-101
select distinct s.course_id  
from "section" s 
where s.semester = 'Fall' and s."year" = 2017 
and s.course_id not in (
    select s.course_id
    from "section" s 
    where s.semester = 'Spring' and s."year" = 2018
)
2 records
course_id
CS-347
PHY-101

where

select course_id
from section
where semester = 'Spring' and year = 2018

is a nested subquery.

distinct is used since set operations remove duplicates by default.

13.7.2 Enumerated Sets

[not] in can be used on enumerated sets:

  • Names of al instructors other than Mozart and Einstein:
select distinct name
from instructor i 
where name not in ('Mozart', 'Einstein')
Displaying records 1 - 10
name
Singh
Srinivasan
Crick
Green
Brandt
Gold
Roznicki
Califieri
Wu
Kim
Warning

enumerated sets shouldn’t be confused with tupels

13.7.3 Set Comparison

Reconsider the query:

  • Names of all instructors whose salary is grater than at least one instructor in the Biology department:

Previously we used the somewhat awkward solution:

select i.name
from instructor i , instructor i2 
where i2.dept_name = 'Biology'
and i.salary > i2.salary 

Now we can express this much more similar to natural language:

select i.name
from instructor i 
where i.salary > some(
    select salary 
    from instructor
    where dept_name = 'Biology'
)
8 records
name
Wu
Einstein
Gold
Katz
Singh
Brandt
Kim
Nishimoto

Contrast:

with cross product with set comparison
select i.name
from instructor i , instructor i2
where i2.dept_name = 'Biology'
and i.salary > i2.salary
select i.name
from instructor i
where i.salary > some(
    select salary
    from instructor
    where dept_name = 'Biology'
)

Consider another query:

  • Names of instructors that have a salary greater than that of any/each instructor in the Biology department:
select i."name"
from instructor i 
where i.salary > all (
    select salary 
    from instructor 
    where dept_name = 'Biology'
)
7 records
name
Wu
Einstein
Gold
Katz
Singh
Brandt
Kim
Warning
  • = some (...) is identical to in (...)
  • <> some (...) is not identical to not in (...)
  • <> all (...) is identical to not in (...)
  • = all(...) is not identical to in (...)
  • name of the department with the highest average salary:
select dept_name
from instructor i 
group by i.dept_name
having avg(i.salary) >= all (
    select avg(salary)
    from instructor 
    group by dept_name 
)
1 records
dept_name
Physics

13.7.4 Testing for Empty Relations

We can test whether a query has any tuples in its result (whether if it’s non-empty) with exists. Reconsider the query

  • All courses taught both in the Fall 2017 and Spring 2018:
select course_id 
from "section" s
where s.semester = 'Fall' and s."year" = 2017
and exists (
    select *
    from "section" s2
    where s2.semester = 'Spring' and s2."year" = 2018
    and s.course_id = s2.course_id 
)
1 records
course_id
CS-101

Above a correlation name from the outer querry has been used in the inner querry. This can equivalently be achieved with a usual join operation:

select s.course_id
from "section" s , "section" s2
where s.semester = 'Fall' and s."year" = 2017
and s2.semester = 'Spring' and s2."year" = 2018
and s.course_id = s2.course_id 

Non-existence can be queried with not exists. For example we can use it to simulate set containment.

\[ B \subseteq A \equiv \texttt{not exists (B except A)} \]

  • All students who have taken all courses offered in the Biology department:
select s.id, s."name" 
from student s 
where not exists (
    (select c.id
    from course c
    where c.dept_name = 'Biology')
        except
    (select t.course_id 
    from takes t 
    where s.id = t.course_id)
)
0 records
id name

The subquerry

select c.id
from course c
where c.dept_name = 'Biology'

finds all courses offered in the Biology department. The subquerry:

select t.course_id 
from takes t 
where s.id = t.course_id

finds all courses that the student ‘s’ has taken.

Consider the query:

  • How many students have taken a course by instructor with the ID ‘10101’

We can construct this query in three different ways:

  • With the usual join operation:
select count(distinct t.student_id)
from takes t, teaches t2 
where t.course_id = t2.course_id 
    and t.sec_id = t2.sec_id 
    and t.semester = t2.semester 
    and t."year" = t2."year" 
    and t2.instructor_id = '10101'
  • With set membership test using in with a tuple instead of a single attribute
select count(distinct t.student_id)
from takes t 
where (t.course_id, t.sec_id, t.semester, t."year") in ( 
    select t2.course_id , t2.sec_id , t2.semester , t2."year" 
    from teaches t2 
    where t2.instructor_id = '10101'
)
  • with the exists construct
select count(distinct t.student_id)
from takes t 
where exists (
    select t2.course_id , t2.sec_id , t2.semester , t2."year" 
    from teaches t2 
    where t.course_id = t2.course_id 
    and t.sec_id = t2.sec_id 
    and t.semester = t2.semester 
    and t."year" = t2."year" 
    and t2.instructor_id = '10101'
)
1 records
count
6

13.7.5 Test for Absence of Duplicates / Test for Uniqueness

Testing if subquery has duplicate tuples with unique:

  • courses that were offered at most once in 2017:
select c.id 
from course c 
where unique (
    select 
    from "section" s 
    where c.id = s.course_id 
    and s."year" = 2017
)

Since unique not implemented in Postgresql, we can simulate it as follows:

select c.id 
from course c 
where 1 >= (
    select count(s.course_id)
    from "section" s 
    where s."year" = 2017
    and c.id = s.course_id
)
Displaying records 1 - 10
id
BIO-101
BIO-301
BIO-399
CS-101
CS-315
CS-319
CS-347
EE-181
FIN-201
HIS-351

(For some reason this delivers a different result than two solutions below. Why?)

Another solution using aggregate functions:

select c.id 
from course c , "section" s 
where c.id = s.course_id 
and s."year" = 2017
group by c.id
having count(*) <= 1
5 records
id
BIO-101
CS-101
CS-347
EE-181
PHY-101

Yet another far less elegant solution:

  • First we find courses offered more than once in 2017:
select course_id
from "section" s 
where s."year" = 2017
    except all
select distinct course_id 
from "section" s 
where s."year" = 2017
1 records
course_id
CS-190
  • Than courses offered at most once in 2017:
select distinct course_id
from "section" s 
where s."year" = 2017
and course_id not in (
    select course_id
from "section" s 
where s."year" = 2017
    except all
select distinct course_id 
from "section" s 
where s."year" = 2017
)
5 records
course_id
BIO-101
CS-101
CS-347
EE-181
PHY-101

13.7.6 Subqueries in the From Clause

Since relations appear in the from clause, there is nothing preventing them being subqueries. Reconsider the query

  • Average instructor salaries per department, where the average salary in that department is greater than 42000:

Previously we solved using group by and having - which is the natural way:

select i.dept_name, avg(i.salary) avg_salary
from instructor i 
group by i.dept_name 
having avg(i.salary) > 42000

We can re-write it without having using subquery in the from clause:

select dept_name, avg_salary
from (
    select i.dept_name, avg(i.salary)
    from instructor i 
    group by i.dept_name 
) dep_salaries(dept_name, avg_salary) --table alias 
where avg_salary > 42000
6 records
dept_name avg_salary
Physics 91000.00
Biology 72560.00
Elec. Eng. 80000.00
Finance 85000.00
Comp. Sci. 77333.33
History 59666.67

Contrast:

with having with nested query in from clause
select i.dept_name, avg(i.salary) avg_salary
from instructor i
group by i.dept_name
having avg(i.salary) > 42000
select dept_name, avg_salary
from (
    select i.dept_name, avg(i.salary)
    from instructor i
    group by i.dept_name
) dep_salaries(dept_name, avg_salary) --table alias
where avg_salary > 42000

Another example:

  • max total salary in a department across all departments:
select max(tot_sal) as max_tot_sal
from (
    select sum(i.salary)
    from instructor i 
    group by i.dept_name 
) r(tot_sal) -- table alias
1 records
max_tot_sal
232000
Note

Correlation variables are allowed in a from subquery using the lateral keyword.

  • names of instructors, their salaries, alongside with the average salary of their department:
select i.name, i.salary , i.dept_name, avg_dep_salary
from instructor i , lateral (
    select avg(i2.salary)
    from instructor i2 
    where i.dept_name = i2.dept_name 
) r(avg_dep_salary)
Displaying records 1 - 10
name salary dept_name avg_dep_salary
Srinivasan 65000 Comp. Sci. 77333.33
Wu 90000 Finance 85000.00
Mozart 40000 Music 40000.00
Einstein 95000 Physics 91000.00
El Said 60000 History 59666.67
Gold 87000 Physics 91000.00
Katz 75000 Comp. Sci. 77333.33
Califieri 62000 History 59666.67
Singh 80000 Finance 85000.00
Crick 72000 Biology 72560.00

13.7.7 With Clause

Defines temporary relations whose definition is available only in the query in which with clause occurs.

  • Departments with maximum budget
with max_budget(value) as (
    select max(d.budget)
    from department d 
)
select d."name" , d.budget as budget
from department d, max_budget mb
where d.budget = mb.value 
1 records
name budget
Finance 120000

Alternatively we can use nested subquerries in the with clause or in the from clause:

  • as a simple subquery in where clause:
select d."name" , d.budget 
from department d 
where d.budget = (
    select max(d2.budget)
    from department d2 
)
  • using all with a where-clause subquery:
select d.name, d.budget  
from department d 
where d.budget >= all (
    select budget 
    from department d2 
)
  • as from-clause subquery:
select d.name, d.budget 
from department d, (
    select max(d2.budget)
    from department d2 
) bd(val)
where d.budget = bd.val

Using with improves readability. Consider another example:

  • departments where total salary greater than the average of the total salary of all departments:
with dep_tot_salary(d_name, t_salary) as (
    select i.dept_name, sum(i.salary)
    from instructor i 
    group by i.dept_name    
)
select *
from dep_tot_salary
where t_salary > (
    select avg(t_salary)
    from dep_tot_salary
)
4 records
d_name t_salary
Physics 182000
Finance 170000
Comp. Sci. 232000
History 179000

13.7.8 Scalar Subqueries

Queries returning one single tuple with one attribute are called scalar. Such queries can be used in place of values as scalar subquerries, even in select-clause:

  • all departments along with number of instructors in each department
select d."name" , 
       ( -- scalare unteranfragen duerfen innerhalb select eingesetzt werden
        select count(*)
        from instructor i 
        where d."name" = i.dept_name --d is a correlatin variable
       ) cnt
from department d 
order by cnt desc
7 records
name cnt
History 3
Comp. Sci. 3
Finance 2
Biology 2
Physics 2
Music 2
Elec. Eng. 1

Of course this can simply be achieved with grouping:

select i.dept_name , count(*)
from instructor i 
group by i.dept_name 
order by count(*) desc
7 records
dept_name count
History 3
Comp. Sci. 3
Physics 2
Finance 2
Biology 2
Music 2
Elec. Eng. 1

Alternatively as subquery of from with lateral:

select d."name", res.val as cnt
from department d , lateral (
    select count(*)
    from instructor i 
    where i.dept_name = d."name" 
) res(val)
order by cnt desc

Scalar subqueries can occur in select, where and having clauses.

13.7.9 Queries Without From

  • with values
values
(1, 'a'),
(2, 'b')
2 records
column1 column2
1 a
2 b
  • with select
select 
(1), ('a')
  union all
select 
(2), ('b')
2 records
?column? ?column?..2
1 a
2 b

13.8 Modifying the Database

  • adding - insert into R
  • removing - delete from R
  • changing - update R set

infromation from the database as opposed to querying.

13.8.1 Deletion

Expressed just like a query,

  • delete tuples from r satisfying condition P:
delete from r -- relation
where P -- predicate
  • delete all tuples from instructor:
delete from instructor;

or equivalently

delete from instructor
where true

The predicate can be arbitrarily complex,

  • Delete all instructors that work in the Watson building:
delete from instructor
where dept_name in (
  select dept_name
  from department
  where building = 'Watson'
)

delete deletes tupels from a single relation, but we can still reference any number of relations nested in select, from and where, including the one that we are deleting from. Consider:

  • delete all instructors that earn less than the overall average instructor salary
delete from instructor
where salary < (
  select avg(salary)
  from instructor
)
  • delete(which is anoter way of saying ‘fire’) all instructors that haven’t taught in the year 2018
delete from instructor i
where i.id not in (
  select t.instructor_id
  from teaches t 
  where t."year" = 2018
)

above a relation other than the one being deleted was referenced in the nested subquery in the where clause, demonstrating we can reference arbitrary relations.

Note

deletions are performed after the tupels are filtered (the ones that pass the test). Otherwise the result could be influenced.

13.8.2 Insertion

Any query result delivering a collection of tuples can be inserted into a relation, as long as they agree with the cardinality and domains of the attributes of the relation:

  • providing tuples explicitetly using values:
insert into course
values 
('CS-437', 'Database Systems', 'Comp. Sci.', 4),
('PHY-201', 'Intro. Theo. Phys.', 'Physics', 6)
  • or using select:
insert into course
select 
('CS-437'), ('Database Systems'), ('Comp. Sci.'), (4)
  union
select 
('PHY-201'), ('Intro. Theo. Phys.'), ('Physics'), (6)
  • we can explicitely specify the order of the attributes:
insert into course(title, id, credits, dept_name)
  values ('Database Systems', 'CS-437', 4, 'Comp. Sci.')
  • even omit some of the attributes from the specification (ommited attributes are set automatically as null):
insert into course (title, id)
  values ('Baking Cakes', 'BK-101')

Above dept_name and credit are set automatically as null.

More generally, results of an arbitary query can be inserted as well. Consider:

  • make each student in the Music department who has earned more than 144 credit hours an instructor in the Music department with a salary of $18,000.
insert into instructor 
  select s.id, s."name", s.dept_name, 18000
  from student s
  where s.dept_name = 'Music' and tot_cred > 144;
  • insert a copy of a relation to itself:
insert into s
  select *
  from s
Note

Importantly in the above example:

insert into s
  select *
  from s

the query is evaluated before the insertion is performed. Otherwise we could face infinite loops. Also note that this operation is only possible if the relation s has no primary key defined. Otherwise duplicates are not allowed.

13.8.3 Updates

Changing some values of tuples is possible with update. Consider:

  • Increase the salaries of all instructors by 5%
update instructor
set salary = salary * 1.05
  • Raise the salary only of those instructors with pay less than $70,000
update instructor
set salary = salary * 1.05
where salary < 70000

Nested subqueries are allowed in the where clause, referencing the relation being updated or other arbitrary relations. Consider:

  • give a 5% salary raise to instructors whose salary is less than overall average salary:
update instructor
set salary = salary * 1.05
where salary < (
  select avg(salary)
  from instructor 
)
  • give a 5% salary raise to instructors that have taugh more than 1 course in 2018:
update instructor i
set i.salary = i.salary * 1.05
where 1 < (
    select count(*)
    from teaches t 
    where t.instructor_id = i.id
    and  t."year" = 2018
)
  • give 3% raise to instructors with salary over $100,000, 5% to others.

One solution is to write to update statements:

update instructor 
set salary = salary * 1.03
where salary > 100000
update instructor 
set salary = salary * 1.05
where salary <= 100000
Warning

in the above solution the order of statements is important, otherwise we could end up giving a 8% raise to instructors whose salary is 100000 or just below it.

Alternatively use case statement. Then the order won’t be important.

update instructor
set salary = 
  case
    when salary <= 100000 then salary * 1.05
    else salary * 1.03
  end 
Note

General syntax of case statement:

case 
  when P1 then res1 -- P1 is a predicate 
  when P2 then res2 
  ...
  when Pn then res_n
  else res0
end

since case statement is an expression that is evaluated to a value, it can be used any place where a value is used.

Complex subqueries can follow after set clause:

  • set total credit of each student to the sum of the credits of courses successfully completed by the student. (Grade is not an ‘F’ or null)
update student
set tot_cred = (
    select sum(c.credits)
    from takes t, course c
    where t.course_id = c.id 
        and t.student_id = student.id 
        and t.grade is not null 
        and t.grade not like 'F'
)

For student that haven’t successfully completed a course, total credit will be set to null. Instead we may set it to 0 using case:

update student
set tot_cred = (
  case
    when select sum(c.credits) is not null then sum(credits)
    else 0
    from takes t, course c
    where t.course_id = c.id 
        and t.student_id = student.id 
        and t.grade is not null 
        and t.grade not like 'F'
)

Equivalently, this can be ahieved by coalesce provided by some DBs.

update student
set tot_cred = (
    select coalesce(sum(c.credits), 0)
    from takes t, course c
    where t.course_id = c.id 
        and t.student_id = student.id 
        and t.grade is not null 
        and t.grade not like 'F'
)

where coalesce(x, y) evaluates to x, if x is not null, and to y otherwise.