13 Introduction to SQL

13.1 Overview

Data-definition language (DDL):
- defining and modifying relation schemas
- integrity constrains
- view definition
- authorization: access rights to relations and views.
Data-manipulation language (DML): querry information from and modify tuples in relations
Transaction Control: Specifying beginning and end points of transactions.
Embedded/dynamic SQL: how SQL is embedded in general-purpose programmig languages.

13.2 SQL DDL

following can be specified:

schema for each relation
set of indices to be maintained for each relation
security and authorization information for each relation
physical storage structure of each relation on disk

13.2.1 Basic Types

char(n): fixed n-long string. (full form: character(n) )
varchar(n): variable-length character string with max length n. (full form character varying(n))
int (full form: integer)
smallint
numeric(p, d): p digits in total, d of the digits after decimal point.
real, double precision: floating-point / double precision floating point.
float(n): Floating-point with precision of at least n digits.

13.2.2 Basic Schema Definition

create table department (
    dept_name   varchar(20),
    building    varchar(15),
    budget      numeric(12,2),
    primary key (dept_name) --primary key integrity constraint
);

general form:

create table r(
    a1 domain1,
    a2 domain2,
    ...
    a_n domain_n
    [integrity constraint 1],
    ...
    [integrity constraing 2] -- these are optional
)

13.2.3 Basic Constraints:

primary key: a1, …, a_n together form the primary key of the relation:
```
primary key(a1, a2, ..., a_n)
```
foreign key: a1, …, a_n together form a foreign key over a relation s, i.e. a1, …, a_n must be a primary key of some tuple in s (existence: referential integrity)
```
foreign key (a1, a2, ..., a_n) references s
```

not null:

name varchar(20) not null, --name can not be null

concrete example:

create table instructor(
  ID          varchar(5),
  name        varchar(20) not null, --name can not be null
  dept_name   varchar(20),
  salary      numeric(8,2),
  primary key (ID),
  foreign key (dept_name) references department
)

Note that explicitly specifying the primary key of the referenced relation

foreign key (dept_name) references department(dept_name)

is also possible but not required.

13.2.4 Altering the Schema

remove a relation completely from the database schema:

drop table r; --not to be confused with 'delete from'

modify a relation schema:

alter table r add a1 a2; -- add the attributes a1, a2 to the relation r

alter table r drop a; -- drop/remove attribute a from the relation r

some dbs’ don’t support dropping single attributes but only whole tables.

13.3 Basic SQL Querries

A typical sql querry has the form

select a1, ..., a_n
from r1, ..., r_m
where P -- P is a predicate/condition

select name
from instructor

Displaying records 1 - 10
name
Srinivasan
Wu
Mozart
Einstein
El Said
Gold
Katz
Califieri
Singh
Crick

select dept_name
from instructor

Displaying records 1 - 10
dept_name
Comp. Sci.
Finance
Music
Physics
History
Physics
Comp. Sci.
History
Finance
Biology

select distinct dept_name --removes duplicates
from instructor

7 records
dept_name
Physics
Biology
Elec. Eng.
Finance
Comp. Sci.
History
Music

Select allows arbitrary expressions of attributes. we can output 10% salary raise for instructors

select id, name, dept_name, salary * 1.1
from instructor

Displaying records 1 - 10
id	name	dept_name	?column?
10101	Srinivasan	Comp. Sci.	71500
12121	Wu	Finance	99000
15151	Mozart	Music	44000
22222	Einstein	Physics	104500
32343	El Said	History	66000
33456	Gold	Physics	95700
45565	Katz	Comp. Sci.	82500
58583	Califieri	History	68200
76543	Singh	Finance	88000
76766	Crick	Biology	79200

13.3.1 Where clause.

Names of all instructors in the CS department who have salary greater than $70,000:

select name
from instructor
where dept_name = 'Comp. Sci.'
and salary > 70000

2 records
name
Katz
Brandt

13.3.2 Joining Tables

names of instructors, names of their departments and the names of the buildings where departments are located:

select i.name, d.name as dept_name, building
from instructor i, department d
where i.dept_name = d.name

Displaying records 1 - 10
name	dept_name	building
Nishimoto	Biology	Watson
Crick	Biology	Watson
Brandt	Comp. Sci.	Taylor
Katz	Comp. Sci.	Taylor
Srinivasan	Comp. Sci.	Taylor
Kim	Elec. Eng.	Taylor
Singh	Finance	Painter
Wu	Finance	Painter
Roznicki	History	Painter
Califieri	History	Painter

names of instructors and identifiers of courses they have tought:

select i.name, t.course_id
from instructor i, teaches t
where i.id = t.instructor_id

Displaying records 1 - 10
name	course_id
Srinivasan	CS-101
Srinivasan	CS-315
Srinivasan	CS-347
Wu	FIN-201
Mozart	MU-199
Einstein	PHY-101
El Said	HIS-351
Katz	CS-101
Katz	CS-319
Crick	BIO-101

names of instructors from the CS department and identifiers of courses that they have tought:

select i.name, t.course_id
from instructor i, teaches t
where i.id = t.instructor_id and i.dept_name = 'Comp. Sci.'

8 records
name	course_id
Srinivasan	CS-101
Srinivasan	CS-315
Srinivasan	CS-347
Katz	CS-101
Katz	CS-319
Brandt	CS-190
Brandt	CS-190
Brandt	CS-319

13.3.3 Renaming

select i.name as instructor_name, t.course_id -- as can be omitted
from instructor as i, teaches as t -- as can be ommited
where i.id = t.instructor_id

Displaying records 1 - 10
instructor_name	course_id
Srinivasan	CS-101
Srinivasan	CS-315
Srinivasan	CS-347
Wu	FIN-201
Mozart	MU-199
Einstein	PHY-101
El Said	HIS-351
Katz	CS-101
Katz	CS-319
Crick	BIO-101

Renaming useful when comparing tuples in the same relation:

names of instructors whose salary is greater than at least one instructor in the biology department $\approx$ names of all instructors who earn more than the lowest paid instructor in the Biology department:

select distinct i.name
from instructor i , instructor i2 
where i.salary > i2.salary and i2.dept_name = 'Biology'

8 records
name
Brandt
Gold
Wu
Kim
Nishimoto
Katz
Einstein
Singh

correlation name = table alias = tuple variable

13.3.4 String Operations

concatenation with ||

values 
('hey' || ' there!')

1 records
column1
hey there!

upper() and lower()

values 
(upper('hey there')),
(lower('HEY THERE'))

2 records
column1
HEY THERE
hey there

removing spaces with trim()

values 
(trim('hey   ' ) || ' there!')

1 records
column1
hey there!

13.3.5 Pattern Matching

%: matches any string
_: matches any character
examples:
- intro%: any string beginning with ‘Intro’
- %Comp%: any string containing ‘Comp’ as a substring
- ___: any string of exactly three characters
- ___%: any string of at least three characters
concrete example; information of courses that have ‘comp’ as a substring in their title:

select *
from course c
where c.title ilike '%comp%' --ilike is case insensitive like

2 records
id	title	dept_name	credits
BIO-399	Computational Biology	Biology	3
CS-101	Intro. to Computer Science	Comp. Sci.	4

*Escaping special characters like “%” with “":

select *
from (
    values
    ('%15')
) r(a)
where a like '\%%'

1 records
a
%15

Defining custom escape characters other than “":

select *
from (
    values
    ('%15')
) r(a)
where a like '^%%' escape '^' -- '^' is defined as the escape character

1 records
a
%15

13.3.6 Ordering Display of Tuples

ordering

select name
from instructor
where dept_name  = 'Physics'
order by name;

2 records
name
Einstein
Gold

ordering order, multiple attributes

select *
from instructor
order by salary desc, name asc

Displaying records 1 - 10
id	name	dept_name	salary
30765	Green	Music	NA
22222	Einstein	Physics	95000
83821	Brandt	Comp. Sci.	92000
12121	Wu	Finance	90000
33456	Gold	Physics	87000
98345	Kim	Elec. Eng.	80000
76543	Singh	Finance	80000
45565	Katz	Comp. Sci.	75000
10032	Nishimoto	Biology	73120
76766	Crick	Biology	72000

13.3.7 Between

SELECT name
from instructor i
where i.salary BETWEEN 90000 and 100000

3 records
name
Wu
Einstein
Brandt

13.3.8 Tuples in Where Predicates

SELECT i.name, t.course_id
from instructor i, teaches t
where i.id = t.instructor_id and i.dept_name = 'Biology'

is equivalent to

SELECT i.name, t.course_id
from instructor i, teaches t
where (i.id, i.dept_name) = (t.instructor_id, 'Biology')

2 records
name	course_id
Crick	BIO-101
Crick	BIO-301

13.4 Set Operations

Set operations eliminate duplicates by default. Duplicates retained by all keyword.

13.4.1 Union

corresponds to $\cup$.

courses offered in 2017 Fall or 2018 Spring:

(
    select  course_id
    from section
    where semester = 'Fall' and year = 2017
)
UNION
(
    select course_id
    from section
    where semester = 'Spring' and year = 2018
)

8 records
course_id
CS-101
CS-315
CS-319
CS-347
FIN-201
HIS-351
MU-199
PHY-101

13.4.2 Intersect

Corresponds to $\cap$.

Courses offered in both Fall 2017 and Spring 2018:

(
    select  course_id
    from section
    where semester = 'Fall' and year = 2017
)
intersect
(
    select course_id
    from section
    where semester = 'Spring' and year = 2018
)

1 records
course_id
CS-101

13.4.3 Except

Corresponds to $\backslash$.

courses offered in the Fall 2017 but not in Spring 2018:

(
    select  course_id
    from section
    where semester = 'Fall' and year = 2017
)
except
(
    select course_id
    from section
    where semester = 'Spring' and year = 2018
)

2 records
course_id
PHY-101
CS-347

Illustrating how all works:

union all:

with r(a) as (
    values
    (1),
    (1)
), s(a) as (
    values
    (1)
)
(select *
from r )
union  all
(select *
from s)

3 records
a
1
1
1

intersect all:

with r(a) as (
    values
    (1),
    (1),
    (1)
), s(a) as (
    values
    (1),
    (1)
)
(select *
from r )
INTERSECT all
(select *
from s)

2 records
a
1
1

except all:

with r(a) as (
    values
    (1),
    (1)
), s(a) as (
    values
    (1)
)
(select *
from r )
except  all
(select *
from s)

1 records
a
1

with r(a) as (
    values
    (1),
    (1)
), s(a) as (
    values
    (1)
)
(select *
from s )
except  all
(select *
from r)

0 records
a

13.5 Null

null represents values that are not known. (can be anything).
arithmetic expressions involving null produce null:

values
(1 + null)

1 records
column1
NA

boolean expressions and predicates involving null (other than is [not] null) have a special truth value unknown.
- e.g. (1 < unknown) = uknown
truth tables for unknown:

p	q	p and q
`false`	`unknown`	`false`
`true`	`unknown`	`unknown`

p	q	p or q
`false`	`unknown`	`unknown`
`true`	`unknown`	`true`

not unknown = uknown

tuples that evaluate to uknown in the where clause are not included in the result (just like the ones that evaluate to false)
we can test if a value is or isn’t null:

select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b is null

1 records
a
2

select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b is not null

1 records
a
1

we can test whether the result of a predicate is or isn’t uknown

select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b > 2 is unknown

1 records
a
2

select a
from (
    values 
    (1, 3),
    (2, null)
) r(a, b)
where b > 2 is not unknown

1 records
a
1

13.6 Aggregate Functions

Take collection (i.e. aggregate, multiset) of values as input and return a single value.
Built in aggregate functions in sql:
- avg: input must be numeric
- min
- max
- sum: computes the total sum of the values in the aggregate, input must be numeric
- count

13.6.1 Basic Aggregation

Average salary of instructors in the CS department:

select avg(salary) as avg_salary
from instructor 
where dept_name = 'Comp. Sci.'

1 records
avg_salary
77333.33

Duplicates are retained. Retention of duplicates is obviously important for calculating averages. But sometimes we may want to eliminate duplicates:

Find the total number of instructors that teach a course in the Spring 2018 semester:

select count(distinct t.instructor_id)
from teaches t
where semester = 'Spring' and year = 2018

1 records
count
6

as opposed to

select count(t.instructor_id)
from teaches t
where semester = 'Spring' and year = 2018

1 records
count
7

counting all attributes with count(*):

select count(*)
from course

1 records
count
13

count(*) retains null values. It is the only aggregate function that does so. All other aggregate functions ignore null values, including count functions applied on a single attribute. That is count(*) and count(attr) are different:

select count(*) as count_star, count(a) as count_attribute
from (
    values 
    (1),
    (null)
) r(a)

1 records
count_star	count_attribute
2	1

13.6.2 Aggregation with Grouping

Instead of applying the aggregate function to a single aggregate (collection), we can apply it to multiple aggregates/collections, that consist of tuples grouped together w.r.t certain grouping attributes:

find the average salary in each department:

select dept_name, avg(salary) as avg_dept_salary
from instructor  
group by dept_name
order by avg(salary) desc

7 records
dept_name	avg_dept_salary
Physics	91000.00
Finance	85000.00
Elec. Eng.	80000.00
Comp. Sci.	77333.33
Biology	72560.00
History	59666.67
Music	40000.00

find number of instructors working in each department:

select dept_name, count(*) cnt
from instructor 
group by dept_name 
order by dept_name

7 records
dept_name	cnt
Biology	2
Comp. Sci.	3
Elec. Eng.	1
Finance	2
History	3
Music	2
Physics	2

find the number of instructors in each department who teach a course in the Spring 2018 semester:

select i.dept_name, count(distinct i.id) as instr_count
from instructor i , teaches t 
where i.id = t.instructor_id 
and t.semester = 'Spring' and t.year = 2018
group by i.dept_name

4 records
dept_name	instr_count
Comp. Sci.	3
Finance	1
History	1
Music	1

Warning

Note that only attributes allowed to appear in the select clause (other than the attribute being aggregated) are the attributes used in the group by clause. Thus

select a, X, avg(b) -- X doesn't appear in the group by clause below
from r -- some relation r
where P -- some predicate
group by a

it not legal.

13.6.3 Having Clause

Specifies a condition that applies to groups rather than to tuples.

departments where average salayr is morethan $42,000:

select i.dept_name, avg(i.salary)
from instructor i 
group by dept_name
having avg(i.salary) > 42000
order by avg(i.salary) desc

6 records
dept_name	avg
Physics	91000.00
Finance	85000.00
Elec. Eng.	80000.00
Comp. Sci.	77333.33
Biology	72560.00
History	59666.67

Like with select the attributes that are allowed in the having clause are either present in the group by clause, it is the attribute that is being aggregated.

13.6.4 Semantics of Group by and Having

Can be understood roughly as:

from is evaluated to get a relation
where predicate is applied on each tupel to get a new relation
tupels that agree on values of those attributes listed in the group by clause are placed into groups.
having clause applied to each group, the ones that satisfy it are retianed to obtain a new relation
select clause is applied to the relation to obtain the resulting relation.

A query with both where and having:

for each course section offered in 2017, find the average total credits (tot_cred) of all students enrolled in the section, if the section has at least 2 students:

select t.course_id, t.sec_id, t.semester, avg(s.tot_cred)
from takes t, student s 
where t.student_id = s.id 
and t."year" = 2017
group by course_id, sec_id, semester
having count(s.id) > 1;

3 records
course_id	sec_id	semester	avg
CS-101	1	Fall	65
CS-190	2	Spring	43
CS-347	1	Fall	67

Aggregation with null

All aggregate functions except of count(*) ignore null values

Aggregation of Boolean Values

the aggregate function some() can be applied to an aggregate consisting of boolean values to compute the disjunction of these values
the aggregate function every() can be applied to an aggregate consisting of boolean values to compute the conjunction of the values:

select every(a)
from (
    values 
    (true),
    (true)
) r(a)

1 records
every
TRUE

13.7 Nested Subqueries

Nested subqueries are used for:

test for set membership with [not] in
make set comparisons
determine set cardinality by nesting queries in the where clause

Queries can be nested in the

where clause
from clause

Simple example for a subquery:

-- names of departments and the average pay
-- where average pay in that department is above the overall average pay
select dept_name, avg(i.salary) as avg_dep_pay
from instructor i 
group by i.dept_name 
having avg(i.salary) > (
    select avg(i2.salary)
    from instructor i2 
)
order by avg(i.salary) desc

4 records
dept_name	avg_dep_pay
Physics	91000.00
Finance	85000.00
Elec. Eng.	80000.00
Comp. Sci.	77333.33

13.7.1 Testing for Set Membership

Reconsider the queries

Find all the courses taught in both Fall 2017 and Spring 2018 semesters
Find all the courses taught in Fall 2017 but not in Spring 2018

Previously we used set operations. Now we can use in:

select distinct s.course_id  
from "section" s 
where s.semester = 'Fall' and s."year" = 2017 
and s.course_id in (
    select s.course_id
    from "section" s 
    where s.semester = 'Spring' and s."year" = 2018
)

1 records
course_id
CS-101

select distinct s.course_id  
from "section" s 
where s.semester = 'Fall' and s."year" = 2017 
and s.course_id not in (
    select s.course_id
    from "section" s 
    where s.semester = 'Spring' and s."year" = 2018
)

2 records
course_id
CS-347
PHY-101

where

select course_id
from section
where semester = 'Spring' and year = 2018

is a nested subquery.

distinct is used since set operations remove duplicates by default.

13.7.2 Enumerated Sets

[not] in can be used on enumerated sets:

Names of al instructors other than Mozart and Einstein:

select distinct name
from instructor i 
where name not in ('Mozart', 'Einstein')

Displaying records 1 - 10
name
Singh
Srinivasan
Crick
Green
Brandt
Gold
Roznicki
Califieri
Wu
Kim

Warning

enumerated sets shouldn’t be confused with tupels

13.7.3 Set Comparison

Reconsider the query:

Names of all instructors whose salary is grater than at least one instructor in the Biology department:

Previously we used the somewhat awkward solution:

select i.name
from instructor i , instructor i2 
where i2.dept_name = 'Biology'
and i.salary > i2.salary

Now we can express this much more similar to natural language:

select i.name
from instructor i 
where i.salary > some(
    select salary 
    from instructor
    where dept_name = 'Biology'
)

8 records
name
Wu
Einstein
Gold
Katz
Singh
Brandt
Kim
Nishimoto

Contrast:

with cross product with set comparison

select i.name
from instructor i , instructor i2
where i2.dept_name = 'Biology'
and i.salary > i2.salary

select i.name
from instructor i
where i.salary > some(
    select salary
    from instructor
    where dept_name = 'Biology'
)

Consider another query:

Names of instructors that have a salary greater than that of any/each instructor in the Biology department:

select i."name"
from instructor i 
where i.salary > all (
    select salary 
    from instructor 
    where dept_name = 'Biology'
)

7 records
name
Wu
Einstein
Gold
Katz
Singh
Brandt
Kim

Warning

= some (...) is identical to in (...)
<> some (...) is not identical to not in (...)
<> all (...) is identical to not in (...)
= all(...) is not identical to in (...)

name of the department with the highest average salary:

select dept_name
from instructor i 
group by i.dept_name
having avg(i.salary) >= all (
    select avg(salary)
    from instructor 
    group by dept_name 
)

1 records
dept_name
Physics

13.7.4 Testing for Empty Relations

We can test whether a query has any tuples in its result (whether if it’s non-empty) with exists. Reconsider the query

All courses taught both in the Fall 2017 and Spring 2018:

select course_id 
from "section" s
where s.semester = 'Fall' and s."year" = 2017
and exists (
    select *
    from "section" s2
    where s2.semester = 'Spring' and s2."year" = 2018
    and s.course_id = s2.course_id 
)

1 records
course_id
CS-101

Above a correlation name from the outer querry has been used in the inner querry. This can equivalently be achieved with a usual join operation:

select s.course_id
from "section" s , "section" s2
where s.semester = 'Fall' and s."year" = 2017
and s2.semester = 'Spring' and s2."year" = 2018
and s.course_id = s2.course_id

Non-existence can be queried with not exists. For example we can use it to simulate set containment.

\[ B \subseteq A \equiv \texttt{not exists (B except A)} \]

All students who have taken all courses offered in the Biology department:

select s.id, s."name" 
from student s 
where not exists (
    (select c.id
    from course c
    where c.dept_name = 'Biology')
        except
    (select t.course_id 
    from takes t 
    where s.id = t.course_id)
)

0 records
id	name

The subquerry

select c.id
from course c
where c.dept_name = 'Biology'

finds all courses offered in the Biology department. The subquerry:

select t.course_id 
from takes t 
where s.id = t.course_id

finds all courses that the student ‘s’ has taken.

Consider the query:

How many students have taken a course by instructor with the ID ‘10101’

We can construct this query in three different ways:

With the usual join operation:

select count(distinct t.student_id)
from takes t, teaches t2 
where t.course_id = t2.course_id 
    and t.sec_id = t2.sec_id 
    and t.semester = t2.semester 
    and t."year" = t2."year" 
    and t2.instructor_id = '10101'

With set membership test using in with a tuple instead of a single attribute

select count(distinct t.student_id)
from takes t 
where (t.course_id, t.sec_id, t.semester, t."year") in ( 
    select t2.course_id , t2.sec_id , t2.semester , t2."year" 
    from teaches t2 
    where t2.instructor_id = '10101'
)

with the exists construct

select count(distinct t.student_id)
from takes t 
where exists (
    select t2.course_id , t2.sec_id , t2.semester , t2."year" 
    from teaches t2 
    where t.course_id = t2.course_id 
    and t.sec_id = t2.sec_id 
    and t.semester = t2.semester 
    and t."year" = t2."year" 
    and t2.instructor_id = '10101'
)

1 records
count
6

13.7.5 Test for Absence of Duplicates / Test for Uniqueness

Testing if subquery has duplicate tuples with unique:

courses that were offered at most once in 2017:

select c.id 
from course c 
where unique (
    select 
    from "section" s 
    where c.id = s.course_id 
    and s."year" = 2017
)

Since unique not implemented in Postgresql, we can simulate it as follows:

select c.id 
from course c 
where 1 >= (
    select count(s.course_id)
    from "section" s 
    where s."year" = 2017
    and c.id = s.course_id
)

Displaying records 1 - 10
id
BIO-101
BIO-301
BIO-399
CS-101
CS-315
CS-319
CS-347
EE-181
FIN-201
HIS-351

(For some reason this delivers a different result than two solutions below. Why?)

Another solution using aggregate functions:

select c.id 
from course c , "section" s 
where c.id = s.course_id 
and s."year" = 2017
group by c.id
having count(*) <= 1

5 records
id
BIO-101
CS-101
CS-347
EE-181
PHY-101

Yet another far less elegant solution:

First we find courses offered more than once in 2017:

select course_id
from "section" s 
where s."year" = 2017
    except all
select distinct course_id 
from "section" s 
where s."year" = 2017

1 records
course_id
CS-190

Than courses offered at most once in 2017:

select distinct course_id
from "section" s 
where s."year" = 2017
and course_id not in (
    select course_id
from "section" s 
where s."year" = 2017
    except all
select distinct course_id 
from "section" s 
where s."year" = 2017
)

5 records
course_id
BIO-101
CS-101
CS-347
EE-181
PHY-101

13.7.6 Subqueries in the From Clause

Since relations appear in the from clause, there is nothing preventing them being subqueries. Reconsider the query

Average instructor salaries per department, where the average salary in that department is greater than 42000:

Previously we solved using group by and having - which is the natural way:

select i.dept_name, avg(i.salary) avg_salary
from instructor i 
group by i.dept_name 
having avg(i.salary) > 42000

We can re-write it without having using subquery in the from clause:

select dept_name, avg_salary
from (
    select i.dept_name, avg(i.salary)
    from instructor i 
    group by i.dept_name 
) dep_salaries(dept_name, avg_salary) --table alias 
where avg_salary > 42000

6 records
dept_name	avg_salary
Physics	91000.00
Biology	72560.00
Elec. Eng.	80000.00
Finance	85000.00
Comp. Sci.	77333.33
History	59666.67

Contrast:

with having with nested query in from clause

select i.dept_name, avg(i.salary) avg_salary
from instructor i
group by i.dept_name
having avg(i.salary) > 42000

select dept_name, avg_salary
from (
    select i.dept_name, avg(i.salary)
    from instructor i
    group by i.dept_name
) dep_salaries(dept_name, avg_salary) --table alias
where avg_salary > 42000

Another example:

max total salary in a department across all departments:

select max(tot_sal) as max_tot_sal
from (
    select sum(i.salary)
    from instructor i 
    group by i.dept_name 
) r(tot_sal) -- table alias

1 records
max_tot_sal
232000

Note

Correlation variables are allowed in a from subquery using the lateral keyword.

names of instructors, their salaries, alongside with the average salary of their department:

select i.name, i.salary , i.dept_name, avg_dep_salary
from instructor i , lateral (
    select avg(i2.salary)
    from instructor i2 
    where i.dept_name = i2.dept_name 
) r(avg_dep_salary)

Displaying records 1 - 10
name	salary	dept_name	avg_dep_salary
Srinivasan	65000	Comp. Sci.	77333.33
Wu	90000	Finance	85000.00
Mozart	40000	Music	40000.00
Einstein	95000	Physics	91000.00
El Said	60000	History	59666.67
Gold	87000	Physics	91000.00
Katz	75000	Comp. Sci.	77333.33
Califieri	62000	History	59666.67
Singh	80000	Finance	85000.00
Crick	72000	Biology	72560.00

13.7.7 With Clause

Defines temporary relations whose definition is available only in the query in which with clause occurs.

Departments with maximum budget

with max_budget(value) as (
    select max(d.budget)
    from department d 
)
select d."name" , d.budget as budget
from department d, max_budget mb
where d.budget = mb.value

1 records
name	budget
Finance	120000

Alternatively we can use nested subquerries in the with clause or in the from clause:

as a simple subquery in where clause:

select d."name" , d.budget 
from department d 
where d.budget = (
    select max(d2.budget)
    from department d2 
)

using all with a where-clause subquery:

select d.name, d.budget  
from department d 
where d.budget >= all (
    select budget 
    from department d2 
)

as from-clause subquery:

select d.name, d.budget 
from department d, (
    select max(d2.budget)
    from department d2 
) bd(val)
where d.budget = bd.val

Using with improves readability. Consider another example:

departments where total salary greater than the average of the total salary of all departments:

with dep_tot_salary(d_name, t_salary) as (
    select i.dept_name, sum(i.salary)
    from instructor i 
    group by i.dept_name    
)
select *
from dep_tot_salary
where t_salary > (
    select avg(t_salary)
    from dep_tot_salary
)

4 records
d_name	t_salary
Physics	182000
Finance	170000
Comp. Sci.	232000
History	179000

13.7.8 Scalar Subqueries

Queries returning one single tuple with one attribute are called scalar. Such queries can be used in place of values as scalar subquerries, even in select-clause:

all departments along with number of instructors in each department

select d."name" , 
       ( -- scalare unteranfragen duerfen innerhalb select eingesetzt werden
        select count(*)
        from instructor i 
        where d."name" = i.dept_name --d is a correlatin variable
       ) cnt
from department d 
order by cnt desc

7 records
name	cnt
History	3
Comp. Sci.	3
Finance	2
Biology	2
Physics	2
Music	2
Elec. Eng.	1

Of course this can simply be achieved with grouping:

select i.dept_name , count(*)
from instructor i 
group by i.dept_name 
order by count(*) desc

7 records
dept_name	count
History	3
Comp. Sci.	3
Physics	2
Finance	2
Biology	2
Music	2
Elec. Eng.	1

Alternatively as subquery of from with lateral:

select d."name", res.val as cnt
from department d , lateral (
    select count(*)
    from instructor i 
    where i.dept_name = d."name" 
) res(val)
order by cnt desc

Scalar subqueries can occur in select, where and having clauses.

13.7.9 Queries Without From

with values

values
(1, 'a'),
(2, 'b')

2 records
column1	column2
1	a
2	b

with select

select 
(1), ('a')
  union all
select 
(2), ('b')

2 records
?column?	?column?..2
1	a
2	b

13.8 Modifying the Database

adding - insert into R
removing - delete from R
changing - update R set

infromation from the database as opposed to querying.

13.8.1 Deletion

Expressed just like a query,

delete tuples from r satisfying condition P:

delete from r -- relation
where P -- predicate

delete all tuples from instructor:

delete from instructor;

or equivalently

delete from instructor
where true

The predicate can be arbitrarily complex,

Delete all instructors that work in the Watson building:

delete from instructor
where dept_name in (
  select dept_name
  from department
  where building = 'Watson'
)

delete deletes tupels from a single relation, but we can still reference any number of relations nested in select, from and where, including the one that we are deleting from. Consider:

delete all instructors that earn less than the overall average instructor salary

delete from instructor
where salary < (
  select avg(salary)
  from instructor
)

delete(which is anoter way of saying ‘fire’) all instructors that haven’t taught in the year 2018

delete from instructor i
where i.id not in (
  select t.instructor_id
  from teaches t 
  where t."year" = 2018
)

above a relation other than the one being deleted was referenced in the nested subquery in the where clause, demonstrating we can reference arbitrary relations.

Note

deletions are performed after the tupels are filtered (the ones that pass the test). Otherwise the result could be influenced.

13.8.2 Insertion

Any query result delivering a collection of tuples can be inserted into a relation, as long as they agree with the cardinality and domains of the attributes of the relation:

providing tuples explicitetly using values:

insert into course
values 
('CS-437', 'Database Systems', 'Comp. Sci.', 4),
('PHY-201', 'Intro. Theo. Phys.', 'Physics', 6)

or using select:

insert into course
select 
('CS-437'), ('Database Systems'), ('Comp. Sci.'), (4)
  union
select 
('PHY-201'), ('Intro. Theo. Phys.'), ('Physics'), (6)

we can explicitely specify the order of the attributes:

insert into course(title, id, credits, dept_name)
  values ('Database Systems', 'CS-437', 4, 'Comp. Sci.')

even omit some of the attributes from the specification (ommited attributes are set automatically as null):

insert into course (title, id)
  values ('Baking Cakes', 'BK-101')

Above dept_name and credit are set automatically as null.

More generally, results of an arbitary query can be inserted as well. Consider:

make each student in the Music department who has earned more than 144 credit hours an instructor in the Music department with a salary of $18,000.

insert into instructor 
  select s.id, s."name", s.dept_name, 18000
  from student s
  where s.dept_name = 'Music' and tot_cred > 144;

insert a copy of a relation to itself:

insert into s
  select *
  from s

Note

Importantly in the above example:

insert into s
  select *
  from s

the query is evaluated before the insertion is performed. Otherwise we could face infinite loops. Also note that this operation is only possible if the relation s has no primary key defined. Otherwise duplicates are not allowed.

13.8.3 Updates

Changing some values of tuples is possible with update. Consider:

Increase the salaries of all instructors by 5%

update instructor
set salary = salary * 1.05

Raise the salary only of those instructors with pay less than $70,000

update instructor
set salary = salary * 1.05
where salary < 70000

Nested subqueries are allowed in the where clause, referencing the relation being updated or other arbitrary relations. Consider:

give a 5% salary raise to instructors whose salary is less than overall average salary:

update instructor
set salary = salary * 1.05
where salary < (
  select avg(salary)
  from instructor 
)

give a 5% salary raise to instructors that have taugh more than 1 course in 2018:

update instructor i
set i.salary = i.salary * 1.05
where 1 < (
    select count(*)
    from teaches t 
    where t.instructor_id = i.id
    and  t."year" = 2018
)

give 3% raise to instructors with salary over $100,000, 5% to others.

One solution is to write to update statements:

update instructor 
set salary = salary * 1.03
where salary > 100000

update instructor 
set salary = salary * 1.05
where salary <= 100000

Warning

in the above solution the order of statements is important, otherwise we could end up giving a 8% raise to instructors whose salary is 100000 or just below it.

Alternatively use case statement. Then the order won’t be important.

update instructor
set salary = 
  case
    when salary <= 100000 then salary * 1.05
    else salary * 1.03
  end

Note

General syntax of case statement:

case 
  when P1 then res1 -- P1 is a predicate 
  when P2 then res2 
  ...
  when Pn then res_n
  else res0
end

since case statement is an expression that is evaluated to a value, it can be used any place where a value is used.

Complex subqueries can follow after set clause:

set total credit of each student to the sum of the credits of courses successfully completed by the student. (Grade is not an ‘F’ or null)

update student
set tot_cred = (
    select sum(c.credits)
    from takes t, course c
    where t.course_id = c.id 
        and t.student_id = student.id 
        and t.grade is not null 
        and t.grade not like 'F'
)

For student that haven’t successfully completed a course, total credit will be set to null. Instead we may set it to 0 using case:

update student
set tot_cred = (
  case
    when select sum(c.credits) is not null then sum(credits)
    else 0
    from takes t, course c
    where t.course_id = c.id 
        and t.student_id = student.id 
        and t.grade is not null 
        and t.grade not like 'F'
)

Equivalently, this can be ahieved by coalesce provided by some DBs.

update student
set tot_cred = (
    select coalesce(sum(c.credits), 0)
    from takes t, course c
    where t.course_id = c.id 
        and t.student_id = student.id 
        and t.grade is not null 
        and t.grade not like 'F'
)

where coalesce(x, y) evaluates to x, if x is not null, and to y otherwise.