This is not book summary
This is me keeping notes of all the functionalities of PostgreSQL used within the book for reference.
Fly By
Pattern Matching
LIKE and ILIKE
like expression returns true if string(column’s value) matches the supplied pattern. It is case sensitive match. ilike matches the string with pattern case insensitively.
If pattern
does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE
acts like the equals operator. An underscore (_
) in pattern
stands for (matches) any single character; a percent sign (%
) matches any sequence of zero or more characters.
More in the official docs.
POSIX Regex
There are four regex operators:
Operator | Description |
---|---|
string ~ pattern | string matches regex pattern case sensitively |
string *~ pattern | string matches regex pattern case insensitively |
string !~ pattern | string does not matches regex pattern case sensitively |
string !*~ pattern | string does not matches regex pattern case insensitively |
More in the official docs. |
Bent Fields
Sargable vs Non Sargable queries
Sargable stands for Search Argument able. Sargable queries are queries which use indexes to speed up execution of query. Non Sargable therefore are queries which do not use indexes to speed up execution of query. Sargablity matters in where, order by, group by clause. In Select clause it doesn’t matter much.
One of the most common way to make query non-sargable is to use function in where clause.
SELECT *
FROM myTable
WHERE SQRT(myIntField) > 11.7
Sargable version of this query would be to compare int value with square of 11.7
SELECT *
FROM myTable
WHERE myIntField > 11.7 * 11.7
WHERE
clauses that are sargable typically have field values on the left of the operator, and scalar values or expressions on the right side of the operator.
For WHERE...LIKE
clause, the query is sargable if they have field values on the left of the operator, and LIKE
text strings that do not begin with the %
on the right.
-- Non-sargable
SELECT *
FROM myTable
WHERE myNameField LIKE '%Wales%'
--Sargable
SELECT *
FROM myTable
WHERE myNameField LIKE 'Wales%'
Type Casts
Type casts converts one data type to another if they are compatible.
CAST (`expression` AS `type`)
`expression`::`type`
Full text Indexing
Full-text indexing in PostgreSQL enables efficient searching of natural language documents. Full text indexing allows documents to be preprocessed and an index saved for later rapid searching. Data types, functions and operators usually involved in creating full text index. Data Types:
- tsvector: Stores preprocessed text in a searchable format.
- tsquery: Represents the search query in a format that can be compared with tsvector.
Functions:
- to_tsvector: Converts text to tsvector.
- to_tsquery: Converts a query string to tsquery.
Operators:
- @@: Matches tsvector against tsquery.
- ←>: Proximity operator.
Steps in full text indexing:
- Create a colum with type tsvector in a table.
- create a GIN index of that column for making searches fast.
- Use to_tsvector function in insert/update/upsert query to add texts as tsvector in the row.
insert into articles (title, body, search_vector)
values (
'Optimizing Search Queries',
'Optimization of search queries involves indexing strategies and query planning.',
to_tsvector('english', 'Optimizing Search Queries Optimization of search queries involves indexing strategies and query planning.')
);
- Use to_tsquery function and @@ operator to do text search on table.
select id, title, date from enceladus_events where search @@ to_tsquery('closest')
Views
Refer sql-views and postgres docs
Sniff the Sky
Use csvkit to import CSVs into PostgreSQL.
Character Types
Name | Description |
---|---|
character varying(n),varchar(n) | variable-length with limit i.e storing strings longer than limit will throw an error and strings shorter than limit will be stored as is. If one casts string longer than n, the string will be truncated to n characters. |
character(n), char(n),bpchar(n) | fixed-length, blank-padded i.e storing strings shorter than n will be stored with blank space. When values are retrieved or when value is being used in comparison, for these types the trailing spaces are removed. |
bpchar | variable unlimited length, blank-trimmed i.e any length string can be stored but if string has any trailing space it will be removed before storing. |
text | variable unlimited length |
As per tip from documentation, char(n) is usually slower than text and varchar and for most cases text and varchar must be used. |
How strings are stored in PostgreSQL for all character types:
The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of
character
. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed forn
in the data type declaration is less than that. It wouldn’t be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, usetext
orcharacter varying
without a length specifier, rather than making up an arbitrary length limit.
Common Table Expressions
Common Table expressions allows to chain queries together, passing the results of one to the next.
CTEs, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in a
WITH
clause can be aSELECT
,INSERT
,UPDATE
, orDELETE
; and theWITH
clause itself is attached to a primary statement that can be aSELECT
,INSERT
,UPDATE
,DELETE
, orMERGE
.
All the ways table can be used in a query, CTEs can be used just that the scope of usage of CTEs is limited to that query. Also check TAOP:CTE , Documentation
Ring Dust
Calculations in SQL are code smells and can be simplified using functions or user defined functions. PostgreSQL provides four kinds of functions:
- query language functions (functions written in SQL)
- procedural language functions (functions written in, for example, PL/pgSQL or PL/Tcl)
- internal functions
- C-language functions
Query Language Functions
Create function Syntax
Drop Function Syntax
Documentation
SQL functions execute SQL statements within the function and returns the output of last query in the function. If return type of function is specified as SETOF sometype
it will return all the result of the last query else it will return first row of the result. Unless the function is declared to return void
, the last statement must be a SELECT
, or an INSERT
, UPDATE
, or DELETE
that has a RETURNING
clause. If you want to defined an function which performs action but doesn’t return anything, the return type should be specified as void
.
Functions are created on schema level and if not mentioned will be created for current schema. Functions with same name and different arguments can coexists and are technically different functions.
To replace the current definition of an existing function, use
CREATE OR REPLACE FUNCTION
. It is not possible to change the name or argument types of a function this way (if you tried, you would actually be creating a new, distinct function). Also,CREATE OR REPLACE FUNCTION
will not let you change the return type of an existing function. To do that, you must drop and recreate the function. (When usingOUT
parameters, that means you cannot change the types of anyOUT
parameters except by dropping the function.)
Select Into
Can be used to create a table from select query. Syntax
Concat
In PostgresSql, concatenation of strings can be done using ||
operator or concat
function. Description of these can be found in these page of documentation.
To remember
null can be casted into any type like
null::timestamptz
in PostgresSQL
Window Functions
Window function performs calculation across a set of table rows that are somehow related to current row. This relation is defined by Partition By
clause.
A window function call always contains an
OVER
clause directly following the window function’s name and argument(s). This is what syntactically distinguishes it from a normal function or non-window aggregate. TheOVER
clause determines exactly how the rows of the query are split up for processing by the window function. ThePARTITION BY
clause withinOVER
divides the rows into groups, or partitions, that share the same values of thePARTITION BY
expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.You can also control the order in which rows are processed by window functions using
ORDER BY
withinOVER
. (The windowORDER BY
does not even have to match the order in which the rows are output.)There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if
ORDER BY
is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to theORDER BY
clause. WhenORDER BY
is omitted the default frame consists of all rows in the partition.
Also aggregate functions can also act as window functions if they are followed by OVER
clause.
Documentation explanation Syntax List of window functions
Under the ICE
Explain and Analyze
Explain command shows execution plan that Postgres planner generates. If Analyse is used with Explain command then it will output query plan and also run the query to show actual query output stats so should be used carefully with Update and Delete command. The command creates tree structure of plan nodes where each node of query plan is represented by ->
. Each nodes branches are subaction and query execution order is inside out where deepest node runs first. The order of execution of same level nodes can be different than whats shown in output.
For more refer documentation and thoughtbot blog.
Instead of using foreign key constraints and junction table to join two tables using ids, indexes can be used on actual joining column to make joins faster. In book, timestamp column was actual column to join and indexing it helped in making joins faster.
PostgreSQL supports ranges data type which can be further divided into two types:
- numeric ranges
- date/time ranges
tsrange
is used to create timestamp without timezone range types. The ranges also have two types of bounds, inclusive and exclusive. Inclusive is represented by square brackets[]
and exclusive is by()
. Bounds type can be combined. Range Types Operators for Range Types range operators can’t be used with BTREE index, GIST index should be used instead. Slides Explaining Ranges