This is not book summary

This is me keeping notes of all the functionalities of PostgreSQL used within the book for reference.

Fly By

Pattern Matching

LIKE and ILIKE

like expression returns true if string(column’s value) matches the supplied pattern. It is case sensitive match. ilike matches the string with pattern case insensitively. If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero or more characters. More in the official docs.

POSIX Regex

There are four regex operators:

OperatorDescription
string ~ patternstring matches regex pattern case sensitively
string *~ patternstring matches regex pattern case insensitively
string !~ patternstring does not matches regex pattern case sensitively
string !*~ patternstring does not matches regex pattern case insensitively
More in the official docs.

Bent Fields

Sargable vs Non Sargable queries

Sargable stands for Search Argument able. Sargable queries are queries which use indexes to speed up execution of query. Non Sargable therefore are queries which do not use indexes to speed up execution of query. Sargablity matters in where, order by, group by clause. In Select clause it doesn’t matter much.

One of the most common way to make query non-sargable is to use function in where clause.

SELECT *
FROM   myTable
WHERE  SQRT(myIntField) > 11.7

Sargable version of this query would be to compare int value with square of 11.7

SELECT *
FROM   myTable
WHERE  myIntField > 11.7 * 11.7

WHERE clauses that are sargable typically have field values on the left of the operator, and scalar values or expressions on the right side of the operator.

For WHERE...LIKE clause, the query is sargable if they have field values on the left of the operator, and LIKE text strings that do not begin with the % on the right.

-- Non-sargable
SELECT *
FROM   myTable
WHERE  myNameField LIKE '%Wales%'
 
--Sargable
SELECT *
FROM   myTable
WHERE  myNameField LIKE 'Wales%'

Type Casts

Type casts converts one data type to another if they are compatible.

CAST (`expression` AS `type`)
`expression`::`type`

Docs

Full text Indexing

Full-text indexing in PostgreSQL enables efficient searching of natural language documents. Full text indexing allows documents to be preprocessed and an index saved for later rapid searching. Data types, functions and operators usually involved in creating full text index. Data Types:

  • tsvector: Stores preprocessed text in a searchable format.
  • tsquery: Represents the search query in a format that can be compared with tsvector.

Functions:

  • to_tsvector: Converts text to tsvector.
  • to_tsquery: Converts a query string to tsquery.

Operators:

  • @@: Matches tsvector against tsquery.
  • >: Proximity operator.

Steps in full text indexing:

  1. Create a colum with type tsvector in a table.
  2. create a GIN index of that column for making searches fast.
  3. Use to_tsvector function in insert/update/upsert query to add texts as tsvector in the row.
insert into articles (title, body, search_vector)
values (
'Optimizing Search Queries',
'Optimization of search queries involves indexing strategies and query planning.',
to_tsvector('english', 'Optimizing Search Queries Optimization of search queries involves indexing strategies and query planning.')
);
  1. Use to_tsquery function and @@ operator to do text search on table.
select id, title, date from enceladus_events where search @@ to_tsquery('closest')

Docs

Views

Refer sql-views and postgres docs

Sniff the Sky

Use csvkit to import CSVs into PostgreSQL.

Character Types

NameDescription
character varying(n),varchar(n)variable-length with limit i.e storing strings longer than limit will throw an error and strings shorter than limit will be stored as is. If one casts string longer than n, the string will be truncated to n characters.
character(n), char(n),bpchar(n)fixed-length, blank-padded i.e storing strings shorter than n will be stored with blank space. When values are retrieved or when value is being used in comparison, for these types the trailing spaces are removed.
bpcharvariable unlimited length, blank-trimmed i.e any length string can be stored but if string has any trailing space it will be removed before storing.
textvariable unlimited length
As per tip from documentation, char(n) is usually slower than text and varchar and for most cases text and varchar must be used.

How strings are stored in PostgreSQL for all character types:

The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn’t be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.

Documentation

Common Table Expressions

Common Table expressions allows to chain queries together, passing the results of one to the next.

CTEs, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in a WITH clause can be a SELECTINSERTUPDATE, or DELETE; and the WITH clause itself is attached to a primary statement that can be a SELECTINSERTUPDATEDELETE, or MERGE.

All the ways table can be used in a query, CTEs can be used just that the scope of usage of CTEs is limited to that query. Also check TAOP:CTE , Documentation

Ring Dust

Calculations in SQL are code smells and can be simplified using functions or user defined functions. PostgreSQL provides four kinds of functions:

  • query language functions (functions written in SQL)
  • procedural language functions (functions written in, for example, PL/pgSQL or PL/Tcl)
  • internal functions
  • C-language functions

Query Language Functions

Create function Syntax Drop Function Syntax Documentation SQL functions execute SQL statements within the function and returns the output of last query in the function. If return type of function is specified as SETOF sometype it will return all the result of the last query else it will return first row of the result. Unless the function is declared to return void, the last statement must be a SELECT, or an INSERTUPDATE, or DELETE that has a RETURNING clause. If you want to defined an function which performs action but doesn’t return anything, the return type should be specified as void.

Functions are created on schema level and if not mentioned will be created for current schema. Functions with same name and different arguments can coexists and are technically different functions.

To replace the current definition of an existing function, use CREATE OR REPLACE FUNCTION. It is not possible to change the name or argument types of a function this way (if you tried, you would actually be creating a new, distinct function). Also, CREATE OR REPLACE FUNCTION will not let you change the return type of an existing function. To do that, you must drop and recreate the function. (When using OUT parameters, that means you cannot change the types of any OUT parameters except by dropping the function.)

Select Into

Can be used to create a table from select query. Syntax

Concat

In PostgresSql, concatenation of strings can be done using || operator or concat function. Description of these can be found in these page of documentation.

To remember

null can be casted into any type like null::timestamptz in PostgresSQL

Window Functions

Window function performs calculation across a set of table rows that are somehow related to current row. This relation is defined by Partition By clause.

A window function call always contains an OVER clause directly following the window function’s name and argument(s). This is what syntactically distinguishes it from a normal function or non-window aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.)

There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition.

Also aggregate functions can also act as window functions if they are followed by OVER clause.

Documentation explanation Syntax List of window functions

Under the ICE

Explain and Analyze

Explain command shows execution plan that Postgres planner generates. If Analyse is used with Explain command then it will output query plan and also run the query to show actual query output stats so should be used carefully with Update and Delete command. The command creates tree structure of plan nodes where each node of query plan is represented by -> . Each nodes branches are subaction and query execution order is inside out where deepest node runs first. The order of execution of same level nodes can be different than whats shown in output. For more refer documentation and thoughtbot blog.

Instead of using foreign key constraints and junction table to join two tables using ids, indexes can be used on actual joining column to make joins faster. In book, timestamp column was actual column to join and indexing it helped in making joins faster.

PostgreSQL supports ranges data type which can be further divided into two types:

  • numeric ranges
  • date/time ranges tsrange is used to create timestamp without timezone range types. The ranges also have two types of bounds, inclusive and exclusive. Inclusive is represented by square brackets [] and exclusive is by (). Bounds type can be combined. Range Types Operators for Range Types range operators can’t be used with BTREE index, GIST index should be used instead. Slides Explaining Ranges