Notes on Postgresql Internals

Heap Table File

Page layout of a heap table file
  1. heap tuple(s) – A heap tuple is a record data itself. They are stacked in order from the bottom of the page.
  2. line pointer(s) – a pointer to each heap tuple. It is also called an item pointer. Line pointers form a simple array, which plays the role of index(offset number) to the tuples. When a new tuple is added to the page, a new line pointer is also pushed onto the array to point to the new one.
  3. header data – It is 24 byte long and contains general information about the page.
  • pd_lsn – This variable stores the LSN of XLOG record written by the last change of this page. It is an 8-byte unsigned integer, related to the WAL (Write-Ahead Logging) mechanism.
  • pd_checksum – This variable stores the checksum value of this page. (Note that this variable is supported in version 9.3 or later; in earlier versions, this part had stored the timelineId of the page.)
  • pd_lower, pd_upper – pd_lower points to the end of line pointers, and pd_upper to the beginning of the newest heap tuple.
  • pd_special – This variable is for indexes. In the page within tables, it points to the end of the page.
  • tuple identifier (TID) comprises block number and offset number of line pointer

Reading Heap Tuples

Sequential scan and index scan

5. Concurrency Control

  • There are three broad concurrency control techniques, i.e. Multi-version Concurrency Control (MVCC), Strict Two-Phase Locking (S2PL), and Optimistic Concurrency Control (OCC), and each technique has many variations
  • In MVCC, each modification creates its own version of data item while retaining the old version of it
  • when a transaction reads a data item, the system selects the correct version of the data item to ensure isolation of the individual transaction
  • In MVCC, writers don’t block readers, readers don’t block writers
  • In S2PL, writers acquire an exclusive lock which blocks readers
  • Postgres uses the variation of MVCC, called Snaption Isolation(SI)
  • SI does not allow some anomalies such as Dirty reads, Non-repeatable reads, and Phantom Reads. But it cannot achieve true serializability because it allows serialization anomalies such as Write skew, Read-only transaction
  • Serializable Snapshot Isolation(SSI) is introduced with version 9.1 to prevent these
  • Postgresql supports these transaction isolation levels: READ COMMITTED, REPEATABLE READS, SERIALIZABLE (
  • Postgresql uses SSI for DML(Select, insert, delete…), while 2PL for DDL (create table…)

5.1. Transaction ID(txid)

  • txids are assigned once the command following the BEGIN command is issued
  • at the viewpoint of txid 100, the txids that are greater than 100 are invisible, as they are “in the future”, while txids that are less than 100 are visible, as they are in the past.
transaction-ids in PostgreSQL

5. 2. Tuple Structure

tuple structure
  • t_xmin: the transaction which created the tuple
  • t_xmax: the transaction which deleted or updated the tuple. otherwise 0.
  • t_cid: hold the command id, which means how many SQL commands are executed before this command in the current transaction beginning from 0. Assume that we execute three INSERT commands within a single transaction: ‘BEGIN; INSERT; INSERT; INSERT; COMMIT;’. If the first command inserts this tuple, t_cid is set to 0. If the second command inserts this, t_cid is set to 1, and so on.
  • t_ctid: holds the tuple id which points to itself or a new tuple. When this tuple is updated, the t_ctid of this tuple points to the new tuple; otherwise, the t_ctid points to itself.

5.3. Inserting, deleting and updating tuples

representation of tuples
  • deleted tuples are regarded as dead tuples and eventually removed in VACUUM processing
  • every update operation creates a new tuple which is pointed by the old updated tuple.

5.3.1. Insertion

tuple insertion
  • t_xmin is set to 99 because this tuple is inserted by txid 99.
  • t_xmax is set to 0 because this tuple has not been deleted or updated.
  • t_cid is set to 0 because this tuple is the first tuple inserted by txid 99.
  • t_ctid is set to (0,1), which points to itself because this is the latest tuple.

5.3.2. Deletion

5.3.3. Update

Update the same row twice

5.4 Commit Log (clog)

  • postgres retains a commit log which holds the status of the current transaction in the shared memory
  • the clog is basically an array whose each item holds the status of a transaction
How clog works
  • transaction logs are maintained by storing in files during server shutdown

5.5. Transaction Snapshot

  • txids that are less than 100 are not active.
  • txids that are equal to or greater than 104 are active.
  • txids 100 and 102 are active since they exist in the xip list, whereas txids 101 and 103 are not active.
Example of transaction snapshot representation
  • Transaction snapshots are provided by the transaction manager. In the READ COMMITTED isolation level, the transaction obtains a snapshot whenever an SQL command is executed; otherwise (REPEATABLE READ or SERIALIZABLE), the transaction only gets a snapshot when the first SQL command is executed. The obtained transaction snapshot is used for a visibility check of tuples, which is described in Section 5.7.
Transaction manager and transactions

5.6. Visibility Check Rules




You can find my notes I take when learning something new or reading, watching. So, they only help me to refresh and remember what I’ve consumed.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Understanding recursion with drawings


First Sprint Planning with a new Scrum Team

The order of random and parallel Unit Tests in Xcode

BP’s Daily Digest #14 — GraalVM 22.1, Salesforce Mobile SDK, and more

RESTful API, HOW TO | Part 1 —  Design

Azure Blob Storage

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mustafa Atik

Mustafa Atik

You can find my notes I take when learning something new or reading, watching. So, they only help me to refresh and remember what I’ve consumed.

More from Medium

Is it necessary for a programmer to understand how the computer works at the hardware level?

Efficient Transfinite Lists

What is Competitive Programming and why is it important?