03 - Database Indexing

Inserting 1 million row at a temp table

Here's the prerequisists

docker run -e POSTGRES_PASSWORD=postgres --name pg1-indexing postgres
docker exec -it pg1-indexing psql -U postgres
CRATE TABLE temp(int t);
INSERT INTO temp(t) SELECT random() * 100 FROM generate_series(0, 1000000);
--employees table for the indexing lecture
--paste these commands into the postgres
--start the docker instance
docker run --name pg -e POSTGRES_PASSWORD=postgres -d postgres

docker start pg
--run postgres command shell
docker exec -it pg psql -U postgres
--the command should switch to
--postgres=#
-- paste these sql
create table employees( id serial primary key, name text);

create or replace function random_string(length integer) returns text as
$$
declare
  chars text[] := '{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}';
  result text := '';
  i integer := 0;
  length2 integer := (select trunc(random() * length + 1));
begin
  if length2 < 0 then
    raise exception 'Given length cannot be less than 0';
  end if;
  for i in 1..length2 loop
    result := result || chars[1+random()*(array_length(chars, 1)-1)];
  end loop;
  return result;
end;
$$ language plpgsql;

insert into employees(name)(select random_string(10) from generate_series(0, 1000000));

we've an index over the id employees_pkey

Notice we didn't go to the heap, as the information we need is found on the index (that's the best query you can make at your life :D)

However here the id is on the index, but the name column not, so we jump to the heap with id=2000 to find the page and retrieve name from the heap.

Let's have another interesting query

postgres or other databases don't need to entirely scan through table using one worker thread, most of the times a parallel worker threads are scaning the table for faster search.

now let's create an index on the name column;

Let's do a query on the name

Good, now we're getting great result, so Let's take another interesting query

Oh, back to the parallel sequential scan again :D, why does this happen?

[!note] Using like statement using like is a kind of statement that doen't match only a single value, we can't use this to select from index, so we've to go to the heap to match with all occurances So the planner will decide once see statements like this to plan a worker threads to start scaning through table

Knowing how to read the explain analyze statement is very important to you as a software engineer to take further decisions based on that readings, in this section we'll learn how to read it.

Prerequisists

now let's have a simple query

  • First part (Index Scan using g_idx on grades ) describes the query plan

  • Second part ((cost=0.15..31.66 rows=501 width=22))

    • cost usually starts from zero

Last updated