Skip to content

CSC versus dictionary #213

@amontoison

Description

@amontoison

@gdalle I did some benchmarks to determine what is the best storage between a CSC matrix and a dictionary to store the index of each edge in the graph.
My conclusion is that a CSC format is the most efficient format.

Benchmark 1 -- strictly upper triangular matrix that is dense

This is the worst scenario for a CSC matrix to find a coefficient M[i,j].

using BenchmarkTools
using SparseArrays

function prepare_csc_and_dict(n)
  d = Dict{Tuple{Int,Int}, Int}()
  M = spzeros(n,n)
  counter = 0
  for j = 1:n
    for i = 1:j-1
      counter += 1
      M[i,j] = counter
      d[(i,j)] = counter
    end
  end
  return M, d
end

function test_csc(M, n)
  for j = 1:n
    for i = 1:j-1
      val = M[i,j]
    end
  end
end

function test_dict(d, n)
  for j = 1:n
    for i = 1:j-1
      val = d[i,j]
    end
  end
end

function test_opt(M, n)
  counter = 0
  for j = 1:n
    for i = 1:j-1
        counter += 1
      val = M.nzval[counter]
    end
  end
end

for n in (10, 50, 100, 500, 1000, 2000, 5000)
  nvertices = n
  nedges = n *(n+1) ÷ 2 - n
  println("Number of vertices: $nvertices.")
  println("Number of edges: $nedges.")
  M, d = prepare_csc_and_dict(n)
  timer_csc = @belapsed test_csc($M, $n)
  println("Timer for csc: $(timer_csc).")
  timer_dict = @belapsed test_dict($d, $n)
  println("Timer for dict: $(timer_dict).")
  timer_opt = @belapsed test_opt($M, $n)
  println("Timer for opt: $(timer_opt).")
  ratio1 = timer_dict / timer_csc
  println("Ratio timer_dict / timer_csc: $ratio1.")
  timer_best = min(timer_csc, timer_dict)
  ratio2 = timer_best / timer_opt
  println("Ratio timer_best / timer_opt: $ratio2.")
  println()
end
Number of vertices: 10.
Number of edges: 45.
Timer for csc: 3.130583333333333e-7.
Timer for dict: 5.678695652173912e-7.
Timer for opt: 6.376302349336056e-8.
Ratio timer_dict / timer_csc: 1.8139416995258326.
Ratio timer_best / timer_opt: 4.909715947926012.

Number of vertices: 50.
Number of edges: 1225.
Timer for csc: 1.1041e-5.
Timer for dict: 1.7181e-5.
Timer for opt: 4.7484183673469385e-7.
Ratio timer_dict / timer_csc: 1.5561090480934698.
Ratio timer_best / timer_opt: 23.25195285218494.

Number of vertices: 100.
Number of edges: 4950.
Timer for csc: 5.1628e-5.
Timer for dict: 7.5252e-5.
Timer for opt: 1.0981e-6.
Ratio timer_dict / timer_csc: 1.457581157511428.
Ratio timer_best / timer_opt: 47.015754485019585.

Number of vertices: 500.
Number of edges: 124750.
Timer for csc: 0.005272895.
Timer for dict: 0.004950737.
Timer for opt: 1.3779e-5.
Ratio timer_dict / timer_csc: 0.9389030124817582.
Ratio timer_best / timer_opt: 359.2958124682488.

Number of vertices: 1000.
Number of edges: 499500.
Timer for csc: 0.025094128.
Timer for dict: 0.030086788.
Timer for opt: 4.1699e-5.
Ratio timer_dict / timer_csc: 1.1989573018835322.
Ratio timer_best / timer_opt: 601.7920813448764.

Number of vertices: 2000.
Number of edges: 1999000.
Timer for csc: 0.108613685.
Timer for dict: 0.149196474.
Timer for opt: 0.00012618.
Ratio timer_dict / timer_csc: 1.373643422557664.
Ratio timer_best / timer_opt: 860.7836820415281.

Number of vertices: 5000.
Number of edges: 12497500.
Timer for csc: 0.820027958.
Timer for dict: 1.18408318.
Timer for opt: 0.000532919.
Ratio timer_dict / timer_csc: 1.4439546462390251.
Ratio timer_best / timer_opt: 1538.747835975073.

It's unexpected but even in this worst case scenario, CSC format is faster.
To make the dictionary format more efficient, we can have a dictionary for each column as you suggested.

Benchmark 2 -- vector of dictionaries for each column in the same setup than benchmark 1

using BenchmarkTools
using SparseArrays

function prepare_csc_and_dict_v2(n)
  d = [Dict{Int, Int}() for i = 1:n]
  M = spzeros(n,n)
  counter = 0
  for j = 1:n
    for i = 1:j-1
      counter += 1
      M[i,j] = counter
      d[j][i] = counter
    end
  end
  return M, d
end

function test_csc_v2(M, n)
  for j = 1:n
    for i = 1:j-1
      val = M[i,j]
    end
  end
end

function test_dict_v2(d, n)
  for j = 1:n
    for i = 1:j-1
      val = d[j][i]
    end
  end
end

function test_opt_v2(M, n)
  counter = 0
  for j = 1:n
    for i = 1:j-1
        counter += 1
      val = M.nzval[counter]
    end
  end
end

for n in (10, 50, 100, 500, 1000, 2000, 5000)
  nvertices = n
  nedges = n *(n+1) ÷ 2 - n
  println("Number of vertices: $nvertices.")
  println("Number of edges: $nedges.")
  M, d = prepare_csc_and_dict_v2(n)
  timer_csc = @belapsed test_csc_v2($M, $n)
  println("Timer for csc: $(timer_csc).")
  timer_dict = @belapsed test_dict_v2($d, $n)
  println("Timer for dict: $(timer_dict).")
  timer_opt = @belapsed test_opt_v2($M, $n)
  println("Timer for opt: $(timer_opt).")
  ratio1 = timer_dict / timer_csc
  println("Ratio timer_dict / timer_csc: $ratio1.")
  timer_best = min(timer_csc, timer_dict)
  ratio2 = timer_best / timer_opt
  println("Ratio timer_best / timer_opt: $ratio2.")
  println()
end
Number of vertices: 10.
Number of edges: 45.
Timer for csc: 3.0325099601593627e-7.
Timer for dict: 3.9227941176470584e-7.
Timer for opt: 6.733265097236437e-8.
Ratio timer_dict / timer_csc: 1.2935799615447627.
Ratio timer_best / timer_opt: 4.503773305174051.

Number of vertices: 50.
Number of edges: 1225.
Timer for csc: 1.1783e-5.
Timer for dict: 1.1003e-5.
Timer for opt: 4.5107614213197967e-7.
Ratio timer_dict / timer_csc: 0.9338029364338453.
Ratio timer_best / timer_opt: 24.39277756521348.

Number of vertices: 100.
Number of edges: 4950.
Timer for csc: 5.7684e-5.
Timer for dict: 4.3329e-5.
Timer for opt: 1.0158999999999999e-6.
Ratio timer_dict / timer_csc: 0.7511441647597253.
Ratio timer_best / timer_opt: 42.65085146175805.

Number of vertices: 500.
Number of edges: 124750.
Timer for csc: 0.005379704.
Timer for dict: 0.001874464.
Timer for opt: 1.1274e-5.
Ratio timer_dict / timer_csc: 0.3484325531664939.
Ratio timer_best / timer_opt: 166.264324995565.

Number of vertices: 1000.
Number of edges: 499500.
Timer for csc: 0.026058051.
Timer for dict: 0.010558865.
Timer for opt: 3.366e-5.
Ratio timer_dict / timer_csc: 0.4052054775700608.
Ratio timer_best / timer_opt: 313.691770647653.

Number of vertices: 2000.
Number of edges: 1999000.
Timer for csc: 0.111248414.
Timer for dict: 0.03566116.
Timer for opt: 0.000103614.
Ratio timer_dict / timer_csc: 0.3205543226890407.
Ratio timer_best / timer_opt: 344.1731812303356.

Number of vertices: 5000.
Number of edges: 12497500.
Timer for csc: 0.817216241.
Timer for dict: 0.283180196.
Timer for opt: 0.000511742.
Ratio timer_dict / timer_csc: 0.3465180717082689.
Ratio timer_best / timer_opt: 553.3651644774123.

We can see that the vector of dictionaries is a good improvement and can lead up to 3x speed compared the CSC format.

Benchmark 3 : Sparse matrices -- CSC versus vector of dictionaries

If we have dense matrices, we don't use a coloring so a practical setup is sparse matrices with a density 0.5% (quite big for large matrices but common for small / medium ones). It represents 500 nnz per column for a matrix of size 1 million.

using BenchmarkTools
using SparseArrays

function prepare_csc_and_dict_v3(n)
  d = [Dict{Int, Int}() for i = 1:n]
  density = 0.005  # 0.5% of each column contains non-zeros
  M = sprand(n, n, density)
  M = triu(M, 1)
  counter = 0
  for j = 1:n
    for pos in nzrange(M, j)
      i = M.rowval[pos]
      counter += 1
      d[j][i] = counter
    end
  end
  return M, d
end

function test_csc_v3(M, n)
  for j = 1:n
    for pos in nzrange(M, j)
      i = M.rowval[pos]
      val = M[i,j]
    end
  end
end

function test_dict_v3(M, d, n)
  for j = 1:n
    for pos in nzrange(M, j)
      i = M.rowval[pos]
      val = d[j][i]
    end
  end
end

function test_opt_v3(M, n)
  counter = 0
  for j = 1:n
    for pos in nzrange(M, j)
        counter += 1
      val = M.nzval[counter]
    end
  end
end

for i in 2:6
  n = 10^i
  M, d = prepare_csc_and_dict_v3(n)
  nvertices = n
  nedges = nnz(M)
  println("Number of vertices: $nvertices.")
  println("Number of edges: $nedges.")
  timer_csc = @belapsed test_csc_v3($M, $n)
  println("Timer for csc: $(timer_csc).")
  timer_dict = @belapsed test_dict_v3($M, $d, $n)
  println("Timer for dict: $(timer_dict).")
  timer_opt = @belapsed test_opt_v3($M, $n)
  println("Timer for opt: $(timer_opt).")
  ratio1 = timer_dict / timer_csc
  println("Ratio timer_dict / timer_csc: $ratio1.")
  timer_best = min(timer_csc, timer_dict)
  ratio2 = timer_best / timer_opt
  println("Ratio timer_best / timer_opt: $ratio2.")
  println()
end
Number of vertices: 100.
Number of edges: 18.
Timer for csc: 1.9388019169329071e-7.
Timer for dict: 2.8431095406360424e-7.
Timer for opt: 1.2294573643410853e-7.
Ratio timer_dict / timer_csc: 1.466426000410453.
Ratio timer_best / timer_opt: 1.5769574229782157.

Number of vertices: 1000.
Number of edges: 2481.
Timer for csc: 1.394e-5.
Timer for dict: 2.249e-5.
Timer for opt: 2.932222222222222e-6.
Ratio timer_dict / timer_csc: 1.6133428981348636.
Ratio timer_best / timer_opt: 4.754073512694203.

Number of vertices: 10000.
Number of edges: 250139.
Timer for csc: 0.002098647.
Timer for dict: 0.005312056.
Timer for opt: 0.000151369.
Ratio timer_dict / timer_csc: 2.5311812801295313.
Ratio timer_best / timer_opt: 13.864443842530505.

Number of vertices: 100000.
Number of edges: 24993632.
Timer for csc: 0.343294563.
Timer for dict: 0.654198804.
Timer for opt: 0.002141986.
Ratio timer_dict / timer_csc: 1.905648601839348.
Ratio timer_best / timer_opt: 160.26928420633936.

Number of vertices: 1000000.
Number of edges: 2499989252.
Timer for csc: 103.759899929.
Timer for dict: 114.619194849.
Timer for opt: 0.046490819.
Ratio timer_dict / timer_csc: 1.1046579162800918.
Ratio timer_best / timer_opt: 2231.836353087262.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions