# Further Voxel World Optimisations

This article explains how we optimised chunk mesh generation down to 0.48ms per chunk.

This article is a continuation from the original article Voxel World Optimisations, which details the initial methods used to increase chunk initialisation time in our destructive first person shooter Sector's Edge.

In the previous article, the benchmark for initialising one chunk was 0.89 milliseconds. This article describes the changes made to reduce this time down to **0.48ms (54% faster)**.

If any terms used in this article are unfamiliar, please see the original article.

The full source code is available here.

*For clarity, i, j and k refer to chunk-relative positions (range 0-31)and x, y and z refer to global map positions (range 0-511)*

## Constants

These constants are referenced multiple times throughout the article:

```
const int EMPTY = 0;
const int CHUNK_SIZE = 32;
const int CHUNK_SIZE_SQUARED = 1024;
const int CHUNK_SIZE_CUBED = 32768;
const int CHUNK_SIZE_MINUS_ONE = 31;
const int CHUNK_SIZE_SHIFTED = 32 << 6;
```

## GenerateMesh

The `GenerateMesh`

function iterates over each block in the chunk, combining faces across multiple blocks where possible to reduce the triangle count of the mesh. The first downfall of the original function is that all 32768 blocks in the data array are checked, when most may be empty.

To reduce the amount of blocks checked, we store two heightmaps for the bottom- and top-most non-empty blocks in each column in the chunk. These arrays are updated each time a block is added or removed from the chunk:

```
byte[] MinY = new byte[CHUNK_SIZE_SQUARED];
byte[] MaxY = new byte[CHUNK_SIZE_SQUARED];
```

To make use of this heightmap, we restructure the loop as follows:

```
int access;
// Z axis
for (int k = 0; k < CHUNK_SIZE; k++)
{
// X axis
for (int i = 0; i < CHUNK_SIZE; i++)
{
int j = MinY[i + k * CHUNK_SIZE_SQUARED];
int topJ = MaxY[i + k * CHUNK_SIZE_SQUARED];
// Y axis
for (; j < topJ; j++)
{
access = i + j * CHUNK_SIZE + k * CHUNK_SIZE_SQUARED;
ref Block b = ref data[access];
if (b.kind == EMPTY)
continue;
CreateRun(ref b, i, j, k, chunkHelper, access);
}
}
// Extend the array if it is nearly full
if (vertexBuffer.used > vertexBuffer.data.Length - 2048)
vertexBuffer.Extend(2048);
}```
```

As the inner-most loop now iterates over the Y axis, we should restructure the way that we store block positions in `data[]`

so that we can access the next highest block in the column by incrementing `access`

by 1:

```
// Old
int access = i + j * CHUNK_SIZE + k * CHUNK_SIZE_SQUARED;
// New
int access = j + i * CHUNK_SIZE + k * CHUNK_SIZE_SQUARED;
```

The next issue is that `i * CHUNK_SIZE`

, `k * CHUNK_SIZE_SQUARED`

and other values within `CreateRun()`

are redundantly calculated many times. Therefore where possible, any value that is calculated multiple times should be stored in a variable in the outer loop and referenced later on. This reduces the amount of multiplications from 65536 down to 1056, and the amount of additions from 197664 down to 135232 (worst case).

The new loop structure is as follows:

```
// Precalculate the map-relative Y position of the chunk in the map
int chunkY = chunkPosY * CHUNK_SIZE;
// Allocate variables on the stack
int access, heightMapAccess, iCS, kCS2, i1, k1, j, topJ;
bool minXEdge, maxXEdge, minZEdge, maxZEdge;
k1 = 1;
for (int k = 0; k < CHUNK_SIZE; k++, k1++)
{
// Calculate this once, rather than multiple times in the inner loop
kCS2 = k * CHUNK_SIZE_SQUARED;
i1 = 1;
heightMapAccess = k * CHUNK_SIZE;
// Is the current run on the Z- or Z+ edge of the chunk
minZEdge = k == 0;
maxZEdge = k == CHUNK_SIZE_MINUS_ONE;
for (int i = 0; i < CHUNK_SIZE; i++; i1++)
{
// Determine where to start the innermost loop
j = MinY[heightMapAccess];
topJ = MaxY[heightMapAccess];
heightMapAccess++;
// Calculate this once, rather than multiple times in the inner loop
iCS = i * CHUNK_SIZE;
// Calculate access here and increment it each time in the innermost loop
access = kCS2 + iCS + j;
// Is the current run on the X- or X+ edge of the chunk
minX = i == 0;
maxX = i == CHUNK_SIZE_MINUS_ONE;
// X and Z runs search upwards to create runs, so start at the bottom.
for (; j < topJ; j++, access++)
{
ref Block b = ref data[access];
if (b.kind != EMPTY)
{
CreateRun(ref b, i, j,
k << 12,
i1,
k1 << 12,
j + chunkY, access,
minX, maxX,
j == 0, j == CHUNK_SIZE_MINUS_ONE,
minZ, maxZ,
iCS, kCS2);
}
}
// Extend the array if it is nearly full
if (vertexBuffer.used > vertexBuffer.data.Length - 2048)
vertexBuffer.Extend(2048);
}
}
```

## CreateRun

As the new `CreateRun()`

function has many changes, we will split it into sections. The first difference is the new parameters:

```
// New CreateRun function
void CreateRun(ref Block b, int i, int j,
int k,
int i1,
int k1,
int y, int access,
bool minX, bool maxX,
bool minY, bool maxY,
bool minZ, bool maxZ,
int iCS, int kCS2)
{
...
}
```

`int i, int j`

- chunk-relative position of the block`k`

- chunk-relative position of the block shifted left 12 bits`int i1`

- precalculated`i + 1`

`int k1`

- precalculated`(k + 1) << 12`

`int y`

- map-relative y position of the block`int access`

- current position in the data[] array`bool minX, bool maxX`

- if the current run is on the X edge of the chunk`bool minY, bool maxY`

- if the current run is on the Y edge of the chunk`bool minZ, bool maxZ`

- if the current run is on the Z edge of the chunk`int iCS`

- precalculated`i * CHUNK_SIZE`

`int kCS2`

- precalculated`k * CHUNK_SIZE_SQUARED`

### Precalculating

At the start of the function, we can see the following changes:

- The
`BlockVertex`

helper class now has a`int[] indexToTextureShifted`

array, which stores the texture values shifted left 18 bits. Since the texture and health values are the same for each face of the block, they can be bit-shifted and combined here rather than in the`AppendQuad()`

function - The
`length`

variable has been replaced with`runFinish`

, which stores the chunk-relative position of the end of the run

```
int textureHealth16 = BlockVertex.indexToTextureShifted[b.index] | ((b.health / 16) << 23);
int runFinish;
int accessIncremented = access + 1;
int chunkAccess;
int j1 = j + 1;
int jS = j << 6;
int jS1 = j1 << 6;
```

### Combining Faces

The next part combines faces upwards along the negative X face, with the following changes:

`kCS2`

is passed to`VisibleFaceXN()`

so that the function doesn't have to calculate`k * CHUNK_SIZE_SQUARED`

each time`chunkAccess`

has the default value of`accessIncremented`

, which is equivalent to`j + i * CHUNK_SIZE + k * CHUNK_SIZE_SQUARED + 1`

. This value is then incremented at the end of the loop- Rather than creating an extra variable
`q`

for managing the loop, we can use the`runFinish`

variable. By initialising this variable to`j1 << 6`

and incrementing it by 64 (`1 << 6 == 64`

), we can save an extra addition and avoid bit-shifts throughout the entire`CreateRun()`

function `AppendQuad()`

has been split into three functions`AppendQuadX()`

,`AppendQuadY()`

and`AppendQuadZ()`

, which are specifically for generating X, Y and Z faces`Int3`

structs are no longer passed to`AppendQuad()`

, therefore removing unnecessary memory allocation- The
`FaceType`

enum has been replaced with`FaceTypeShifted`

, which stores the same values shifted left 27 bits

```
// Left (X-)
if (!chunkHelper.visitXN[access] && VisibleFaceXN(j, access, minX, kCS2))
{
chunkHelper.visitXN[access] = true;
chunkAccess = accessIncremented;
for (runFinish = jS1; length < Constants.ChunkSizeShifted; length += 64)
{
if (DifferentBlock(chunkAccess, ref b))
break;
chunkHelper.visitXN[chunkAccess++] = true;
}
// k1 and k are already shifted
BlockVertex.AppendQuadX(vertexBuffer, i, jS, length, k1, k, (int)FaceTypeShifted.xn, textureHealth16);
}
```

### AppendQuad

The new `AppendQuadX`

function contains no bit-shifts and fewer or operations:

```
// Stores the values of the old blockToTexture[] shifted left 18 bits
public static byte[] blockToTextureShifted;
public static void AppendQuadX(BlockVertexBuffer buffer, int x,
int jBottom, int jTop,
int kLeft, int kRight,
int normal, int textureHealth)
{
// Combine data that is common for each face
var shared = x |
textureHealth |
normal;
// Combine the shared data with the Y and Z position of the vertex
buffer.data[buffer.used] = jTop | kRight | shared;
buffer.data[buffer.used + 1] = buffer.data[buffer.used + 4] = jBottom | kRight | shared;
buffer.data[buffer.used + 2] = buffer.data[buffer.used + 3] = jTop | kLeft | shared;
buffer.data[buffer.used + 5] = jBottom | kKeft | shared;
buffer.used += 6;
}
```

## Visible Face Checking

The new `VisibleFaceXN`

function uses the precalculated values `min`

and `kCS2`

and prevents excess faces being produced on the edge of the map:

```
protected bool VisibleFaceXN(int j, int k, int access, bool min, int kCS2)
{
// Access directly from a neighbouring chunk
if (min)
{
// If on the edge of the map, don't produce faces
if (chunkPosX == 0)
return false;
if (cXN == null)
return true;
return cXN.data[31 * CHUNK_SIZE + j + kCS2].kind == EMPTY;
}
// The block to the left can be accessed by subtracting CHUNK_SIZE
return data[access - CHUNK_SIZE].kind == EMPTY;
}```
```

## Full Source Code

The full source code including OpenGL data buffering and shaders is available here.

See the code in action in our latest video: