Library Optimization
Sometimes you cannot find a way to optimize your own code, so you begin to take a look at others. In this case, I took a closer look at the libraries I was using. In some cases, there are alternative libraries, while in others you can could just optimize how you use a library.
Looking at my performance values for each stage, I found that while NBT parsing took the longest time, this was mostly due to allocations, so I most probably will look at it further in the future, but not right now. What was second and third slowest was image writing(PNG) and decompressing(inflating). Starting with reading a stream of compressed data and decompress it, the current library we use is zlib. It is slow but gives good compression ratios. It is also very easy to integrate to basically any project, hence why Minecraft makes use of it. First official rewrite of zlib is pigz, which was designed with multi-threading in mind. However, as PixelMap is already separating each decompression in their respective threads, this makes pigz completely redundant.
After looking at some benchmarks 1, I found libdeflate. I imported it and found that it were 6 times faster than zlib standalone, in total it improves the performance by 15%. While this was great, as it was also available as a package to some Linux distros, this was great result. But I needed to check more, so I found zlib-ng. Sadly, while I found it much faster than zlib, it was not as fast as libdeflate.
For PNG encoding I found that most libraries were focusing in reading, rather than writing 2. Others claim that they are faster than libpng, but I have yet to find that statement true 34. And then we have those special cases where they do not support encoding row by row 5. Encoding row by row is important if one want to write really large images, as one row of 32k pixels require 128k bytes to be stored, but as we also have columns of rows, it results in over 16GB of data to be stored in memory before being encoded as a PNG. Do note that this is indeed what would happen anyway if we have a perfect square of chunks, but on normal maps this is usually not a problem. As similar to zlib-ng, we also have some encoders that are running in parallel 6, which would surely be interesting at least for writing a whole file, but it does not improve the overall processing. It is also a proof of concept, which I do not want to handle if bugs are found, and it also uses several more dependencies, which makes it hard to support it.
While we do not come anywhere with PNG encoding, I could either integrate libdeflate into libpng, and while there was an issue for it since 2018, it did not come very far 7. Another possibility would me to implement it myself. As I only need 32-bit color and the encoding routine, I could easily do that, ignoring any more advanced features for as much speed as possible, only losing in decompression ratio. However, I doubt it would be faster than libpng, and I would most definitely introduce bugs or other issues I rather not want to handle myself.
What I then get is 6 times faster zlib decoding, which somehow makes it as fast as rendering and PNG encoding, which I find fair. To get almost 15% better processing speed is huge, and I feel like if I found some better optimizations for my NBT parsing, that could improve things even further. Adding one dependency without removing the old one is a bit bad, but it should not affect the total size of the executable. At best, the linker might even remove unused symbols, making the executable smaller in the process, but I currently do not actively tell the compiler to do this.