First step
I reached out to some developers as this is quite a heavy task to do alone, and got a response and we started a team. It was named Encoaster. We geared toward Rust with TypeScript on the frontend. The idea was to contain each service in docker so it could be deployed anywhere, and scaled accordingly.
Problem was that the dedication faltered, priority on the project was lowered significantly until put to a halt. As we had not come far enough everyone left the project.
I did not give up on it, and instead a year later picked it up again, but this time I figured to do a prototype in Python, which I am more comfortable in. As I now could poke around in ffmpeg more I figured out more and more how its pipeline worked, and a couple of months later I got my first conversion working exactly as I wanted. It also had a good batching system where you send it a folder and it output it to a folder. You also could specify what subtitles and audio to burn-in.