Machine learning naturally lends itself architecturally to a batch processing model, not microservices (which can be useful for providing visibility and observability).
The most powerful batch processing system in the world, which was created by NASA Advanced Supercomputing (NAS) division originally, is the entirely proprietary (Altair) Portable Batch System (PBS). Every large-scale enterprise should at least be considering where PBS fits in. But, my god, is it expensive.
There is a FOSS alternative, however. https://openpbs.org
Portable Batch System (PBS): Overview - HECC Knowledge Base
https://www.nas.nasa.gov/hecc/support/kb/Portable-Batch-System-(PBS)-Overview_126.html
The Portable Batch System (PBS) - Embry-Riddle Aeronautical
http://hpc.erau.edu/pbs.html