I learned IBM 360 assembly language when at Durham University. For commercial work I used Intel 8080/Z80 assembly language in the early 1980s when optimising the STRESS3 structural analysis program for use on micro computers running under CP/M and MSDOS. STRESS was written in Fortran and I discovered that using any of the built-in Fortran input/output statements invoved loading an unweildy memory-hog of an library.
So I wrote my own input/output library in assembly language. It was so compact and fast that with the help of my brother Brian Shearing we were able to introduce a virtual memory system that would take advantage of available memory to cache data in RAM, only writing it to disk when strictly necessary; it's one of the things what made our version of STRESS so fast. The competition simply couldn't believe how quick our version was - they thought it must be some kind of voodoo.
Other optimisations for STRESS included writing a library of assembly-language matrix handling routines. These interfaced with the 8087 math chip, if present. Back then, the 8087 "math chip" was a cost-option and most PC's didn't have hardware-assisted trancendental mathematic functions. This too helped make STRESS blindingly fast.