Graduate Student Descent Method
Buildup
The Fridge-Elephant Problem
Traditionally, the method for putting an elephant into a fridge has relied on graduate students. How difficult or challenging it could be, or what the best method might be, I’m not sure, but that’s something the graduate students would figure out, so no worries there. Then, what could be the first measure to try in solving nonlinear optimization problems, including deep learning?
Terminology
It involves trying a variety of hyperparameters. However, the graduate students…
The simplest way to obtain good hyperparameters is to manually insert different numbers of hyperparameters and see how it works. This strategy can be surprisingly effective and educational. A deep learning practitioner needs to develop intuition about deep network structures. When the theory is weak, empirically researching is the best way to learn how to make deep learning models. I recommend you try creating various fully connected models yourself. Record your choice of hyperparameters and the results in a spreadsheet and systematically explore. Try to understand the effect of different hyperparameters. What makes the network learn faster or slower? At what setting range does learning completely stop? (Unfortunately, this is easy to find out.)
…(omission)…
Why do we call choosing hyperparameter values the Graduate Descent Method? Until recently, machine learning was mainly an academic field. A reliable method for designing new machine learning algorithms was to explain the desired method to new graduate students and let them resolve the details. This process is a kind of rite of passage, and students had to try numerous design alternatives. Overall, this is an educational experience because the only way to gain design aesthetics is to accumulate memories of settings that work and those that don’t.
Explanation
Graduate Descent Method, of course, is a play on words with Gradient Descent Method, but for some, it is a reality and actually works as described above. Manual labor isn’t just a day or two, but being stuck in similarly inefficient tasks repeatedly can inevitably lead to experience in office automation and programming. You start coding, use large servers, and want the computer to report work situations remotely.
Original 1
Barath Raghavan, Reza Zadeh, translated by Jang Jung-ho, Jung Hana. (2018). Deep Learning with TensorFlow Completed in One Book: p137~138 ↩︎