2018 AAPM Annual Meeting
Back to session list

Session Title: Machine Learning for Radiomics (Session 2 of the Certificate Series)
Question 1: Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed:
Reference:Samuel AL. Some studies in machine learning using the game of checkers. IBM: J Res Dev. 1959;3:210–29.
Choice A:True.
Choice B:False.
Question 2: With increased model complexity one can expect:
Reference:Faber, Nicolaas M. "A closer look at the bias–variance trade‐off in multivariate calibration." Journal of Chemometrics: A Journal of the Chemometrics Society 13.2 (1999): 185-192
Choice A:Increase in model variance.
Choice B:Decrease in model bias.
Choice C:Both a and b.
Choice D:None.
Question 3: In the case where we have a large number of features and a small number of subjects (“large p, small n” problem) the LASSO is limited because:
Reference:Zou, Hui, and Trevor Hastie. "Regularization and variable selection via the elastic net." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67.2 (2005): 301-320.
Choice A:The algorithm can’t handle these type of cases.
Choice B:It selects at most n variables before it saturates.
Choice C:It yields a sparse solution.
Choice D:None of the above.
Question 4: The depth of a learned decision tree can be larger than the number of training examples used to create the tree.
Reference:Breiman, L. (1984). Classification and Regression Trees. New York: Routledge.
Choice A:True.
Choice B:False.
Question 5: You are working on a particular learning task and cross-validation experiments indicate that your SVM is overfitting. Name one action that can help decrease overfitting in an SVM.
Reference:James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: springer.
Choice A:Decrease C.
Choice B:Use less expressive kernel (e.g. smaller degree polynomial.
Choice C:Get more training data.
Choice D:All of the above.
Question 6: The dropout technique in neural networks on a given probability of units refers to:
Reference:Srivastava, Nitish, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research: JMLR 15 (1): 1929–58.
Choice A:Setting the output of those units to zero.
Choice B:Treating the units as pass through units, where input and outputs are equal.
Choice C:Removing the neurons from the network completely along with their inputs and outputs.
Question 7: If you suspect your neural network to be overfitting to the given data, what course of action would you take:
Reference:Caruana, Rich, Steve Lawrence, and C. Lee Giles. 2001. “Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping.” In Advances in Neural Information Processing Systems 13, edited by T. K. Leen, T. G. Dietterich, and V. Tresp, 402–8. MIT Press.
Choice A:Use more data through augmentation.
Choice B:Use shallower networks with fewer parameters.
Choice C:Introduce regularization methods.
Choice D:Introduce early stopping.
Choice E:All of the above.
Question 8: When training a neural network, you notice that the training loss remains unchanged for the first several epochs (i.e. the loss curve is flat). The training loss then starts to drop indicating that learning is taking place. What is the most likely reason for this?
Reference:Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. “Deep Learning. Book in Preparation for MIT Press.” URL!` Http://www. Deeplearningbook. Org.
Choice A:High learning rate.
Choice B:Low learning rate.
Choice C:Bad parameter initialization.
Choice D:Small batch size.
Question 9: The main difference between gradient descent (GD) and stochastic gradient descent (SGD) is:
Reference:Ruder, Sebastian. 2016. “An Overview of Gradient Descent Optimization Algorithms.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1609.04747.
Choice A:GD is used in lieu of SGD when entire datasets do not fit in memory.
Choice B:GD is performed on the entire dataset while SGD is performed iteratively on batches of the dataset.
Choice C:SGD can be momentum-based while GD cannot.
Question 10: The rectified linear unit (ReLU) activation function can potentially lead to dead units that never activate and can be mitigated by using leakly ReLU instead.
Reference:Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. 2013. “Rectifier Nonlinearities Improve Neural Network Acoustic Models.” In Proc. ICML. Vol. 30. https://pdfs.semanticscholar.org/367f/2c63a6f6a10b3b64b8729d601e69337ee3cc.pdf.
Choice A:True.
Choice B:False.
Back to session list