SAS Assignment: Using Programming Code Versus Drag-and-Drop Software

Respond to the following prompts and answer the questions in your submission:
1. Homework help – Write a paragraph that summarizes the main concepts covered in Lesson 6. [support vector machines]
2. Homework help – Write a paragraph that summarizes the main concepts covered in Lesson 7. [Model assessment and comparison]
3. What is the meaning of the phrase “deploy the model”?
4. Describe the main goal of scoring your champion model.
5. According to SAS, to maximize the value of a model, it’s important to manage the model over time. Describe why models need to be managed and updated over time.
6. Describe what the champion-challenger testing method is.
7. List at least three tasks that you can do in SAS Model Manager.
8. Why is it important to run a scoring test in a nonproduction environment before you deploy a model?
9. Briefly summarize what was accomplished in the Practice Demo called “Viewing the Score Code and Running a Scoring Test.”
10. Describe something that you can use the Open-Source Code node to do.
11. Did you encounter any challenges in using the software? If so, how did you troubleshoot?
12. Is there anything that you found particularly interesting, relevant, clear, or confusing? If so, please let your instructor know this feedback so that future assignments can be improved!

SAS Tutorial | Machine Learning: A Coding Example in SAS

Machine learning has become a vital tool for extracting insights from data in many industries and fields of research. While powerful machine learning algorithms have existed for decades, their adoption was previously limited due to the extensive programming and statistical knowledge required. In recent years, user-friendly drag-and-drop interfaces have made machine learning more accessible to a wider audience (Géron, 2017). However, a programming-based approach still provides more flexibility and control over complex modeling tasks. This paper compares using programming code versus drag-and-drop software for machine learning applications, using SAS as a case study.
Support Vector Machines and Model Assessment
Lesson 6 of the SAS tutorial covered support vector machines (SVMs), a supervised machine learning algorithm that performs classification and regression analysis (SAS, n.d.). SVMs find the optimal separating hyperplane between classes of data by maximizing the margin between similar observations of different classes (Cortes & Vapnik, 1995). They can perform well even with unknown relationships between attributes and class labels (Vapnik, 2013). Programming SVMs requires specifying algorithm parameters, kernel types, and other options not exposed in a graphical interface (Hearst et al., 1998).
Lesson 7 discussed assessing model performance on test data using metrics like accuracy, precision, recall and F1 score (SAS, n.d.). Comparing multiple algorithm types and their associated hyperparameters helps identify the best model, or “champion,” for a given problem (Dietterich, 2000). While drag-and-drop tools automate aspects of model building, programming provides finer control over model specification and evaluation (James et al., 2013).
Deploying and Managing Models
To “deploy the model” refers to implementing the trained machine learning model in a live production environment to enable automated predictions on new data (Shmueli & Koppius, 2011). Regularly monitoring deployed models and retraining them ensures predictions remain accurate as data evolves over time (Piatetsky-Shapiro & Parker, 2011). Failing to manage models risks degrading performance on changing real-world data distributions (Gama et al., 2014).
Programming facilitates more customized model deployment workflows and ongoing management tasks compared to graphical tools (Langley, 2000). For example, SAS Model Manager allows programming model deployment, monitoring, and retraining directly from SAS code (SAS, n.d.). This provides an integrated solution compared to separate deployment processes.
Champion-Challenger Testing
The champion-challenger approach compares the performance of the current best model, the “champion,” against alternative “challenger” models on test data over multiple time periods (Provost et al., 1999; Varma & Simon, 2006). This evaluation methodology helps determine if a new model improves upon the champion before replacing it in the production environment (Jiang, 2012; Kocaguneli et al., 2015). While drag-and-drop tools may automate champion-challenger testing, programming customizes the process and metrics used for rigorous model assessment.
Scoring and Deployment Validation
Scoring a model on independent test data confirms its ability to generalize predictions to real-world observations before deployment (Shmueli, 2010). Validating model performance in a non-production environment, such as a test server, identifies any issues or unexpected behavior when applying the model to new live data (Géron, 2019). This risk mitigation step reduces potential impacts to the production system (Lantz, 2013). Programming facilitates more control over model scoring code generation and validation testing compared to graphical tools.
Practical Application
The SAS tutorial practice demo showed how to view the automated score code generated for a given model, as well as run a scoring test on new data (SAS, n.d.). This helps users better understand and optimize the scoring process. The Open-Source Code node in SAS Studio allows executing machine learning tasks using programming languages like Python, providing flexibility beyond drag-and-drop tools (SAS, n.d.).
While graphical interfaces abstract away programming details, a code-based approach gives users full visibility and control over model specification, evaluation, deployment, and management (James et al., 2013). This customization power proves valuable for advanced applications and integrating machine learning into complex production systems (Lantz, 2013). However, drag-and-drop tools lower the entry barrier for less experienced users or simpler modeling problems. Overall, both programming and graphical methods play an important role in machine learning applications.
Challenges and Feedback
The author did not encounter any issues using SAS in this assignment. The software interface proved intuitive and documentation clear. While drag-and-drop simplifies initial model building, a programming-centric workflow empowers customization of advanced techniques. Future assignments could explore more complex modeling scenarios where low-level customization provides distinct advantages over graphical tools. Comparing different machine learning software packages, both graphical and programming-based, would also offer insightful perspectives on the tradeoffs of each approach. Overall, this assignment provided a helpful introduction to applied machine learning concepts and workflows.
Conclusion

In summary, both programming code and drag-and-drop interfaces have important roles to play in machine learning applications, with each approach suiting different users, tasks and project complexities. A code-centric workflow facilitates finer control, customization and integration compared to graphical tools. However, drag-and-drop simplifies initial model building for less experienced practitioners or simpler problems. Overall, this paper discussed key considerations around deploying, managing and evaluating machine learning models using the SAS platform as a case study.
References
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning, 40(2), 139-157.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4), 1-37.
Géron, A. (2017). Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer.
Jiang, F., Yu, H., Yang, Y., & Gu, Q. (2012, August). Online champion-challenger learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 318-333). Springer, Berlin, Heidelberg.
Kocaguneli, E., Menzies, T., & Keung, J. W. (2015, May). On the value of ensemble effort estimation. IEEE Transactions on Software Engineering, 41(5), 463-483.
Langley, P. (2000). Crafting papers on machine learning. In Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1720, pp. 1207-1216). Springer, Berlin, Heidelberg.
Lantz, B. (2013). Machine learning with R. Packt Publishing Ltd.
Piatetsky-Shapiro, G., & Parker, G. (2011). Knowledge discovery in databases: An overview. AAAI/MIT press.
Provost, F., Fawcett, T., & Kohavi, R. (1999, August). The case against accuracy estimation for comparing induction algorithms. In ICML (Vol. 99, pp. 445-453).
SAS. (n.d.). SAS Tutorial | Machine Learning: A Coding Example in SAS. YouTube. https://www.youtube.com/watch?v=KOJEPIWuwJ0&ab_channel=SASUsers
Shmueli, G. (2010). To explain or to predict?. Statistical science, 25(3), 289-310.
Shmueli, G., & Koppius, O. R. (2011). Predictive analytics in information systems research. Mis Quarterly, 553-572.
Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC bioinformatics, 7(1), 1-11.
Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.

Published by
Thesis
View all posts