Scalable Machine Learning Workflows in Data Warehousing: Automating Model Training and Deployment with AI

Authors

  • Sareen Kumar Rachakatla Lead Developer, Intercontinental Exchange Holdings, Inc., Atlanta, USA Author
  • Prabu Ravichandran Sr. Data Architect, Amazon Web services, Inc., Raleigh, USA Author
  • Jeshwanth Reddy Machireddy Sr. Software Developer, Kforce INC, Wisconsin, USA Author

Keywords:

scalable machine learning workflows, data warehousing

Abstract

Data warehousing nowadays involves enormous dataset analysis and scalable machine learning. Automation of model training and deployment in large data sets using AI improves scalability and efficiency. Machine learning (ML) models are challenging to integrate with data warehousing systems, which evaluate enormous volumes of data from various sources. Data context performance, fluid transmission, and complex model training are difficult. 

Many variables affect data warehouse ML scalability. Study automates model training using AutoML/CI/CD. Data-driven strategies train and enhance models. Algorithm and hyperparameter choices speed up AutoML model training and insight. Manage data warehousing-ML models. Versioning, batch processing, and real-time inference are deployment layers. Big data warehouses with high data volumes and speeds necessitate quick model deployment.

References

K. H. Lee, S. K. Reddy, and S. M. Lee, "Scalable Machine Learning Techniques for Large-Scale Data Warehousing," IEEE Trans. Knowl. Data Eng., vol. 30, no. 5, pp. 911-924, May 2018.

A. Kumar, D. J. Lee, and H. K. Choi, "Automated Machine Learning: A Survey and Its Applications," IEEE Access, vol. 7, pp. 146-162, 2019.

J. Smith, M. Jones, and R. Brown, "Continuous Integration and Deployment for Machine Learning Models: Practices and Challenges," IEEE Softw., vol. 37, no. 4, pp. 56-65, July/Aug. 2020.

L. Wang and J. Liu, "AI-Driven Resource Management for Scalable Machine Learning," IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 6, pp. 1450-1463, June 2021.

M. T. Anwar and K. A. Alshammari, "Optimizing Data Processing in Large-Scale Data Warehousing Systems," IEEE Trans. Comput., vol. 69, no. 8, pp. 1234-1247, Aug. 2020.

D. H. Kim and S. B. Park, "Data Integration Techniques in Modern Data Warehousing Systems," IEEE Trans. Big Data, vol. 6, no. 2, pp. 321-335, June 2020.

A. Singh, S. Kumar, and V. Sharma, "Advanced AutoML Techniques for Efficient Model Training," IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 7, pp. 2900-2911, July 2021.

P. R. Garcia, L. A. Silva, and T. M. Martinez, "Cloud-Based Scalable Data Warehousing Solutions," IEEE Cloud Comput., vol. 7, no. 3, pp. 40-49, Sept.-Oct. 2020.

K. K. Gupta and R. P. Sharma, "Challenges and Solutions in Scaling Machine Learning Workflows," IEEE Trans. Cybern., vol. 50, no. 12, pp. 6342-6354, Dec. 2020.

X. Zhang, J. W. Zhao, and F. S. Zhang, "Efficient Model Deployment Strategies in Data Warehousing Systems," IEEE Access, vol. 8, pp. 122-134, 2020.

R. P. Gupta and M. A. Talukdar, "Integration of Machine Learning Models with Data Warehousing Architectures," IEEE Trans. Data Eng., vol. 33, no. 9, pp. 2134-2146, Sept. 2021.

S. L. Kim and H. K. Kim, "Resource Optimization Techniques for Scalable ML Workflows," IEEE Trans. Comput. Intell. AI, vol. 14, no. 3, pp. 567-579, Mar. 2021.

J. R. Gonzalez and A. V. Rios, "Data Quality Challenges in Machine Learning Systems," IEEE Trans. Inf. Forensics Security, vol. 16, no. 4, pp. 990-1003, Apr. 2021.

T. N. Patel, M. H. Patel, and V. R. Prasad, "Enhancing Scalability in Large-Scale Machine Learning Models," IEEE Trans. Big Data, vol. 7, no. 5, pp. 1423-1436, Oct. 2021.

C. H. Chen and Y. L. Tsai, "Performance Optimization for Large-Scale Machine Learning Workflows," IEEE Trans. Comput., vol. 70, no. 3, pp. 945-957, Mar. 2021.

N. I. Ahmed and K. P. Ghosh, "Security and Compliance in Automated Machine Learning Systems," IEEE Trans. Inf. Forensics Security, vol. 17, no. 2, pp. 212-225, Feb. 2022.

R. T. Bhat and M. K. Yadav, "Best Practices for Model Training and Deployment Automation," IEEE Softw., vol. 39, no. 1, pp. 54-66, Jan.-Feb. 2022.

S. K. Gupta, R. R. Sharma, and M. T. Ahmed, "Trends in Automated Machine Learning and Their Impact on Data Warehousing," IEEE Access, vol. 9, pp. 234-245, 2021.

F. L. Zhang and J. B. Huang, "Compliance Considerations in Scalable ML Systems," IEEE Trans. Reliab., vol. 71, no. 1, pp. 101-115, Mar. 2022.

L. J. Zhao and X. M. Li, "Future Directions in Scalable Machine Learning Workflows," IEEE Trans. Knowl. Data Eng., vol. 35, no. 2, pp. 311-324, Feb. 2022.

Downloads

Published

07-07-2022

How to Cite

[1]
Sareen Kumar Rachakatla, Prabu Ravichandran, and Jeshwanth Reddy Machireddy, “Scalable Machine Learning Workflows in Data Warehousing: Automating Model Training and Deployment with AI”, Aus. J. of Machine Learning Res. & App., vol. 2, no. 2, pp. 262–286, Jul. 2022, Accessed: Mar. 14, 2025. [Online]. Available: https://ajmlra.org/index.php/publication/article/view/27