Using Apache Flink, scaling rule-based abnormality and fraud detection and corporate procedure monitoring.

Sarbaree Mishra

Authors

Sarbaree Mishra Program Manager at Molina Healthcare Inc., USA Author

Keywords:

Anomaly Detection, Fraud Detection, Business Process Monitoring, Scalability

Abstract

Recognizing abnormalities in several disciplines like banking, e-commerce, and healthcare depends on rule-based anomaly and fraud detection systems. Still, traditional methods find it difficult to handle and understand this data in real-time as data volumes grow and develop more complex. Thanks to the scalability of rule-based systems, Apache Flink has developed as a potent stream processing tool that solves these challenges. This paper highlights Apache Flink's effectiveness in properly handling continuous data streams, hence improving anomaly detection and business process monitoring at scale. Notwithstanding the promise, the use of these systems brings challenges related to system complexity management, data quality assurance, and low-latency processing assurance. The paper addresses the operational issues related to the extensive implementation of these systems and their preservation of effectiveness over time. It also provides understanding of the evolution of anomaly detection systems and the transforming power of stream processing architectures like Flink. Companies may enhance their detection abilities by using advanced techniques such as machine learning, therefore reducing false positives and increasing the accuracy of their fraud detection systems.

References

1. Friedman, E., & Tzoumas, K. (2016). Introduction to Apache Flink: stream processing for real time and beyond. " O'Reilly Media, Inc.".

2. Saxena, S., & Gupta, S. (2017). Practical real-time data processing and analytics: distributed computing and event processing using Apache Spark, Flink, Storm, and Kafka. Packt Publishing Ltd.

3. Giannakopoulos, P., & Petrakis, E. G. (2021, April). Smilax: statistical machine learning autoscaler agent for Apache Flink. In International Conference on Advanced Information Networking and Applications (pp. 433-444). Cham: Springer International Publishing.

4. Habeeb, R. A. A. (2019). Real-Time Anomaly Detection Using Clustering in Big Data Technologies (Doctoral dissertation, University of Malaya (Malaysia)).

5. Pinar, E., Gul, M. S., Aktas, M., & Aykurt, I. (2021, September). On the detecting anomalies within the clickstream data: Case study for financial data analysis websites. In 2021 6th International Conference on Computer Science and Engineering (UBMK) (pp. 314-319). IEEE.

6. Choi, S., Youm, S., & Kang, Y. S. (2019). Development of scalable on-line anomaly detection system for autonomous and adaptive manufacturing processes. Applied Sciences, 9(21), 4502.

7. Kekevi, U., & Aydın, A. A. (2022). Real-time big data processing and analytics: Concepts, technologies, and domains. Computer Science, 7(2), 111-123.

8. Esco, E. (2017). Flexible Infrastructure Supporting Machine Learning for Anomaly Detection in Big Data (Doctoral dissertation, WORCESTER POLYTECHNIC INSTITUTE).

9. Habeeb, R. A. A., Nasaruddin, F., Gani, A., Hashem, I. A. T., Ahmed, E., & Imran, M. (2019). Real-time big data processing for anomaly detection: A survey. International Journal of Information Management, 45, 289-307.

10. Pasupathipillai, S. (2020). Modern Anomaly Detection: Benchmarking, Scalability and a Novel Approach.

11. Ali, M., & Iqbal, K. (2022). The Role of Apache Hadoop and Spark in Revolutionizing Financial Data Management and Analysis: A Comparative Study. Journal of Artificial Intelligence and Machine Learning in Management, 6(2), 14-28.

12. Febrer-Hernández, J. K., & Herrera Semenets, V. (2019). A Framework for Distributed Data Processing. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 24th Iberoamerican Congress, CIARP 2019, Havana, Cuba, October 28-31, 2019, Proceedings 24 (pp. 566-574). Springer International Publishing.

13. Abbady, S., Ke, C. Y., Lavergne, J., Chen, J., Raghavan, V., & Benton, R. (2017, December). Online mining for association rules and collective anomalies in data streams. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 2370-2379). IEEE.

14. Dubuc, C. (2021). A Real-time Log Correlation System for Security Information and Event Management.

15. Daub, F. J. F. (2017). Design and Evaluation of a Cloud Native Data Analysis Pipeline for Cyber Physical Production Systems (Master's thesis, Universidad Catolica de Cordoba (Argentina)).

16. Thumburu, S. K. R. (2022). EDI and Blockchain in Supply Chain: A Security Analysis. Journal of Innovative Technologies, 5(1).

17. Thumburu, S. K. R. (2022). The Impact of Cloud Migration on EDI Costs and Performance. Innovative Engineering Sciences Journal, 2(1).

18. Gade, K. R. (2022). Data Analytics: Data Fabric Architecture and Its Benefits for Data Management. MZ Computing Journal, 3(2).

19. Gade, K. R. (2022). Migrations: AWS Cloud Optimization Strategies to Reduce Costs and Improve Performance. MZ Computing Journal, 3(1).

20. Katari, A., & Vangala, R. Data Privacy and Compliance in Cloud Data Management for Fintech.

21. Katari, A., Ankam, M., & Shankar, R. Data Versioning and Time Travel In Delta Lake for Financial Services: Use Cases and Implementation.

22. Komandla, V. Enhancing Product Development through Continuous Feedback Integration “Vineela Komandla”.

23. Komandla, V. Enhancing Security and Growth: Evaluating Password Vault Solutions for Fintech Companies.

24. Thumburu, S. K. R. (2021). A Framework for EDI Data Governance in Supply Chain Organizations. Innovative Computer Sciences Journal, 7(1).

25. Gade, K. R. (2021). Cost Optimization Strategies for Cloud Migrations. MZ Computing Journal, 2(2).

Using Apache Flink, scaling rule-based abnormality and fraud detection and corporate procedure monitoring.

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

Language