Dynamic Data Processing Pipeline: Integrasi Apache Kafka dan Apache Spark

Authors

  • Sofyan Mufti Prasetiyo Universitas Pamulang
  • Marji Universitas Pamulang
  • Aditya Rahman Purwanto Universitas Pamulang
  • Fransiskus Frengky Farnebun Universitas Pamulang

Keywords:

Dynamic Data Processing Pipeline: Integrasi Apache Kafka dan Apache Spark

Abstract

Di era di mana kebutuhan akan pengolahan data real-time semakin mendesak, integrasi antara Apache Kafka dan Apache Spark menjadi kunci dalam memenuhi tuntutan ini. Artikel ini menggali bagaimana kombinasi antara Kafka sebagai sistem pesan yang andal dan Spark sebagai platform pemrosesan in-memory mampu mengoptimalkan manajemen data untuk menghadapi aliran data yang besar dan heterogen. Kami mendiskusikan desain, implementasi, serta manfaat dari integrasi ini dalam membangun jalur data yang efisien dan responsif, memungkinkan organisasi untuk mendapatkan wawasan yang bernilai dengan latensi minimal.

References

Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A Distributed Messaging System for Log Processing. In Proceedings of the ACM International Conference on Distributed Event-Based Systems (pp. 1-7).

Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013). Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (pp. 423-438).

Grolinger, K., Higashino, W. A., Tiwari, A., & Capretz, M. A. (2013). Data Management in Cloud Environments: NoSQL and NewSQL Data Stores. Journal of Cloud Computing: Advances, Systems and Applications, 2(1), 22.

Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., ... & Zaharia, M. (2016). MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research, 17(1), 1235- 1241.

Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., ... & Fu, M. (2014). Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (pp. 147-156).

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., ... & Stoica, I. (2012). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (pp. 2-2).

Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster Computing with Working Sets. HotCloud, 10(10-10), 95.

Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., ... & Stoica, I. (2016). Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM, 59(11), 56- 65.

Downloads

Published

2024-05-30

How to Cite

Sofyan Mufti Prasetiyo, Marji, Aditya Rahman Purwanto, & Fransiskus Frengky Farnebun. (2024). Dynamic Data Processing Pipeline: Integrasi Apache Kafka dan Apache Spark. JRIIN :Jurnal Riset Informatika Dan Inovasi, 1(12), 1240–1243. Retrieved from https://jurnalmahasiswa.com/index.php/jriin/article/view/1058

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.