Abstract

Flink has supported Apache Mesos officially since the 1.2 release and many users have been using them together even before that. The latest releases 1.4 and 1.5 (not released at the time of writing) add a deeper integration for resource schedulers, such as Mesos, which also resulted in many new features around this integration. But what does that mean in practice for operating large cluster? In this talk, we will discuss operational best practices-alongside with some pitfalls- for operating large Flink cluster on top of Apache Mesos, including topics such as: Deployments, Monitoring, Scaling, Upgrades, * Debugging.

Operating Flink on Mesos at Scale

Slides