“Big data” is collected from many sources in real-time, but is processed after collection in batches to provide information about the past. Modern apps, however, need to respond to events happening now, not to yesterday's news. To do this they use a “fast data” pipeline, which processes data as it is collected to provide real-time insights.
Implementing a fast data pipeline can enable your modern app do amazing things, but it also poses challenges that aren’t a problem when you’re dealing with big data alone. Building your fast data pipeline—and the app that it supports—on top of the Datacenter Operating System (DC/OS) can help alleviate some of these challenges.
Fast data requires many different components, which each have different requirements and prerequisites. DC/OS allows you to provision these components with a single click or command.
When you rely on fast data, every second counts. You need your fast data pipeline to be more resilient than the infrastructure it runs on, and DC/OS provides that resiliency.
Maintaining enough infrastructure to handle data-processing peaks can be expensive. DC/OS allows you to scale up critical micro-services during peaks, and schedules extra resources efficiently during troughs.
Provisioning the different data services that comprise a fast data pipeline (message queues to track data streams, processing programs, and databases to store results) can be a huge challenge, because each service has it's own specific requirements. DC/OS abstracts away the details and allows you to provision and run all of these services on a common set of resources with your containerized or legacy app. You can install most data services in the DC/OS universe with a single click in the GUI or single command from the CLI. When provisioning data services is easy, you can experiment with multiple options, allowing you to optimize your pipeline for speed and performance—not provisioning time.
If you are processing big data when a server in your data center fails, you have time to reschedule your process. But when you depend on fast data, every second counts. DC/OS provides a resilient architecture that automatically reschedules tasks that were running on failed nodes. Even under normal conditions logging, debugging, and metrics gathering are essential for smooth operations. With the release of 1.9, DC/OS will provide easily queryable APIs which will be able to send operational data to your choice of visualization tools.
Big data data is processed in batches, which gives you control over the amount of processing power that you need at any given time. But fast data can be unpredictable, and the amount of processing power you need can vary from moment to moment. DC/OS allows you to scale apps up, down, and out safely. It maximizes server efficiency during peak times, which saves you money and guards against downtime.