Fast Data on DC/OS

“Big data” is collected from many sources in real-time, but is processed after collection in batches to provide information about the past. Modern apps, however, need to respond to events happening now, not to yesterday's news. To do this they use a “fast data” pipeline, which processes data as it is collected to provide real-time insights.

DC/OS enables fast data

Implementing a fast data pipeline can enable your modern app do amazing things, but it also poses challenges that aren’t a problem when you’re dealing with big data alone. Building your fast data pipeline—and the app that it supports—on top of the Datacenter Operating System (DC/OS) can help alleviate some of these challenges.

On-demand provisioning

Fast data requires many different components, which each have different requirements and prerequisites. DC/OS allows you to provision these components with a single click or command.

Simplified operations

When you rely on fast data, every second counts. You need your fast data pipeline to be more resilient than the infrastructure it runs on, and DC/OS provides that resiliency.

Elastic infrastructure

Maintaining enough infrastructure to handle data-processing peaks can be expensive. DC/OS allows you to scale up critical micro-services during peaks, and schedules extra resources efficiently during troughs.

How does DC/OS provide these advantages?

On-demand provisioning

Provisioning the different data services that comprise a fast data pipeline (message queues to track data streams, processing programs, and databases to store results) can be a huge challenge, because each service has it's own specific requirements. DC/OS abstracts away the details and allows you to provision and run all of these services on a common set of resources with your containerized or legacy app. You can install most data services in the DC/OS universe with a single click in the GUI or single command from the CLI. When provisioning data services is easy, you can experiment with multiple options, allowing you to optimize your pipeline for speed and performance—not provisioning time.

Simplified operations

If you are processing big data when a server in your data center fails, you have time to reschedule your process. But when you depend on fast data, every second counts. DC/OS provides a resilient architecture that automatically reschedules tasks that were running on failed nodes. Even under normal conditions logging, debugging, and metrics gathering are essential for smooth operations. With the release of 1.9, DC/OS will provide easily queryable APIs which will be able to send operational data to your choice of visualization tools.

Elastic infrastructure

Big data data is processed in batches, which gives you control over the amount of processing power that you need at any given time. But fast data can be unpredictable, and the amount of processing power you need can vary from moment to moment. DC/OS allows you to scale apps up, down, and out safely. It maximizes server efficiency during peak times, which saves you money and guards against downtime.

Fast data resources