- The Apache Mesos kernel runs on every computer resource and is designed to provide applications with APIs for scheduling and resource management
- It works with a range of applications such as Hadoop, Spark, Kafka, Elasticsearch
- Chronos was built to be a fault-tolerant and distributed job scheduler on top of Mesos
- Automated job scheduling is a critical process in datacenter and cloud environments
The increase in data volume has led to an expansion in the IT infrastructure used by enterprises. From a single Mainframe system, today businesses need an array of servers, databases, and operating systems, sometimes a separate one for each department, built on various scripting languages such as Java or Python. As a result, companies now cannot get by with one job scheduler running on one machine and so end up taking a fragmented approach with different schedulers and custom scripts for different systems.
While this may seem like a natural progression, the truth is that — the more complex the workflow, the more difficult scheduling can become.
Often, IT teams try to build their own algorithms to handle this complexity, but this can be challenging and better handled with automation. Distributed schedulers can enable reliable scheduling and automation of workloads across silos and multiple specialized servers.
The Benefits for a Distributed System
A distributed scheduling system is more fault-tolerant than a traditional job scheduler. A traditional scheduler is usually installed on the execution machine or communicates with one execution machine.
As a result, if one machine goes down,critical processes or jobs are disrupted. Whereas, in the case of a distributed system, when a system goes down, the scheduler simply routes the affected jobs to available machines.
There can be three types of distributed environments:
- Centralized, where a central node distributes and orchestrates jobs to worker or execution nodes
- Decentralized, where multiple central nodes work with their own subset of the system
- Tiered, where there is a node for the scheduling software, one where the workload is executed and one for database access
Often, distributed scheduling systems are decentralized and open-source options such as cron (Linux/UNIX) or Apache Mesos are used to manage it. The cron daemon (crond) was traditionally used in the Linux operating system for the time-based execution of commands or scripts.
Chronos replaces cron in the Mesos distributed systems kernel.
Chronos is a popular, open-source Mesos framework, developed by Airbnb, to manage its complex data analysis pipelines.
Some of the key functions were to:
- Make it highly available
- Automatically retry when there is a job failure
- Improve flexibility
- Schedule commands or scripts
- Use built-in containerizers in Mesos
Read about other advanced tools recommended by Merit’s data engineering experts for powering and optimising your BI Stack.
Chronos is a distributed and fault-tolerant scheduler that runs on top of Apache Mesos and helps with the following:
- Enables orchestration of jobs
- Enables the use of Mesos as a job executor
- Facilitates interactions with Hadoop
- Defines triggers when a job execution is complete
- Supports dependency chains of any length
Chronos allows the creation of standalone schedule-based jobs easily and reliably. You can also specify the schedule and resources offered up by the Mesos slaves to complete complex dependency-based jobs and pipelines. Its out-of-the-box support helps to run commands in Linux control groups (cgroups) and Docker containers and ensures that time-based jobs are running as per the schedule and efficiently using datacenter resources.
Chronos not only supports custom Mesos executors but also the default command executor.
Jobs can be scheduled using ISO8601 repeating interval notation for greater flexibility.
Chronos can be made to interact with systems such as Hadoop whether it is installed on the Mesos slaves on which the execution happens or not. Transferring and executing files on a remote machine in the background is allowed by included wrapper scripts and Chronos job completion (or failure) is notified by asynchronous callbacks.
How Chronos Works
The Chronos workflow looks something like this:
- Reads all job state(s) from the state store such as Zookeeper
- The jobs are then registered in the scheduler and loaded into the job graph from where it is tracked for dependencies
- Jobs are separated into a list of those which should be run at the current time (based on the clock of the host machine), and those which should not be run
- The jobs in the list are queued and launched when a sufficient offer becomes available
- The scheduler sleeps till the next job, for which the process begins from step 1 again
The Benefits and Limitations of Chronos
One of the key advantages of Chronos is that it enables writing job metrics to Cassandra for further analysis, validation, and party favors. Notifications can be sent to various endpoints such as email, Slack, and the like. The metrics can be exported to Graphite and so on.
But keep in mind that Chronos cannot magically solve all distributed computing problems, cannot guarantee precise scheduling, clock synchronization or that the jobs run as planned.
Solving some of these challenges require Chronos 2.3.0 and Mesos 0.21.0, which make it easy to schedule Docker containers to run ETL, batch and analytics applications on top of Apache Mesos. A graphical user interface or an expressive REST API can be used to create the jobs.
Containerizing job launches using Chronos simplifies the distribution of job processing without the need for manual setup on the cluster nodes. A dependency graph between scheduled jobs is also enabled so that jobs depending on the output of previous jobs get triggered only after that job is successfully executed.
About Merit Group
Merit Data and Technology Ltd. has over 15 years of experience in business intelligence and software development, developing applications for some of the world’s leading brands.
Our offshore data engineers, software developers and testing engineers work closely with product and program managers at client companies to build applications that meet all functional and performance requirements.
Merit Group is also a trusted technology partner for some of the world’s leading B2B intelligence companies like Wilmington, Dow Jones, Glenigan, and Haymarket. Our data and engineering teams work closely with our clients to build data products and business intelligence tools. Our work directly impacts business growth by helping our clients to identify high-growth opportunities.
Merit has deep expertise and experience in various technologies including Microsoft.net, Perl, Python, Oracle, PHP, Hadoop, Salesforce.com, Tableau, ASP.net, Microsoft SQL Server, and many more.
As businesses race towards transformation by adopting digital solutions, we help increase their efficiency and implement solutions such as a Chronos distributed scheduling system to automate the job distribution process.
Related Case Studies
Enhancing News Relevance Classification Using NLP
A leading global B2B sports intelligence company that delivers a competitive advantage to businesses in the sporting industry providing commercial strategies and business-critical data had a specific challenge.
Mitigating Tech Resourcing Challenges with Highly Skilled Offshore Talent
Discover how a global B2B media business, with over £400 million in annual turnover dealt with the challenge of tight deployment and development timelines with little room for recruitment or onboarding.