Owein Reese Owein Reese on

Owein Reese, Senior Engineer at Mediamath, gives an overview of Autolifts, an open source dependently typed library for auto lifting and auto mapping of functions built in Scala.

Autolifts takes advantage of Scala’s advanced type system to yield a set of abstractions for working with complex objects. We’ll introduce the concept of lifting and why you might want to incorporate this pattern in your code. Then we’ll show how the library takes that concept, mixes it with dependent types and implicit extensions to automatically lift in a type safe manner. Finally, we’ll show how using these extensions simplifies code, reducing boilerplate while making code more easily understood and maintained.

Owein Reese has been a full-time Scala developer for over five years and has spoken at several Scala meetups and conferences. He has several open source projects and leads several engineering teams at Media Math.

 

 

Continue
Unknown author on

All of us are familiar with overflow bugs. However, sometimes you write code that counts on overflow. This is a story where overflow was supposed to happen but didn’t, hence the name underflow bug.

Round-robin


In our Java implementation of the round-robin algorithm, we store the number of connections in variable size and then we call index() % size to get the index of the chosen connection. The value of index() follows the sequence 0, 1, 2, 3, … For example with size = 3, we get index() % size equal to 0, 1, 2, 0, 1, 2, 0, …, which is exactly how round-robin works.

Continue
Kailuo Wang Kailuo Wang on

Recently at iHeartRadio we decided to migrate our one monolithic Java backend service into multiple microservices, more specifically we decided to implement these microservices as Akka apps and use Akka cluster to weave them into our microservice cluster. This post is about how we reached this decision. There are three main sections in this post: first a brief discussion of the goals we wanted to achieve with microservices, second, the specific reasons why we think Akka apps without Http interface makes the most sense for our goals, and third, a brief look at the initial architecture design.

Goals we want to achieve

Goal 1. Easier and faster release cycle.

Monolithic code base for the Java backend is one of the major factors preventing us from being able to release services at a more granular pace. Code changes to multiple services have to be tested/QA as a whole, which means small changes have to wait for all other changes (relevant or irrelevant), before they can be released. We want to address this problem with microservices that can be released on a microservice by microservice basis - each microservice can have its own release schedule. Thus we can deliver new features/fixes to clients at a faster pace (with smaller steps).

Goal 2. Improve development productivity with looser and clearer inter-module dependencies

In our monolithic Java backend code, different functional areas depend on each other tightly. The dependencies are also hard to track without carefully inspecting the code, thus making them harder to manage. These over-dependencies make the whole code cumbersome to change - to change code at one place you may have to change code at multiple places accordingly. The implication of such changes are hard to understand. By dividing code into clearly separated microservices, the dependencies are much looser through messages and easier to inspect in simple configuration files.

Goal 3. Better reusability/composability

Classes in a monolithic backend tend to grow larger and larger congregating logic from different functional areas, which makes reusing more and more difficult. We want to take this opportunity to redesign the modules so that each microservice will have a smaller interface and clearly defined responsibilities. This will make it easier to reuse them as modules and compose higher level microservices with lower level ones.

Goal 4. Easier team integration

The monolithic backend codebase is huge in size and very complex to understand. It creates a higher barrier for developers from outside the dedicated backend team to contribute to the backend. Code size within each microservice, on the other hand, is much more modest and easier to learn. This opens the doors to different development organizations, such as having client developers contributing directly to the code base or a more vertically oriented team structure.

 

Why we picked Akka cluster as the core architecture for our microservice

Now let me go through a few reasons why we made this pick:


  • Out-of-the-box clustering infrastructure

  • Loose coupling without the cost of JSON parsing.

  • Transparent programming model within and across microservices.

  • Strong community and commercial support

  • High resiliency, performance and scalability.


Out-of-the-box clustering infrastructure

One of the costs of microservices is the clustering infrastructure you need to build - that includes but is not limited to discovery, load balancing, monitoring, failover and partitioning the microservices. There are 3rd party tools that can help with these clustering functionalities, but they require a strenuous integration effort and introduce significant complexity to the stack. Akka cluster provides these clustering infrastructure components out of the box. We had the cluster up and running with only a couple lines of configuration changes. Actually, these clustering functionalities are so ready that Typesafe implemented their general distributed system management tool Conductr using Akka cluster.

Loose(relatively) coupling without the cost of JSON parsing

One of the common protocols used by microservices is HTTP with JSON body. This setup has a performance cost of text (JSON) parsing and a development cost of writing JSON parsers. Akka’s message communication over binary protocol is more optimized for performance and there is no extra parsing code to write. We value both benefits over the looser coupling provided by HTTP with JSON. We also separate our message API with the service implementation to provide enough decoupling between the clients and services.

Transparent programming model within and across microservices

The Akka programming model, actor model, is transparent within and across microservices. All the calling is done through asynchronous message passing regardless of whether the caller and responder are on the same microservice or different ones. This single programming model within and across microservices has two major benefits:1) it makes development experience consistent - when writing clients, you don’t need to remember where the service actor is, local or remote. 2) it makes it easier to move logic around, you can easily merge or split microservices with little-to-no code change.

High resiliency, performance and scalability

Apache Spark uses Akka under the hood for driver-worker communication. Their team is building masterless standalone Spark mode using Akka cluster. Not much more was needed to convince us that it will comfortably satisfy our performance requirements.

Strong community and commercial support

The Akka community is unquestionably substantial and vigorous, but what sets Akka different from other candidates in our list is the option of having commercial support from Typesafe, which is provided by the core developers in the Akka teams. It’s precious for us to have the confidence that you will always be able to get the “most correct” answer to related problems and you never get blocked by a technical problem for more than 24 hours.

With these features of an Akka cluster tallying well with our goals, it was an easy decision to make.

 

Microservices with Akka Cluster

As a final note, here is a brief review our initial architecture design of the microservice platform.

All microservices are implemented in Akka apps and running in a single Akka cluster. Above this Akka cluster layer, a REST layer is composed of one or more Play! web applications serving as Http (mostly JSON) public interfaces for the microservices. They communicate with the microservices by having router actors deployed inside the akka cluster - we call these actors agents. The web apps also handle some cross cutting functionalities such as security and caching. Below the Akka cluster is the lower data storage layer, which represent the root sources of data/information for our backend.

Instances of microservices can join and leave Akka cluster according to demand. When a redundant instance of a microservice joins the cluster, all members get notified, and will automatically start to include that instance when load balancing requests to the service. Same is true when an instance leaves the cluster, members will get notified that it’s leaving the cluster. This way we can scale up and down at a service by service level.

There you go. As of now we have several microservices live in production and the cluster has served us very well. I will post further updates, hopefully good ones, later.

Continue
aothman aothman on

Scores are a way for domain experts to communicate the quality of a complex, multi-faceted object to a broader audience. Scores are ubiquitous; everything from NFL Quarterbacks to the security threat risk of software has a score. Scoring also has commercial potential: beyond obvious applications (e.g., credit scoring) in the past twelve months both Klout (social media reputation scoring) and Walkscore (neighborhood walkability assessment) have been acquired.

HiScore is a python package that provides a new way for domain experts to quickly create and improve scoring functions: by using reference sets, a set of representative objects that are assigned scores. HiScore is currently used by a major environmental non-profit as well as IES, a startup that assesses the safety and sustainability of fracking wells.

HiScore relies on being able to interpolate through the reference set in an understandable and justifiable way. In technical terms, HiScore needs a good solution to the multivariate monotone scattered data interpolation problem. Monotone scattered data interpolation turns out to be trivial in one dimension and devilishly hard in many others. We discuss several failed approaches and false starts before finally arriving at the quasi-Kriging algorithmic foundation of HiScore. We conclude with applications, including the intuitive creation of complex scores with dozens of attributes.

The theoretical basis of HiScore is joint work with Ken Judd (Stanford).

31:36

GitHub repo here.

This video was recorded at the SF Bayarea Machine Learning meetup at Thumbtack in SF.

Continue
Unknown author on

Raghavendra Prabhu, engineering manager for the infrastructure team at Pinterest, walks through their new storage product, Zen. Built at Pinterest, Zen was originally conceived in summer 2013 and since then has grown to be one of the thier most widely used storage solutions, powering the home feed, interest graph, messages, news and other key features.

In this talk, RVP goes over the design motivation for Zen and describes its internals including the API, type system and HBase backend. He also discusses their learnings from running the system in production over the last year and a half, the features added and performance improvements made to accommodate the fast adoption we have seen since launch.

46:50

Slides here:

This talk was given at SF Data Engineering meetup at Galvanize in San Francisco.

Continue
Unknown author on

Stefan Kutko, VP of Engineering at Electronifie, presents what his team has learned while building what is very likely the world's first electronic bond trading platform written in Node.js. He covers how Electronifie uses messaging and microservices to build their distributed system, allowing problem domains to be separated by service and each service to be custom tailored to the problem it solves. Along the way, Stefan shows how a CQRS (Command Query Responsibility Separation) architecture allows their system to scale and how patterns like Event Sourcing allow interesting benefits for financial applications. Mixed in will be glimpses of how Electronifie is breathing fresh air into FinTech by using and contributing to Open Source, plus sprinklings of Meteor, binary addons, and desktop-enable Node.js web apps!

01:16:11

Slides are available here.

This talk was presented at the NodeJS meetup at Spotify in New York.

Continue
Piyush Narang Piyush Narang on

A large reason why micro services appeal to programmers is that they resonate with the UNIX philosophy - make each program do one thing well and ensure a separation of concerns. While micro-services are quite the rage at most companies, there are a few dissenting opinions that are fans of the opposite end of the spectrum - monolithic services. This often results in both sides indulging in fairly engaging debates. Occasionally 140 characters at a time.

There are patterns to getting monolithic services to work well in production systems. Etsy is one of the places that seems to have it working reasonably well (continuous integration, multiple deployments a day, per developer VMs, monitoring, dashboards & alarms). Personally though, I'm wary about a few aspects of the monolithic setup. I'm not sure how well it can work when sizes of development  teams cross a few hundred engineers. Netflix has come out publicly about a lot of their struggles with monolithic architectures as their size grew in the early years. I’d heard similar stories about the early days while I was at Amazon (though that might not be entirely applicable as it was quite a few years back). Scaling various components might also prove to be a challenge in such architectures.

There is an aspect of micro service design that I've had my own share of lively debates over.   It centers on deciding what bits of functionality needs to be part of a service. Very strict separation of concerns can lead services to be very minimal (or the opposite - bloated services). I've listed some aspects that I’ve learnt to pay attention to while thinking about how to structure my services.

1) Scaling characteristics:

When you have application components that have very different scaling characteristics, it tends to make sense to break them out. That allows you to scale these different components independently. If necessary, you could also use different machine types for each of these components to help them perform better. For example, services that are compute intensive have different characteristics from those that are network / IO intensive.

2) Services with different availability tiers:

While I was at Amazon it was common to characterize services as Tier-1 (essential to the functioning of the site) and Tier-2 (could take a bit of a downtime without affecting users). Given the scale at which they operate, there were tons of war stories related to this. Typically it was due to services handling both Tier-1 and Tier-2 traffic together and some issues with the Tier-2 service causing the Tier-1 service to go down and leading to unhappy users. Breaking services up based on the availability SLAs was strongly encouraged.

To make things more interesting, this also extends to understanding your service's dependencies well. It makes no sense to depend on a service with lower SLAs that those you need to provide your clients. If your service has strict availability requirements you need to only take on dependencies that can provide those guarantees.

3) Latency concerns:

Pretty often when companies start transitioning from large monolithic architectures to micro-services they tend to go all out in making their services as fine grained as possible (no point in half measures :-)). Sometimes this ends up being the natural progression over a period of a couple of years as existing components are migrated to services and every new feature is wrapped as a shiny little service. Being over zealous about micro services can sometimes result in each service having to make a flood of calls to dependencies to service the call. Each call over the network can add a few milliseconds of latency to your application (see: Numbers everyone should know). If you're making a decent number of calls this adds up. I’ve seen some services making dozens of calls to process every request and the latency ended up being terrible. There are options when it comes to tackling high service latencies in these scenarios. Co-locating services on the same machines, batching calls, adding a cache layer, pre-computing results can be options depending on your setup. In some cases it helps to merge services (or not break them up in the first place) if latency is a concern and other aspects don't strongly dictate splitting the services.

4) Service maintenance:

Each service that you need to maintain involves a certain amount of overhead. You need to worry about deploying to it regularly with minimal downtime to users. Setting up monitoring and alerts needs to be done as well. You have an additional service added to your on-call responsibilities (in other words, another service to wake you up at 3am). Depending on the number of services you are spinning up and the tools at your disposal, this can amount to a good amount of work. I’ve been at teams where a person in the team needed to spend a day or two every week to ensure that all our services were deployed to world wide in a sane fashion. This was even though we had a fairly mature deployment / monitoring / alerts ecosystem at the company.

In service design, as with most interesting things in life, there is no magic bullet. What requirements matter while solving a problem in one context might not be important in another. I find it useful to consider some of the aspects I’ve listed above while designing new services. It is equally important though, to keep in mind what you're trying to optimize for (speed of development / performance / scalability / robustness) as that helps you make the right choices during this process.

Continue
Justyna Ilczuk Justyna Ilczuk on

docker-y

A few months ago, we started using Docker with Syncano and pretty much fell in love. The first blog posts we wrote (Reasons Why We Use Docker, Getting Started with Docker and Make Your Docker Workflow Awesome With Fig.sh) gave you some reasons and tools to get started with Docker - now we're going to share how we use Docker ourselves (and why we can't live without it).


In development


Docker greatly simplified our development process. Syncano is mostly written in Python, and setting up a working development environment used to take a few hours. Now, onboarding a new developer takes just a few minutes on Linux and half an hour on OSX. All they need to do is:

  • Install Docker

  • Install fig

  • Clone our syncano-platform repo

  • Type into shell:


Screen Shot 2014-11-12 at 11.11.36 AM
Our stack is quite big and consists of five main components. Every component is run in a separate container, and containers are connected over the network and environment variables using Docker links.

Our development setup consisting of Docker containers is very similar to our production setup. This gives us confidence that the application will run the same way in both development and production environments. You can read more about why your development setup should be similar to your production setup here.

Testing and continuous integration


To make sure that bugs are caught early and code is maintainable, we test everything we write, and every branch in our git repository is tested on our Continuous Integration server, CircleCI. You can read what Continuous Integration is and why people use it here.

Our Continuous Integration workflow works as follows:


  1. Build container

  2. Run tests on container

  3. Collect metrics such as coverage and so on

  4. If tests are successful, we tag the image and push it to the Docker registry

  5. If the tested git branch is on the list of branches to deploy in CircleCi configuration, we deploy the new version of Syncano using the Docker image that we pushed in the step 4


The first two steps are executed with help of fig:
Screen Shot 2014-11-12 at 11.11.44 AM

You can read more about using Fig here or setting up Docker with CircleCi on their documentation page. It's very easy!

Deployment


We deploy exactly what we tested on our CI using the same image. For deployment, we use AWS Elastic Beanstalk.

Elastic Beanstalk can run different kinds of containers, including those in Docker's stack. It currently supports Docker 1.0 and Docker 1.2.

The best features of Elastic Beanstalk are:


  • Easy autoscaling

  • Application version control

  • Good, predefined settings - you can utilize the power of AWS with very simple configurations


You can read a tutorial on deploying Docker containers with elastic beanstalk here.

However, we aren't 100 percent happy about our deployment setup. Elastic Beanstalk underutilizes Docker - it usually spawns only one Docker container per machine (which is a complete waste)! It also doesn't really support inter-container communication or failure detection - it only supports health checks. And sometimes it has some stability issues. For example, I had problems with already allocated ports and Docker not running on the host machine.

CodeBox feature


We're currently working on our CodeBox feature, which offers execution of custom code on our platform. CodeBox can completely eliminate the need for implementing a custom backend with our API. It's in alpha now, but some of our customers use it every day.

Each time our client "runs" CodeBox, a Docker container is created, code is executed inside the container, and the container is destroyed just after code execution. We have some predefined code platforms - currently we support node.js and python. Docker containers are great for this use case because they offer isolation and control over resources used by the executed code (memory, cores) and are pretty lightweight.

Summary


Using Docker in deployment gives us great confidence that we'll deploy the same app that we test. Thanks to ready-to-run Docker images, auto-scaling is fast and reliable.

Now that we've started using Docker, it's hard to imagine the Syncano platform without it!

Continue
Kinshuk Mishra Kinshuk Mishra on

Spotify Tech Lead Kinshuk Mishra and Engineer Noel Cody share their experience about building personalized ad experiences for users through iterative engineering and product development. They explain their process of continuous problem discovery, hypothesis generation, product development and experimentation. Later they deep dive into the specific ad personalization problems Spotify is solving and explain their data infrastructure technology stack in detail. They also speak about how they've experimented various product hypothesis and iteratively evolved their infrastructure to keep up with the product requirements.

Continue
Ben Sigelman Ben Sigelman on

Building large-scale distributed systems is a challenge in any language: what about Go makes distributed system-building easier, what makes it harder, and what won't work at all? Former Google software engineer Ben Sigelman answers these questions in his talk about creating distributed systems in Go. He also addresses the fundamentals of healthy distributed systems and the joys and pitfalls of building them in Go.

40:09

View Ben's slides here.

This talk was given at the GoSF meetup at Pivotal.

Continue
Patrick Reilly Patrick Reilly on

Patrick Reilly (Technology Strategist at Mesosphere, Inc.) gives an in-depth talk on Kubernetes and discuss how it can serve as the foundation for high-level tools, automation systems, and API layers.

Kubernetes is an open source implementation of container cluster management across multiple hosts. It uses Docker to package, instantiate, and run containerized applications and provides a basic mechanisms for deployment, maintenance, and scaling of applications.

00:00

This talk was given at the GoSF meetup at Pivotal.

Continue
Jay Kreps Jay Kreps on

Apache Kafka committer, Jay Kreps from LinkedIn, walks through a brief production timeline for Kafka. Jay goes over what's new with 0.8.2 and how to get the most out of new features like Log Compaction and the new Java producer. Jay also gives an overview what to expect from 0.9(?): a new consumer, better security and operational improvements.

Continue
Jeff Ward Jeff Ward on

Bitcoin futurist Jeff Ward talks about the evolving platform for digital currency and what the push towards decentralization means for web platforms as a whole. Jeff also gives a handy primer on what Bitcoin is and why people use it; and what Blockchain means for our future.

Digital currencies have been around for a while, and each is an attempt to use money natively on the internet, either for general use or for a specific domain. Otherwise, payment processors are necessary for each transaction as a trusted third party for transaction disputes. The more centralized the currency, the more vulnerable it is. With a fast development of Bitcoin services we need to think about decentralizing the currencies and building the infrastructure to support it.

 

26:33

 

This talk was recorded at the NYCHTML5 meetup at Conde Nast in New York.

Continue
Adam Warski Adam Warski on

Spray, once a stand-alone project, now part of Akka, is a toolkit for building and consuming REST services. SoftwareMill CTO and Co-founder Adam Warski demos how to build a simple REST service with Spray, and then consume it with a Spray-based client. He shows that new routes can be added very quickly, how to use type-safe query and path parameters, as well as how to create custom directives, reusing existing code.

01:02:48

This talk was given at the Scala Bay meetup hosted at SumoLogic in SF.

Continue
Wes Chow Wes Chow on

Problem: Chartbeat generates random unique user IDs in the browser when a new reader visits a customers' sites. The original 2 line random user ID function used would generate over 4.8 trillion trillion (yes, that’s 1 trillion squared) different unique IDs, but in practice we were seeing laughably high collision rates. To add to the challenge of fixing this issue, our solution had to run in all browsers, take up minimal code, and work with zero calls to a server.

Wes Chow (CTO, Chartbeat) describes the experiences in solving this problem as well as the mathematical basics of hash functions and pseudo-randomness.

Continue
Victor Vieux Victor Vieux on

Docker Software Engineer Victor Vieux gives an overview of the new features in the Docker Engine and Docker Hub. New features for the Engine include the ability to pause and unpause a container, various networking strategies, .dockerignore, and much more. For the Hub, there are many new features including organizations and groups and official repositories. Victor also goes over what’s coming in the future for the Engine.

Continue
Victor Levy Victor Levy on

Victor Levy (Senior Consultant, Princeton Consultants) talks about COSMA (CSX Onboard Systems Management Agent). COSMA is a Python application hosted on locomotives, designed to monitor, check out, and upgrade safety-related locomotive systems. In this illuminating talk, Victor discusses the challenges of Python programming for train systems and how COSMA works.

Continue
Eliot Brenner Eliot Brenner on

For most large-scale image retrieval systems, performance depends upon accurate meta-data. While content-based image retrieval has progressed in recent years, typically image contributors must provide appropriate keywords or tags that describe the image. Tagging, however, is a difficult and time-consuming task, especially for non-native English speaking contributors.

Continue
Todd Palino Todd Palino on

LinkedIn runs one of the largest installations of Kafka in the world. In this talk, Todd Palino and Clark Haskins (Site Reliability, LinkedIn) discuss Kafka from an operations point of view. You'll learn the use cases for Kafka and the tools LinkedIn has been developing to improve the management of deployed clusters. They also talk about some of the challenges of managing a multi-tenant data service and how to avoid getting woken up at 3 AM.

Continue
Michael Kjellman Michael Kjellman on

Making and implementing a C* Migration Plan
Migrating to a new database is hard: really hard. It’s almost impossible to do it perfectly. I’d like to break the migration issue into two parts: 1) maintaining integrity of your data during import and migration and 2) how to operationally plan and code a migration plan to migrate from MySQL to C* downtime free (fingers crossed!).

Continue
Chris Becker Chris Becker on

Engineers love working at Shutterstock because they get to build cool things. We aim to solve problems that matter to customers, and we’re constantly trying out new ideas through rapid prototyping.  One of the great things about our culture at Shutterstock is that an idea can come from anywhere, from the newest engineer to the CEO — we’ll try them out equally and see what resonates with users. This is how one of those ideas, our Spectrum color search, came to life.

Continue
Niklas Nielsen Niklas Nielsen on

In this talk, Niklas Nielsen from Mesosphere, talks about Apache Mesos, a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. In this talk, Niklas will go over how to write frameworks for Apache Mesos in Go. It can run Apache Hadoop, MPI, Hypertable, Apache Spark, Storm, Chronos, Marathon, and other applications on a dynamically shared pool of nodes. The biggest user of Mesos is Twitter, where it runs on thousands of servers. Airbnb runs all of their data infrastructure on it, processing petabytes of data. This talk was recorded at the GoSF meetup at Heroku.

Continue
Wiktor Macura Wiktor Macura on

In this talk, Square engineering lead Wiktor Macura talks about Square's distributed payment infrastructure detailing methods for building distributed secure systems that are optimized for various permutations of performance and reliability. By the end of this talk you will have developed a better appreciation for the CAP theorem, breaking the rules, and generally making amazing customer focused systems. This talk was recorded at the NYC Data Engineering meetup at Ebay NYC.

Continue
Joe Stein Joe Stein on

In this talk, Joe Stein, Apache Kafka committer, member of the PMC, and Founder and Principal Architect at Big Data Open Source Security, will talk on Apache Kafka an open source, distributed publish-subscribe messaging system. Joe will focus on how to get started with Apache Kafka, how replication works and more! Storm is a great system for real-time analytics and stream processing but to get the data into Storm, you need to collect your data streams with consistency and availability at high loads and large volumes. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. This talk was recorded at the NYC Storm User Group meetup at WebMD Health.

Continue
Alexis Lê-Quôc Alexis Lê-Quôc on

Imagine you are tasked with building a platform to monitor the performance of 500,000 servers in real-time. How would you design it? What tools would you choose? (Cassandra? Storm? Spark? HBase? ...) What technical challenges would you expect? As a monitoring company, Datadog receives tens of billions of telemetry data points every day and is working to change the way operations teams understand and troubleshoot their infrastructure and applications. In this talk, Alexis Lê-Quôc from Datadog talks about how they built their (Python-based) low-latency, real-time analytics pipeline. This talk was recorded at the NYC Data Engineering meetup at The Huffington Post.

Continue
Mike Curtis Mike Curtis on

Data Driven Growth at Airbnb by Mike Curtis -- As Airbnb's VP of Engineering, Mike Curtis is tasked with using big data infrastructure to provide a better UX and drive massive growth. He's also responsible for delivering simple, elegant ways to find and stay at the most interesting places in the world. He is currently working to build a team of engineers that will have a big impact as Airbnb continues to construct a bridge between the online and offline worlds. Mike's particular focus is on search and matching, systems infrastructure, payments, trust and safety, and mobile.

Continue
Camille Fournier Camille Fournier on

This is an Apache Zookeeper introduction - In this talk, Camille Fournier, from Rent The Runway, gives an introduction to ZooKeeper. She talks on why it's useful and how you should use it once you have it running. Camille goes over the high-level purpose of ZooKeeper and covers some of the basic use cases and operational concerns. One of the requirements for running Storm or a Hadoop cluster is to have a reliable Zookeeper setup. When you’re running a service distributed across a large cluster of machines, even tasks like reading configuration information, which are simple on single-machine systems, can be hard to implement reliably. This talk was recorded at the NYC Storm User Group meetup at WebMD Health.

Continue
Unknown author on

In this panel discussion, Randy Bias from Cloudscaling, Nati Shalom from Gigaspaces, and Alex Freedland from Mirantis, each share their perspectives on the topic of embracing the Amazon Web Services (AWS) APIS and architecture as a part of the OpenStack project. Dave McCrory, SVP of Platform Engineering at Warner Music Group moderates the discussion. This talk was recorded at the OpenStack New York meetup at MongoDB, formerly 10gen.

Continue