Scalable and Reliable Logging at Pinterest

At Pinterest, hundreds of services and third-party tools that are implemented in various programming languages generate billions of events every day.

To achieve scalable and reliable low latency logging, there are several challenges: (1) uploading logs that are generated in various formats from tens of thousands of hosts to Kafka in a timely manner; (2) running Kafka reliably on Amazon Web Services where the virtual instances are less reliable than on-premises hardware; (3) moving tens of terabytes data per day from Kafka to cloud storage reliably and efficiently, and guaranteeing exact one time persistence per message.

In this talk, Krishna Gade (Head of Data Engineering) and Yu Yang (Data Engineer) will present Pinterest’s logging pipeline and share their experience addressing these challenges. They dive deep into three components they developed: data uploading from service hosts to Kafka, data transportation from Kafka to S3, and data sanitization. They also share their experience in operating Kafka at scale in the cloud.

This talk was a talk recorded at our DataEngConf event in San Francisco.