In this talk, Daniel Krasner covers rapid development of high performance scalable text processing solutions for tasks such as classification, semantic analysis, topic modeling and general machine learning. He demonstrates how Python modules, in particular the Rosetta Python library, can be used to process, clean, tokenize, extract features, and build statistical models with large volumes of text data. The Rosetta library focuses on creating small and simple modules (each with command line interfaces) that use very little memory and are parallelized with the multiprocessing package. Daniel also touches on LDA topic modeling and different implementations thereof (Vowpal Wabbit and Gensim). The talk is part presentation, and part “real life” example tutorial. This talk was recorded at the NYC Machine Learning meetup at Pivotal Labs.
Front-end engineers have gone through a bit of a renaissance in recent years. There’s been a wild and wonderful spurt in innovation that has completely changed what it means to be a front-end developer. We now have a vast and wide array of tools available to us, allowing us to develop faster and smarter. We now can push the limits of what a modern web browser can do, accomplishing things that two years ago seemed impossible. Bottom line, right now is an amazing time to be a front-end developer.
In early December, we held our first ever hack-day. Each product manager teamed up with one to two engineers for the day to think up and develop any idea that they wanted. At the end of hack-day, each team then presented their concepts, demoed the results, and explained how they thought their hack improved the application.
On the morning of hack-day, my partner Jeff Rand and I holed up in a corner of the office with a hearty breakfast of pancakes, bacon and eggs and quickly came up with a list of about 10-15 ideas, from improvements that we knew Members or our Sales team had asked for, to improvements to the codebase that we knew needed attention. We had two basic requirements:
In this talk, Niklas Nielsen from Mesosphere, talks about Apache Mesos, a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. In this talk, Niklas will go over how to write frameworks for Apache Mesos in Go. It can run Apache Hadoop, MPI, Hypertable, Apache Spark, Storm, Chronos, Marathon, and other applications on a dynamically shared pool of nodes. The biggest user of Mesos is Twitter, where it runs on thousands of servers. Airbnb runs all of their data infrastructure on it, processing petabytes of data. This talk was recorded at the GoSF meetup at Heroku.
In this talk, Keith Rarick, formerly of Heroku, gives a brief overview on the state of dependency management tools for Go applications. He then provides a detailed run-through of godep, including initial setup, collaboration, updating dependency packages and working with third party tools. Keith also runs through deploying to Heroku, as well as godep's general philosophy of operation, specifically, why it works and the way it works. This talk was recorded at the GoSF meetup at Heroku.
In this talk, Radu Gheorghe, from SemaText, talks about using Elasticsearch or Solr to index your logs, so you can search and also analyze them in real-time. The term “logs” can range from server logs and application events to metrics or even social media information. This talk was recorded at the NYC Search, Discovery and Analytics meetup at Pivotal Labs.
Data scientists love to create exciting data visualizations and insightful models. However, before they get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data. In this talk, Jeroen Janssens, from YPlan, talks about the *nix command line. Although it was invented decades ago, it remains a powerful environment for many data science tasks. It provides a read-eval-print loop (REPL) that is often much more convenient for exploratory data analysis than the edit-compile-run-debug cycle associated with scripts or even programs. Even if you're already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make any data scientist more efficient. This talk was recorded at the NY Open Statistical Programming meetup at Knewton.
His talk goes through building high throughput pipelines in Python with the Python / Numpy / Cython stack we use at Datadog.
A recording of this awesome talk is below:
In this Lyceum, Ben Holzman will take us on a brief tour of the basics of 3D graphics programming and tip a toe or two in some deeper waters, like what a quaternion is and what it has to do with computer graphics. Then he will introduce the three.js library which makes it much, much easier to create an animated 3D scene than just using raw WebGL.
Ben will finish by demonstrating how to use three.js to make a 3D animated version of the Axial logo.
Click here to register for the event
In this talk, Jeremy Carroll, Operations engineer at Pinterest, talks about how they use HBase at massive scale at Pinterest. The talk will also focus on how they runs HBase on Amazon EC2 cloud and how they monitor, trouble shoot and scale the system. This talk was recorded at the Big Data Gurus meetup at Samsung R&D.