Meet with other users of the open-source programming language. Previous this meetup focused only on the R language, but now are focused on all open-source data analysis tools; including but not limited to, Python, WEKA, Sage, etc.

Learn and share tricks and techniques from and with other users. Beginners welcome.

Small 133543 Shane Conway on

Introduction to Reinforcement Learning

Machine learning is often divided into three categories: supervised, unsupervised, and reinforcement learning.  Reinforcement learning concerns problems with sequences of decisions (where each decision affects subsequent opportunities), in which the effects can be uncertain, and with potentially long-term goals.  It has achieved immense success in various different fields, especially AI/Robotics and Operations Research, by providing a framework for learning from interactions with an environment and feedback in the form of rewards and penalties.

Shane Conway, researcher at Kepos Capital, gives a general overview of reinforcement learning, covering how to solve cases where there is uncertainty both in actions and states, as well as where the state space is very large.  


This talk was given at the New York Open Statistical Programming Meetup.

Placeholder Chris Wiggins on

Computational Biology and Data Science at the New York Times

Nearly all fields have been or are being transformed by the availability of copious data and the tools to learn from them. Dr. Chris Wiggins (Chief Data Scientist, New York Times) will talk about using machine learning and large data in both academia and in business. He shares some ways re-framing domain questions as machine learning tasks has opened up new avenues for understanding both in academic research and in real-world applications.

Dataenconfnyc2016 logos4


This talk was given at iHeartRadio and hosted by the New York Open Statistical Programming Meetup.

Small harlan Harlan Harris on

Use Modern Spatial Analysis Tools to Put Your Meetup on the Map

Meetup organizers and business owners have the same question: "Where should I put my event, store, or factory to maximize attendance?" You could pay consultants thousands of dollars or figure it out with a free weekend and R!

In this presentation, Harlan Harris  replicates a spatial analysis he performed for DC Meetups. Techniques that will be discussed include: working with latitude/longitude data, constructing geometric cost functions, mapping, continuous optimization, global optimization, and dynamic report generation. R packages used include: knitr, plyr, reshape2, ggplot2, ggmap, and DEoptim.


This talk was presented at the New York Open Statistical Programming Meetup hosted by Pivotal Labs in NYC.

Small jeroenjanssens Jeroen Janssens on

Obtaining, Scrubbing, and Exploring Data at the Command Line

Data scientists love to create exciting data visualizations and insightful models. However, before they get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data.

In this talk, Jeroen Janssens, from YPlan, talks about the *nix command line. Although it was invented decades ago, it remains a powerful environment for many data science tasks. It provides a read-eval-print loop (REPL) that is often much more convenient for exploratory data analysis than the edit-compile-run-debug cycle associated with scripts or even programs. Even if you're already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make any data scientist more efficient.

This talk was recorded at the NY Open Statistical Programming meetup at Knewton.


Small 39b7a68b6cbc43ec7683ad0bcc4c9570 Paul Dix on

Introduction to InfluxDB

In this presentation, Paul Dix from Errplane gives and introduction to InfluxDB, an open source distributed time series database that he created. Paul talks about why one would want a database that's specifically for time series and also covers its API as well as some of the key features of InfluxDB, including:

• Stores metrics (like Graphite) and events (like page views, exceptions, deploys) • No external dependencies (self contained binary) • Fast. Handles many thousands of writes per second on a single node  HTTP API for reading and writing data  SQL-like query language • Distributed to scale out to many machines  Built in aggregate and statistics functions  Built in downsampling


This talk was recorded at the New York Open Statistical Programming meetup at Knewton.

Small jmw bigger John Myles White on

Streaming Data Analysis and Online Learning by John Myles White

In this talk, "Streaming Data Analysis and Online Learning," John Myles White of Facebook surveys some basic methods for analyzing data in a streaming manner. He focuses on using stochastic gradient descent (SGD) to fit models to data sets that arrive in small chunks, discussing some basic implementation issues and demonstrating the effectiveness of SGD for problems like linear and logistic regression as well as matrix factorization. He also describes how these methods allow ML systems to adapt to user data in real-time. This talk was recorded at the New York Open Statistical Programming meetup at Knewton.


Small 823468 10151333735096871 878018107 o Tal Galili on

Tal Galil - Creating Beautiful Trees of Clusterings with R (+a bonus)

In this talk, Tal Galili, the founder of R-bloggers, will present his recent work "dendextend," a package intended for visualizing and comparing trees of hierarchical clusterings (a.k.a: dendrograms) with R. This talk was recorded at the New York Statistical Programming meetup at Knewton.


Tal begins his presentation with a short overview of  "dendrogram" object in R and its manipulation with the "dendextend" package. He then discusses how to create, change, visualize, and statistically compare two trees of hierarchical clusterings (with some sprinkles of Rcpp).

Tal ends with a 5-minute lightening talk teaching how one can quickly update R on windows/mac, using the 'installr' package.


Small 230348 1870593437981 1035450059 32104665 3285049 n cropped Aditya Mukerjee on

How to Make Your Statistical Programs More Scalable With Go

In this talk, "Using Go for Statistical Programming," Aditya Mukerjee, student at Cornell Tech, discusses how to use Google's Go programming language for statistics. This talk was recorded at the New York Open Statistical Programming meetup at Knewton.

While R is the language of choice for academic statisticians, data scientists sometimes use other languages and frameworks for advantages such as distributed computing, speed, and portability.  Go, a new language developed by Google, provides a convenient alternative for statistical work, with built-in concurrency features and a focus on both speed and stability. In this talk, Aditya will provide an overview of the current state of statistical programming in Go, and some basic tips for getting started.


Join Us