Meet with other users of the open-source programming language. Previous this meetup focused only on the R language, but now are focused on all open-source data analysis tools; including but not limited to, Python, WEKA, Sage, etc.
Learn and share tricks and techniques from and with other users. Beginners welcome.
Machine learning is often divided into three categories: supervised, unsupervised, and reinforcement learning. Reinforcement learning concerns problems with sequences of decisions (where each decision affects subsequent opportunities), in which the effects can be uncertain, and with potentially long-term goals. It has achieved immense success in various different fields, especially AI/Robotics and Operations Research, by providing a framework for learning from interactions with an environment and feedback in the form of rewards and penalties.
Shane Conway, researcher at Kepos Capital, gives a general overview of reinforcement learning, covering how to solve cases where there is uncertainty both in actions and states, as well as where the state space is very large.
This talk was given at the New York Open Statistical Programming Meetup.
Nearly all fields have been or are being transformed by the availability of copious data and the tools to learn from them. Dr. Chris Wiggins (Chief Data Scientist, New York Times) will talk about using machine learning and large data in both academia and in business. He shares some ways re-framing domain questions as machine learning tasks has opened up new avenues for understanding both in academic research and in real-world applications.
This talk was given at iHeartRadio and hosted by the New York Open Statistical Programming Meetup.
Meetup organizers and business owners have the same question: "Where should I put my event, store, or factory to maximize attendance?" You could pay consultants thousands of dollars or figure it out with a free weekend and R!
In this presentation, Harlan Harris replicates a spatial analysis he performed for DC Meetups. Techniques that will be discussed include: working with latitude/longitude data, constructing geometric cost functions, mapping, continuous optimization, global optimization, and dynamic report generation. R packages used include: knitr, plyr, reshape2, ggplot2, ggmap, and DEoptim.
This talk was presented at the New York Open Statistical Programming Meetup hosted by Pivotal Labs in NYC.
There are exciting applications of network science and graphical modeling in recent brain imaging studies. Watch as Ivor Cribben examines the challenges of estimating group-level dynamic connectivity structure across subjects and outlines novel data-driven statistical methods to estimate connectivity. Techniques discussed include:
Data scientists love to create exciting data visualizations and insightful models. However, before they get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data.
In this talk, Jeroen Janssens, from YPlan, talks about the *nix command line. Although it was invented decades ago, it remains a powerful environment for many data science tasks. It provides a read-eval-print loop (REPL) that is often much more convenient for exploratory data analysis than the edit-compile-run-debug cycle associated with scripts or even programs. Even if you're already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make any data scientist more efficient.
This talk was recorded at the NY Open Statistical Programming meetup at Knewton.
In this talk, Jay Emerson of Yale University discusses his new package, ShinyHelper, designed to help people get started with Shiny from RStudio. This talk was recorded at the New York Open Statistical Programming meetup at Knewton.
In this presentation, Paul Dix from Errplane gives and introduction to InfluxDB, an open source distributed time series database that he created. Paul talks about why one would want a database that's specifically for time series and also covers its API as well as some of the key features of InfluxDB, including:
• Stores metrics (like Graphite) and events (like page views, exceptions, deploys) • No external dependencies (self contained binary) • Fast. Handles many thousands of writes per second on a single node • HTTP API for reading and writing data • SQL-like query language • Distributed to scale out to many machines • Built in aggregate and statistics functions • Built in downsampling
This talk was recorded at the New York Open Statistical Programming meetup at Knewton.
In this talk, "Streaming Data Analysis and Online Learning," John Myles White of Facebook surveys some basic methods for analyzing data in a streaming manner. He focuses on using stochastic gradient descent (SGD) to fit models to data sets that arrive in small chunks, discussing some basic implementation issues and demonstrating the effectiveness of SGD for problems like linear and logistic regression as well as matrix factorization. He also describes how these methods allow ML systems to adapt to user data in real-time. This talk was recorded at the New York Open Statistical Programming meetup at Knewton.
In this talk, Tal Galili, the founder of R-bloggers, will present his recent work "dendextend," a package intended for visualizing and comparing trees of hierarchical clusterings (a.k.a: dendrograms) with R. This talk was recorded at the New York Statistical Programming meetup at Knewton.
Tal begins his presentation with a short overview of "dendrogram" object in R and its manipulation with the "dendextend" package. He then discusses how to create, change, visualize, and statistically compare two trees of hierarchical clusterings (with some sprinkles of Rcpp).
Tal ends with a 5-minute lightening talk teaching how one can quickly update R on windows/mac, using the 'installr' package.
In this talk, "Using Go for Statistical Programming," Aditya Mukerjee, student at Cornell Tech, discusses how to use Google's Go programming language for statistics. This talk was recorded at the New York Open Statistical Programming meetup at Knewton.
While R is the language of choice for academic statisticians, data scientists sometimes use other languages and frameworks for advantages such as distributed computing, speed, and portability. Go, a new language developed by Google, provides a convenient alternative for statistical work, with built-in concurrency features and a focus on both speed and stability. In this talk, Aditya will provide an overview of the current state of statistical programming in Go, and some basic tips for getting started.