Single-cell sequencing generates a new kind of genomic data, and with it new storage and compute challenges. I'll talk about recent work parallelizing analysis of this data using a variety of distributed backends (Apache Spark, Dask, Pywren, Apache Beam). I'll also discuss the Zarr format for storing and working with N-dimensional arrays, that several scientific domains have recently gravitated toward in response to challenges using HDF5 in parallel and in the cloud.
Ryan is a software developer at Mount Sinai School of Medicine, focused on open-source tools for distributed genomic- and single-cell analyses in the cloud.
.png?width=300&name=Ryan20Williams(1).png)
Company address, lorem ipsum dolor sit amet


