Like many people, I constantly find interesting codes, projects and datasets everyday. Here is a partial list of more interesting projects that I’m using them in my research or learning or I might check them out later! There’s a short description plus their source code address, if available.

Table of Contents



This is a collection of >150000 fully open datasets managed and hosted by the U.S. General Services Administration. This website contains collections across many topics from agriculture and local governments to science and finance.

Website | Github

United Nations Global Pulse

This is a UN initiative focusing on big data to tackle problems for public good. They have done many important projects so far that you can find on their website and data is also available for some projects, although not in a central place.




ANN is a library written in C++, which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions. In the nearest neighbor problem a set of data points in d-dimensional space is given. These points are preprocessed into a data structure, so that given any query point q, the nearest or generally k nearest points of P to q can be reported efficiently. The distance between two points can be defined in many ways. ANN assumes that distances are measured using any class of distance functions called Minkowski metrics. These include the well known Euclidean distance, Manhattan distance, and max distance.



Autograd can automatically differentiate native Python and Numpy code. It can handle a large subset of Python’s features, including loops, ifs, recursion and closures, and it can even take derivatives of derivatives of derivatives.


autoScale is a program that performs an automatic finite-size scaling analysis for given sets of simulated data. It implements a quite general scaling assumption and optimizes an initial set of scaling parameters that enforce a data collapse of the different data sets. The presented guide describes how the program works, it presents a detailed example and finally gives some hints on how to improve the results of a scaling analysis.

Paper | Github


CGAL is a software project that provides easy access to efficient and reliable geometric algorithms in the form of a C++ library. CGAL is used in various areas needing geometric computation, such as geographic information systems, computer aided design, molecular biology, medical imaging, computer graphics, and robotics. The library offers data structures and algorithms like triangulations, Voronoi diagrams, Boolean operations on polygons and polyhedra, point set processing, arrangements of curves, surface and volume mesh generation, geometry processing, alpha shapes, convex hull algorithms, shape analysis, AABB and KD trees.

Website | Github


DynamO is a free and open-source event-driven particle simulator. Event-driven simulation is a fast and analytical technique for particle simulation and is an alternative approach to the more traditional time-stepping approaches (such as those found in Gromacs, Liggghts, and NAMD). DynamO is a reference implementation of many established event-driven models and a research platform for the latest advances in event-driven algorithms.”

Website | Github


Eigen is an excellent C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

Website | Github

Geometry algorithms

This is not a library in the technical sense but the purpose of this site is to provide practical geometric algorithms for the software developer. It has explanation of algorithms plus C++ code.



HOOMD-blue is a general-purpose particle simulation toolkit. It scales from a single CPU core to thousands of GPUs. You define particle initial conditions and interactions in a high-level python script. Then tell HOOMD-blue how you want to execute the job and it takes care of the rest. Python job scripts give you unlimited flexibility to create custom initialization routines, control simulation parameters, and perform in situ analysis.

Website | Bitbucket


OpenMM includes everything one needs to run modern molecular simulations. It is extremely flexible with its custom functions, is open-source, and has high performance, especially on recent GPUs.

Website | Github


pyfssa is a scientific Python package for algorithmic finite-size scaling analysis at phase transitions.

Website | Github


pypercolate is a scientific Python package that implements the Newman-Ziff algorithm for Monte Carlo simulation of percolation on graphs.

Website | Github


The “Scalable Library for Eigenvalue Problem Computations”, SLEPc is a software library for the solution of large scale sparse eigenvalue problems on parallel computers. It is an extension of PETSc and can be used for linear eigenvalue problems in either standard or generalized form, with real or complex arithmetic. It can also be used for computing a partial SVD of a large, sparse, rectangular matrix, and to solve nonlinear eigenvalue problems (polynomial or general). Additionally, SLEPc provides solvers for the computation of the action of a matrix function on a vector.

Website | Bitbucket

Stony Brook algorithms

“A comprehensive collection of algorithm implementations for over seventy of the most fundamental problems in combinatorial algorithms. The problem taxonomy, implementations, and supporting material are all drawn from my book The Algorithm Design Manual.”” Website

comments powered by Disqus