Y Combinator Alum Citus Data Wants To Make Scalable Data Analytics Accessible To Anyone

As companies from small startups to large enterprise continue to generate an ever-increasing amount of data, the demand for affordable and scalable databases also increases. Typically, this market has been the domain of large vendors like Oracle, but besides them and the usual open-source players, we’ve also seen a growing number of closed-source startups enter this space. Citus Data, which is launching version 1.0 of its CitusDB today, is the latest startup to challenge these incumbents.

The Y Combinator graduate (the company was part of the summer 2011 class) develops a scalable analytics database that’s built on top of PostgreSQL. Unlike its direct competitors, the company makes its product available for free for users who only need up to eight nodes. It’s also, as the company notes, “the first such database that’s available for download.”

The company, which has received a total of $1.6 million in seed funding, is based in Silicon Valley, with an additional office in Istanbul, Turkey. Among the seed investors are Trinity Ventures, Digital Garage, Matt Ocko, Ben Ling and Paul Buchheit. The founders are industry veterans who previously worked at Amazon and other companies that had to deal with large amounts of data since the early days of the web. The team has been working on the product for about two years now and already has a number of customers who are using CitusDB in a production environment.

[vimeo http://www.vimeo.com/44715903 w=600&h=450]

As the founders told me earlier this week, building a database than can both execute SQL queries and still scale well across machines (including in the cloud) is a hard technical problem. CitusDB often outperforms legacy databases and, the founders note, as well as Hadoop in combination with the data warehouse system Hive. Its product, CitusDB says, offers both scalability (the company promises that “you can linearly scale out to hundreds of nodes), as well as replication and high availability.

You can find a few more examples of typical use cases (including a few datasets to test on your own servers) for CitusDB here.