Monday, April 21, 2008

Summer of Code 2008

This year I am going to be participating in the Google Summer of Code as a student, for the haskell.org organisation, on the Haskell API Search as an interface to Hackage project - aka Hoogle 4. I will be mentored by Niklas Broberg, author of various tools including Haskell Source Extensions, which is already used by Hoogle. My project link gives the summary I gave for the project, but below I've posted the interesting bits from my full application. I am going to be posting my progress at least once a week once the project phase starts (about 6 weeks time). I welcome any comments!




What is the goal of the project you propose to do?

There are two main goals:

1) Make Hoogle more useful to the community, along the same path as it is currently used.

2) Make Hoogle suitable to use as the standard interface to Hackage.

Can you give some more detailed design of what precisely you intend to achieve?

# Removal of all bugs

Hoogle 3 has a number of embarrassing bugs, some of which are not easily fixed. The nastiest of these is to do with monads, which are horribly mistreated. Since I now know the underlying issues which have caused a problem with Hoogle 3, things like higher-kinded type classes can be solved in a more principled manner.

# Support for some Haskell type extensions

Hoogle 3 does not deal with multi-parameter type classes. I would like to support a variety of type system extensions, primarily by mapping them on to Haskell 98 equivalent types.

# Faster searching

The current implementation is O(n) in the number of functions in the library, where the constant factor is absolutely massive. I wish to make text searching O(s), where s is the length of the search string, and have an incredibly low constant overhead -- using the a lazy file-based trie.

The type searching also needs a massive speed up. I have some ideas on how to proceed, but it is a difficult problem! I will spend a small amount of time investigating this problem, but may have to use a simpler algorithm, rather than delay the rest of the project.

# Better deployment

Currently there is Hoogle for the base libraries, and a special (very hacked) version that supports Gtk2hs only. I have received several requests for custom Hoogle instances for tools such as XMonad, Ycr2js, wxHaskell etc. The new Hoogle will make deployment of individual versions for specific packages easy.

# Support for multiple packages

I wish to support searching through every package on Hackage at once. This requires a massive speed up in the searching algorithms.

# Generalised text searching

By searching both function names, and also cabal package descriptions, Hoogle can be much more useful in finding packages, as opposed to individual functions.

# Better Design

Hoogle 3 is a web application, with a hacked on command line program. Hoogle 4 will be a central API which can be reused from any IDE tools, and also used to build the web interface and the command line application.

# Generalised interface to all of Cabal

Hopefully all the above goals will result in a tool that is suitable to be an interface to Cabal.

What deliverables do you think are reasonable targets? Can you outline an approximate schedule of milestones?

I would plan to release a beta of Hoogle 4 approximately half way through the project, as a web application. Much of the initial design has been done, so this is primarily hacking time.

I would then hope to complete the final hackage integration for the second half. This stage will require discussion with the cabal people, and will be a combination of design, implementation and server administration/setup.

In what ways will this project benefit the wider Haskell community?

Hoogle is already of use to the community, but has never seen a final release, and has a reasonable number of known bugs. This project would produce a polished version of a tool for which we already know there is huge demand.

Hackage is working well, and gaining new packages every day. As the number of packages increases, the interface to hackage must be updated to handle this volume. Discussions with some of the hackage/cabal team seem to suggest that a search interface is the way forward. By making Hackage easier to use, everyone benefits.

No comments: