Optimization Functions in Julia

Update 10/30/2013: Since this post was written, Julia has acquired a large body of optimization tools, which have been grouped under the heading of JuliaOpt.

Over the last few weeks, I’ve made a concerted effort to develop a basic suite of optimization algorithms for Julia so that Matlab programmers used to using fminunc() and R programmers used to using optim() can start to transition code over to Julia that requires access to simple optimization algorithms like L-BFGS and the Nelder-Mead method.

As of today, I believe that enough progress has been made to justify publicly announcing the first draft of an optimization suite for Julia tentatively called optim.jl. I intend to keep working on optim.jl for the next few months and expect that it will eventually become one of the first Julia packages.

For an end-user, interactions with optim.jl should take place almost entirely through the `optimize()` function, which I demonstrate how to use below in the context of minimizing the Rosenbrock function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
load("src/init.jl")
 
function rosenbrock(x::Vector)
  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
 
function rosenbrock_gradient(x::Vector)
  [-2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1],
   200.0 * (x[2] - x[1]^2)]
end
 
function rosenbrock_hessian(x::Vector)
  h = zeros(2, 2)
  h[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2
  h[1, 2] = -400.0 * x[1]
  h[2, 1] = -400.0 * x[1]
  h[2, 2] = 200.0
  h
end
 
problem = Dict()
problem[:f] = rosenbrock
problem[:g] = rosenbrock_gradient
problem[:h] = rosenbrock_hessian
problem[:initial_x] = [0.0, 0.0]
problem[:solution] = [1.0, 1.0]
 
algorithms = ["naive_gradient_descent",
              "gradient_descent",
              "newton",
              "bfgs",
              "l-bfgs",
              "nelder-mead",
              "sa"]
 
for algorithm = algorithms
  results = optimize(problem[:f],
                     problem[:g],
                     problem[:h],
                     problem[:initial_x],
                     algorithm,
                     10e-8,
                     true)
  print(results)
end

I think the code in my current draft of optim.jl is basically usable, but I’m still a little unsure of its correctness as I’ve never developed optimization algorithms before. I’ve set up a basic suite of benchmark functions to test these functions, but I could really use some additional use cases and benchmarks. If you’re interested, please submit pull requests through GitHub that provide new functions that you want optim.jl to be able to minimize automatically. If you submit a function, please provide the function itself, its gradient, its Hessian, a starting point and the global minimum of the function.

I’ve already set up five test functions as benchmarks, which are:

  1. A simple exponential function.
  2. A simple parabolic function.
  3. A simple 4th-degree polynomial function.
  4. The Rosenbrock function.
  5. The Powell function.

The code for these five functions can be seen on GitHub along with drafts of a few other functions.

Below you can see plots that depict the performance of the seven algorithms that are currently available on each of these five problems. In some of these plots the bar for Naive Gradient Descent drops out: that’s because the constant step-size gradient descent algorithm behaves pathologically and produces NaN’s at some point. I’d stay away from that function unless you’re sure it’s what you want.

Stepping away from the perils of hard-coding a step-size for gradient descent, I think the take away message from these plots is that you should use BFGS, L-BFGS or Newton’s Method if you can compute or safely approximate the Hessian of a function. Only fall back on the other four methods if you find that you really need to avoid using even approximate Hessians or can’t compute the gradient of your to-be-minimized function.

The next few steps ahead are to add a few more optimization methods including Brent’s method and conjugate gradients; then start working on constrained optimization; and finally get to work on incorporating automatic differentiation and finite differencing. With those things in place, I think we’ll have a basic optimization function for Julia that will let us keep up with R’s optim. I think we’ve got a long way to go to catch up with fminunc, but I’m sure we’ll get there by the time Julia hits a 1.0 release.

Run times
Solution error

15 responses to “Optimization Functions in Julia”

  1. Harlan

    Oh, very nice! Have you tried any back-of-the-envelope speed benchmarks versus Matlab or R yet? Just curious whether the pure-Julia is within an order of magnitude of the highly-optimized Fortran…

  2. John Myles White

    I haven’t done any comparisons with Matlab or R for speed so far. Right now I haven’t even begun to consider optimized code, since correctness is still a little up in the air. The Fortran code is very optimized for sure, but also so opaque as to be essentially uneditable. We may be slower, but we should be much more interpretable in the near future.

  3. gappy3000

    John, I don’t want to rain on your parade, but… I would not recommend implementing optimization algorithms from scratch in Julia. The rationale for this is identical to the one according to which it’s not wise to implement basic linear algebra operators for scratch in any language other than FORTRAN, and in all cases it’s better to link to good BLAS and LAPACK libraries. Much more than in the linear algebra cases, the performance ratios between a library developed by an expert, and a vanilla implementation of a vanilla algorithm can be very high. To get an idea of the performance variability among “expert” implementations, check http://plato.asu.edu/bench.html . It’s not uncommon to see ratios of 100x. And I can guarantee that the ratio between the slowest expert implementation and a vanilla one is much higher than 100x. Put together the two things, and you easily get slowdowns of 10^4 (when you get to a solution — not guaranteed!). Just to give you some context, IPOPT (which is probably the fastest free library, even if it’s slower than most commercial libraries) has been developed almost full time by Andreas Wachter for about 10 years with contributions from others, first at IBM Research and then at Northwestern. Besides speed considerations, there are always considerations of DRY and comparative advantage.

    My recommendation would be to: i. check out http://www.coin-or.org/ (the repository for open-source optimization code); ii. target a few problem classes; perhaps Convex, Quadratic first and Linear and Mixed Integer much later; iii. indentify the best-of-breed libraries; iv. link up with the core developers and get guidance in developing an API of these packages in Julia. A part of this could be a great summer of code project.

    By the way, both R and MATLAB have poor optimization support. This could be a real differentiator for Julia.

  4. John Myles White

    Hi Giuseppe,

    Thanks so much for the input. I didn’t realize that the expert libraries were so substantially more performant and that the situation would be analogous to something like a BLAS.

    The trouble for me is that the code for the open versions of Fortan libraries, like Nocedal’s L-BFGS code, is essentially unreadable, which is not desirable from my standpoint as a scientist.

    What do you think of the merits of allowing pure Julia implementations to co-exist with wrappers for long-standing open libraries? Pure folly?

    I’m also surprised to hear that Matlab’s optimization support is considered poor; while I can easily imagine that being the case for an expert in optimization, it’s also the case that many of the best major machine learning labs in the world are using Matlab’s optimization functions to engineer their software.

  5. gappy3000

    Answering from the end because it’s fun: Yes, MATLAB’s optimization is utter crap. I may be mistaken, but all the ML libraries I have seen in MATLAB either implement the core numerical routines in C/C++, or code the algorithms from scratch, the point being that one can obtain speedups by specializing the approach. I don’t know of libraries that have dependencies on the optimization toolbox. Happy to be proven wrong though.

    Hmm, pure Julia implementations… in a way, this avenue was attempted both with LISP and with Java, where you can find plenty of native numerical linear algebra libraries. For these two languages, the approach didn’t work in the long run because the performance penalty was so high, and because eventually no one maintaned them. Optimization is used less than matrix multiplication of course, but the penalty is even higher. My qualified answer is: for hacking and very small-scale problems, I see no issues. But I think the approach will run out of breath quickly. Optimization is a core numerical capability of a scientific environment, and it’s underserved by scripting languages, leaving us to spend anywhere between 5K and 100K/yr in optimization software. It deserves to be taken seriously, and Julia can do it. Another (experimental) approach would be to interest a PhD student or an optimization expert into writing a convex optimization solver in Julia. Stanford still has groups in this line of research (Boyd at EE-LIDS, Murray at MS&E).

    Agreed: Nocedal’s code is unreadable, slow, old and unmaintained. But, as mentioned above, there’s better code that is faster and maintained.

  6. gappy3000

    Yes, searching for ICML fminunc, I can see a few instances where these functions are used. My take would be: wherever optim() worked for you, probably some home-made code can work as well. Although MATLAB’s functions mail fail (typically indicated by a timeout), I don’t know that they give the wrong solution on convex problem (not to be ruled out — this happens in commercial implementations too). If go along the route of implementing an optimization routine in Julia, my recommendation would be to focus on one or two algorithms, and put together a decent test suite for them (the benchmarking site I linked to before has some), to make them somewhat trustworthy.

    I guess that it boils down to what are one’s personal goals, and more broadly for the language. As a research tool, everything is fair game, even implementing an SVD from scratch. For publication, I would be bit wary of home-made numerical routines, more on trustworthyness/correctness concerns than performance. For anything that is large-scale or vaguely production-related, I am afraid there is no choice.

    I think Julia may benefit in the short run from this though. At least, it’s a proof of concept, and if you can show that the performance/results are on par with R, it’s quite good one. Still, I also believe that good libraries drive out of circulation bad ones (a sort of software Gresham’s law in reverse…): as soon as a solid and accepted library is available, it becomes a focal point and the ad hoc implementations wither away. To see this in action, see what CVXOPT did to optimization and Pandas did to data manipulation for Python.

    In light of this… wouldn’t you allocate your coding/thinking time on projects/ideas with the highest probability of survival?

  7. Blake Johnson

    I would tend to agree with John, that having a library with readable code is often more important to me than having the highest performance. Julia is currently lacking even very simple things that I need on a day-to-day basis, for example: non-linear least-squares fitting (unconstrained and constrained), like MATLAB’s nlinfit and lsqnonlin (statistics toolbox) or SciPy’s curve_fit. In both cases, the code is written in the native language of the tool.

    As an aside, there are modern alternatives to Fortran BLAS libraries. For example, Eigen (http://eigen.tuxfamily.org/) is essentially as fast or faster than any Fortran-based implementation, while also providing an interface that ejects the ridiculous dgemm(…) paradigm.

  8. John Myles White

    At this point, my view of this topic is simple: defining a clean API for simple optimization is what really matters. As time goes on, we can switch to the best backend possible for that API. For now, done is better than perfect.

  9. Michele

    Hello John,
    Do you have any plan to implement constrained optimization? Or is there a quick way to do it using optim.jl?
    Thank you,
    michele

  10. John Myles White

    Right now, there’s one function in Optim.jl called fminbox that provides a mechanism for doing constrained optimization. We’ll add others after I graduate. If fminbox doesn’t work, you might try creating a function that maps the real line into the constrained space in which you need to work, then doing optimization in that space. That approach, of course, assumes that the solution isn’t on the boundary of the input domain.

  11. honfui

    Hi John,

    I’m looking for SQP based solver, like Rsolnp or alabama in R language.
    Do you have any idea where I can find it in Julia? Thanks
    hon_fui

  12. John Myles White

    Right now there are no QP solvers in Julia. Hopefully we’ll have one soon.

  13. Guilherme Freitas

    Some thoughts (sorry for not elaborating, it’s a tough night):

    1. AMPL [1] is awesome and maybe a “default” Julia optimization package can take many lessons from them. If we could use their solver interface (which I *think* is open source), we would have immediate access to a large number of industry-grade solvers already interfaced to AMPL.

    2. Something like CVXOPT [2] would be beyond awesome.

    3. Other Python packages to look at: Pyomo [3] (a project by William Hart at Sandia National labs) and NLpy [4] (by Dominique Orban at Ecole Polytechnique de Montreal) is also worth a look for ideas. Pyomo uses the AMPL solver interface trick I mentioned above. I think NLpy reuses also some part of AMPL. There are some interesting design decisions there. I also like the syntax of PuLP [5] quite a bit.

    4. Important solvers to have: IPOPT [6] (interior point) and ALGENCAN [7] (active set) . They are certainly among the best (if not *the* best) open source, smooth nonlinear solvers out there.

    5. Automatic differentiation [8, 9] would be extremely useful. There are many libraries out there, and a very good book by Andreas Griewank published by SIAM.

    6. A solver for systems of smooth nonlinear equations would be great. For something readable, maybe the solvers in Kelley’s little book [10] published by SIAM would be ideal… Well documented, high-quality and very readable. The book has MATLAB implementations that could be easily ported.

    7. Good linear programming support. I am guessing clp [11] by the COIN-OR foundation would be among the best. A good simplex solver that could use fractions or arbitrary precision numbers would be great too! I think GLPK [12] does that already, need to check.

    8. Talking about GLPK, a lot of people would be well served by a good interface to GLPK and the GSL [13]. There is a GSL-Shell project [14] in Lua that might be worth looking at.

    I am not an expert in optimization, but I am an enthusiast, a user and I know a fair amount of math and programming. I would love to contribute. Send me an email if you want to coordinate something! I found out about Julia recently, and it looks neat!

    [1] http://www.ampl.com
    [2] http://abel.ee.ucla.edu/cvxopt/
    [3] https://software.sandia.gov/trac/coopr/wiki/Pyomo
    [4] https://github.com/dpo/nlpy
    [5] http://pythonhosted.org/PuLP/index.html
    [6] https://projects.coin-or.org/Ipopt
    [7] http://www.ime.usp.br/~egbirgin/tango/codes.php
    [8] http://www.autodiff.org/
    [9] https://en.wikipedia.org/wiki/Automatic_differentiation
    [10] http://www.ec-securehost.com/SIAM/FA01.html
    [11] http://www.coin-or.org/Clp/
    [12] https://www.gnu.org/software/glpk/
    [13] https://www.gnu.org/software/gsl/
    [14] http://www.nongnu.org/gsl-shell/

  14. PhilM

    I got here while searching for fminunc implementation in Javascript! The comments about not implementing optimizations algorithms natively is quite amusing. I have seen my share of fast and optimized code but totally unreadable. Having written some optimized code myself, in C, the techniques I followed then, which resulted in somewhat unreadable code, are not so important today. I haven’t programmed in Fortran for over thirty years and I am really glad that I don’t.

    The advice to not do things yourself but wrap someone else’s work is so frequently dished out that my concern is about just about the opposite. This commonsense and wise approach of reuse is probably killing the birth of better structured and modern implementation of many esoteric algorithms. I am happy to see some people lack this commonsense and wisdom and commit follies!

    I just think it is fine to reinvent the wheel. People do it constantly and come up with something better. If nothing else, you end up learning something worthwhile.

    P