Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I Love Julia (stitchfix.com)
106 points by tokai on Dec 5, 2014 | hide | past | favorite | 57 comments


I love Julia too, but it currently has a bug which kills my desired workflow: the plotting library (Gadfly) takes 30 seconds to import.

Yes, there are a number of workarounds. You can use python or R for plotting, you can keep a single interactive session going and reload your source file in it, you can precompile Gadfly into Julia's global precompiled blob, etc, but the solutions take time and effort that significantly offset Julia's value proposition (for me, at least). If you haven't looked at Julia yet you might want to hold off for a while until they get it sorted out. It looks like it might happen in the next version, 0.4, which hopefully will support pre-compiled libraries.

I feel like a bit of a dick for complaining about hiccups in a beta version, but I really like Julia and I really don't want to see it go the Clojure route and simply accept several-second delays in oft-repeated tasks. Drake, I'm looking at you -- Make can build my entire C++ codebase in less time than it takes your "make-replacement" to launch! I love your features, but the stuck-in-molassas feeling I get when using your program was enough to scare me away.


We're definitely not settling for several-second delays. This will get better soon.


Great to hear! Thanks for all the hard work, btw. Julia seems to be steaming ahead at breakneck pace, it's exciting to be along for the ride!


Have you tried the userimg.jl hack? It's _very_ hacky, but it will convince you that precompiling packages is viable in the short-term.


IIRC I put ~30 minutes into trying to figure it out, hit a roadblock, and went back to using R here-documents (err, multi-line strings). Just now I put ~5 minutes into trying to figure out what the roadblock was but I can't find much info on the compilation process. I'd be very grateful for any pointers you have :)


If you're compiling Julia from source, you should be able to do something like write a base/userimg.jl file that contains this kind of text:

Base.require("Distributions.jl")

Base.require("Optim.jl")

Base.require("DataFrames.jl")

Base.require("Gadfly.jl”)

You'll need to create a new build after adding this userimg.jl file. Also be aware that changes to these packages will stop showing up when you type "using Gadfly" since you'll always access your precompiled version.


You know, one of the things that has annoyed me about python isn't that it's slow, its just it seemed like the BDFL was unconcerned about addressing performance. It is what it is, you can't fault them for having their priorities, but it bothered me. PyPy to an extent has done an excellent job of addressing it, but I'm really happy that there's a language like this that seems like it cares about making things fast without making it incredibly verbose -- I am impressed.


Yeah, I also feel the same way. If I would need to do scientific computing work, Julia would be one of my choices.


Not completely unconcerned, PSF did donate $10k to PyPy last month, so they do recognize some importance for speed, even if it isn't CPython's priority.

(and PyPy does break some python APIs and expectations, speed does come at a small cost, after all)


PyPy breaks CPython APIs. Python language stuff is very well supported.


I was specifically thinking of `gc` at the time of writing, which is a part of python, not CPython. (though it only has reason to exist as a part of CPython)

but yeah, for... probably 90%+ of use-cases, it is the PyPy<=>CPython differences that are more notable


nim (formerly nimrod) is another such language with a vaguely pythonic feel: http://nim-lang.org/


I guess LANGUAGE_NAME should have been a variable imported by their site templates. The change was back in August? b^)


Multiple dispatch alone make Julia lovable. I'm not a lisper, so I didn't really get the point until trying Julia. It's a beautiful way to solve the expression problem [1], and a nice alternative to pattern matching in functional languages.

1: http://c2.com/cgi/wiki?ExpressionProblem


R's S4 classes has multiple dispatch also.


I want to love Julia. The core language is excellently designed and better than the alternatives, although the performance in practice is not nearly as good as the benchmarks would have you believe. String manipulation and building recursive data structures composed of small types (trees) are especially slow, considerably slower than Python.

The libraries, on the other hand... I know it's new, but the DataFrames.jl package in particular gave me fits. Data frames are essential tools for statistics, and there are many problems. When I last used it, it took several minutes to load modest 10MB TSV matrices, and segfaulted entirely on slightly larger ones. It doesn't support indexes on both axes, and the developers made the extremely questionable decision to require that index names be valid symbols. I could go on.

I think the core developers should exercise more control over the library ecosystem, at least for the packages that are crucial to the type of workflow they're building the language for.


FWIW, I think the DataFrames package and its dependencies have consistently operated at the boundaries of what we know how to do efficiently in Julia. The package has had lackluster performance in many contexts primarily because it adopted many idioms from R and Python that were sharply at odds with Julia's type inference system. We're starting to clear those problems up, but there are still lots of unsolved challenges we need to resolve.

If you have any ideas about how we should modify the basic data types and functions defined in DataFrames, those ideas would go a long way to making Julia a better language.


Sorry, really late reply.

I fully appreciate that the type system imposes constraints that don't exist in Python or R. For my purposes in particular, and I think many people, I don't actually need a full-fledged data frame with heterogeneous types. What I actually want is a numeric matrix with labels on both axes and good methods for querying, group-by operations, etc. (And an equivalent numeric Series type). Big bonus for memory mapping and/or fast I/O.

I think this is an easier problem to solve, especially since factors and ordinals can be considered as a special type of numeric.

It has been too long since I've looked at the internal code structure of DataFrames.jl, but I think the biggest design flaws at the time were the requirements of index names to be symbols (probably should either be a flat String, or a choice between String and Int64), and axes on columns only. I can only assume the symbol decision was made for performance but you surely have worked with datasets given by investigators that use all kinds of random conventions for index names that don't fit the constraints of a symbol. Not to mention the very common case of numeric index names. I find it very annoying to read such a file in R and get "X1000" or whatever as my index names.

I actually tried briefly to dive in and fix the I/O problems, but the code style was daunting -- a few, very huge functions. If it hasn't been done, I would suggest breaking it up a little.

Anyway, I didn't mean to be overly critical -- I think you're doing a very important task -- but as an honest assessment of why I, as a busy scientist, found Julia to be more trouble than it was worth.


The author of the article says that relative (to Python and R) paucity of the libraries should not stop one from using Julia. I completely second this for another reason, which is the awesome PyCall package, using which you can make use of any Python library.


I learned Julia in a week and implemented a EM test particle integrator in a couple of hours. Julia is fantastic, easy, and fast.


Instead of Julia Studio, I would recommend Juno[0], the IDE based off of LightTable. While it still has some rough edges, it works very well.

[0]: http://junolab.org


> If you want to install a branch instead of master in a repo, you can do Pkg.checkout("name_of_package", branch="a_branch"). This kind of package management is much better than what is currently available for Python packaging.

Is it different from pip install git+...? See https://pip.pypa.io/en/latest/reference/pip_install.html#vcs...


The most interesting thing from this, for us that know Julia, is that Juliabox is open again without invite.


https://juliabox.org/ for those interested in trying it. If enough people try it, there may be a queue, but there are no invites required anymore.


We recently added support for IJulia Notebooks in Domino [0]. Same idea: one-click, fully hosted, scalable hardware, with version control and collaboration. More generally, we're excited to do more for the Julia community. If anyone has feature requests to improve our Julia support, please let us know.

[0] http://www.dominodatalab.com


I feel bad about sharing juliabox here some months ago blocking the service up with traffic. I hope it didn't cause too much of a headache for you julia guys.


No worries, it was a good kick-in-the-pants to work on scalability of the service :-)


Is there a syntactical reason why Julia can be fast but Python can't? Would it be possible to use Python syntax and get the same LLVM performance? (aside from the difficulty of writing a new interpreter)

Put another way, does Julia sacrifice something relative to Python? Are the objects less flexible?

Could I write Python that compiles to Julia, without losing features?


If you're interested in this topic, you should watch this talk by Steven G. Johnson: https://www.youtube.com/watch?v=jhlVHoeB05A

tl;dr of the talk: Syntax has little to no effect on how a language performs. What distinguishes Julia from Python is that Julia's semantics were designed to be amenable to type inference. The results of type inference allow Julia's compiler to generate very efficient machine code.


There were a few concerns I have about Julia packaging from this article and reading the docs http://julia.readthedocs.org/en/latest/manual/packages/

* It looks like packages are installed in a global namespace that is shared by all projects. This seems like it will get messy when you try to run older and newer projects.

* The default way to add packages is to just Pkg.add("package-name"), this makes reproducible builds difficult. This is especially an issue with a language used in scientific contexts where reproducibility is extremely important.

Are there solutions to these issues that I can't see? I'm aware that Julia is a young language so I don't expect them to solve everything at once.


Does Julia community have a typesetting tool(s) like knitr + Latex? a.k.a can i embed Julia code on to a latex document the same way I can do with R code? That is one feature that can convert me very quickly.


Not quite native, but you can use Knitr with Julia. Carson Sievert (IA State stats grad student) has written up some slides:

http://heike.github.io/stat590f/gadfly/carson-knitr

IJulia[1] is probably closest to the community supported equivalent, but it's based on IPython

[1]: https://github.com/JuliaLang/IJulia.jl


There is also Weave.jl, which is young but looks like it is coming along very nicely: https://github.com/mpastell/Weave.jl


I do wonder why Julia caught on where lush died. Any thoughts?


I don't know lush why lush died, but julia's matlab-like syntax is a big deal for me. Not because I especially like that syntax (although it's grown on my as I've used more julia), but because it's going to be a lot easier to try to get coauthors, students, etc., to use the language and because there's a lot of code relevant for what I do that's written in Matlab, and porting it (hopefully) should be relatively straightforward.


There are probably a few reasons.

Julia has roots with MIT which immediately lends it some cachet, and probably made it easier to grow a vibrant community.

Lush (which is an unfortunate name btw) used Lisp syntax, which has never been favored by math, science and engineering types. It seems obvious that equations as expressed in the language should closely resemble those used in actual math - thus the syntax of languages like Fortran, Matlab and Julia.

Writing (or reading):

(setq vx (+ vx (* ax deltat))) ; update velocity

is much more awkward than:

  vx += ax * deltat  # update velocity
This issue gets worse with more complex examples.

Lush also allowed inline C, which was probably a bad design choice. Julia allows painless C library calls, which is much cleaner.

Those are just a few thoughts off the top of my head...


Which, btw., is written in Lisp as:

    (incf vx (* ax deltat))
The equivalent of += is INCF.


I'm not sure that syntax was available in Lush. The line I cited was from a Lush example program.

http://lush.sourceforge.net/lander04.txt


In Lush a similar construct is called INCR. See its documentation. INCF in Common Lisp might be more general, since it supports a concept called place.


i've always wondered why lush never caught on. my speculation is that the intersection of people who like lisps and people who don't mind the lack of lexical scoping is very low.


It's still early days for Julia, and performance is uneven. I wouldn't use it for serious work unless 1) an expert in your field is already using it (e.g. Udell and Convex.jl) or 2) you carefully benchmark your key computations. In my case, I wrote C++ and Python benchmarks and stumbled on a performance problem that the Julia team knew about and plans to address.


Another interesting article about Julia by Evan Miller (was on HN a while ago): http://www.evanmiller.org/why-im-betting-on-julia.html

HN thread about it: https://news.ycombinator.com/item?id=7109982


Is Julia being used in production at StitchFix?


How can Julia be "on par" with C when it is itself implemented in C?


Both C and Julia are compiled to machine code before they are executed. C is compiled Ahead Of Time, and Julia compiles new code as it runs, Just In Time, whenever a function is called with argument types it hasn't seen before.

Parts of Julia's library and compiler are implemented in C, but this actually isn't very relevant to the speed of the generated machine code that actually runs.

Statements about Julia being "on par" with C mean that if you write code in a straightforward way to solve some problem, e.g. "find the three largest even integers in a collection," then Julia is capable of generating machine code that executes with efficiency "on par" with the machine code that C generates.

The "straightforward" part in the last paragraph is actually important. You could in principle solve this problem in any language by writing your own machine code generator in that language, and then the distinction between efficiency of different languages breaks down. But usually you won't do that, and so usually the distinction does have some meaning.


GNU's assembler is implemented in C.


Because Julia has been carefully designed to allow functions to compile down to very fast machine code. There are a few important design choices that are necessary to make it possible to do this (type stability, etc) - there are a few talks about the design principles that went into making Julia.

However, numerical Python can be nearly as fast as C as well with very, very little additional work (using Numba means adding @jit on top of a function). The downside is that Numba only works on the 'numpy' subset of Python, basically.


Sounds like snake oil to me.


You could, of course, just download a Julia distribution, fire up the interpreter and see for yourself. But I'm sure unfounded snark is much, much better.


If you want to know the catch, it is that it has high latency at first run, and requires a full compiler at runtime. It is optimized for numerical work, where the long runtime make slow startup irrelevant and low latency is easily amortized.

But if you are suspicious, there are introspection utilities that let you see the generated native code. Give it a try.


As I understand it there's a plan to produce stand-alone binaries downstream, with no compiler needed at runtime.

It's using LLVM on the backed, so it should be quite possible.


Oh wow, another person evangelizing about X obscure language from inside their pseudoacademic ivory tower yet providing no examples of anything useful they've done with it, or anyone has done with it for that matter.

Everytime I see this BS I think of this:

https://www.youtube.com/watch?v=lrp57IAlh84

It's fine to say something more reasonable like "I have high hopes for this very early stage language in the future", but this kind of fanfare is the reason why stuff like Java got so big.


I'm not really sure what the point of this post is. Julia is a fun language to play with, it has a great and very helpful community and, in its niche, it's very good. Its creators have been very good OSS citizens and joined forces with IPython.

If you don't like it, don't use, but don't piss and moan about the fact that some people enjoy it.


I hate Julia. It is not about the language but rather about the community surrounding it. Take for examples this post. I have nothing against the author, but everyone using Julia pretends they have leveled the playing field. That they need more than C but don't want to waste their with Assembly because they are too superior. It is a bit like the Arch community, where superiority is claimed and all alternatives are dumbed down because they are not on the cool table.

Out of principle I don't engage with products like these because you either become part of them or will never get truly involved.


I've had the opposite experience. The mailing lists are friendly and, to the extent that I've seen people involved with the project claim superiority, it's in the very narrow context of scientific computing. (I mean, it's a super-early language project. Most of the people using it are people who _like_ new programming languages.)


If you generally "engage with products" then I'm not sure Julia is for you.


Julia is a niche language... you comment is quite vitriolic just to point out that it's not in your niche.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: