From peterhull90 at gmail.com Sun May 1 13:18:03 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Sun, 1 May 2011 21:18:03 +0100 Subject: [rust-dev] LLVM patch for OS X Message-ID: In the wiki (https://github.com/graydon/rust/wiki/Getting-started) it says "On the Mac (Darwin), you need to apply a small patch to LLVM in order to use the standard library under rustc." However it seems that the patch is now applied to the LLVM upstream (since r129974) Is this correct? Pete From graydon at mozilla.com Sun May 1 13:21:05 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Sun, 01 May 2011 13:21:05 -0700 Subject: [rust-dev] LLVM patch for OS X In-Reply-To: References: Message-ID: <4DBDC0B1.6090803@mozilla.com> On 01/05/2011 1:18 PM, Peter Hull wrote: > In the wiki (https://github.com/graydon/rust/wiki/Getting-started) it > says "On the Mac (Darwin), you need to apply a small patch to LLVM in > order to use the standard library under rustc." However it seems that > the patch is now applied to the LLVM upstream (since r129974) Is this > correct? Yes. We've been feeding patches upstream. I think with today's trunk LLVM one should be fine on all 3 platforms again. -Graydon From graydon at mozilla.com Thu May 5 07:39:35 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Thu, 05 May 2011 07:39:35 -0700 Subject: [rust-dev] status, triage Message-ID: <4DC2B6A7.50700@mozilla.com> Hi, Yesterday we removed rustboot from the build cycle; you can now do a "normal" day-to-day rustc build without ocaml or rustboot at all (configure still sniffs for ocaml, we'll be removing that later today). It works entirely off snapshots. In the near term (next few weeks) I'd like to ask everyone on the team -- at least the group working for mozilla corp! -- to focus strictly on performance enhancement and error checks. That is: make the current build cycle fast and stable. Rustc still takes ~5 minutes to build; we want down to a few seconds or tens of seconds. And it's *essential* that we establish enough error-checking machinery to avoid snapshotting compilers that are too buggy to compile our way out of. Memory safety, for example, is not currently very well guaranteed in rustc. A bunch of rules aren't being enforced yet, and you can pretty easily corrupt memory still. Not cool. In the longer term, we ought to move as quickly as possible towards something that is no longer "not yet suitable for users". That is, a general (if preliminary and "pre alpha" as they say) release that people can "just download" to tinker with, sketch library code for, run experiments on. Obviously we're not there yet, but I suspect we can be by (say) the end of summer or sometime in the fall. I'd like to ask everyone with an opinion, moz corp and otherwise, to do a little mental triage exercise on features they think such an early-release compiler *must* have, at minimum. I'll make a list of possibilities here and note my own choices from it. Followup to list here, I'll tabulate, weigh preferences and try to set some reasonable milestones in the bug tracker. Possible features: - Major syntax changes - Unique pointers and move semantics - Kind system for types, including kind-polymorphism - Cheap "block"-like lambda syntax/semantics, for down-fnargs - Unsafe blocks and raw pointer sublanguage - Resource items (dtors moved out of objs) - Unit testing harness - Tasking system - Thread domains (or task:thread migration) - Typestate system working on general preds - Static overloading or pseudomethods - Object extension and method overriding - Proper native calling convention - Stack growth - Unwinding - Note system - Port to x64 - Port to arm - GC for cyclic kinds - Debug info - Reflection interface - Syntax extension interface - Static check extension interface - Fuzz tester - Library / package manager - Crate APIs / export versioning Personally I think we can ship "something" credible and interesting with only the first 9 on this list (up to and including thread domains, which "work" but may need quite a bit of redesign). But I can imagine wanting some of the remainder, and suspect that the interns (and the gsoc student on llvm!) will have something meaningful to show for the next 4 by end-of-summer anyways. Opinions? Things I missed that you think of as essential? -Graydon From peterhull90 at gmail.com Thu May 5 07:47:48 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Thu, 5 May 2011 15:47:48 +0100 Subject: [rust-dev] status, triage In-Reply-To: <4DC2B6A7.50700@mozilla.com> References: <4DC2B6A7.50700@mozilla.com> Message-ID: On Thu, May 5, 2011 at 3:39 PM, Graydon Hoare wrote: Very exciting news, things are moving really quickly now! > Opinions? Things I missed that you think of as essential? I suppose you're focussing on the coding side but I think there needs to be more 'user-level' documentation/ getting started guide. The time taken to write this should also be taken into account (especially as the language is a bit of a moving target at the moment) Pete From giles at thaumas.net Thu May 5 10:15:27 2011 From: giles at thaumas.net (Ralph Giles) Date: Thu, 5 May 2011 10:15:27 -0700 Subject: [rust-dev] status, triage In-Reply-To: <4DC2B6A7.50700@mozilla.com> References: <4DC2B6A7.50700@mozilla.com> Message-ID: On 5 May 2011 07:39, Graydon Hoare wrote: > - Library / package manager > - Crate APIs / export versioning In addition to documentation, moving these up the list would help early adoption. "Download and install deps for me" engines are ubiquitous features in newer languages; the thing Go had where you could say 'use github:foo/bar' really helped accelerate library experimentation. FWIW, -r From pwalton at mozilla.com Thu May 5 10:17:38 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Thu, 05 May 2011 10:17:38 -0700 Subject: [rust-dev] status, triage In-Reply-To: <4DC2B6A7.50700@mozilla.com> References: <4DC2B6A7.50700@mozilla.com> Message-ID: <4DC2DBB2.4060901@mozilla.com> On 5/5/11 7:39 AM, Graydon Hoare wrote: > Opinions? Things I missed that you think of as essential? The ability for rustc to link binaries is essential IMO. Having to run gcc is a major chore. This shouldn't be that hard in theory, because clang knows how to run the linker on each system. (I started some work on this at jsconf.) Great post though. Patrick From marijnh at gmail.com Thu May 5 11:15:13 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Thu, 5 May 2011 20:15:13 +0200 Subject: [rust-dev] status, triage In-Reply-To: References: <4DC2B6A7.50700@mozilla.com> Message-ID: > I suppose you're focussing on the coding side but I think there needs > to be more 'user-level' documentation/ getting started guide. Agreed. Once we feel we're starting to approach the point where we want to ship something, I'm probably a good candidate for writing the introductory guidebook. From respindola at mozilla.com Thu May 5 11:37:59 2011 From: respindola at mozilla.com (Rafael Avila de Espindola) Date: Thu, 05 May 2011 14:37:59 -0400 Subject: [rust-dev] a special pull request Message-ID: <4DC2EE87.5000203@mozilla.com> I have a change that modifies the interaction of rustc generated code and rt. The change has to be merge in two steps now that we are bootstraping: First merge from https://github.com/espindola/rust/tree/exit_glue_rustc the build will fail to produce a stage3/std.o. Lets build a snapshot: $ rm stage2/*.bc $ rm stage2/rustc.o $ rm stage2/std.o $ mv stage2/ rust-stage0 $ rm dl/*.tar.bz2 $ tar -cjf temp.tar.bz2 rust-stage0/ $ h=$(sha1sum temp.tar.bz2 | cut -f1 -d ' ') $ mv temp.tar.bz2 dl/rust-stage0-2011-05-05-fffffff-linux-i386-$h.tar.bz2 (update snapshots.txt) (delete all objects but the new snapshot) Now merge from https://github.com/espindola/rust/tree/exit_glue_rt and build normally. Cheers, Rafael From lkuper at mozilla.com Thu May 5 12:03:23 2011 From: lkuper at mozilla.com (Lindsey Kuper) Date: Thu, 5 May 2011 12:03:23 -0700 (PDT) Subject: [rust-dev] status, triage In-Reply-To: Message-ID: <1869199890.19344.1304622203931.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> > From: "Marijn Haverbeke" > To: "Peter Hull" > Cc: rust-dev at mozilla.org > Sent: Thursday, May 5, 2011 11:15:13 AM > Subject: Re: [rust-dev] status, triage > > I suppose you're focussing on the coding side but I think there > > needs > > to be more 'user-level' documentation/ getting started guide. > > Agreed. Once we feel we're starting to approach the point where we > want to ship something, I'm probably a good candidate for writing the > introductory guidebook. Also agreed. Right now, there's a big ol' 'TODO' under 'Tutorial' in the documentation. Since the tutorial doesn't exist yet, maybe we should consider writing it in wiki form instead. That way, folks can contribute to it without having to use Texinfo. In any case, it would also be great to rig it so Texinfo's HTML output goes onto the web somewhere. Lindsey From arelius at gmail.com Thu May 5 12:10:33 2011 From: arelius at gmail.com (Nicholas "Indy" Ray) Date: Thu, 05 May 2011 12:10:33 -0700 Subject: [rust-dev] status, triage In-Reply-To: <1869199890.19344.1304622203931.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> References: <1869199890.19344.1304622203931.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> Message-ID: <4DC2F629.9000803@gmail.com> On 5/5/2011 12:03 PM, Lindsey Kuper wrote: > Also agreed. Right now, there's a big ol' 'TODO' under 'Tutorial' in the documentation. Since the tutorial doesn't exist yet, maybe we should consider writing it in wiki form instead. That way, folks can contribute to it without having to use Texinfo. In any case, it would also be great to rig it so Texinfo's HTML output goes onto the web somewhere. As someone who's just begun to learn my way around rust, (and taking notes) and likely to try to contribute to the tutorial, I much prefer having the tutorial in git with the code. It seems that overall trying to keep any documentation out of the git repo is a recipe to get it out-of-sync with the current revision. Just see LLVM and much of their wiki-only documentation, it's almost uniformly out of date. Indy From graydon at mozilla.com Thu May 5 12:11:08 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Thu, 05 May 2011 12:11:08 -0700 Subject: [rust-dev] status, triage In-Reply-To: <1869199890.19344.1304622203931.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> References: <1869199890.19344.1304622203931.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> Message-ID: <4DC2F64C.6010208@mozilla.com> On 11-05-05 12:03 PM, Lindsey Kuper wrote: > Also agreed. Right now, there's a big ol' 'TODO' under 'Tutorial' in the documentation. Since the tutorial doesn't exist yet, maybe we should consider writing it in wiki form instead. That way, folks can contribute to it without having to use Texinfo. In any case, it would also be great to rig it so Texinfo's HTML output goes onto the web somewhere. Yeah. More generally, we should probably put together at least *some* kind of not-just-a-github-project-dir website between now and whenever we do a public release. Doesn't have to be fancy; can even use the "github pages" hosting system. It knows how to do a fair bit of stuff. -Graydon From graydon at mozilla.com Thu May 5 12:11:53 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Thu, 05 May 2011 12:11:53 -0700 Subject: [rust-dev] status, triage In-Reply-To: <4DC2F629.9000803@gmail.com> References: <1869199890.19344.1304622203931.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> <4DC2F629.9000803@gmail.com> Message-ID: <4DC2F679.6000305@mozilla.com> On 11-05-05 12:10 PM, Nicholas "Indy" Ray wrote: > As someone who's just begun to learn my way around rust, (and taking > notes) and likely to try to contribute to the tutorial, I much prefer > having the tutorial in git with the code. It seems that overall trying > to keep any documentation out of the git repo is a recipe to get it > out-of-sync with the current revision. Just see LLVM and much of their > wiki-only documentation, it's almost uniformly out of date. Oh, if we put the HTML online we'd be hooking it to auto-update on any push, don't worry :) -Graydon From lkuper at mozilla.com Thu May 5 12:34:10 2011 From: lkuper at mozilla.com (Lindsey Kuper) Date: Thu, 5 May 2011 12:34:10 -0700 (PDT) Subject: [rust-dev] status, triage In-Reply-To: <4DC2F629.9000803@gmail.com> Message-ID: <2073015829.19715.1304624050475.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> > From: "Nicholas \"Indy\" Ray" > To: rust-dev at mozilla.org > Sent: Thursday, May 5, 2011 12:10:33 PM > Subject: Re: [rust-dev] status, triage > As someone who's just begun to learn my way around rust, (and taking > notes) and likely to try to contribute to the tutorial, I much prefer > having the tutorial in git with the code. It seems that overall trying > to keep any documentation out of the git repo is a recipe to get it > out-of-sync with the current revision. Just see LLVM and much of their > wiki-only documentation, it's almost uniformly out of date. It's a good point. However, right now the Rust documentation doesn't agree with the code anyway -- sometimes it's behind and sometimes it's ahead. Now, if all the code examples in the (not-yet-existent) tutorial automatically built and ran when we built the doc, that would be ideal! That probably rules out the wiki idea, though. Going even further, we could make breaking the doc build in that way count as Breaking The Build. Lindsey From arelius at gmail.com Thu May 5 12:42:22 2011 From: arelius at gmail.com (Nicholas "Indy" Ray) Date: Thu, 05 May 2011 12:42:22 -0700 Subject: [rust-dev] status, triage In-Reply-To: <2073015829.19715.1304624050475.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> References: <2073015829.19715.1304624050475.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> Message-ID: <4DC2FD9E.30403@gmail.com> On 5/5/2011 12:34 PM, Lindsey Kuper wrote: > Going even further, we could make breaking the doc build in that way count as Breaking The Build. > Are you suggesting that the docs should reference examples, that are built with the test suite? I figure that'd be an interesting approach, however can the current texinfo system support including source in that way? Indy From lkuper at mozilla.com Thu May 5 12:57:48 2011 From: lkuper at mozilla.com (Lindsey Kuper) Date: Thu, 5 May 2011 12:57:48 -0700 (PDT) Subject: [rust-dev] status, triage In-Reply-To: <4DC2FD9E.30403@gmail.com> Message-ID: <2072606810.20116.1304625468764.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> > From: "Nicholas \"Indy\" Ray" > To: rust-dev at mozilla.org > Sent: Thursday, May 5, 2011 12:42:22 PM > Subject: Re: [rust-dev] status, triage > On 5/5/2011 12:34 PM, Lindsey Kuper wrote: > > Going even further, we could make breaking the doc build in that way > > count as Breaking The Build. > > > Are you suggesting that the docs should reference examples, that are > built with the test suite? I figure that'd be an interesting approach, > however can the current texinfo system support including source in > that way? Something like that, yeah. I don't know what support, if any, texinfo has for such things. Another possibility is to rig it so that anything appearing inside, say, the @example environment in the .texi file was scraped out into a .rs file and built. That might be even hairier, though. Lindsey From peterhull90 at gmail.com Fri May 6 00:33:59 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Fri, 6 May 2011 08:33:59 +0100 Subject: [rust-dev] a special pull request In-Reply-To: <4DC2EE87.5000203@mozilla.com> References: <4DC2EE87.5000203@mozilla.com> Message-ID: On Thu, May 5, 2011 at 7:37 PM, Rafael Avila de Espindola wrote: > I have a change that modifies the interaction of rustc generated code and > rt. Does everyone have to do this, or will it all work OK once it's merged into graydon/rust? Peter From marijnh at gmail.com Fri May 6 03:50:27 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Fri, 6 May 2011 12:50:27 +0200 Subject: [rust-dev] a special pull request In-Reply-To: References: <4DC2EE87.5000203@mozilla.com> Message-ID: > Does everyone have to do this, or will it all work OK once it's merged > into graydon/rust? Once it's merged and the snapshot server has been updated (which should be the same point), your makefile will download a proper snapshot compiler and things should 'just work'. From marijnh at gmail.com Fri May 6 03:53:45 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Fri, 6 May 2011 12:53:45 +0200 Subject: [rust-dev] status, triage In-Reply-To: <2072606810.20116.1304625468764.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> References: <4DC2FD9E.30403@gmail.com> <2072606810.20116.1304625468764.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> Message-ID: >?That might be even hairier, though. Not necessarily. Eloquent JavaScript chapters are actually literate programs that can be automatically extracted and run, without doing any violence to the flow of the text. It'll be a bit harder to do this in a static language, but certainly possible. From graydon at mozilla.com Fri May 6 08:48:24 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Fri, 06 May 2011 08:48:24 -0700 Subject: [rust-dev] docs + tests (was Re: status, triage) In-Reply-To: References: <4DC2FD9E.30403@gmail.com> <2072606810.20116.1304625468764.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> Message-ID: <4DC41848.3020300@mozilla.com> On 06/05/2011 3:53 AM, Marijn Haverbeke wrote: >> That might be even hairier, though. > > Not necessarily. Eloquent JavaScript chapters are actually literate > programs that can be automatically extracted and run, without doing > any violence to the flow of the text. It'll be a bit harder to do this > in a static language, but certainly possible. Yeah. I hope there is some sweet spot we can get to by exploring blends of integrated unit-test harness and inline documentation, a la doctest[1]. I'd be happy to see the compiler (or a next-to-the-compiler tool shipping standard with the compiler) grow this capability, perhaps as a natural extension of a scribble[2]-like "documentation-oriented grammar" for expressions? Not sure. Worth trying a few things though. As for "straight" literate programming a la web/noweb, I feel it's a bit too documentation-orthodox for my programmer mentality; I explore a problem by writing the code, primarily, and have a hard time coming at a problem doing "documentation first". It's easier for me to do a post-coding "narration" through the code, a la codenar[3]. -Graydon [1] http://docs.python.org/library/doctest.html [2] http://docs.racket-lang.org/scribble/index.html [3] http://orenbenkiki.github.com/codnar/ From graydon at mozilla.com Fri May 6 08:52:10 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Fri, 06 May 2011 08:52:10 -0700 Subject: [rust-dev] a special pull request In-Reply-To: References: <4DC2EE87.5000203@mozilla.com> Message-ID: <4DC4192A.30407@mozilla.com> On 06/05/2011 3:50 AM, Marijn Haverbeke wrote: >> Does everyone have to do this, or will it all work OK once it's merged >> into graydon/rust? > > Once it's merged and the snapshot server has been updated (which > should be the same point), your makefile will download a proper > snapshot compiler and things should 'just work'. Yeah. This is all done now. Rafael was just mentioning it because I had to do it manually this time around; I'm in the process of trying to get automation set up to support both "normal" stage2->stage0 snapshots for compatible changes, and "migration" stage1->stage0 snapshots for incompatible changes. Currently all snapshots are being done by me, by hand. Not ideal. -Graydon From pwalton at mozilla.com Sun May 8 17:09:54 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Sun, 08 May 2011 17:09:54 -0700 Subject: [rust-dev] -O0 on Mac Message-ID: <4DC730D2.1090400@mozilla.com> Currently rustrt is built with -O0 on the Mac due to a GCC bug. This is starting to cause performance problems; in particular, very suboptimal assembler is generated for next_power_of_two, which is the third-highest function in the profiles. Apple's GCC is way outdated anyhow. Will anyone object if I make clang a dependency on that platform? Patrick From pwalton at mozilla.com Sun May 8 20:35:43 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Sun, 08 May 2011 20:35:43 -0700 Subject: [rust-dev] [PATCH] compiler-rt: Sanity check architectures Message-ID: <4DC7610F.9010107@mozilla.com> (rust-dev: This is an LLVM patch you might want to apply if you're trying to build with clang on the Mac.) Hi everyone, I've got a quick patch to compiler-rt that makes it do a simple sanity check on the toolchain before trying to compile for each architecture. This makes clang able to be built again on Darwin without having to install the iOS SDK. Thanks! Patrick Index: compiler-rt/make/platform/clang_darwin.mk =================================================================== --- compiler-rt.orig/make/platform/clang_darwin.mk 2011-05-08 19:44:40.000000000 -0700 +++ compiler-rt/make/platform/clang_darwin.mk 2011-05-08 20:25:06.000000000 -0700 @@ -6,6 +6,19 @@ Description := Static runtime libraries for clang/Darwin. +# A function that ensures we don't try to build for architectures that we +# don't have working toolchains for. +CheckArches = \ + $(shell \ + result=""; \ + for arch in $(1); do \ + gcc -arch $$arch; \ + if test $$? == 1; then result="$$result$$arch "; fi; \ + done; \ + echo $$result) + +### + Configs := UniversalArchs := @@ -13,23 +26,23 @@ # still be referenced from Darwin system headers. This symbol is only ever # needed on i386. Configs += eprintf -UniversalArchs.eprintf := i386 +UniversalArchs.eprintf := $(call CheckArches,i386) # Configuration for targetting 10.4. We need a few functions missing from # libgcc_s.10.4.dylib. We only build x86 slices since clang doesn't really # support targetting PowerPC. Configs += 10.4 -UniversalArchs.10.4 := i386 x86_64 +UniversalArchs.10.4 := $(call CheckArches,i386 x86_64) # Configuration for targetting iOS, for some ARMv6 functions, which must be # in the same linkage unit, and for a couple of other functions that didn't # make it into libSystem. Configs += ios -UniversalArchs.ios := i386 x86_64 armv6 armv7 +UniversalArchs.ios := $(call CheckArches,i386 x86_64 armv6 armv7) # Configuration for use with kernel/kexts. Configs += cc_kext -UniversalArchs.cc_kext := armv6 armv7 i386 x86_64 +UniversalArchs.cc_kext := $(call CheckArches,armv6 armv7 i386 x86_64) ### From respindola at mozilla.com Mon May 9 06:36:05 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Mon, 09 May 2011 09:36:05 -0400 Subject: [rust-dev] -O0 on Mac In-Reply-To: <4DC730D2.1090400@mozilla.com> References: <4DC730D2.1090400@mozilla.com> Message-ID: <4DC7EDC5.9090209@mozilla.com> On 11-05-08 8:09 PM, Patrick Walton wrote: > Currently rustrt is built with -O0 on the Mac due to a GCC bug. This is > starting to cause performance problems; in particular, very suboptimal > assembler is generated for next_power_of_two, which is the third-highest > function in the profiles. Apple's GCC is way outdated anyhow. > > Will anyone object if I make clang a dependency on that platform? I am fine with it. If you want to keep the dependencies low, you can try using the pre-built binaries: http://llvm.org/releases/2.9/clang+llvm-2.9-x86_64-apple-darwin10.tar.gz but since we are building llvm already, adding clang to the build is not hard. > Patrick Cheers, Rafael From graydon at mozilla.com Mon May 9 07:34:36 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 09 May 2011 07:34:36 -0700 Subject: [rust-dev] -O0 on Mac In-Reply-To: <4DC730D2.1090400@mozilla.com> References: <4DC730D2.1090400@mozilla.com> Message-ID: <4DC7FB7C.2020105@mozilla.com> On 08/05/2011 5:09 PM, Patrick Walton wrote: > Currently rustrt is built with -O0 on the Mac due to a GCC bug. This is > starting to cause performance problems; in particular, very suboptimal > assembler is generated for next_power_of_two, which is the third-highest > function in the profiles. Apple's GCC is way outdated anyhow. > > Will anyone object if I make clang a dependency on that platform? I don't mind if "we can build with clang" (support, auto-preference). I think I do mind if we *must* build with clang (dependency), but could probably be convinced otherwise. I'm also curious whether there's anything simpler we can do to work around the gcc bug short of switching compilers. What's the nature of the bug? -Graydon From echristo at apple.com Mon May 9 11:04:16 2011 From: echristo at apple.com (Eric Christopher) Date: Mon, 09 May 2011 11:04:16 -0700 Subject: [rust-dev] [llvm-commits] [PATCH] compiler-rt: Sanity check architectures In-Reply-To: <4DC7610F.9010107@mozilla.com> References: <4DC7610F.9010107@mozilla.com> Message-ID: On May 8, 2011, at 8:35 PM, Patrick Walton wrote: > (rust-dev: This is an LLVM patch you might want to apply if you're > trying to build with clang on the Mac.) > > Hi everyone, > > I've got a quick patch to compiler-rt that makes it do a simple sanity > check on the toolchain before trying to compile for each architecture. > This makes clang able to be built again on Darwin without having to > install the iOS SDK. Seems reasonable to me. Unless I hear an objection I'll apply it a bit later. -eric From echristo at apple.com Mon May 9 13:34:05 2011 From: echristo at apple.com (Eric Christopher) Date: Mon, 09 May 2011 13:34:05 -0700 Subject: [rust-dev] [llvm-commits] [PATCH] compiler-rt: Sanity check architectures In-Reply-To: References: <4DC7610F.9010107@mozilla.com> Message-ID: On May 9, 2011, at 11:04 AM, Eric Christopher wrote: > > On May 8, 2011, at 8:35 PM, Patrick Walton wrote: > >> (rust-dev: This is an LLVM patch you might want to apply if you're >> trying to build with clang on the Mac.) >> >> Hi everyone, >> >> I've got a quick patch to compiler-rt that makes it do a simple sanity >> check on the toolchain before trying to compile for each architecture. >> This makes clang able to be built again on Darwin without having to >> install the iOS SDK. > > Seems reasonable to me. Unless I hear an objection I'll apply it a bit > later. Committed in: [yendi:llvm/projects/compiler-rt] echristo% svn ci Sending make/platform/clang_darwin.mk Transmitting file data . Committed revision 131098. -eric From respindola at mozilla.com Thu May 12 04:24:10 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Thu, 12 May 2011 07:24:10 -0400 Subject: [rust-dev] undefined behavior Message-ID: <4DCBC35A.4090403@mozilla.com> A very interesting read: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html Cheers, Rafael From marijnh at gmail.com Thu May 12 07:13:58 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Thu, 12 May 2011 16:13:58 +0200 Subject: [rust-dev] The module naming situation Message-ID: Several people have mentioned that they are not happy with the camelcase convention for module names. (I am actually among those people.) A suggestion was made to go back to lowercase module names, and to use different syntax for module dereferencing and record/object/tuple dereferencing. If we follow Perl/C++ you'd get std::vec::len(x). That way, module names an other names can be, mostly, statically distinguished, and modules can live in a different namespace from other names. (I say mostly, because 'import foo::bar' is still ambiguous?is 'bar' a module or a name? We could require 'import foo::bar::' in the former case, or something like that.) Working on the resolver has made me more sympathetic to this proposal. I rather like the simplicity of std.vec.len, but it requires the resolver to mangle the AST, since only at that stage is it clear what's an expr_field. It'd be nice if the parser could handle this without full knowledge about all used crates (also important for external tools implementing 'jump to definition' and such). So, I could 1) Implement this 2) Using our advanced transitional snapshot technology, follow up said implementation with another monster commit that renames std modules back to lowercase, and replaces all the right dots with the new module dereferencing operator. (I think this can be done with some creative regexping, since our set of module names is still rather small, so I won't have to put the pretty printer into play yet) One more thing I'd like to put out there: If we do something scary and make a colon followed by an alphabetic character a different token than one followed by something else, we can make the module dereference operator a single ':'. I think our users will thank us, on the whole, for the extra succinctness, but there is the issue of it being slightly obscure -- you'll never even notice this rule when writing well-formatted code, but then when, in a fit of naughtiness, you omit the space in front of your typestate constraint, you'll get an error. I think this is not a big problem, since your mistake will be easy enough to figure out, but it's something to consider. Let me know what you think. Best, Marijn From marijnh at gmail.com Thu May 12 09:44:48 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Thu, 12 May 2011 18:44:48 +0200 Subject: [rust-dev] The module naming situation In-Reply-To: References: Message-ID: I went ahead an implemented a large part of this--using a single colon as a module separator, and downcasing the module names again. A separate module namespace isn't done yet. Look around https://github.com/marijnh/rust/tree/modulesep if you're curious what it looks like. The thing is that this is going to bitrot faster than you can say 'please don't do this'. Graydon, I'm not sure how far the support for transitional commits is, but even if it isn't there, you could, if you agree with this, build the new compiler manually, register it as a snapshot, and mark down the transitional snapshot in the file for later. Please consider this an urgent pull request. (I'm of course also okay with rejecting this, but do make up your mind before lots of conflicting patches are merged.) Cheers, Marijn From pwalton at mozilla.com Thu May 12 10:15:30 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Thu, 12 May 2011 10:15:30 -0700 Subject: [rust-dev] The module naming situation In-Reply-To: References: Message-ID: <4DCC15B2.3090302@mozilla.com> On 5/12/11 9:44 AM, Marijn Haverbeke wrote: > I went ahead an implemented a large part of this--using a single colon > as a module separator, and downcasing the module names again. A > separate module namespace isn't done yet. Look around > https://github.com/marijnh/rust/tree/modulesep if you're curious what > it looks like. I'd prefer ::, if for no other reason than that it's consistent with C++, Ruby, PHP, and Perl. Also fewer special cases in the grammar are always nice. Graydon can cast the deciding vote here :) I always figured that "import foo::bar" or "from foo import bar" would import both the "bar" item and the "bar" module if both exist. I don't see much harm in that off the top of my head. Patrick From marijnh at gmail.com Thu May 12 10:18:48 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Thu, 12 May 2011 19:18:48 +0200 Subject: [rust-dev] The module naming situation In-Reply-To: <4DCC15B2.3090302@mozilla.com> References: <4DCC15B2.3090302@mozilla.com> Message-ID: > I always figured that "import foo::bar" or "from foo import bar" would > import both the "bar" item and the "bar" module if both exist. I don't see > much harm in that off the top of my head. The problem is that this breaks the proposed separate namespace for modules. If we don't know what bar is, we can't look it up. From respindola at mozilla.com Thu May 12 10:39:59 2011 From: respindola at mozilla.com (Rafael Avila de Espindola) Date: Thu, 12 May 2011 13:39:59 -0400 Subject: [rust-dev] The module naming situation In-Reply-To: References: <4DCC15B2.3090302@mozilla.com> Message-ID: <4DCC1B6F.6090106@mozilla.com> > The problem is that this breaks the proposed separate namespace for > modules. If we don't know what bar is, we can't look it up. I like :: too. I am fine with any disambiguation option, or just declaring that import foo::bar is invalid if we find two bars. Cheers, Rafael From graydon at mozilla.com Thu May 12 11:25:24 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Thu, 12 May 2011 11:25:24 -0700 Subject: [rust-dev] The module naming situation In-Reply-To: <4DCC1B6F.6090106@mozilla.com> References: <4DCC15B2.3090302@mozilla.com> <4DCC1B6F.6090106@mozilla.com> Message-ID: <4DCC2614.9090208@mozilla.com> On 11-05-12 10:39 AM, Rafael Avila de Espindola wrote: >> The problem is that this breaks the proposed separate namespace for >> modules. If we don't know what bar is, we can't look it up. > > I like :: too. > > I am fine with any disambiguation option, or just declaring that import > foo::bar is invalid if we find two bars. Given the options I prefer a::b over a:b as well. Surveying IRC we also had a vote from brson for ::, one from robarnold for :, and one from tjc for initial-caps-as-module-namespace-indicator as in haskell. I think the weight-o-votes favours ::. It sounds like we're semi-converged on this design: - We get 3 namespaces: types, modules and values. - Module paths are separated by :: for similarity with C++ (and Perl, PHP, Ruby) - a.b in value context means a is a value - a.b in type context is illegal - a::b in type context means a is a module, b is a type - a::b in value context means a is a module, b is a value - import a::b imports all "b" bindings found in type, value or module namespaces. we may modify this rule to permit selecting which of the three you wish to import, but for now it seems harmless to overload since all 3 referencing contexts are unambiguous. -Graydon From pwalton at mozilla.com Thu May 12 17:49:10 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Thu, 12 May 2011 17:49:10 -0700 Subject: [rust-dev] Reserved word/module name collisions Message-ID: <4DCC8006.6070908@mozilla.com> Hi everyone, So we still have a problem with reserved word/module name collisions, even with separate namespaces. We have modules named "_str" and "_vec" to avoid conflicting with the reserved words "str" and "vec". Here are some possible solutions: (1) Live with it. Use leading "_" for modules like _str and _vec. ("_vec" may end up ceasing to be a reserved word, but "str" is likely to still a problem.) (2) Rename the module "_str" to "string", similar to the way we named what would probably have been "middle::type" "middle::ty". (3) Make built-in types no longer reserved words, but instead special identifiers that are always in scope and can't be qualified with module names. (4) Have the lexer lex "std::vec" together as one identifier. Have "::foo" be equivalent to "foo", so that the user can say "::vec" in the rare cases that that isn't enough. (5) Add productions to the grammar for reserved words in the right positions; e.g. (BNF): import ::== "import" module-part ("::" module-part)* ";" module-part ::== identifier | module-keyword module-keyword ::== "str" | "vec" | "int" | "uint" | ... There are probably other ways to finesse the problem. I lean toward (3), and, failing that, (5), but I'm not particularly in favor of one solution or the other (although I'd prefer not to do (1); I think _str is universally disliked). Mostly I just wanted to throw some ideas out there. Patrick From marijnh at gmail.com Thu May 12 22:45:39 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Fri, 13 May 2011 07:45:39 +0200 Subject: [rust-dev] Reserved word/module name collisions In-Reply-To: <4DCC8006.6070908@mozilla.com> References: <4DCC8006.6070908@mozilla.com> Message-ID: Hi Patrick, My intention was to do 5 -- do keyword recognition *after* determining in which context the identifier should be parsed. In module context, nothing will be a keyword, so vec and str will be perfectly valid module names. In import statements, the parser will assume that there are never any keywords. Thus 'import std::str' works, even though str is not unambiguously a module. It is impossible to export str in a type context (since it'd be recognized as a type keyword if you try to), so this is safe -- either a suitable str module or value will be found during resolution, or you'll get a resolve error. Best, Marijn From peterhull90 at gmail.com Fri May 13 03:41:09 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Fri, 13 May 2011 11:41:09 +0100 Subject: [rust-dev] -O0 on Mac In-Reply-To: <4DC7FB7C.2020105@mozilla.com> References: <4DC730D2.1090400@mozilla.com> <4DC7FB7C.2020105@mozilla.com> Message-ID: On Mon, May 9, 2011 at 3:34 PM, Graydon Hoare wrote: > On 08/05/2011 5:09 PM, Patrick Walton wrote: >> Will anyone object if I make clang a dependency on that platform? Just a quick note to say - if anyone is trying to build on OS X and sees an error like link: rustllvm/librustllvm.dylib Undefined symbols: "___eprintf", referenced from: ___eprintf$non_lazy_ptr in libLLVMSupport.a(SearchForAddressOfSpecialSymbol.o) (maybe you meant: ___eprintf$non_lazy_ptr) ld: symbol(s) not found clang: error: linker command failed with exit code 1 (use -v to see invocation) The answer is here (http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/40133) "If you check out compiler-rt into /projects/compiler-rt, clang will pick it up and build it automatically." This works for me (OSX 10.6.7) I know Rafael has contributed to that thread already but I thought I'd post here to save anyone else the effort of googling it! Peter From peterhull90 at gmail.com Fri May 13 03:57:45 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Fri, 13 May 2011 11:57:45 +0100 Subject: [rust-dev] Linker warning Message-ID: I've started getting a lot of these warnings (on OS X 10.6.7) when doing 'make all' WARNING: Linking two modules of different target triples: stage2/intrinsics.bc: 'i386-apple-darwin10.7.0' and 'i686-apple-darwin' Where are they coming from, and is it a problem or not? Thanks, Pete From marijnh at gmail.com Fri May 13 04:01:53 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Fri, 13 May 2011 13:01:53 +0200 Subject: [rust-dev] Linker warning In-Reply-To: References: Message-ID: > Where are they coming from, and is it a problem or not? I'm seeing the same on Linux. They are not a problem. (Still, someone should probably look into them.) From graydon at mozilla.com Fri May 13 08:41:04 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Fri, 13 May 2011 08:41:04 -0700 Subject: [rust-dev] Reserved word/module name collisions In-Reply-To: <4DCC8006.6070908@mozilla.com> References: <4DCC8006.6070908@mozilla.com> Message-ID: <4DCD5110.4020609@mozilla.com> On 12/05/2011 5:49 PM, Patrick Walton wrote: > I lean toward (3), and, failing that, (5), but I'm not particularly in > favor of one solution or the other (although I'd prefer not to do (1); I > think _str is universally disliked). Mostly I just wanted to throw some > ideas out there. I'd go with an even looser version of (3): make the builtin names reserved words only in the type namespace, not the module *or* value namespaces (or the field-label / method-name "namespaces", which we haven't mentioned yet since there are never scope-resolution issues to worry about) At the moment things like "vec" can't be used as a value name because they're ambiguous as constructor prefixes ("vec(...)"), but that's temporary: vec(...) will soon be [...] and rec(...) will soon be {...}, and so on. Once that sort of change goes in, there's no reason we can't permit "auto vec = [];" when that's the most natural thing to write. IOW the only reason they're not shadowable now is that I wanted maintenance programmers to "always be able to explicitly denote the types of the literals (that they are able to denote as values)". Since types are in their own namespace now -- or are going to be there soon -- there's no reason to restrict the other namespaces. That was coincidental with "only having one namespace". -Graydon From graydon at mozilla.com Fri May 13 08:51:23 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Fri, 13 May 2011 08:51:23 -0700 Subject: [rust-dev] Reserved word/module name collisions In-Reply-To: <4DCD5110.4020609@mozilla.com> References: <4DCC8006.6070908@mozilla.com> <4DCD5110.4020609@mozilla.com> Message-ID: <4DCD537B.2070201@mozilla.com> On 13/05/2011 8:41 AM, Graydon Hoare wrote: > On 12/05/2011 5:49 PM, Patrick Walton wrote: > >> I lean toward (3), and, failing that, (5), but I'm not particularly in >> favor of one solution or the other (although I'd prefer not to do (1); I >> think _str is universally disliked). Mostly I just wanted to throw some >> ideas out there. > On IRC ytterbium shoved me towards another option I'd been mulling over too but had decided against, but I'll put it up here for completeness. (6) reserve only one name, the path-root module name 'rust' in each crate, and bind it to a module provided by the compiler, and import all identifiers in it by default. Then bind int, float, etc. as rust::int, rust::float. I'd be ok with this too, but I think I slightly prefer "reserve one type name for each literal we can denote" (the proposal I made). The utility of the "reserve the rust:: namespace" approach is, I guess, that it provides a place to put intrinsics too. Hmm. Warming to it. -Graydon From graydon at mozilla.com Fri May 13 18:30:25 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Fri, 13 May 2011 18:30:25 -0700 Subject: [rust-dev] self-serve snapshots Message-ID: <4DCDDB31.50703@mozilla.com> I think I have a (primitive, barely limping along, manual-labour-heavy) self-serve system running on the tinderboxes for making compiler snapshots. It's documented in the wiki: https://github.com/graydon/rust/wiki/Compiler-snapshots Let me know if it fails to work for you, or is confusing, or you can think of nice ways to simplify it. -Graydon From skaller at users.sourceforge.net Sat May 14 08:00:41 2011 From: skaller at users.sourceforge.net (john skaller) Date: Sun, 15 May 2011 01:00:41 +1000 Subject: [rust-dev] Hello Message-ID: Hello rust devs. I have been looking at rust and I'm sure I don't fully understand either the language or design goals yet, but I am disturbed by multi-argument functions. Please correct any misconceptions in the following argument! I doubt this will change anything but I feel I should present the case anyhow. Functions in Rust have a type like fn (int,int)->int instead of tup(int,int)->int which is what they should have. The reason appears to be to support aliases, which I understand are a special pointer kind which which are guaranteed to point point down stack. This achieves efficiency when passing large objects, whilst at the same time assuring the pointers are weak, and so need no memory management other than the normal stack operations, again an efficiency issue. Forgive me is the description is inaccurate! Ok, so the problem with multi-argument functions without a domain type is it wreaks utter havoc with the type system. Instead of a simple combinatorial system which can be made polymorphic by "simply chucking in type variables" we have a complex spaghetti which ends up being entirely untenable. To see this you only have to look at C++! In particular look at the weak attempt to get around this in C++ 201x by allowing templates to processes variable length argument lists recursively. To see the extreme impact of this choice consider how you'd implement a composition operator. In Felix: fun compose[A,B,C] (f:A->B, g:B->C) => fun (x:A) => g ( f (x)); Similar in ML, Ocaml, Haskell, and indeed in C++ and Rust for one argument functions.. and therein lies the problem. Certainly in Rust if you have two argument functions returning a tuple type you can ALSO define composition by unpacking the tuple returned by the first and applying the second function to it. And three argument functions. So you need a whole family of functions, just to define composition. This problem is pervasive: it doesn't just apply to composition, it applies to all higher order functions. You can in fact see the effect in the C++ Standard library where there are combinators for one and two argument functions and then the library just gives up. Note this problem isn't merely a linear expansion in the size of a polymorphic library: the effects combine, the explosion is exponential. Roughly it makes polymorphism completely untenable. Polymorphism is the central vehicle to provide safe reusable components. No language should be released with coherent polymorphism. Go doesn't have it, so its useless. Java is a rubbish, because it was released without it. C++ at least was standardised with polymorphism (even if it isn't wonderful it is the basis of the standard library). In my view, polymorphism is no longer a nice toy for academics. At least first order polymorphism is mandatory. Multi-argument functions have to be thrown out. So the problem is: how to achieve efficiency of a similar to kind to what rust aliases provide. Note that these are similar to C++ references with a prohibition on address taking. Its quite clear the parameter passing performance is easily obtained by pointers, and this fits into the model of functions with tuple arguments. But it isn't safe to just pass pointers to the stack into a function for fear they'll be captured and stored somewhere that outlives the argument stack frame. First, let me say that it is possible to detect with a good conservative approximation. I know this because my Felix compiler actually does it: if the compiler can't prove it is safe it will use the heap instead of the stack. However in Rust, the philosophy is not to automatically optimise things this way, because the performance is then unclear. Instead in Rust the philosophy is to annotate the code so the performance can be seen, and then the compiler ensures the annotations hold (and errors out otherwise). In my discussion on IRC Grayson appeared to believe this had to be done with a type annotation, and therefore one would have to introduce aliases into the general type system, which is not desired. I have thought about this a bit and I tend to think now this is not entirely the case. Let us assume instead: (a) we have tuple typed function domain types (b) we allow pointers in tuples and therefore to be passed to functions (c) a function can use the existing alias annotation Now what we do is: the alias annotation works exactly as it does now. It can only be used in function declarations and has the same meaning. However the TYPE is "pointer". So the caller can and must pass a pointer, according to the type of the function. The callee guarantees that the pointer isn't captured but this cannot be deduced from the type. So when you use an unknown function you have no assurance. But if the function is known, then you can look at the definition and you get the assurance. This is weaker than the current Rust rules, because it forces "heapification" of arguments to unknown functions accepting a pointer. But it has the major advantage of making general parametric polymorphism possible. Without some kind of change like this it simply isn't, and in my view that makes Rust a dead horse even before the race starts. I should note that this problem only applies on ABI boundaries, in Rust I think thats what you export/import from crates. Internal (within crate) calls can be optimised heavily by inlining away many calls. Now, people should expect overheads using intensional polymorphism. Clearly this excludes some type based optimisations. Clearly separate compilation and linkage by dumb linkers excludes optimisations. But then there are other choices: C++ makes templates visible to all callers and uses extensional polymorphism and so it generates fast code although it tends to bloat a bit as a consequence. Felix does this too but it doesn't bloat (it unifies code which is identical except for pointer types). Ocaml forces the order of compilation in order to pass information from compilation units forward to allow inlining and other optimisations in at least one direction: an extra restriction and a smarter (more complex) compilation/linkage process. In Rust, you have expanded values but you can box them explicitly. So it seems to make sense to me to allow both intensionally polymorphic functions and extensional ones (explicit choice), The means passing more information between crates to support higher performance if the programmer chooses. Roughly you can either compile a polymorphic function that fits into the ABI or just pass the text around if you want to trade compile time performance for run time performance. [Of course I don't mean passing the unicode text around literally!] In particular passing the text around allows the safety aliases give without sacrificing performance, otherwise you lose the performance. So I recommend considering this new tradeoff. You throw away some performance in some circumstances (safety is not lost). In return you get proper polymorphism you can't otherwise get and that is essential to reusability and therefore the quality of the software written in the language. When I started extending Felix (well over 10 years ago) to support polymorphism I made the very difficult decision that the only way to do it was to assert that *everything* was polymorphic .. its just that some functions happen to have 0 type parameters :] Doing this forced the whole language and type system to work polymorphically as an "axiom" not something added on later. I think this is essential. If you haven't got polymorphism working, don't release the language at all. You may actually get users and then you get stuck if you didn't design the type system right. IMHO the type system at the moment isn't. I mean it isn't a type system at all. All modern type systems are combinatorial. Rusts isn't, so it doesn't even count as having a type system. The ability to make arbitrary combinations IS parametric polymorphism. In an "ideal" language you can prototype by separately compiling the pieces and playing around. This is fast compilation but not so fast generated code. Then when you're happy you expand the size of the units being compiled, finally up to the whole program. That gives you the best performance at run time, at the cost of a very long compile. In the case of the design I presented above, this idea shows how you can have your cake and eat it too. So if you want to get rid of "heapified" values introduced into prototypes because information is lost .. just compile larger units or the whole program. This won't get rid of the "unknown function" in complex situations where you'd need data flow analysis to even make an educated guess, but it does allow significant improvement. I don't have any final answer to this problem, but throwing away combinatorial typing and thus parametric polymorphism for a minor optimisation and safety assurance doesn't make sense to me. There must be a better compromise! -- john skaller skaller at users.sourceforge.net From pwalton at mozilla.com Sat May 14 09:51:00 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Sat, 14 May 2011 09:51:00 -0700 Subject: [rust-dev] Hello In-Reply-To: References: Message-ID: <4DCEB2F4.1050804@mozilla.com> Hi John, There is nothing preventing you from writing functions that take one tuple parameter instead of multiple parameters. If you're faced with library routines with signatures that aren't to your liking, it's simple to write macros that tuple, untuple, curry, or uncurry parameters. For example, take the function map, which has the type fn[T,U](fn(&T) -> U, vec[mutable? T]) -> vec[mutable? U]. If you would prefer that it had the type fn[T,U](tup(fn(&T) -> U), vec[mutable? T])) -> vec[mutable? U] instead, then you could define a macro #tuple so that you could say #tuple(map) to get the version of "map" with the signature you prefer. Incidentally, I would be careful with the OCaml comparison; OCaml has functions that take multiple, untupled parameters via the named and optional parameters feature. Finally, there's no need for the hyperbolic "[Rust's type system] isn't a type system at all" kind of language. Of course Rust has a type system. Unless you have found a way in which Rust's type system is unsound (which we would definitely like to hear about if so!), that is beyond debate. Patrick From graydon at mozilla.com Sat May 14 11:52:07 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Sat, 14 May 2011 11:52:07 -0700 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: References: Message-ID: <4DCECF57.5060808@mozilla.com> On 14/05/2011 8:00 AM, john skaller wrote: > which is what they should have. > ... > a complex spaghetti which ends up being entirely untenable. > ... > makes polymorphism completely untenable. > ... > Go doesn't have it, so its useless. > Java is a rubbish, because it was released without it. > ... > At least first order polymorphism is mandatory. > Multi-argument functions have to be thrown out. > ... > Rust a dead horse even before the race starts. > ... > I mean it isn't a type system at all. We have a code of conduct and you are not following it. Not even close. https://github.com/graydon/rust/wiki/Development-policy -> Conduct The behavior in this email is not tolerated here. I answered your question on IRC and Patrick has answered slightly differently here -- we don't think the tradeoffs involved favour the balance you're proposing -- but in any case your tone is entirely unacceptable. Change tone or you'll be excluded from further conversation. -Graydon (not Grayson) From skaller at users.sourceforge.net Sun May 15 03:34:26 2011 From: skaller at users.sourceforge.net (john skaller) Date: Sun, 15 May 2011 20:34:26 +1000 Subject: [rust-dev] Hello In-Reply-To: <4DCEB2F4.1050804@mozilla.com> References: <4DCEB2F4.1050804@mozilla.com> Message-ID: <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> On 15/05/2011, at 2:51 AM, Patrick Walton wrote: > Hi John, > > There is nothing preventing you from writing functions that take one tuple parameter instead of multiple parameters. Yes, that's true, but I can't work with your functions that don't do that. > If you're faced with library routines with signatures that aren't to your liking, it's simple to write macros that tuple, untuple, curry, or uncurry parameters. Is that really the case? In Ocaml etc you can do that, but in Rust can you? Sorry I don't know Rust well enough but clearly something is lost doing this, eg if the function has alias arguments? > Incidentally, I would be careful with the OCaml comparison; OCaml has functions that take multiple, untupled parameters via the named and optional parameters feature. True. > > Finally, there's no need for the hyperbolic "[Rust's type system] isn't a type system at all" kind of language. Of course Rust has a type system. Unless you have found a way in which Rust's type system is unsound (which we would definitely like to hear about if so!), that is beyond debate. The intent was clear I think: if you take "type system", with emphasis on the word "system" in the modern sense, then it doesn't. System actually means its an initial algebra. Functions with multiple arguments can be typed: I mean the domain can be typed, but in Rust it isn't. I think you need a kinding system to do it. Basically if you have type category T, and lets consider a 2 argument function, then with a tuple the kind is T -> T but with Rust two argument function the kind is T * T -> T [ignoring weird stuff like aliases, mutables, and other such things] In fact the tuple constructor (usually written "," ) is functor T * T -> T which is the whole point of it: you have constructor which gets factors away the extremely difficult to handle multi-argument functions. A multi-argument function is a combination tuple constructor and function. It's obviously better to factor these two distinct kinds of things (in a basic language). In Rust there's a reason not to do this but it comes at a high price and may not be necessary. -- john skaller skaller at users.sourceforge.net From skaller at users.sourceforge.net Sun May 15 03:55:55 2011 From: skaller at users.sourceforge.net (john skaller) Date: Sun, 15 May 2011 20:55:55 +1000 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: <4DCECF57.5060808@mozilla.com> References: <4DCECF57.5060808@mozilla.com> Message-ID: <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> On 15/05/2011, at 4:52 AM, Graydon Hoare wrote: > On 14/05/2011 8:00 AM, john skaller wrote: > >> which is what they should have. > > ... > > a complex spaghetti which ends up being entirely untenable. > > ... > > makes polymorphism completely untenable. > > ... > > Go doesn't have it, so its useless. > > Java is a rubbish, because it was released without it. > > ... > > At least first order polymorphism is mandatory. > > Multi-argument functions have to be thrown out. > > ... > > Rust a dead horse even before the race starts. > > ... > > I mean it isn't a type system at all. > > We have a code of conduct and you are not following it. Not even close. > > https://github.com/graydon/rust/wiki/Development-policy -> Conduct I read it. > The behavior in this email is not tolerated here. I answered your question on IRC and Patrick has answered slightly differently here -- we don't think the tradeoffs involved favour the balance you're proposing -- but in any case your tone is entirely unacceptable. > > Change tone or you'll be excluded from further conversation. > > -Graydon > (not Grayson) Sorry for getting your name wrong! The word "untenable" is technical and I will not withdraw it. I could be wrong on the issue and I would be happy if that were the case. Spaghetti is metaphorical and intended to conjur up an image because that is what I see in my minds eye. It is not perjorative and I see no reason to withdraw it. In the sense I used it to describe Java "Rubbish" is perjorative. I withdraw it with the comment that I am unhappy billions of dollars were spend by Sun to promote a language which was a major backward step in computing - in my opinion. Sorry for being emotional on the issue, I not only care about programming languages I have devoted my life to it (and I think, largely wasted my time). "Mandatory" means necessary and is therefore not perjorative. The context should have been clear: I believe polymorphism is paramount and the argument -- which could be false of course -- that multi-argument functions need to be removed. Of course if you don't agree polymorphism is important, you're not accepting the premise, so that's fine. And if my argument is incorrect I'd love to have that demonstrated. Please understand: I wrote the article in attempt to prevent yet another language making a serious mistake, and thereby forfeiting all its good features. Rust has many good features and some interesting ones (like the 3 level memory management system). I understand you explained the reasoning on IRC but the point is that after thinking about it I believe the loss of performance doesn't affect as many cases as you might think. As I tried to explain I think you can keep aliases, and it will have the desired effect *most* of the time even with tuples. If you're willing to throw out a few cases, you can go with tuples. I may be wrong. Maybe it isn't just a few cases. But I think it is worth investigating because you're paying huge price. unfortunately it isn't the kind of balance than can be quantified. -- john skaller skaller at users.sourceforge.net From skaller at users.sourceforge.net Sun May 15 04:59:43 2011 From: skaller at users.sourceforge.net (john skaller) Date: Sun, 15 May 2011 21:59:43 +1000 Subject: [rust-dev] Hello In-Reply-To: <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> Message-ID: <4D2BC6F6-6965-419C-853B-9817324F84C1@users.sourceforge.net> On 15/05/2011, at 8:34 PM, john skaller wrote: Oopps .. > Basically if you have type category T, and lets consider a 2 argument function, then > with a tuple the kind is > > T -> T This is wrong, the kind is just T > > but with Rust two argument function the kind is > > T * T -> T Sorry! So more correctly the kind of the domain, codomain, and function with tuples is T,T,T respectively and with multiple arguments is T * T, T, T respectively The handling of n-ary combinators like tuple formation appears to require high level polyadic programming (functorial polymorphism). By comparison, currified functions like: U -> U -> U -> U -> ... can be decomposed into a list which can be handled relatively easily by higher order polymorphism and recursion. This doesn't work for tuples because tuple formation isn't associative. I believe this could be why Haskell uses currified functions as standard instead of tuples ( the -> combinator is right associative so can be handled by a single binary function and recursion, in particular just a fold). [BTW: the usual "hack" to solve this is to break the tuple term into a list, process it with recursion, then rebuild the tuple, but this does NOT work properly. Basically this needs a "flatten" operator but flatten is ill defined because you can't tell how "deep" to flatten.] The impact on a programming language is that it is hard to compactly generalise tuple handling. To handle tuples up to N components you need N combinators, or, N^m, if you're combining them. I have actually worked on a language which could do this, designed by a top flight category theorist. So actually it CAN be done and it isn't hard but I for one don't understand enough of the theory to design a language in which it can be implemented. -- john skaller skaller at users.sourceforge.net From dherman at mozilla.com Sun May 15 05:15:32 2011 From: dherman at mozilla.com (David Herman) Date: Sun, 15 May 2011 09:15:32 -0300 Subject: [rust-dev] Hello In-Reply-To: <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> Message-ID: <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> FWIW, I have plenty of friends in the PL research world who design type systems that are not based on your restrictive definitions (type system = initial algebra, type system design = category theory). But I have no interest in a theoretical shouting match. We'd probably lose -- no one on our team is a category theorist -- so there's no point in fighting. Regardless, I believe our policies are clear. And I believe that it's pretty common etiquette in many cultures throughout the world that you don't walk into a community and tell them "do things my way or give up everything you're doing because otherwise it's all doomed." That's rude, it's presumptuous, and I would suggest it's an awfully self-destructive way to think. Dave From skaller at users.sourceforge.net Sun May 15 06:09:23 2011 From: skaller at users.sourceforge.net (john skaller) Date: Sun, 15 May 2011 23:09:23 +1000 Subject: [rust-dev] Hello In-Reply-To: <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> Message-ID: On 15/05/2011, at 10:15 PM, David Herman wrote: > FWIW, I have plenty of friends in the PL research world who design type systems that are not based on your restrictive definitions (type system = initial algebra, type system design = category theory). But I have no interest in a theoretical shouting match. We'd probably lose -- no one on our team is a category theorist -- so there's no point in fighting. I'm not either, I'm just making a comment. Note I said "modern type system". Clearly I am talking about what theoreticians think a type system is these days. None of the ones I know would accept that C++ has a type system in that sense, for example. Anyhow, please ignore my comment, it was just off the cuff and really it was intended to be a bit humorous! I should remember email isn't so good for that;( > > Regardless, I believe our policies are clear. And I believe that it's pretty common etiquette in many cultures throughout the world that you don't walk into a community and tell them "do things my way or give up everything you're doing because otherwise it's all doomed." That's rude, it's presumptuous, and I would suggest it's an awfully self-destructive way to think. Dave, I'm not trying to be presumptuous or rude. I'm actually trying to help by imparting a little of my experience. And also I'm trying to learn enough about Rust and the team working on it to decide whether to give up my own project in favour of it. I spent 10 years on the C++ committee, a lot of it working on templates. [As an NB representative] Then I designed my own language to overcome many of the shortcomings of C++. So basically I've been into programming for 35 years and 20 or so of that seriously into languages, and around 15 I think actually writing a compiler. Much of that was full time and entirely unpaid. I am trying to present a case that not using tuples for function arguments has a high price, and that the justification for multiple arguments, to support aliases, to support statically assured down stack pointers for efficiency .. may not be as strong as is believed. At this point in time I believe it is true that on an ABI boundary, it is necessary in some cases for maximum optimisation. But in some cases it isn't necessary, and it is never necessary calling a known function (I mean, the function definition is known) which I think you would say is "in the same crate" if I understand crates properly. This basically means that whilst you're "all in Rust" you probably have the option to widen the "definition" of a crate dynamically from small during prototyping to "whole program" later, and get major optimisations benefits .. including the removal of overheads from tuples vs multiple arguments. I'm not saying you should do this, but I am warning that, well, when someone gets a new language the first thing they will probably do after "hello world" is try to implement some basic data structures, lists, trees etc, because without them they can't write applications, and naturally they would want those data structures to be polymorphic. Anyhow there are lots of other things that could be talked about. I have some interesting algorithms that might be useful, for example parallel assignment, which is used to optimise tail recursive function calls to minimise the number of temporaries. [That problem is NP complete so the algorithm is only a heuristic]. I also know how to do inlining (including polymorphic functions), in the presence of nested functions (does Rust have nested functions?), and although the algorithm isn't nearly as good as it could be I think it is the best one available (it does a much better job than MLton for example). Most of this stuff is about performance, and I understand that matters to Rust people? And most of it achieves a lot more than minor tweaks at interface boundaries because it tries to eliminate boundaries (which of course isn't possible in the interface of a function of a binary library). -- john skaller skaller at users.sourceforge.net From skaller at users.sourceforge.net Sun May 15 07:34:06 2011 From: skaller at users.sourceforge.net (john skaller) Date: Mon, 16 May 2011 00:34:06 +1000 Subject: [rust-dev] Three level memory system In-Reply-To: References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> Message-ID: <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> I'm intrigued by the 3 level memory system (stack, ref count, gc) and wonder how that works out in practice. Does it ever get in the way? I'm also a bit curious about ref counting. As a memory management technique it is known to be slower than a good GC, partly because of the overhead of managing the count, and partly because doing so can destroy cache coherence by touching memory that otherwise doesn't need to be touched. Also, at least in C++ the primary use of destructors is to release memory which is not necessary with a GC, and ordered and timely release of memory is not really useful except perhaps in application with hard real time constraints. On the other hand, ordered synchronous release of some resources (file handles, locks, etc) is essential, but many languages without destructors (such as Ocaml) don't seem to have a huge problem with this. Also, destructors have a very serious design fault: faults in destructors are notoriously hard to handle. So I'm curious about the decision to use ref counting and deterministic destructors. [I'm just curious, I think the design is very interesting!] FWIW, I also had this 3 level system implemented in my language Felix, but without the explicit static control Rust provides. What I found was that the performance overheads, especially in the presence of shared memory concurrency that Felix supported, were very high. Stuff ran a lot faster when I threw it out. I should say that in Felix, ref counting didn't exclude cycles so the GC had to scan the whole heap, ref counted or not: the load was only reduced when the ref counts removed memory prior to a GC phase. Does Rust ensure ref counted objects can't contain cycles? Anyhow, the Rust system seems very promising to me! -- john skaller skaller at users.sourceforge.net From graydon at mozilla.com Sun May 15 08:25:18 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Sun, 15 May 2011 08:25:18 -0700 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> Message-ID: <4DCFF05E.3050903@mozilla.com> On 15/05/2011 3:55 AM, john skaller wrote: > In the sense I used it to describe Java "Rubbish" is perjorative. I withdraw it with the > comment that I am unhappy billions of dollars were spend by Sun to promote a language > which was a major backward step in computing - in my opinion. Sorry for being > emotional on the issue, I not only care about programming languages I have devoted > my life to it (and I think, largely wasted my time). I care about them too, and have likewise devoted much time to them, but I guess I see them as a field of great (and often very pretty) "ecological diversity", not a process to discover the One True Way. I don't even mind Java that much, though I think they shipped before they were ready and greatly overdid the design pattern thing in their standard library. Java demonstrated a few really important points to language designers and users alike: - We really were wasting a whole lot of time chasing memory errors! - Tooling matters a lot. The IDEs are incredible. - Fast compilation is mostly a matter of compilation model. - Well defined primitive types are handy. - There's a large class of business programmers who value predictability over many other features. - The line between dynamic and static language is blurrier than everyone was saying. Maybe some of these were lessons well known to the lisp camp, but some weren't, and in any case "lisp people know X" is seldom the final word that needs to be written. I am sometimes sad that java got as much attention as it did, when other pretty languages did not, but maybe that's just the price of "anything going mainstream". > "Mandatory" means necessary and is therefore not perjorative. > The context should have been clear: I believe polymorphism is paramount > and the argument -- which could be false of course -- that multi-argument > functions need to be removed. I don't know if there's a single concern I hold to be "paramount" in rust, but if I had to pick one it'd be "safety", which really isn't at play here. Polymorphism is by no means a dominant concern for me. It's a nice thing to support where and when we can, in whatever senses seem reasonable, and when we can't everyone knows how to work around its absences. (I don't even think in general it's a well-defined term. There are several perfectly valid senses to the word, and some of them are at odds in implementation technique. How do multimethods, overloading, subtyping &c fit into your worldview?) > Of course if you don't agree polymorphism is important, you're not accepting > the premise, so that's fine. And if my argument is incorrect I'd love to have > that demonstrated. Ok. Demonstration A: every language with less-than-Ideal (in whatever sense you mean) polymorphism but perfectly usable pragmatics, currently having millions of lines of code written in it every day. Code that ships and works and is not totally worthless. See language list below... > Please understand: I wrote the article in attempt to prevent yet another language > making a serious mistake, and thereby forfeiting all its good features. > Rust has many good features and some interesting ones (like the 3 level > memory management system). I'm afraid I don't see how some loss of compositionality (or polymorphism) equates to forfeiting all good features. You've made it sound like you don't think any language that fails on this point is usable at all. Yet I've read and written code in dozens of languages that don't do what you advise. Indeed, of the influences on rust I can name off hand (nil/hermes, newsqueak/alef/limbo, C, C++, python, lisp, ada, C#, clu, mesa, eiffel/sather, ml, D, erlang) I don't think more than a couple did what you're demanding, yet they all seem quite usable to their respective communities. How do you get from "lack of compositionality" to "forfeit all other features"? > As I tried to explain I think you can keep aliases, and it will have the > desired effect *most* of the time even with tuples. If you're willing to throw > out a few cases, you can go with tuples. Really? It sounded like something quite different to me: that the pragmatics of making a function call would change to entail implicit heapification on every library boundary. That would further entail: - Making calls to precompiled libraries using pointers less efficient than just passing by value (!), unless we.. - Overhauled the whole compilation model to stick a copy of the text (say, pickled AST) in each library and rebuilt it every time we built final binaries. Compile times shoot up and we've got the C++ "recompile libstdc++ each time" system. - Also we could never do varargs. - Also we could never do keyword args. - Also mutable values would become significantly more mysterious because the compiler sometimes inserts copy-to-heap operations that change where the mutable thing is located. - Also the semantics of destructors and/or gc would change because we'd have phantom copies of things being synthesized by the compiler that we'd have to decide when and how to dispose of. IOW it would change a lot of quite major pragmatics of the language. I recognize that some compositionality of function types is being lost here, but I simply don't think the mathematical abstraction of a function with a single, purely compositional domain-and-range is an adequate model of what real programmers want to be doing (and are accustomed to doing elsewhere). I do think it's a model of a subset of what programmers want to be doing, and we support that model adequately. We have tuple types and inferred parametric polymorphism of a sort, and you can write your functions as pure and composable in FP style. We'll certainly ship FP bits to support this style in the standard library, as C++ increasingly does. I don't think C++ failed here at all; I don't think it's essential to convert *everything* to FP style to reap some of the benefits of it in some cases, and in cases where it's not a benefit it can be set aside in favour of a more appropriate style. I'm a pluralist and intend the language to support writing in multiple styles. That includes FP, but it doesn't mean that the considerations of FP get to throw out the considerations of other styles to make sure everything composes perfectly in FP style. > I may be wrong. Maybe it isn't just a few cases. But I think it is worth investigating > because you're paying huge price. unfortunately it isn't the kind of balance > than can be quantified. I agree it's very hard to quantify these things. It's a trade between compositionality and several other factors (predictability, performance, FFI, compiler complexity, compilation model, various non-tuple-like features) few of which are obvious or even visible in the FP worldview. I realize there's some compositionality being lost here, but like orthogonality, do not think it the sole concern. -Graydon From pwalton at mozilla.com Sun May 15 14:37:26 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Sun, 15 May 2011 14:37:26 -0700 Subject: [rust-dev] Statically linked library crates Message-ID: <4DD04796.4080708@mozilla.com> Hi everyone, It occurred to me that our insistence on dynamic linking for the standard library is likely to cause performance problems. This has been lingering at the back of my mind for a while, although my fears have been allayed to some degree so far by the fact that trivial standard library routines rarely show up in profiles. But it occurred to me that there's one small class of functions that we really can't get away with not inlining: simple iterators. Consider uint::range(). This is the preferred way to do a C-style for loop in Rust. Up to this point we've been preferring to write C-style for loops as while loops, as near as I can tell, due to some early flakiness in rustboot, and for performance reasons. While loops work fine. But while loops aren't something we want programmers to have to write to get good performance out of their loops. They're a quite low-level construct. I've been burned many, many times while writing Rust code by forgetting to increment the loop counter. At the moment, *each iteration* of uint::range() requires not one, but two indirect function calls, both across boundaries that are opaque to LLVM (indeed, each boundary is a DLL boundary). This is likely not going to work; we're inhibiting all loop optimizations and forcing new stack frames to be created for every trip around the loop. I'm pretty sure that LLVM can inline away the overhead of range(), by first inlining the iterator itself, then inlining the for-each body. But, in order to do that, it needs to statically know the iterator definition. So range() can't be dynamically linked anymore. The simplest solution I can think of is to allow both static and dynamic library crates. There's been a movement lately in some circles to forbid dynamic linking entirely; Go for example explicitly does not support it (in keeping with the Plan 9 tradition). I tend to disagree; I think that supporting dynamic linking is useful to enable memory savings and to make security problems in the wild easy to patch. (I'm told there was a zlib security bug that was much easier to fix on Linux than Windows because Windows apps tend to statically link against everything but the system libraries.) But exclusively relying on it is not going to be tenable; some routines are so simple, yet so critical to performance, that static linking is probably going to be the only viable option. Thoughts? Patrick From respindola at mozilla.com Sun May 15 17:09:44 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Sun, 15 May 2011 20:09:44 -0400 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD04796.4080708@mozilla.com> References: <4DD04796.4080708@mozilla.com> Message-ID: <4DD06B48.6030805@mozilla.com> > There's been a movement lately in some circles to forbid dynamic linking > entirely; Go for example explicitly does not support it (in keeping with > the Plan 9 tradition). I tend to disagree; I think that supporting > dynamic linking is useful to enable memory savings and to make security > problems in the wild easy to patch. (I'm told there was a zlib security > bug that was much easier to fix on Linux than Windows because Windows > apps tend to statically link against everything but the system > libraries.) But exclusively relying on it is not going to be tenable; > some routines are so simple, yet so critical to performance, that static > linking is probably going to be the only viable option. > > Thoughts? I fully agree that dynamic and static linking are the right tools for different problems. The design of creating regular mangled symbols is in part so that we can static link, so just that shouldn't be too hard. A more interesting task is how to get LLVM to inline across crates. When static linking, we could probably give a -emit-llvm option to rustc that would create a crate with LLVM IL instead of regular binary code. This is very similar to how LTO is done for C++. When using dynamic libraries there are also some basic functions we would probably like to inline (those you would code in a header in c++). Maybe we can output the IL in addition to the binary code but mark the IL as available_externally? That way rustc can read it in. > Patrick Cheers, Rafael From graydon at mozilla.com Sun May 15 23:42:00 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Sun, 15 May 2011 23:42:00 -0700 Subject: [rust-dev] Three level memory system In-Reply-To: <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> Message-ID: <4DD0C738.4050509@mozilla.com> On 15/05/2011 7:34 AM, john skaller wrote: > I'm intrigued by the 3 level memory system (stack, ref count, gc) and wonder how > that works out in practice. Does it ever get in the way? Not sure. We don't have enough experience with it yet. We'll be better able to answer this in a few months. > I'm also a bit curious about ref counting. As a memory management technique it > is known to be slower than a good GC, partly because of the overhead of managing > the count, and partly because doing so can destroy cache coherence by touching > memory that otherwise doesn't need to be touched. This statement is highly debatable. There are several kinds of GC and several kinds of RC, and they both have different cache behavior (GC trashes the whole cache regularly and doubles heap pressure; RC hurts cache coherence *if* you're doing shared memory multiprocessing but otherwise tends to cost in code overhead..) Lots of GCs hybridize and RC is nowhere near as dead as its detractors make it out to be. > Also, at least in C++ the primary use of destructors is to release memory > which is not necessary with a GC, and ordered and timely release of memory > is not really useful except perhaps in application with hard real time constraints. > On the other hand, ordered synchronous release of some resources (file handles, > locks, etc) is essential, but many languages without destructors (such as Ocaml) > don't seem to have a huge problem with this. This is pretty subjective. Destructors mean your cleanup routines can do predictable work, in a way that's much more difficult and ambiguous in finalizers. I do think timely release of resources -- including memory -- is more important than you say here. But it's hard to quantify. > Also, destructors have a very serious design > fault: faults in destructors are notoriously hard to handle. We have a well-defined (not flawless, but I think reasonable) strategy for double-faulting (failing inside a dtor running due to failure). Failure being idempotent and unrecoverable helps. > So I'm curious about the decision to use ref counting and deterministic destructors. > [I'm just curious, I think the design is very interesting!] Ref counting recycles memory sooner, more deterministically, touches less of the heap (blows out the cache less), has lower heap overhead, lower average latency, etc. It's not without merit. Determinstic destructors are ... a unique feature that there's no way to emulate without them. Finalizers and finally blocks do different things, and each only does a part of dtors. From a parsimony perspective, dtors cover both and do both better, so why wouldn't you use them? > FWIW, I also had this 3 level system implemented in my language Felix, > but without the explicit static control Rust provides. What I found was that > the performance overheads, especially in the presence of shared memory > concurrency that Felix supported, were very high. We do not support shared memory concurrency. RC is non-atomic. And the GC doesn't have to touch the RC subgraphs. > Does Rust ensure ref counted objects can't contain cycles? Yes. They are (or will be, when this part is actually finished) a separate type kind. -Graydon From graydon at mozilla.com Mon May 16 00:21:27 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 16 May 2011 00:21:27 -0700 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD04796.4080708@mozilla.com> References: <4DD04796.4080708@mozilla.com> Message-ID: <4DD0D077.6000003@mozilla.com> On 15/05/2011 2:37 PM, Patrick Walton wrote: > Hi everyone, > > It occurred to me that our insistence on dynamic linking for the > standard library is likely to cause performance problems. It might. Measure twice, cut once. > Consider uint::range(). This is the preferred way to do a C-style for > loop in Rust. Up to this point we've been preferring to write C-style > for loops as while loops, as near as I can tell, due to some early > flakiness in rustboot, and for performance reasons. I'm not sure there's a single preferred way. We have some special-casing in the language for built-in loop types (elements of vectors for example, avoiding bounds-checking on each iteration) and we could do more; for example we could add a 3-clause C style for loop, or a "from .. to .. by" range loop over scalar types. Or other options, see below. > At the moment, *each iteration* of uint::range() requires not one, but > two indirect function calls Two? It should only be one. Range calls block, block runs, range does += 1, range calls block again. It's also not exactly a surprising indirect function; it keeps calling the same one, from the same place in the code + stack. IOW it'll *probably* be well predicted. > LLVM (indeed, each boundary is a DLL boundary). This is likely not going > to work; we're inhibiting all loop optimizations and forcing new stack > frames to be created for every trip around the loop. It is definitely going to inhibit some optimizations, it's a question of asking which ones and how high the costs are. Measure? If you find it is really bad, which is surely possible, then I'd note that for scalar-range loops I think it's legit to ask for some language help (possibly including pointer-range? maybe we can unify the fast vec-element for loop, since it's just a pointer scalar-range). For iters with substantially more structure, I don't think "having a funciton call in the middle" is a guaranteed major performance issue. It's worth measuring a few different "kinds" of iter before jumping to the conclusion that it can't involve any calling. > So range() can't be dynamically linked anymore. The simplest > solution I can think of is to allow both static and dynamic library crates. Certainly possible. You're jumping through a few inferences to a limited set of conclusions but it's a possible approach. > There's been a movement lately in some circles to forbid dynamic linking > entirely; Go for example explicitly does not support it (in keeping with > the Plan 9 tradition). I tend to disagree; I think that supporting > dynamic linking is useful to enable memory savings and to make security > problems in the wild easy to patch. Yeah. I'm not really a "fan" of dynamic linking. There are also studies showing the putative memory savings of dynamic linking are .. pretty hypothetical. But then it really depends on the system load, system design, what you're measuring, and and who you ask. I mostly wanted dynamic linking so that people who *are* convinced they want it can have it, for whatever reason. It's a common sentiment and as I said before, I'm a pluralist. I'm usually fine statically linking the programs I write. But I want dynamic to work. > But exclusively relying on it is not going to be tenable; > some routines are so simple, yet so critical to performance, that static > linking is probably going to be the only viable option. I think you mean more than "static linking", since LLVM is not a linker; you mean static linking where the .a file actually contains LLVM bitcode, a la the LLVM-LTO approach. That's plausible. It's *different again* from actual static linking. And it's different from pickled ASTs, and different from macros. So there are like ... 4 points on the spectrum here in addition to our existing compilation strategy: - Pickling ASTs so that we can load in STL-like "declared externally but mixed in to your compilation unit" functions. So that the inlining or intertwingling can happen at the front-and-middle ends, not just the LLVM layer. - Powerful macros that cover the "needs to be efficient" cases as well as the various other macro scenarios. Definitely front-end! - Static linking, just to avoid having runtime dependencies and PIC code and such. But still "calls a function on each iter". - Static linking LLVM bitcode, to get LTO across crates and cross-iter inlining. On this list, I'd point out a couple things: - We want a macro system anyways. So let's build that soon. - You only have an inlining problem *when* you're dealing with separate crates. Granted, uint::range is "standard library fodder", but if you have some custom data structure X with iters on X implemented in the same crate as the iter-callers, LLVM may well inline the iter already. And uint::range is one of those cases that will fall into "supported by a scalar-range special case loop", so let's not over-generalize the problem too quickly. - We wanted some kind of "cache mode" for saving and reloading .bc files transparently inside the compiler, to avoid re-trans'ing subsets of a crate that the compiler determines as not having changed run-to-run. So that will probably help set up a framework for loading and static linking external crates as LLVM bitcode. I'd probably say go with the macros and the scalar-range loop *first* while measuring the cost of DLL vs. crate-internal iter vs. scalar-range loops. If you have a serious performance problem that definitely can't be solved with scalar-range optimizations, then start digging into the LLVM-LTO stuff. The reason is simply this: if "standard wisdom" is to use LLVM-LTO against the standard library to get "adequate performance", you get two big follow-on problems: - The idea that "compilation may be fast" is shot; the standard approach will involve rebuilding libstd every time. Back in C++. - You're leaving DLL-based users out in the cold *by design*, rather than trying to find an approach that gets them decent performance too. So I'd like to approach cross crate LLVM-LTO as a last resort, not a first one. -Graydon From skaller at users.sourceforge.net Mon May 16 06:35:58 2011 From: skaller at users.sourceforge.net (john skaller) Date: Mon, 16 May 2011 23:35:58 +1000 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: <4DCFF05E.3050903@mozilla.com> References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> Message-ID: On 16/05/2011, at 1:25 AM, Graydon Hoare wrote: > > - We really were wasting a whole lot of time chasing memory errors! Those people using C, yes. They still do :) > - Tooling matters a lot. The IDEs are incredible. I'll take your word on that, I never use IDE's. In my view Java's really big selling point was the standard GUI. Even if it wasn't that good, at least it was standard. > - Fast compilation is mostly a matter of compilation model. Yes, but surely that's clear? > - Well defined primitive types are handy. > - There's a large class of business programmers who value > predictability over many other features. Yes, but it is worth noting: in Java, semantic predictability. In C++, the semantic predictability is much better than C though not quite as strong as in Java. However in C++ the performance predictability of the library is entrenched in the ISO Standard. > - The line between dynamic and static language is blurrier than > everyone was saying. We might well agree that all (imperative) programs are dynamic. Actually the way I see this issue is: we want strong expressive type systems to help get programs right. However if the type system is strong but not expressive enough the program must go through contortions to work around it in some cases, which is not good. The boundary here seems to be: basic typing is fully understood, up to first order polymorphism at least, however high order polymorphism, kinding systems, and polyadic programming (functorial polymorphism) are not well understood and the work around for this lack of knowledge which means a expressivity in a type system is insufficient is to use dynamics, which is why fully dynamic languages can often do things easily that you can't do at all in static languages. A simple example: in C++ you can't write a function that returns itself because there's no way to specify the type. In Python you can just do it. Another example: it is HARD to get the typing strong enough to type arrays correctly. Many languages (including C++, Ocaml and Rust) just give up and don't bother to include the length in the type. Felix provides arrays which include the length in the type and actually allows *some* algorithms to be written with these but other, clearly sensible, useful, and correct algorithms, cannot be written. For example in Felix you can't concatenate two arrays because the lengths can't be added by the type system. > I don't know if there's a single concern I hold to be "paramount" in rust, but if I had to pick one it'd be "safety", which really isn't at play here. Polymorphism is by no means a dominant concern for me. It's a nice thing to support where and when we can, in whatever senses seem reasonable, and when we can't everyone knows how to work around its absences. I think we differ in our viewpoints here. I agree safety is vital, and I think you believe performance is vital too. Where perhaps we differ is that I believe polymorphism is essential for safety. Roughly speaking it is "safer" to write a list or tree handling algorithm once that handles all data types than it is to write it once for each data type. I think you would probably agree with this if I put the word "better" rather than "safer", but I believe that reducing duplicate efforts reduces bugs even if it isn't "safety" in the sense of language enforcing things. [Actuallly with polymorphism the language does "enforce" correct copying for different data types by doing the copying itself, this certainly prevents the programmer making mistakes!] > > (I don't even think in general it's a well-defined term. There are several perfectly valid senses to the word, and some of them are at odds in implementation technique. How do multimethods, overloading, subtyping &c fit into your worldview?) Is that question rhetorical? if not: Multimethods don't work, by which I mean they do not fix the problem that OO fails to deliver abstraction and therefore is not a general paradigm. Overloading is a convenience. Felix has it, and it introduces very significant extra complexity in the compiler. It is an almost essential convenience for handling things like operations on a large set of integer approximations that C has, for example (although you could also do this with some other technique, for example Haskell style type classes (which Felix also has) or Ocaml style functors). I do not really like overloading: I can live with a language that doesn't have it, although I find it is a bit easier sometimes to use one that does. Subtyping in the sense of type theorists is interesting and probably essential for more complex coding. Note I'm talking about subtyping defined by "a subtype of a type is a type with a subset of the values", not the kind of subtyping purported to exist in OO systems. I do not really understand enough to appreciate how useful it is or isn't. I used Ocaml polymorphic variants which support subtyping variants particularly well and found it was still extremely hard to use in practice (but that may have been the particular implementation or just my own inability). > >> Of course if you don't agree polymorphism is important, you're not accepting >> the premise, so that's fine. And if my argument is incorrect I'd love to have >> that demonstrated. > > Ok. Demonstration A: every language with less-than-Ideal (in whatever sense you mean) polymorphism but perfectly usable pragmatics, currently having millions of lines of code written in it every day. Code that ships and works and is not totally worthless. Please excuse my language, I often exaggerate and get too emotional .. I don't actually mean to offend people. Of course, for example "C" is a useful language. But that doesn't mean a *new* language should be developed that doesn't support polymorphism. Of course I see that Rust does (or is intended to) support polymorphism so we are actually arguing about a detail. Roughly: with multi-argument functions you can certainly have "true and proper" polymorphism, but not first order or even second order: you need a kinding system as well, because your functions are a combination of a N-functor (constructor) and an ordinary function. Languages with tuples have a family of N-functors for constructing tuples, but most do not have the power to generalise them. Its not perfect but the rest of the typing generalises (can be easily made polymorphic). The impact on Rust is more severe because ALL functions will need the advanced kinding support that most language lack only for tuples. Just to be clear what I mean: in ML and in Felix you simply cannot write a single function that creates a parallel composition of N functions. Here is a binary parallel composition in ML: let pcomp2 f g = fun x y -> f x, g y It is of course trivial to do this for a list of any length with recursion (in fact a map), but to do this for tuples of any number of components is impossible in ML. It is however quite clearly trivial to implement pcomp for all N inside a compiler, but providing the support for the end user to put this in a library function is very hard. > See language list below... > >> Please understand: I wrote the article in attempt to prevent yet another language >> making a serious mistake, and thereby forfeiting all its good features. >> Rust has many good features and some interesting ones (like the 3 level >> memory management system). > > I'm afraid I don't see how some loss of compositionality (or polymorphism) equates to forfeiting all good features. Please note "composition" was just an example of a polymorphic function. > How do you get from "lack of compositionality" to "forfeit all other features"? Lack of polymorphism, not composition. The way I get from that is that theory and practice for non-polymorphic sequential languages is well enough known it is of little interest and there are no serious variations that really affect software quality. All the research is about type systems and supporting polymorphism better. [Note: "sequential languages" excludes concurrency which is clearly not well understood] Polymorphism is the way to get type safety for components that are reusable for many types. To be more precise: everyone can easily do first order polymorphism. Second order is a bit harder. The holy grail is polyadic programming, also called functorial polymorphism. I want to give an example of what that is. Consider "fold", it is an important higher order function I'm sure you'd agree. Now, in most languages fold is polymorphic. You have List.fold in Ocaml and it works for list of integers and lists of floats and even lists of lists of doubles. Then you have Array.fold. It does the same job for arrays. Then there is a Set.fold which does it for sets. Etc etc.. So many folds! Its boring to write out all these folds! There must be a better way. To write fold ONCE and have it work on ALL data structures. A language that can do that is polymorphic over data functors (data structures), not just the type of values contained in them. This is the holy grail. I have worked on a language which was array polyadic, meaning all algorithms could be written to work independently of the number of dimensions (and be statically type checked to ensure no mismatches). Note: *number* of dimensions. Note just any length, but any number of dimensions. Subsequently, this was extended to ANY polynomial data type: thats a data type made of of sums and products (variants and tuples of something similar). That's a very wide class of data structures: lists, trees, etc: anything inductively defined (i.e. constructive data type). Ok, so sorry for the long rave and digression but there's a purpose in it. Since I see the aim as providing this kind of very high level of reusability .. a language without basic polymorphism is back in the dark ages. A language missing out on higher order polymorphism or polyadic programming isn't at the research frontier but at least it implements something reasonable well established as a precursor to the "real thing". Felix is such a language: it really only has first order polymorphism (and a little tiny bit of higher order stuff, just enough get Monads working .. but don't rely on it). > >> As I tried to explain I think you can keep aliases, and it will have the >> desired effect *most* of the time even with tuples. If you're willing to throw >> out a few cases, you can go with tuples. > > Really? It sounded like something quite different to me: that the pragmatics of making a function call would change to entail implicit heapification on every library boundary. That would further entail: > > - Making calls to precompiled libraries using pointers less > efficient than just passing by value (!), unless we.. > > - Overhauled the whole compilation model to stick a copy of the > text (say, pickled AST) in each library and rebuilt it every > time we built final binaries. Compile times shoot up and we've > got the C++ "recompile libstdc++ each time" system. Note my suggestion on this: you should consider doing this by only *optionally* recompiling. The idea is that you use the precompiled functions when prototyping and when you're happy you rebuild the lot as a single piece of text. Yes you're "recompiling libstdc++" in that case but you're not doing it during development, only when you make a release. > - Also we could never do varargs. Varargs are evil :) > > - Also we could never do keyword args. Felix does, and it uses the tuple model! And since it also has overloading it even allows the keywords to help with overload selection. > > - Also mutable values would become significantly more mysterious > because the compiler sometimes inserts copy-to-heap operations > that change where the mutable thing is located. > > - Also the semantics of destructors and/or gc would change because > we'd have phantom copies of things being synthesized by the compiler > that we'd have to decide when and how to dispose of. These things may happen, but this kind of thing is always going to happen in any programming system. What I mean is: I might implement a graph data type in C and write cleanup routine which is of course precisely a standard garbage collection algorithm (because there isn't any other way to manage memory in a graph). So even though you don't have GC in C, if you have a graph data structure you have GC in C: its just in the user code instead of the language system. > > IOW it would change a lot of quite major pragmatics of the language. I am not so sure, but of course you know Rust better. Removing alias clearly doesn't "impact" the language at all. It does impact programmers trying to use the language (since the feature is missing). Once aliases are removed, changing from multiple arguments to tuples has no impact (given you don't get have polymorphism or typing to support it), but it opens the path for easy introduction of polymorphism. Introducing that also has no impact on the language (in the sense of breaking any invariants). But again it affects programmers by allowing much safer implementations data structures by making them more reusable. I do NOT dispute the logic of your analysis, by the way. At this time I believe you are right: something you want to achieve (iron clad safety and zero overhead memory management when using down stack pointers) is definitely lost. There may be another way to achieve this though, and it may not matter quite so much if you lose a bit of performance in some cases (we would agree losing safety in Rust isn't acceptable: in Felix I would throw out the safety to get the performance but I'm not arguing you should do that). > I recognize that some compositionality of function types is being lost here, but I simply don't think the mathematical abstraction of a function with a single, purely compositional domain-and-range is an adequate model of what real programmers want to be doing (and are accustomed to doing elsewhere). I'm sure it isn't but without meaning to be rude real programmers have no idea what they're talking about most of the time so that isn't much of an argument. Real programmers think OO is wonderful, that its the holy grail, when it can be proven to be fraudulent in a few lines.. but real programmers would then try to invent the Perpetual Motion Machine despite the fact physics says it can't be done (Multimethods, double dispatch .. all sorts of ridiculous attempts to support unsustainable religious beliefs). Real programmers think about "what they want" in terms of "what they already have". Ask a C programmer what they wan't and they'll ask for some minor tweak of C. They won't ask for ML because they have no idea what a variant type is, they "have got by without higher order functions just fine" (etc). Of course they keep producing code that doesn't work. > > I do think it's a model of a subset of what programmers want to be doing, and we support that model adequately. We have tuple types and inferred parametric polymorphism of a sort, and you can write your functions as pure and composable in FP style. We'll certainly ship FP bits to support this style in the standard library, as C++ increasingly does. I don't think C++ failed here at all; I don't think it's essential to convert *everything* to FP style to reap some of the benefits of it in some cases, and in cases where it's not a benefit it can be set aside in favour of a more appropriate style. I should add in passing I am NOT an FP zealot. Felix is actually a classical Algol like language, and Ocaml is a fairly standard imperative language too, it just has strong support for FP. > > I'm a pluralist and intend the language to support writing in multiple styles. I am with you there. > That includes FP, but it doesn't mean that the considerations of FP get to throw out the considerations of other styles to make sure everything composes perfectly in FP style. There's no need, since we already HAVE the perfect FP language: Haskell. > >> I may be wrong. Maybe it isn't just a few cases. But I think it is worth investigating >> because you're paying huge price. unfortunately it isn't the kind of balance >> than can be quantified. > > I agree it's very hard to quantify these things. It's a trade between compositionality and several other factors (predictability, performance, FFI, compiler complexity, compilation model, various non-tuple-like features) few of which are obvious or even visible in the FP worldview. I realize there's some compositionality being lost here, but like orthogonality, do not think it the sole concern. I actually agree with all your arguments and much of your priorities, but not your view of polymorphism. I think you're playing down THE most important feature of any programming language because static typing is THE most important way to improve correctness and polymorphism is way of upgrading the expressivity of a type system by a huge order of magnitude which leads to very significant reduction in the amount of code that has to be written to implement libraries. You are using static typing (and typestate stuff etc) to make memory management guarantees beyond what most languages make, and these in turn provide performance guarantees beyond which most languages make, and that is very significant contribution to Computer Science (IMHO of course but I'm guessing you aren't going to disagree on this one! :) I just find it hard to believe that doing this requires sacrificing the usual way of doing polymorphism. "Aliases" are a hack. We both know that. You're using them because you can't see a better way to achieve your goals. I feel there just HAS to be a better way. i don't claim to know what it is. Its just intuition. Whatever it is should be part of the type system "in general" an not be restricted to function arguments. These down stack pointers are basically a special kind of weak pointer. Anyhow thanks for your patience and time discussing this! -- john skaller skaller at users.sourceforge.net From respindola at mozilla.com Mon May 16 06:48:44 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Mon, 16 May 2011 09:48:44 -0400 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD04796.4080708@mozilla.com> References: <4DD04796.4080708@mozilla.com> Message-ID: <4DD12B3C.9040803@mozilla.com> > There's been a movement lately in some circles to forbid dynamic linking > entirely; One use case that I just remembered that pretty much mandates dynamic linking: plugins on Windows. A plugin on windows cannot read symbols from the main binary, so if we don't have a XUL equivalent, the plugin would have to have its own copy of the JIT for example. > Patrick Cheers, Rafael From skaller at users.sourceforge.net Mon May 16 06:55:03 2011 From: skaller at users.sourceforge.net (john skaller) Date: Mon, 16 May 2011 23:55:03 +1000 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD04796.4080708@mozilla.com> References: <4DD04796.4080708@mozilla.com> Message-ID: On 16/05/2011, at 7:37 AM, Patrick Walton wrote: > > I'm pretty sure that LLVM can inline away the overhead of range(), by first inlining the iterator itself, then inlining the for-each body. But, in order to do that, it needs to statically know the iterator definition. So range() can't be dynamically linked anymore. The simplest solution I can think of is to allow both static and dynamic library crates. My view as expressed in some messages already is that you can have both opaque function calls AND text for inlining. When you're "rapid prototyping" you just use the slow calls to avoid the compilation costs of doing large scale analysis. When you're doing a release, you compile a lot more text and do more inlining, and don't call the prebuilt functions so often. This isn't so much "static and dynamic" because I do not believe you can leave optimisations like inlining to LLVM. No matter how good LLVM is at doing this it cannot compete with a compiler that understands the high level semantics of the users code. The way I picture this is you have a tree of library "things" ending up with leaves which are individual functions. To compile fast and run slow you compile all the functions separately. Change the program, recompile the changed functions (only). To get better performance you just expand the "compilation unit" by moving up the tree. Now you compile sets of functions so that calls within these sets are inlined and optimised, but call across from one set to another aren't. In the extreme case you can move the trade-off slider all the way to "whole program" and that get rid of ALL ABI overheads (other than calls to non-Rust functions). Whole program analysis can achieve blinding speed but comes at a heavy price. Felix can ONLY do whole program analysis, although it uses caching to reduce the cost. This is the WRONG WAY (TM). Rusts approach is correct: get the external interfacing working first. Then if you can make the definition of what's external flexible you can open up the possibility of heaving inlining at any level of granularity. IMHO it is not necessary to do heavy optimisations early in the development of the language: you can always add that later. It is important to get the "flexible granularity" of compilation unit going early though (if you choose to do this kind of thing). -- john skaller skaller at users.sourceforge.net From chris.double at double.co.nz Mon May 16 07:08:46 2011 From: chris.double at double.co.nz (Chris Double) Date: Tue, 17 May 2011 02:08:46 +1200 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> Message-ID: On Tue, May 17, 2011 at 1:35 AM, john skaller wrote: > > Felix provides arrays which include the length in the type and actually > allows *some* algorithms to be written with these but other, clearly > sensible, useful, and correct algorithms, cannot be written. > For example in Felix you can't concatenate two arrays because the lengths > can't be added by the type system. Have you considered adding some form of dependent types to Felix to allow expressing this sort of thing? In ATS for example you can do something like (using lists with the length encoded because I have the example handy): fun{a:t at ype} list_append {n1,n2:nat} (xs: list (a, n1), ys: list (a, n2)): list (a, n1+n2) = case+ xs of | nil () => ys | cons (x, xs) => cons(x, list_append(xs, ys)) (see http://www.bluishcoder.co.nz/2010/09/01/dependent-types-in-ats.html for examples and explanations of the syntax) Can you give an example of how making functions take tuples improves certain types of code or makes things possible that weren't before? I remember way back when you used to post in the 'genstl' mailing list about templates and C++ you experimenting with a style of functions taking one argument which was a struct type with members for each argument, and composing these in interesting ways via templates, but the details are lost to my memory. Chris. -- http://www.bluishcoder.co.nz From dherman at mozilla.com Mon May 16 07:38:26 2011 From: dherman at mozilla.com (David Herman) Date: Mon, 16 May 2011 07:38:26 -0700 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> Message-ID: <40D8549A-31EF-4CF7-A18B-6876163688FA@mozilla.com> >> How do you get from "lack of compositionality" to "forfeit all other features"? > > Lack of polymorphism, not composition. I'm sorry, I'm just still confused about why you claim we have no polymorphism. I mean, we simply do have polymorphism. I think I understand why you are saying that polymorphic functions are less general in a language with multiary functions, but it doesn't mean we have *no* polymorphism. There has been work on making polymorphism more expressive in languages with multiple arity functions: http://www.ccs.neu.edu/racket/pubs/esop09-sthf.pdf And I think C++0x has some similar functionality. Maybe we'll end up exploring some of that. But at the end of the day, the worst case scenario here is that we lose some expressiveness and some things require more code duplication. That's a trade-off we will continue exploring, but we are willing to take some hits in expressiveness. I think it's worth cutting to the chase and recognizing that we are not aiming for perfection. We're looking for a good trade-off between many competing constraints. That requires taste, and it requires compromise. Reasonable people can disagree about some of the individual decisions we make, but I don't agree that this one constitutes an insurmountable failure. Dave From skaller at users.sourceforge.net Mon May 16 07:39:09 2011 From: skaller at users.sourceforge.net (john skaller) Date: Tue, 17 May 2011 00:39:09 +1000 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD0D077.6000003@mozilla.com> References: <4DD04796.4080708@mozilla.com> <4DD0D077.6000003@mozilla.com> Message-ID: <3AABB042-0CFA-472D-A7F9-C978E995DFA8@users.sourceforge.net> On 16/05/2011, at 5:21 PM, Graydon Hoare wrote: > > Yeah. I'm not really a "fan" of dynamic linking. There are also studies showing the putative memory savings of dynamic linking are .. pretty hypothetical. But then it really depends on the system load, system design, what you're measuring, and and who you ask. Its not about memory savings. Its about delayed binding. This is especially important for long running service programs where you can add new features or even new versions of modules to the program without stopping it. > I mostly wanted dynamic linking so that people who *are* convinced they want it can have it, for whatever reason. It's a common sentiment and as I said before, I'm a pluralist. I'm usually fine statically linking the programs I write. But I want dynamic to work. In Windows a lot of libraries are ONLY available as DLLs. > > - Pickling ASTs so that we can load in STL-like "declared externally > but mixed in to your compilation unit" functions. So that the > inlining or intertwingling can happen at the front-and-middle ends, > not just the LLVM layer. Felix does this. It save enormous amounts of parsing time. > > - Powerful macros that cover the "needs to be efficient" cases as well > as the various other macro scenarios. Definitely front-end! I do not understand the need for macros. Could someone explain? If you have polymorphic functions you basically don't need macros. But perhaps I misunderstand what a macro is in Rust. > > - We want a macro system anyways. So let's build that soon. Why? Felix had macros at many levels, very powerful. I threw them out. The languages is better without them. Conditional compilation is supported by ordinary control flow and compile time constants (the compiler optimises away unused branches even before types are determined). > > I'd probably say go with the macros and the scalar-range loop *first* while measuring the cost of DLL vs. crate-internal iter vs. scalar-range loops. If you have a serious performance problem that definitely can't be solved with scalar-range optimizations, then start digging into the LLVM-LTO stuff. Just in general: the cost of function calls is enormous. The cost is not just in the call, it is not just in the parameter passing. The cost is that when you inline the function, the resulting substitutions lead to code that can be further optimised because everything just got more specific. -- john skaller skaller at users.sourceforge.net From skaller at users.sourceforge.net Mon May 16 08:13:35 2011 From: skaller at users.sourceforge.net (john skaller) Date: Tue, 17 May 2011 01:13:35 +1000 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> Message-ID: <3798AFCE-52D4-49F1-868A-C68010FD7D01@users.sourceforge.net> On 17/05/2011, at 12:08 AM, Chris Double wrote: > On Tue, May 17, 2011 at 1:35 AM, john skaller > wrote: >> >> Felix provides arrays which include the length in the type and actually >> allows *some* algorithms to be written with these but other, clearly >> sensible, useful, and correct algorithms, cannot be written. >> For example in Felix you can't concatenate two arrays because the lengths >> can't be added by the type system. > > Have you considered adding some form of dependent types to Felix to > allow expressing this sort of thing? Sure, but I don't know how :) I would need a type theorist or three on the team to help design this properly. As it is Felix is pretty much dead because of lack of developers and users (which is why I'm here at the moment evaluating Rust). This is a bit off topic but: in Felix, an array has the type T ^ N where ^ is an exponential and N is a unit sum: 3 = 1 + 1 + 1 for example. In particular this kind of array is "free" in the sense that it is precisely a tuple: int ^ 3 = int * int * int var x = 1,2,3; is a tuple of three ints and also an array of 3 ints .. there's no difference, they're not just isomorphic they're *equal*. The problem is that 3 + 2 != 5 because, expanded: (1+1+1) + (1 + 1) != 1 + 1 + 1 + 1 + 1 because summation is not associative. In fact, the typing here has no need of dependent (run time) types. Its entirely static. Its just that I do not see a general isomorphism to do the calculation. One should note from this scheme that array indicies SHOULD be sum types and NOT integers, and such values cannot possibly be "out of bounds". > > Can you give an example of how making functions take tuples improves > certain types of code or makes things possible that weren't before? Sure just look at "binder1" and "binder2" in the C++ standard library and wonder where "binder3" and "binder4" have gone. They haven't gone, they're not defined. You need an infinite number of them. If you have only one argument functions .. you only need one of them for completeness. -- john skaller skaller at users.sourceforge.net From catamorphism at gmail.com Mon May 16 08:30:12 2011 From: catamorphism at gmail.com (Tim Chevalier) Date: Mon, 16 May 2011 08:30:12 -0700 Subject: [rust-dev] Three level memory system In-Reply-To: <4DD0C738.4050509@mozilla.com> References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> <4DD0C738.4050509@mozilla.com> Message-ID: On Sun, May 15, 2011 at 11:42 PM, Graydon Hoare wrote: > This statement is highly debatable. There are several kinds of GC and > several kinds of RC, and they both have different cache behavior (GC trashes > the whole cache regularly and doubles heap pressure; RC hurts cache > coherence *if* you're doing shared memory multiprocessing but otherwise > tends to cost in code overhead..) I assume you're just referring to copying GC, given the assertion about doubling heap pressure? There are other types of GC, and even assuming copying, generational collectors have been used to *improve* cache behavior: for example, Chilimbi and Larus, "Using Generational Garbage Collection To Implement Cache-Conscious Data Placement" (ISMM '98). Cheers, Tim -- Tim Chevalier * http://cs.pdx.edu/~tjc/ * Often in error, never in doubt "an intelligent person fights for lost causes,realizing that others are merely effects" -- E.E. Cummings From skaller at users.sourceforge.net Mon May 16 08:36:31 2011 From: skaller at users.sourceforge.net (john skaller) Date: Tue, 17 May 2011 01:36:31 +1000 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: <40D8549A-31EF-4CF7-A18B-6876163688FA@mozilla.com> References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> <40D8549A-31EF-4CF7-A18B-6876163688FA@mozilla.com> Message-ID: On 17/05/2011, at 12:38 AM, David Herman wrote: >>> How do you get from "lack of compositionality" to "forfeit all other features"? >> >> Lack of polymorphism, not composition. > > I'm sorry, I'm just still confused about why you claim we have no polymorphism. I mean, we simply do have polymorphism. I think I understand why you are saying that polymorphic functions are less general in a language with multiary functions, but it doesn't mean we have *no* polymorphism. I know, but the polymorphism isn't fully parametric, because that implies it is combinatorial, i.e. you can just combine anything with anything. I picked composition as an example, I'm sure it isn't the only time the lack of a type for a function domain is a problem. Note that multiple arguments aren't a problem for functions like "fold" since that works just fine with the folding argument having type fn(A,B)->B If you want to get theoretical this function has domain of kind TYPE * TYPE where TYPE is the category of types. A value of that category is a type tuple (A,B) so the function above DOES have a domain: (A,B) but that isn't a type, its a value of a different Kind that TYPE (in particular, the Kind TYPE * TYPE). Note there are no values of style (A,B) in the language. Tuple values are values of type A * B (a single type) not a pair of types. This means you can get full polymorphism in Rust by introducing a proper kinding system .. but that is quite hard I think. And users will not only have to write polymorphic functions in the library .. they'll have to write functors as well. > > There has been work on making polymorphism more expressive in languages with multiple arity functions: > > http://www.ccs.neu.edu/racket/pubs/esop09-sthf.pdf > > And I think C++0x has some similar functionality. Yes it does. But it is still incomplete and a hack: look at all that complexity to do what is simple in ML. Rather it isn't simple .. its non-existent. There's no machinery required to make it work. The exception is tuples because there are N-ary functors. The point is there's just one of these nasty things in the language. > I think it's worth cutting to the chase and recognizing that we are not aiming for perfection. We're looking for a good trade-off between many competing constraints. Of course. > That requires taste, and it requires compromise. Reasonable people can disagree about some of the individual decisions we make, but I don't agree that this one constitutes an insurmountable failure. What becomes insurmountable is when you want to change something and you CANT because you have too many users. C++ is a good example of this. They screwed up a few things quite badly in ISO C++ and 201x can't fix them now. Its too late. That's basically why I quit the committee: beyond recovery. Sure C++ is useable .. but it could have been a lot better. -- john skaller skaller at users.sourceforge.net From pwalton at mozilla.com Mon May 16 08:44:40 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Mon, 16 May 2011 08:44:40 -0700 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD0D077.6000003@mozilla.com> References: <4DD04796.4080708@mozilla.com> <4DD0D077.6000003@mozilla.com> Message-ID: <4DD14668.40907@mozilla.com> On 5/16/11 12:21 AM, Graydon Hoare wrote: > Two? It should only be one. Range calls block, block runs, range does += > 1, range calls block again. It's also not exactly a surprising indirect > function; it keeps calling the same one, from the same place in the code > + stack. IOW it'll *probably* be well predicted. That's true, sorry. > It is definitely going to inhibit some optimizations, it's a question of > asking which ones and how high the costs are. Measure? Well, I'm not sure what kind of measurements we can usefully do to demonstrate this. I could certainly write a loop microbenchmark, but that won't tell us much. This doesn't hurt us much in rustc because (a) rustc is mostly tree-traversal, not hot inner loops; (b) we're using while loops in rustc anyway. I suspect this will be the kind of problem we will see in things like video decoders... but that's a ways off :) > I think you mean more than "static linking", since LLVM is not a linker; > you mean static linking where the .a file actually contains LLVM > bitcode, a la the LLVM-LTO approach. That's plausible. It's *different > again* from actual static linking. And it's different from pickled ASTs, > and different from macros. So there are like ... 4 points on the > spectrum here in addition to our existing compilation strategy: Yes, I should have been more clear when I referred to static linking. I think traditional static linking is mostly obsolete with the advent of LLVM bitcode; the only reason one would want it is if the object files aren't in LLVM bitcode format (not a problem we have in this case), or for compilation speed (which is a fair point). > - We want a macro system anyways. So let's build that soon. Agreed :) > - You only have an inlining problem *when* you're dealing with > separate crates. Granted, uint::range is "standard library fodder", > but if you have some custom data structure X with iters on X > implemented in the same crate as the iter-callers, LLVM may well > inline the iter already. And uint::range is one of those cases > that will fall into "supported by a scalar-range special case loop", > so let's not over-generalize the problem too quickly. Right, that's one of the approaches that I was considering (but left out for the sake of email brevity). The issue is that I suspect a fair number of folks will end up writing Rust in a functional style and use functions like map, filter, and possibly reduce/fold liberally. I worry about them getting burned. We could of course add language support for those constructs too a la Python's list comprehensions, but that increases the complexity budget. > The reason is simply this: if "standard wisdom" is to use LLVM-LTO > against the standard library to get "adequate performance", you get two > big follow-on problems: Sorry, I should have been more clear. What I'm proposing is to segment the standard library into two parts: a small, performance-critical part that benefits from LTO and a larger, dynamically-linked part. This is what I was getting at by defending dynamic linking; there are parts of the standard library that we would definitely like to be able to share among processes, to be able to patch for security issues, and to avoid recompiling. But there would be a small core that would be shipped as a .bc file and linked at LLVM optimization time into Rust binaries. This allows us to alter the dividing line between static and dynamic over time. Patrick From dherman at mozilla.com Mon May 16 09:02:54 2011 From: dherman at mozilla.com (David Herman) Date: Mon, 16 May 2011 09:02:54 -0700 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> <40D8549A-31EF-4CF7-A18B-6876163688FA@mozilla.com> Message-ID: <241DA11A-F132-4E77-AA47-F3CEC75DA126@mozilla.com> > Yes it does. But it is still incomplete and a hack: look at all that complexity to do what is > simple in ML. Rather it isn't simple .. its non-existent. There's no machinery required > to make it work. That doesn't really bother me. ML has a different set of trade-offs. It's a given that we will do some things worse than ML, and some things better. > What becomes insurmountable is when you want to change something and you CANT > because you have too many users. C++ is a good example of this. They screwed up > a few things quite badly in ISO C++ and 201x can't fix them now. Its too late. > That's basically why I quit the committee: beyond recovery. Sure C++ is useable .. > but it could have been a lot better. Well, this is an inevitability. Every successful project ends up with decisions the designers wish they could take back but can't. Which is not to say we shouldn't try to avoid making mistakes, so I should say I appreciate hearing your concern about multiple arity functions. Anyway, I think I probably took this conversation too meta, so I think I'll just step back. I don't think I buy that the expressivity of unary-only is vital, but I'm not an experienced enough implementor to know what the performance cost would be. And I think that's the key trade-off. Dave From graydon at mozilla.com Mon May 16 09:10:49 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 16 May 2011 09:10:49 -0700 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD14668.40907@mozilla.com> References: <4DD04796.4080708@mozilla.com> <4DD0D077.6000003@mozilla.com> <4DD14668.40907@mozilla.com> Message-ID: <4DD14C89.5030605@mozilla.com> On 16/05/2011 8:44 AM, Patrick Walton wrote: > Sorry, I should have been more clear. What I'm proposing is to segment > the standard library into two parts: a small, performance-critical part > that benefits from LTO and a larger, dynamically-linked part. This is > what I was getting at by defending dynamic linking; there are parts of > the standard library that we would definitely like to be able to share > among processes, to be able to patch for security issues, and to avoid > recompiling. But there would be a small core that would be shipped as a > .bc file and linked at LLVM optimization time into Rust binaries. This > allows us to alter the dividing line between static and dynamic over time. Ok. Concrete proposal: we put an LLVM bitcode section in the resulting crate when compiling in shared mode, and use keyword 'inline'. Stick it on a fn-or-iter, that fn-or-iter gets emitted in the .bc section of the resulting crate and copied into the crates that use you rather than externally referenced. Good enough? (also realize this ties out compilation artifacts to particular llvm bitcode format changes; it is not actually a "stable" format) I don't really want to get into multiple compilation artifacts if we can avoid it. -Graydon From respindola at mozilla.com Mon May 16 09:36:04 2011 From: respindola at mozilla.com (Rafael Avila de Espindola) Date: Mon, 16 May 2011 12:36:04 -0400 Subject: [rust-dev] Statically linked library crates In-Reply-To: <4DD14C89.5030605@mozilla.com> References: <4DD04796.4080708@mozilla.com> <4DD0D077.6000003@mozilla.com> <4DD14668.40907@mozilla.com> <4DD14C89.5030605@mozilla.com> Message-ID: <4DD15274.8040207@mozilla.com> > Ok. Concrete proposal: we put an LLVM bitcode section in the resulting > crate when compiling in shared mode, and use keyword 'inline'. Stick it > on a fn-or-iter, that fn-or-iter gets emitted in the .bc section of the > resulting crate and copied into the crates that use you rather than > externally referenced. Good enough? I like it. It might also be a good idea to still emit an external copy so that llvm has the option (and not the obligation) to inline it. At some point we might want to have a static crate format so that we can put it in there too, but that in an independent issue. > (also realize this ties out compilation artifacts to particular llvm > bitcode format changes; it is not actually a "stable" format) > > I don't really want to get into multiple compilation artifacts if we can > avoid it. > > -Graydon Cheers, Rafael From graydon at mozilla.com Mon May 16 09:54:52 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 16 May 2011 09:54:52 -0700 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: <3798AFCE-52D4-49F1-868A-C68010FD7D01@users.sourceforge.net> References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> <3798AFCE-52D4-49F1-868A-C68010FD7D01@users.sourceforge.net> Message-ID: <4DD156DC.10300@mozilla.com> On 16/05/2011 8:13 AM, john skaller wrote: > Sure just look at "binder1" and "binder2" in the C++ standard library and > wonder where "binder3" and "binder4" have gone. They haven't gone, > they're not defined. You need an infinite number of them. If you have only > one argument functions .. you only need one of them for completeness. Here we're getting into (almost) a concrete argument. But I think you're overdoing it. binder1st and binder2nd are the only ones defined for a simple reason: they cover all (or most) cases. Seeing why takes a momentary digression. In any conversation about polymorphism, I think there are two very different social cases you want to address: (A) you wrote datatype X and function F and want to use some generic like tree[X] or hashmap[K,X] and want to map/fold/process F over the generic using higher order programming. Fine. Since you wrote both X and F, you make sure F has a single domain/range, be it a scalar, a record, or a tuple -- rust has those too -- and you're done. You wrote both so you have control over args. F is always 1-arg, and plays nice with your higher order fns. (B) Serendipity. You are already working with a container of type (say) tree[(a,b,c,d,e)] and you wish to munge the tree through some independently written function of 5 args -- a function you did not write and have no control over -- and then you realize you can save yourself time by munging the 5-tuple through the 5-arg function using tree::fold. But oh! Only if the 5-arg function is written to take a 5-tuple, not 5 separate args. I *think* I am hearing you argue that this sub-case of B is what you're worried about. That we have to always encode 5-arg functions as taking 5-tuples so that serendipity can have its day and you can reuse tree::fold on 5-tuples and independently-written 5-arg functions. My counterargument is simple and it's what I meant above by binder2nd covering all cases: the odds against an *independently written* 5-arg function taking the exact same 5 args (not 4, not 6, and not 5 in some other order) are ... low. That as N increases from 1 the odds against a serendipitous match between any two N-ary functions and argument tuples matching up decreases geometrically. So much so that beyond "binary function" is just never even *happens*, serendipitously, in the field. This is why, I think, most FP toolkits max out at map2 and don't bother writing a map3, map4, map5 and such. You'd only have a match between a 5-arg function and a randomly chosen 5-tuple if you wrote both the function and the typedef. And then, yes, you'd just write your functions as taking 1 arg (a tuple, a record, whatever) as you say. But that's not a serendipity case, and it's easy to make work manually (or at worst with the occasional adaptor function). Am I misunderstanding? -Graydon From graydon at mozilla.com Mon May 16 09:59:11 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 16 May 2011 09:59:11 -0700 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: <4DD156DC.10300@mozilla.com> References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> <3798AFCE-52D4-49F1-868A-C68010FD7D01@users.sourceforge.net> <4DD156DC.10300@mozilla.com> Message-ID: <4DD157DF.7040008@mozilla.com> On 16/05/2011 9:54 AM, Graydon Hoare wrote: > My counterargument is simple and it's what I meant above by binder2nd > covering all cases: the odds against an *independently written* 5-arg > function taking the exact same 5 args (not 4, not 6, and not 5 in some > other order) are ... low. That as N increases from 1 the odds against a > serendipitous match between any two N-ary functions and argument tuples > matching up decreases geometrically. So much so that beyond "binary > function" is just never even *happens*, serendipitously, in the field. As a further counterexample: N-arg functions in *ML* don't even always wind up encoded as N-tuples. They very commonly write in curried form so that you have (X -> Y -> Z) rather than (X*Y) -> Z. And even this little wrinkle tends to defeat parsimonious combinations. It gets worse as soon as you add mutability. And we have 4 or more kinds of pointer... -Graydon From paul.stansifer at gmail.com Mon May 16 10:25:37 2011 From: paul.stansifer at gmail.com (Paul Stansifer) Date: Mon, 16 May 2011 10:25:37 -0700 Subject: [rust-dev] Statically linked library crates In-Reply-To: <3AABB042-0CFA-472D-A7F9-C978E995DFA8@users.sourceforge.net> References: <4DD04796.4080708@mozilla.com> <4DD0D077.6000003@mozilla.com> <3AABB042-0CFA-472D-A7F9-C978E995DFA8@users.sourceforge.net> Message-ID: > > - Powerful macros that cover the "needs to be efficient" cases as well > > as the various other macro scenarios. Definitely front-end! > > I do not understand the need for macros. Could someone explain? > If you have polymorphic functions you basically don't need macros. > But perhaps I misunderstand what a macro is in Rust. Scheme has a very powerful macro system and it doesn't even have static types. Macros abstract over whatever second-class features a language has. In a non-polymorphic language, this includes types, but even the lambda calculus has a second-class feature: binding. If you want to create a new kind of 'let' or 'case' or 'lambda' construct (and Scheme people love doing that kind of thing), you need macros. The surface syntax of the language is also second-class, so macros can provide nicer surface syntaxes (for things like printf and regexpes, for example). Macros are ideal for embedding domain-specific languages. Macros could be used to create what are effectively inline functions, but if that's all the macro does, I'm really uncomfortable with the thought, because we don't want to encourage that kind of behavior. I have this terrible vision of programmers saying "Oh, use macros instead of functions, they're faster.", and then writing code that is (a) terrible, since it uses macros as if they were functions, and (b) slow, since the huge hunks of code produced don't fit in the instruction cache. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From graydon at mozilla.com Mon May 16 10:31:13 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 16 May 2011 10:31:13 -0700 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> Message-ID: <4DD15F61.7020404@mozilla.com> On 16/05/2011 6:35 AM, john skaller wrote: > I'm sure it isn't but without meaning to be rude real programmers have no idea what they're > talking about most of the time so that isn't much of an argument. > ... > real programmers would then try to invent the Perpetual Motion Machine > ... > ridiculous attempts to support unsustainable religious beliefs > ... > Of course they keep producing code that doesn't work. If this is how you "don't mean to be rude", you need to work on social graces. This is intensely rude. If you aren't willing to treat people with respect, to try to understand their concerns and adapt, you are not welcome. I'll let you follow up to the messages in flight, if you care to, but after that I'm going to have to ask you to leave. Sorry. I gave you a fair warning about our conduct guidelines and your behavior suggests you either don't want to, or can't, behave in a respectful manner. -Graydon From gal at mozilla.com Mon May 16 10:37:29 2011 From: gal at mozilla.com (Andreas Gal) Date: Mon, 16 May 2011 10:37:29 -0700 Subject: [rust-dev] Three level memory system In-Reply-To: References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> <4DD0C738.4050509@mozilla.com> Message-ID: On May 16, 2011, at 8:30 AM, Tim Chevalier wrote: > On Sun, May 15, 2011 at 11:42 PM, Graydon Hoare wrote: >> This statement is highly debatable. There are several kinds of GC and >> several kinds of RC, and they both have different cache behavior (GC trashes >> the whole cache regularly and doubles heap pressure; RC hurts cache >> coherence *if* you're doing shared memory multiprocessing but otherwise >> tends to cost in code overhead..) > > I assume you're just referring to copying GC, given the assertion > about doubling heap pressure? There are other types of GC, and even > assuming copying, generational collectors have been used to *improve* > cache behavior: for example, Chilimbi and Larus, "Using Generational > Garbage Collection To Implement Cache-Conscious Data Placement" (ISMM > '98). The paper does not support your claim. It shows that some copying GCs have really terrible cache characteristics (Cheney), and then proposes a new algorithm that improves that for some workloads over the known terrible ones. The paper doesn't analyze whether the old GC algorithm or the new GC algorithms are more cache efficient than reference counting. Andreas > > Cheers, > Tim > > -- > Tim Chevalier * http://cs.pdx.edu/~tjc/ * Often in error, never in doubt > "an intelligent person fights for lost causes,realizing that others > are merely effects" -- E.E. Cummings > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev From catamorphism at gmail.com Mon May 16 10:40:56 2011 From: catamorphism at gmail.com (Tim Chevalier) Date: Mon, 16 May 2011 10:40:56 -0700 Subject: [rust-dev] Three level memory system In-Reply-To: References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> <4DD0C738.4050509@mozilla.com> Message-ID: On Mon, May 16, 2011 at 10:37 AM, Andreas Gal wrote: > > On May 16, 2011, at 8:30 AM, Tim Chevalier wrote: > >> On Sun, May 15, 2011 at 11:42 PM, Graydon Hoare wrote: >>> This statement is highly debatable. There are several kinds of GC and >>> several kinds of RC, and they both have different cache behavior (GC trashes >>> the whole cache regularly and doubles heap pressure; RC hurts cache >>> coherence *if* you're doing shared memory multiprocessing but otherwise >>> tends to cost in code overhead..) >> >> I assume you're just referring to copying GC, given the assertion >> about doubling heap pressure? There are other types of GC, and even >> assuming copying, generational collectors have been used to *improve* >> cache behavior: for example, Chilimbi and Larus, "Using Generational >> Garbage Collection To Implement Cache-Conscious Data Placement" (ISMM >> '98). > > The paper does not support your claim. It shows that some copying GCs have really terrible cache characteristics (Cheney), and then proposes a new algorithm that improves that for some workloads over the known terrible ones. The paper doesn't analyze whether the old GC algorithm or the new GC algorithms are more cache efficient than reference counting. > Ah. I wasn't claiming anything about the relationship about GC and ref-counting. But I found Graydon's assertion that "GC trashes the whole cache regularly" to be rather strong, as there are many kinds of GC, not just copying. Cheers, Tim -- Tim Chevalier * http://cs.pdx.edu/~tjc/ * Often in error, never in doubt "an intelligent person fights for lost causes,realizing that others are merely effects" -- E.E. Cummings From graydon at mozilla.com Mon May 16 11:49:09 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 16 May 2011 11:49:09 -0700 Subject: [rust-dev] Three level memory system In-Reply-To: References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> <4DD0C738.4050509@mozilla.com> Message-ID: <4DD171A5.5010703@mozilla.com> On 16/05/2011 10:40 AM, Tim Chevalier wrote: > Ah. I wasn't claiming anything about the relationship about GC and > ref-counting. But I found Graydon's assertion that "GC trashes the > whole cache regularly" to be rather strong, as there are many kinds of > GC, not just copying. True. I was too simplistic, apologies. In truth, automatic memory management has dozens of dimensions and I was only commenting on, er, three of them them, and somewhat generally. - When the scheme is based on an algorithm that is effectively based on some O(fraction-of(live data)) step, average-case latency grows as heap size grows. - Not only latency, but cache pressure: visiting O(live data) tends to flush everything hot in favour of live-but-cold stuff, as heaps grow. - When the scheme is based on an algorithm that defers a lot of reclamation, there will be heap overhead. Copying definitely doubles, but it's not uncommon for mark-sweep collectors to carry similar orders of "dead but not yet reclaimed" heap overheads. It's true that generational collectors, concurrent collectors, compacting collectors, hybrid-RC collectors, arena collectors, and all manner of other systems can mitigate the constant factors in these considerations, and I certainly don't mean to imply they never work. Just serve a bit of an antidote to the "GC always wins" position I sometimes hear. It sometimes does, sometimes doesn't, and often requires careful analysis of domain requirements. I've really not seen any single answer for the "best" storage management system; automatic or manual. Apologies if it sounds like I'm proposing RC as that champion. It's surely got flaws (code footprint, cache write traffic and acyclicality; nontrivial issues!) My experience is merely that it's predictable, has low latency and heap overhead, scales well with large heaps, and hybridizes acceptably with other schemes such as the mark/sweep we'll probably be using for cyclic data. And it's simple! There's not much to tune so you don't spend much time "tuning the GC". There's really only one knob: how many rc operations the compiler can elide by static analysis. Makes it comparatively easy to reason about. Anyway, if this all falls apart in practice of course we'll revisit. I'm amenable to data much more than hypotheticals, even my own :) We have complete type information and are going to be building a cycle-scanning GC anyways. It's just a matter of trying to do better than it, when we think we can. If we never can, I'll give up. -Graydon From catamorphism at gmail.com Mon May 16 11:59:25 2011 From: catamorphism at gmail.com (Tim Chevalier) Date: Mon, 16 May 2011 11:59:25 -0700 Subject: [rust-dev] Three level memory system In-Reply-To: <4DD171A5.5010703@mozilla.com> References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> <4DD0C738.4050509@mozilla.com> <4DD171A5.5010703@mozilla.com> Message-ID: On Mon, May 16, 2011 at 11:49 AM, Graydon Hoare wrote: > I've really not seen any single answer for the "best" storage management > system; automatic or manual. Apologies if it sounds like I'm proposing RC as > that champion. It's surely got flaws (code footprint, cache write traffic > and acyclicality; nontrivial issues!) My experience is merely that it's > predictable, has low latency and heap overhead, scales well with large > heaps, and hybridizes acceptably with other schemes such as the mark/sweep > we'll probably be using for cyclic data. And it's simple! There's not much > to tune so you don't spend much time "tuning the GC". There's really only > one knob: how many rc operations the compiler can elide by static analysis. > Makes it comparatively easy to reason about. > There's only one thing I've noticed about RC (never having worked with a system that used it before): it seems to be a source of compiler bugs. Just in writing code that seems to be a bit more functional and deeply-nested than some of the existing code in rustc, I've found two bugs involving the compiler forgetting to emit drop code in the right place. With GC, the compiler typically has to do some work too -- that is, specifying data layout -- but we'll have to do that anyway. This would be a great exercise for a compiler verification framework, but since we're not proving the compiler correct, the potential for refcounting bugs makes me a bit nervous. Cheers, Tim -- Tim Chevalier * http://cs.pdx.edu/~tjc/ * Often in error, never in doubt "an intelligent person fights for lost causes,realizing that others are merely effects" -- E.E. Cummings From graydon at mozilla.com Mon May 16 12:02:25 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Mon, 16 May 2011 12:02:25 -0700 Subject: [rust-dev] Three level memory system In-Reply-To: References: <4DCEB2F4.1050804@mozilla.com> <82BD3FCA-2182-4AA3-B608-7A089D9604D7@users.sourceforge.net> <02DAEA8E-5A83-406E-8E63-0D8376A74086@mozilla.com> <58AAA640-71F5-40BF-8A44-8BE004F88AE6@users.sourceforge.net> <4DD0C738.4050509@mozilla.com> <4DD171A5.5010703@mozilla.com> Message-ID: <4DD174C1.8030500@mozilla.com> On 16/05/2011 11:59 AM, Tim Chevalier wrote: > There's only one thing I've noticed about RC (never having worked with > a system that used it before): it seems to be a source of compiler > bugs. Just in writing code that seems to be a bit more functional and > deeply-nested than some of the existing code in rustc, I've found two > bugs involving the compiler forgetting to emit drop code in the right > place. With GC, the compiler typically has to do some work too -- that > is, specifying data layout -- but we'll have to do that anyway. True enough. Though the interaction is there in compiler-supported GC systems too: you have to get all the transient pointers identified, and make sure they're all flushed back to findable places at "safe points" when the GC might kick in. At least if it's precise rather than conservative. But yeah. It's delicate. Whole thing's delicate :( -Graydon From skaller at users.sourceforge.net Mon May 16 17:56:18 2011 From: skaller at users.sourceforge.net (john skaller) Date: Tue, 17 May 2011 10:56:18 +1000 Subject: [rust-dev] polymorphism In-Reply-To: <4DD156DC.10300@mozilla.com> References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> <3798AFCE-52D4-49F1-868A-C68010FD7D01@users.sourceforge.net> <4DD156DC.10300@mozilla.com> Message-ID: <3CF6ABF4-4A78-447F-A07C-FC4CA172332B@users.sourceforge.net> On 17/05/2011, at 2:54 AM, Graydon Hoare wrote: BTW: I'm continuing on with this analysis because I think Rust is really good. If it were just "crap" like Java I wouldn't bother. This is a pretty long post because there are real examples in it which show how hard it is to handle just one canonical family of multi-argument functions, namely the tuple constructor. The executive summary is: if it is this hard to handle just one family the handling of a language where *all* functions are multi-argument is probably untenable with very high level kinding. > On 16/05/2011 8:13 AM, john skaller wrote: > >> Sure just look at "binder1" and "binder2" in the C++ standard library and >> wonder where "binder3" and "binder4" have gone. They haven't gone, >> they're not defined. You need an infinite number of them. If you have only >> one argument functions .. you only need one of them for completeness. > > Here we're getting into (almost) a concrete argument. But I think you're overdoing it. > > Am I misunderstanding? I think you understand it, but you mightn't quite appreciate just how bad it can get in a combinatorial situation. It isn't just a matter of 1, 2, 3 ... oh, we need 5, well we can always write out the 5 case. That's true, its rare, and the 5 case for one function can be coded in a few (boring) minutes for a simple function. A bit longer for a more complex function like say STL sort (which is polymophic). Before continuing a caveat: usually you would do the interfacing by using an adaptor like binder2 etc. but in Rust that may not be possible without losing the alias property .. if it were possible you wouldn't have the constraint that you *need* multi-args to support aliases to support down-stack pointers to support stack protocol memory management. I'm not sure about this. But this is not the real problem; that comes when you have higher order stuff and you're *combining* these N-ary things in more complex ways, and then the 1.. N argument problem becomes a case not of N functions or even N * M function but more like 2^N functions, which is, of course untenable. It's hard to illustrate because any illustration requires a massively large code example, and these examples don't exist because .. well .. doing them is untenable :) So I will actually give an illustration which is only lower order to give a feel for it. As I mentioned I have this problem in Felix with tuples because they're the canonical multi-arg function in ML and Felix. In Felix, I have Haskell style type classes (which, by the way, seem useful and could be put into Rust). Here are some of them: // equality typeclass Eq[t] { virtual fun eq: t * t -> bool; virtual fun ne (x:t,y:t):bool => not (eq (x,y)); axiom reflex(x:t): eq(x,x); axiom sym(x:t, y:t): eq(x,y) == eq(y,x); axiom trans(x:t, y:t, z:t): implies(eq(x,y) and eq(y,z), eq(x,z)); } // total order typeclass Tord[t]{ inherit Eq[t]; virtual fun lt: t * t -> bool; virtual fun gt(x:t,y:t):bool =>lt(y,x); virtual fun le(x:t,y:t):bool => not (gt(x,y)); virtual fun ge(x:t,y:t):bool => not (lt(x,y)); virtual fun max(x:t,y:t):t=> if lt(x,y) then y else x endif; virtual fun min(x:t,y:t):t => if lt(x,y) then x else y endif; axiom trans(x:t, y:t, z:t): implies(lt(x,y) and lt(y,z), lt(x,z)); axiom antisym(x:t, y:t): lt(x,y) or lt(y,x) or eq(x,y); axiom reflex(x:t, y:t): implies(le(x,y) and le(y,x), eq(x,y)); } // printable string rep of data type typeclass Str [T] { virtual fun str: T -> string; } and now I'm going to bore you with the implementation for tuples. Note that in Felix an array T ^ N is just a tuple with N components where N is a type (NOT an integer but instead a sum of units) and there's a conflict when you do things for arrays and tuples, you get a duplicate function error if you define both in general, so you have to resolve the conflict between T * U and T ^ 2 with U set to T by coding the overridding specialisation for T * T explicitly. So here are the specialisations of just three functions for 2 to 5 component tuples, that provide: (a) equality of tuples (b) lexicographic ordering of tuples (c) stringisation of tuples to support things like: println (1,2,3,4,5); which is done by calling str function and implementing print for just strings. It took many hours to write out and check these things, because there were some silly typos when I did it and the compiler error messages were .. well .. less than helpful :) /////////////////////////////////////////////// // two components // string instance[T,U] Str[T*U] { fun str (t:T, u:U) => "("+str t + ", " + str u+")"; } instance[T] Str[T*T] { fun str (t1:T, t2:T) => "("+str t1 + ", " + str t2+")"; } // equality instance[t,u with Eq[t],Eq[u]] Eq[t*u] { fun eq: (t * u) * (t * u) -> bool = | (?x1,?y1),(?x2,?y2) => x1==x2 and y1 == y2 ; } instance[t with Eq[t]] Eq[t*t] { fun eq: (t * t) * (t * t) -> bool = | (?x1,?y1),(?x2,?y2) => x1==x2 and y1 == y2 ; } // total order instance[t,u with Tord[t],Tord[u]] Tord[t*u] { fun lt: (t * u) * (t * u) -> bool = | (?x1,?y1),(?x2,?y2) => x1 <= x2 and y1 < y2 ; } instance[t with Tord[t]] Tord[t*t] { fun lt: (t * t) * (t * t) -> bool = | (?x1,?y1),(?x2,?y2) => x1 <= x2 and y1 < y2 ; } open[t,u with Tord[t], Tord[u]] Tord[t*u]; //--------------------------------------------------------------------------------- // three components // string instance[T,U,V] Str[T*U*V] { fun str (t:T, u:U, v:V) => "("+str t + ", " + str u + ", " + str v+")"; } instance[T] Str[T*T*T] { fun str (t1:T, t2:T, t3:T) => "("+str t1 + ", " + str t2 + ", " + str t3+")"; } // equality instance[t,u,v with Eq[t], Eq[u], Eq[v]] Eq[t*u*v] { fun eq: (t * u * v) * (t * u * v) -> bool = | (?x1,?y1,?z1),(?x2,?y2,?z2) => x1==x2 and y1 == y2 and z1 == z2 ; } instance[t with Eq[t]] Eq[t*t*t] { fun eq: (t * t * t) * (t * t * t) -> bool = | (?x1,?y1,?z1),(?x2,?y2,?z2) => x1==x2 and y1 == y2 and z1 == z2 ; } // total order instance[t,u,v with Tord[t],Tord[u],Tord[v]] Tord[t*u*v] { fun lt: (t * u * v) * (t * u * v) -> bool = | (?x1,?y1,?z1),(?x2,?y2,?z2) => x1 <= x2 and y1 <= y2 and z1 < z2 ; } instance[t with Tord[t]] Tord[t*t*t] { fun lt: (t * t * t) * (t * t * t) -> bool = | (?x1,?y1,?z1),(?x2,?y2,?z2) => x1 <= x2 and y1 <= y2 and z1 < z2 ; } open[t,u,v with Tord[t], Tord[u], Tord[v]] Tord[t*u*v]; //--------------------------------------------------------------------------------- // four components // string instance[T,U,V,W] Str[T*U*V*W] { fun str (t:T, u:U, v:V, w:W) => "("+str t + ", " + str u + ", " + str v + ", " + str w+")"; } instance[T] Str[T*T*T*T] { fun str (t1:T, t2:T, t3:T, t4:T) => "("+str t1 + ", " + str t2 + ", " + str t3 + ", " + str t4+")"; } // equality instance[t,u,v,w with Eq[t], Eq[u], Eq[v], Eq[w]] Eq[t*u*v*w] { fun eq: (t * u * v * w) * (t * u * v * w) -> bool = | (?x1,?y1,?z1,?a1),(?x2,?y2,?z2,?a2) => x1==x2 and y1 == y2 and z1 == z2 and a1 == a2 ; } instance[t with Eq[t]] Eq[t*t*t*t] { fun eq: (t * t * t * t) * (t * t * t * t) -> bool = | (?x1,?y1,?z1,?a1),(?x2,?y2,?z2,?a2) => x1==x2 and y1 == y2 and z1 == z2 and a1 == a2 ; } // total order instance[t,u,v,w with Tord[t],Tord[u],Tord[v],Tord[w]] Tord[t*u*v*w] { fun lt: (t * u * v *w) * (t * u * v * w) -> bool = | (?x1,?y1,?z1,?a1),(?x2,?y2,?z2,?a2) => x1 <= x2 and y1 <= y2 and z1 <= z2 and a1 < a2 ; } instance[t with Tord[t]] Tord[t*t*t*t] { fun lt: (t * t * t * t) * (t * t * t * t) -> bool = | (?x1,?y1,?z1,?a1),(?x2,?y2,?z2,?a2) => x1 <= x2 and y1 <= y2 and z1 <= z2 and a1 < a2 ; } open[t,u,v,w with Tord[t], Tord[u], Tord[v], Tord[w]] Tord[t*u*v*w]; //--------------------------------------------------------------------------------- // five components // string instance[T,U,V,W,X] Str[T*U*V*W*X] { fun str (t:T, u:U, v:V, w:W, x:X) => "("+str t + ", " + str u + ", " + str v + ", " + str w+", "+ str x+")" ; } instance[T] Str[T*T*T*T*T] { fun str (t1:T, t2:T, t3:T, t4:T, t5:T) => "("+str t1 + ", " + str t2 + ", " + str t3 + ", " + str t4+", "+str t5+")" ; } // equality instance[t,u,v,w,x with Eq[t], Eq[u], Eq[v], Eq[w], Eq[x]] Eq[t*u*v*w*x] { fun eq: (t * u * v * w *x) * (t * u * v * w * x) -> bool = | (?x1,?y1,?z1,?a1,?b1),(?x2,?y2,?z2,?a2,?b2) => x1==x2 and y1 == y2 and z1 == z2 and a1 == a2 and b1 == b2; ; } instance[t with Eq[t]] Eq[t*t*t*t*t] { fun eq: (t * t * t * t *t) * (t * t * t * t * t) -> bool = | (?x1,?y1,?z1,?a1,?b1),(?x2,?y2,?z2,?a2,?b2) => x1==x2 and y1 == y2 and z1 == z2 and a1 == a2 and b1 == b2; ; } // total order instance[t,u,v,w,x with Tord[t],Tord[u],Tord[v],Tord[w],Tord[x]] Tord[t*u*v*w*x] { fun lt: (t * u * v * w * x) * (t * u * v * w * x) -> bool = | (?x1,?y1,?z1,?a1,?b1),(?x2,?y2,?z2,?a2,?b2) => x1 <= x2 and y1 <= y2 and z1 <= z2 and a1 <= a2 and b1 < b2 ; } instance[t with Tord[t]] Tord[t*t*t*t*t] { fun lt: (t * t * t * t *t) * (t * t * t * t * t) -> bool = | (?x1,?y1,?z1,?a1,?b1),(?x2,?y2,?z2,?a2,?b2) => x1 <= x2 and y1 <= y2 and z1 <= z2 and a1 <= a2 and b1 < b2 ; } open[t,u,v,w,x with Tord[t], Tord[u], Tord[v], Tord[w], Tord[x]] Tord[t*u*v*w*x]; //////////////////////////// Now here is another module which also illustrates some problems: ////////////// open module Categ { // note: in Felix, products are uniquely decomposable, but arrows // are not. So we cannot overload based on arrow factorisation. // for example, the curry functions can be overloaded but // the uncurry functions cannot be // Note: Felix is not powerful enough to generalise these // operation in user code, i.e. polyadic programming // change star into arrow fun curry[u,v,r] (f:u*v->r) => fun (x:u) (y:v) => f (x,y); fun curry[u,v,w,r] (f:u*v*w->r) => fun (x:u) (y:v) (z:w) => f (x,y,z); // change arrow into star fun uncurry2[u,v,r] (f:u->v->r) => fun (x:u,y:v) => f x y; fun uncurry3[u,v,w,r] (f:u->v->w->r) => fun (x:u,y:v,z:w) => f x y z; // argument order permutation fun twist[u,v,r] (f:u*v->r) => fun (x:v,y:u) => f (y,x); // projection fun proj1[u1,u2,r1,r2] (f:u1*u2->r1*r2) => fun (x:u1*u2) => match f x with | ?a,_ => a endmatch; fun proj2[u1,u2,r1,r2] (f:u1*u2->r1*r2) => fun (x:u1*u2) => match f x with | _,?b => b endmatch; // parallel composition fun ravel[u1,u2,r1,r2] (f1:u1->r1,f2:u2->r2) => fun (x1:u1,x2:u2) => f1 x1, f2 x2; // series composition fun compose[u,v,w] (f:v->w, g:u->v) => fun (x:u) => f (g x); fun rev_compose[u,v,w] (f:u->v, g:v->w) => fun (x:u) => g (f x); } ////////////////////// You can see in this module products causing problems again, needing to write uncurry2 and uncurry3 out, same for many of these functions. The pain here is that the Ocaml representation (in the compiler) of a product is just a list, so it is trivial to implement these kinds of functions *in the compiler* with recursion, maps, or folds. The operations are categorical, meaning they're the building blocks of higher order operations: they're fundamental to allow easy construction of powerful things, meaning, to support code reuse by avoiding duplication of code. So you see my pain: in Felix, I cannot handle tuples properly. I cannot provide a way the USER can put things into the library I can do easily in the compiler, but clearly I don't want to change the compiler for every new function that should have gone into the library. The point here is: this is "bleeding edge", this problem HAS to be solved to get anywhere. The problem arises because tuples (and anonymous sums which are their duals) are not associative so can't be trivially decomposed by using a fold and a binary function. This may look esoteric, but it is impacts the "average programmer" and a large fraction of the basic tutorial examples: println$ "Hello World on ", gethostname(); Woops, printing a tuple. people *expect* that to work so I HAVE to support it. The point of all this: I have a problem with ONE multi-argument function, namely the tuple constructor, which by itself is so bad it threatens the whole Felix project unless I can provide enough higher order polymorphism to fix the problem of having to write out all the cases ... in rust ALL the functions are multi-arg. So the problem occurs not just for one function (after which the type system factors it away) but for all of them. You can see why this looks like a major backward step in code reuse to me! Just look at the pain for supporting just tuples! For interest, in Ocaml functors don't combine properly either. Just to to make an Ocaml functor for a Ring (a set with addition and multiplication) by composing it out of two functors on the same data type which provide the addition group and the multiplication semi-group. It can be done but it is really contorted, I had to get an expert to show me how to do it. Safety is all about reusability. As Graydon mentioned elsewhere the line between statically assured correctness and run time checks for dynamics is fuzzy, meaning you will *always* have to have some dynamics, and in those cases where you have to actually test code to gain faith in the correctness of the code it is even more important to have reusability because of the very high price of testing: you don't want to have to test multiple implementations when you could have had a single one. In Rust the problem might go away if "aliases" were put into the type system properly. Eg if you have a proper "weak pointer" type which could be used anywhere. If that could be done you could get rid of multi-argument functions by using tuples without losing the down stack pointer assurance. However when I think about this I don't see how it can work, but I don't understand Rust well enough. A simple "weak pointer" type probably doesn't work: if you have a tuple containing weak pointer and copy it, the copy may persist if it is returned from a function and this causes the stack frame containing the data pointed to by the weak pointer to be popped off the stack. What's actually needed is to take the weak pointer as a type and replace the current artificial, non-type constraint (aliases can only be used in function args) with a *type based* constraint, i.e. make the assurance directly part of the type system. I do NOT know how to do this. However .. if you provide curried functions clearly you can have multiple arguments in the currified style and apply the "alias" property to those arguments you want to (because those arguments are separate). This is the same non-type constraint but now we have a standard type system again. Again I don't know if this would even work (from a type theoretic viewpoint) and I have no idea what the impact would be on the memory model (so this is just an observation that there could be many possible solutions). Anyhow, we have probably talked enough about this: pragmatically Rust is going to go ahead as it is, otherwise it will never get released and that would be a great pity! -- john skaller skaller at users.sourceforge.net From skaller at users.sourceforge.net Mon May 16 18:18:00 2011 From: skaller at users.sourceforge.net (john skaller) Date: Tue, 17 May 2011 11:18:00 +1000 Subject: [rust-dev] Statically linked library crates In-Reply-To: References: <4DD04796.4080708@mozilla.com> <4DD0D077.6000003@mozilla.com> <3AABB042-0CFA-472D-A7F9-C978E995DFA8@users.sourceforge.net> Message-ID: On 17/05/2011, at 3:25 AM, Paul Stansifer wrote: > > > - Powerful macros that cover the "needs to be efficient" cases as well > > as the various other macro scenarios. Definitely front-end! > > I do not understand the need for macros. Could someone explain? > If you have polymorphic functions you basically don't need macros. > But perhaps I misunderstand what a macro is in Rust. > > Scheme has a very powerful macro system and it doesn't even have static types. Macros abstract over whatever second-class features a language has. In a non-polymorphic language, this includes types, but even the lambda calculus has a second-class feature: binding. If you want to create a new kind of 'let' or 'case' or 'lambda' construct (and Scheme people love doing that kind of thing), you need macros. The surface syntax of the language is also second-class, so macros can provide nicer surface syntaxes (for things like printf and regexpes, for example). Macros are ideal for embedding domain-specific languages. No they're not. They're the hack you use when you don't have higher order polymorphism. They definitely not "ideal". Just to be clear, I think of macros as "without proper typing and kinding" and in addition "operate by replacement, not programmatically". The first feature is used because the system doesn't have a good enough type/kinding system as a band aid; it is better than bleeding to death, but not as good a proper dressing :) The second feature is the main problem. Code generators, that is, actual code which generates code, is MUCH better than macros because the replacement rules for macros so that everything remains fairly nice and predictable just can't be formed. The obvious problem is quotation, and when you're nesting stuff the need to quote quotes to multiple levels and escape them. This problem doesn't exist at all in a programatic code generator. In Felix I had macros (very powerful) and also the build system was written in Python so I could easily write Python script to "print out" code. I tried to get rid of the Python script and put what it did into the language directly with macros, but this proved to be extremely hard to do because of the quoting problem. Macros lack transparency in the sense you have to change the arguments to macro calls to match the implementation, which you never need to do when you're generating code with a real programming language (like Python). [I actually got rid of most of the code generation and macro stuff with a "proper feature": type classes. The syntax hackery was got rid of with an extensible GLR parser that with user defined grammar and parser actions (AST generating code) written in Scheme] Basically, macros in any form are the poor mans code generator which don't work so well but seem easy to use because they fit into the existing syntax. The right way is to have proper high order "code generating" constructions, but this is hard. > > Macros could be used to create what are effectively inline functions, but if that's all the macro does, I'm really uncomfortable with the thought, because we don't want to encourage that kind of behavior. I have this terrible vision of programmers saying "Oh, use macros instead of functions, they're faster.", and then writing code that is (a) terrible, since it uses macros as if they were functions, and (b) slow, since the huge hunks of code produced don't fit in the instruction cache. Agreed, and a good example. Directly from experience with C no doubt :) Note in C++ need for macros for this purpose was removed by inline functions. And for the hacked up polymorphism macros provide in C: # define addmul(a,b,c) a + b * c C++ provided templates which re-enabled proper type checking. My point above is that these macros uses always represent a hack to do something that should be properly supported. My view is: if you have an example where you think you need macros, try to fix the language so you don't. Its actually "useful" to have macros for this reason: provide them because you can't solve all problems in the first release of the language, and then use client's actual use cases as a way to drive getting rid of that usage with proper constructions. BTW: I'm not against putting macros in for this reason. -- john skaller skaller at users.sourceforge.net From skaller at users.sourceforge.net Mon May 16 18:23:20 2011 From: skaller at users.sourceforge.net (john skaller) Date: Tue, 17 May 2011 11:23:20 +1000 Subject: [rust-dev] Conduct (was "Hello") In-Reply-To: <4DD15F61.7020404@mozilla.com> References: <4DCECF57.5060808@mozilla.com> <7357A147-EA1C-470C-9065-6F73936EA9D5@users.sourceforge.net> <4DCFF05E.3050903@mozilla.com> <4DD15F61.7020404@mozilla.com> Message-ID: On 17/05/2011, at 3:31 AM, Graydon Hoare wrote: > On 16/05/2011 6:35 AM, john skaller wrote: > >> I'm sure it isn't but without meaning to be rude real programmers have no idea what they're >> talking about most of the time so that isn't much of an argument. > > ... > > real programmers would then try to invent the Perpetual Motion Machine > > ... > > ridiculous attempts to support unsustainable religious beliefs > > ... > > Of course they keep producing code that doesn't work. > > If this is how you "don't mean to be rude", you need to work on social graces. This is intensely rude. If you aren't willing to treat people with respect, to try to understand their concerns and adapt, you are not welcome. Please, I deliberately wrote that stuff in a way that was *intended* to indicate I was well aware of the "conduct" issue. Clearly the perpetual motion machine was intended to make fun of the whole thing. The claim about producing code that doesn't work isn't rude. Its a fact. C programs continue to crash from null pointer dereferences. > > I'll let you follow up to the messages in flight, if you care to, but after that I'm going to have to ask you to leave. Sorry. I gave you a fair warning about our conduct guidelines and your behavior suggests you either don't want to, or can't, behave in a respectful manner. Ok, I'll unsubscribe shortly. -- john skaller skaller at users.sourceforge.net From peterhull90 at gmail.com Tue May 17 04:34:08 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Tue, 17 May 2011 12:34:08 +0100 Subject: [rust-dev] Linker warning In-Reply-To: References: Message-ID: On Fri, May 13, 2011 at 12:01 PM, Marijn Haverbeke wrote: >> Where are they coming from, and is it a problem or not? > > I'm seeing the same on Linux. They are not a problem. (Still, someone > should probably look into them.) 'i386-apple-darwin10.7.0' is the target triple that llvm-config reports on my system, so I am guessing that 'i686-apple-darwin' comes from the downloaded snapshot, ie. the version of OSX (or maybe LLVM itself) on the snapshot-compiling machine is not exactly the same as mine. Does that sound plausible? Pete From peterhull90 at gmail.com Tue May 17 04:53:11 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Tue, 17 May 2011 12:53:11 +0100 Subject: [rust-dev] self-serve snapshots In-Reply-To: <4DCDDB31.50703@mozilla.com> References: <4DCDDB31.50703@mozilla.com> Message-ID: On Sat, May 14, 2011 at 2:30 AM, Graydon Hoare wrote: > Let me know if it fails to work for you, or is confusing, or you can think > of nice ways to simplify it. Just a couple of questions: Is it now impossible to compile rustc without downloading a snapshot? If the snapshots were no longer available (for whatever reason) would it be clear enough for someone to track back through git's history and find the last working rustboot in order to bootstrap again? Is there a way to port rustc to a new platform (assuming it has LLVM support) - i.e can it be configured as a cross compiler? How do you see the final state when the rust language is perfectly stable, will there still be stage0, stage1 etc? I seem to remember that when gcc builds, it starts with simplified compiler (gcc1 ?) which can be build with the system C compiler, then gcc1 compiles gcc and all the tools. Would there need to be a 'rustc1' written in C? Pete From graydon at mozilla.com Tue May 17 08:23:39 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Tue, 17 May 2011 08:23:39 -0700 Subject: [rust-dev] self-serve snapshots In-Reply-To: References: <4DCDDB31.50703@mozilla.com> Message-ID: <4DD292FB.2010205@mozilla.com> On 17/05/2011 4:53 AM, Peter Hull wrote: > Is it now impossible to compile rustc without downloading a snapshot? Correct. > If the snapshots were no longer available (for whatever reason) would > it be clear enough for someone to track back through git's history and > find the last working rustboot in order to bootstrap again? It would not be *easy*, but yes, there's a trail you can follow. The src/snapshots.txt maps current git revisions (and datestamps) to snapshot binary files, and it grows as we add more snapshots. So you'd need to build as many of the snapshots as you were missing (including "starting again from the beginning") to fill in the gaps. But, in case you're concerned about data loss, every developer's build directory contains a download cache of all the snapshots you've picked up from the snapshot server already. So it's very likely that, were there a loss of the server (unlikely: it's s3 currently) the developers could restore the most-recent-few snapshots just by looking in $builddir/dl. For completeness, I'll also add another python script to the etc/ directory to download *all* snapshots, in case you wish to mirror the snapshot set in full. And I'll make copies in a few other places regularly. It's not that big -- the full set is currently about 40mb -- and space is cheap. > Is there a way to port rustc to a new platform (assuming it has LLVM > support) - i.e can it be configured as a cross compiler? Not yet, but it's on the list. The limitations that exist have to do with x86-isms and platform-specific issues wired into rustc already (i.e. even starting from rustboot, we couldn't yet run on any other platforms). Code-generation-wise, LLVM is perfectly happy to cross compile, and we intend rustc to be a cross compiler. > How do you see the final state when the rust language is perfectly > stable, will there still be stage0, stage1 etc? I seem to remember > that when gcc builds, it starts with simplified compiler (gcc1 ?) > which can be build with the system C compiler, then gcc1 compiles gcc > and all the tools. Would there need to be a 'rustc1' written in C? If it made anyone more comfortable, such a rustc1 could be built. Either manually (we've already written 2 rust compilers, why not a 3rd?!) or else using LLVM's "C backend". But the latter would hardly be any more portable or hackable than a binary image. For the most part, the "final state" I foresee is continuing to work off a structure more or less like the one we have now. With better automation and whatnot, but structurally the same: start from a snapshot and work forward from there. I don't *really* want to write a 3rd rust compiler; there are other things in life worth doing, y'know? If this is a policy problem with some free software distributor (say) we can cross that bridge when we come to it (provide audit trails, provide our own packages, provide stub "image downloader" packages, whatever). But I doubt it'll be a profound problem, just a quirk. It's not *that* abnormal. I believe gnat, cmucl and ocamlc, for example, are developed this way. -Graydon From noel at peralex.com Tue May 17 08:54:54 2011 From: noel at peralex.com (Noel Grandin) Date: Tue, 17 May 2011 17:54:54 +0200 Subject: [rust-dev] newbie q: choice of keywords Message-ID: <4DD29A4E.1000604@peralex.com> Hi I notice that Rust seems to favour extreme brevity in it's choice of keywords e.g. "ret" "vec" "cont" "fn" I ask because Rust aims to be a system programming language, and the current dominant system programming language is C/C++, so it would make sense to me to pick keywords that align more closely with C to make transition easier. Or am I missing something in the design criteria of the language? Regards, Noel Grandin Disclaimer: http://www.peralex.com/disclaimer.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From erick.tryzelaar at gmail.com Tue May 17 09:42:09 2011 From: erick.tryzelaar at gmail.com (Erick Tryzelaar) Date: Tue, 17 May 2011 09:42:09 -0700 Subject: [rust-dev] self-serve snapshots In-Reply-To: <4DD292FB.2010205@mozilla.com> References: <4DCDDB31.50703@mozilla.com> <4DD292FB.2010205@mozilla.com> Message-ID: On Tue, May 17, 2011 at 8:23 AM, Graydon Hoare wrote: > > If it made anyone more comfortable, such a rustc1 could be built. Either > manually (we've already written 2 rust compilers, why not a 3rd?!) or else > using LLVM's "C backend". But the latter would hardly be any more portable > or hackable than a binary image. Just a quick warning about LLVM's C backend. From an email thread back in November 2010, Chris Lattner said that he's unaware of anyone working on it, so it's fairly out of sync with the rest of LLVM. It probably will need a full rewrite in order to bring it up to production quality. http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-November/036029.html From icefoxen at gmail.com Tue May 17 10:18:37 2011 From: icefoxen at gmail.com (Simon Heath) Date: Tue, 17 May 2011 10:18:37 -0700 Subject: [rust-dev] self-serve snapshots In-Reply-To: <4DD292FB.2010205@mozilla.com> References: <4DCDDB31.50703@mozilla.com> <4DD292FB.2010205@mozilla.com> Message-ID: On Tue, May 17, 2011 at 8:23 AM, Graydon Hoare wrote: > But I doubt it'll be a profound problem, just a quirk. It's not *that* > abnormal. I believe gnat, cmucl and ocamlc, for example, are developed this > way. Delurking a moment for a nitpick; last I checked ocamlc distributes a bytecode image of ocamlc and a C source for a bytecode interpreter capable of running it. cmucl and gnat I find pretty inconvenient to build from source precisely because of this behavior, though cmucl a little less so since there are now other good lisp compilers available, so one doesn't get in a chicken-and-egg problem of "well their snapshot mysteriously breaks on my particular architecture" quite as badly. However, since you are targeting LLVM, it seems like you should be able to cross-compile to any host that LLVM supports. This adds some convenient portability, which seems to be one of the main objection to compiler snapshots. I don't know how portable and stable the LLVM intermediate language is, but if it is both of these it might make a decent archival format. >If it made anyone more comfortable, such a rustc1 could be built. Either manually >(we've already written 2 rust compilers, why not a 3rd?!) or else using LLVM's "C >backend". But the latter would hardly be any more portable or hackable than a binary image. It seems like this alone is a decent counter-argument; if you somehow lose snapshots or uncover a deep-seated code generator bug or something you can't compile your way out of, you can always build a new rust compiler in C to re-build rustc to a known-good state. Having such a thing may be useful eventually anyway... but not while the rust language itself is so heavily in flux. -- Simon Heath ? ?http://alopex.li/ ? Science, games, computers, life. From graydon at mozilla.com Tue May 17 10:26:02 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Tue, 17 May 2011 10:26:02 -0700 Subject: [rust-dev] newbie q: choice of keywords In-Reply-To: <4DD29A4E.1000604@peralex.com> References: <4DD29A4E.1000604@peralex.com> Message-ID: <4DD2AFAA.9010308@mozilla.com> On 11-05-17 08:54 AM, Noel Grandin wrote: > Hi > > I notice that Rust seems to favour extreme brevity in it's choice of keywords e.g. "ret" "vec" "cont" "fn" > > I ask because Rust aims to be a system programming language, and the current dominant system programming language is C/C++, > so it would make sense to me to pick keywords that align more closely with C to make transition easier. > > Or am I missing something in the design criteria of the language? Nothing terribly rational, it's just an aesthetic preference on my part. -Graydon From stejohns at adobe.com Tue May 17 10:26:18 2011 From: stejohns at adobe.com (Steven Johnson) Date: Tue, 17 May 2011 10:26:18 -0700 Subject: [rust-dev] newbie q: choice of keywords In-Reply-To: <4DD29A4E.1000604@peralex.com> References: <4DD29A4E.1000604@peralex.com> Message-ID: <842FEDAC-F763-4CF1-B9ED-791C92A34779@adobe.com> On May 17, 2011, at 8:54 AM, Noel Grandin wrote: I notice that Rust seems to favour extreme brevity in it's choice of keywords e.g. "ret" "vec" "cont" "fn" I ask because Rust aims to be a system programming language, and the current dominant system programming language is C/C++, so it would make sense to me to pick keywords that align more closely with C to make transition easier. +1 -- terseness is a virtue in a language, but this seems to be taking it to a bit of an extreme. I refuse to believe that the extra milliseconds saved by typing "fn" instead of "function" makes a difference in productivity, so I presume this choice is deliberate in that it is believed to *improve* readability? If so, may I gently offer a contrary opinion; words are more readable than abbreviations, and it's not clear (to me, anyway) that the shortened keywords meaningfully improve things. (OTOH I haven't read large swaths or Rust, so perhaps it does move the needle in real-world code...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From graydon at mozilla.com Tue May 17 12:42:57 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Tue, 17 May 2011 12:42:57 -0700 Subject: [rust-dev] Syntax changes Message-ID: <4DD2CFC1.9030908@mozilla.com> There's a bunch of syntax churn going on lately; I figured we should get this out of the way asap after bootstrapping, to minimize further pain on users. Having floated a few trial balloons on the mailing list about syntax over the winter, and seen that on almost every issue we have a broad level of "everyone disagrees about everything", I've decided to approach syntax in a more meritocratic sense of weighting opinions by level of code contribution, and ultimately since my name is all over the thing, to reserve a veto on any change I really wouldn't to defend in front of an audience. So we're going to move faster and a bit more unilaterally on syntax than I'd earlier implied. Sorry, but it just doesn't look like "consensus" is something that emerges very often in syntax conversations. If some part of what we're doing *really* grinds your gears after a few weeks or months of staring at it, file a followup bug and we can possibly revisit, and formally (re-)poll major developers to see what they think. But right now the stuff being done is roughly agreeable among the full timers working on rust, and is as follows: - path.to.module changed to path::to::module. - Module, type and value namespaces have been made disjoint. - vec(1,2,3) changed to [1,2,3] as a value constructor. Further ones queued up: - vec[T] will turn into T[] in the type grammar. - foo[T] will change to foo. for type parameter specification. - rec(a=b, c=d) will change to {a: b, c: d} as a value constructor. - rec(int a, int b) will change to {a: int, b: int} as a type constructor. - tup(a,b) will change to (a,b) in both type and value grammars; one-element tups will become inexpressible (unless someone really cares, then I guess we can copy the python hack of (a,). Tups will require parentheses. - foo:constraint() will change to foo|constraint() in the type grammar. - alt and case arms will probably lose a bit of indentation and parentheses. They're over-chatty right now. - 'auto' will probably turn to 'let' and 'let' to something else ('var' or such) Hopefully this will settle down in another few weeks. A lot of this stuff was decided months and months ago and we've just been waiting for rustboot to die (I removed it from the repository a few days ago). -Graydon From peterhull90 at gmail.com Tue May 17 13:21:20 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Tue, 17 May 2011 21:21:20 +0100 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD2CFC1.9030908@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> Message-ID: On Tue, May 17, 2011 at 8:42 PM, Graydon Hoare wrote: > Sorry, but it just doesn't look like "consensus" is something that emerges > very often in syntax conversations. Maybe posting a little code snippet with old and new syntax might make the case for changing a bit clearer? > ?- rec(int a, int b) will change to {a: int, b: int} as a type > ? ?constructor. > ?- 'auto' will probably turn to 'let' and 'let' to something else > ? ?('var' or such) There seems to be inconsistency now between having the type before the name or after it with a colon. i.e '{a: int, b: int}' vs. 'var int a = 0' Pete From pwalton at mozilla.com Tue May 17 13:28:24 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Tue, 17 May 2011 13:28:24 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: <4DD2DA68.4010909@mozilla.com> On 5/17/11 1:21 PM, Peter Hull wrote: > There seems to be inconsistency now between having the type before the > name or after it with a colon. i.e '{a: int, b: int}' vs. 'var int a = > 0' I agree; I kinda figured we'd use "{ int a, int b }" syntax for record types. I had also assumed that "var" would be the type-inferred keyword for consistency with C# (it reads a little better too IMHO) but I don't feel strongly about it either way. Patrick From sebastian.sylvan at gmail.com Tue May 17 15:23:17 2011 From: sebastian.sylvan at gmail.com (Sebastian Sylvan) Date: Tue, 17 May 2011 15:23:17 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD2CFC1.9030908@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> Message-ID: On Tue, May 17, 2011 at 12:42 PM, Graydon Hoare wrote: > > ?- 'auto' will probably turn to 'let' and 'let' to something else > ? ?('var' or such) Why the need for two keywords? Couldn't the inferred version simply omit the type? -- Sebastian Sylvan From marijnh at gmail.com Tue May 17 22:08:53 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Wed, 18 May 2011 07:08:53 +0200 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: > Why the need for two keywords? Couldn't the inferred version simply > omit the type? It could, but that'd require backtracking in the parser, and we'd like to keep Rust easy to parse (for speed, for our own convenience, and for the convenience of external tools). From sebastian.sylvan at gmail.com Tue May 17 22:37:04 2011 From: sebastian.sylvan at gmail.com (Sebastian Sylvan) Date: Tue, 17 May 2011 22:37:04 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: On Tue, May 17, 2011 at 10:08 PM, Marijn Haverbeke wrote: >> Why the need for two keywords? Couldn't the inferred version simply >> omit the type? > > It could, but that'd require backtracking in the parser, and we'd like > to keep Rust easy to parse (for speed, for our own convenience, and > for the convenience of external tools). Depends on the exact syntax you want, I suppose, but this could be done with at most single token lookahead right? Simplest option (no lookahead needed): let x : int = 5; let y = x; let z : int; Here the second token is always the identifier, and then there's two optional bits that can be uniquely identified by the first token (the next token must either be ":" or "="). Even this option is easy: let int x = 5; let y = x; let int z; You only need to look one token ahead to know what the second token was (if the third is not "" or "=", then the second token must've been the type). -- Sebastian Sylvan From peterhull90 at gmail.com Wed May 18 00:54:15 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Wed, 18 May 2011 08:54:15 +0100 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD2CFC1.9030908@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> Message-ID: On Tue, May 17, 2011 at 8:42 PM, Graydon Hoare wrote: > ?- rec(a=b, c=d) will change to {a: b, c: d} as a value constructor. > > ?- rec(int a, int b) will change to {a: int, b: int} as a type > ? ?constructor. > > ?- tup(a,b) will change to (a,b) in both type and value grammars; Might as well get all bike-shedding out of the way now... is there any merit in using {} for both rec and tup? I mean, express a tuple like a record with anonymous members. rec(int a,int b) becomes {a:int, b:int} and tup(a,b) becomes {a,b}. I've seen this before, maybe in Lua? I'm just thinking if you have auto x = (1+2+3,4+5+6) vs. auto x = (1+2+3+4+5+6) you have to read quite a way in to tell if it's a tuple or an expression in parentheses. Pete From noel at peralex.com Wed May 18 03:44:36 2011 From: noel at peralex.com (Noel Grandin) Date: Wed, 18 May 2011 12:44:36 +0200 Subject: [rust-dev] newbie q: choice of keywords In-Reply-To: <4DD2AFAA.9010308@mozilla.com> References: <4DD29A4E.1000604@peralex.com> <4DD2AFAA.9010308@mozilla.com> Message-ID: <4DD3A314.2070500@peralex.com> OK. I don't presume to dictate the design of your language, but I have noticed that languages that adhere more closely to an established language tend to have an easier time attracting converts (Java, JavaScript, Go). Just my 2c :-) In general, this is a very interesting time - lots of interesting new languages around, and I'm very interested to see how the typestate stuff in Rust works out. -- Noel Grandin Graydon Hoare wrote: > On 11-05-17 08:54 AM, Noel Grandin wrote: >> Hi >> >> I notice that Rust seems to favour extreme brevity in it's choice of keywords e.g. "ret" "vec" "cont" "fn" >> >> I ask because Rust aims to be a system programming language, and the current dominant system programming language is >> C/C++, >> so it would make sense to me to pick keywords that align more closely with C to make transition easier. >> >> Or am I missing something in the design criteria of the language? > > Nothing terribly rational, it's just an aesthetic preference on my part. > > -Graydon > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > Disclaimer: http://www.peralex.com/disclaimer.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From marijnh at gmail.com Wed May 18 10:11:01 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Wed, 18 May 2011 19:11:01 +0200 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: > Depends on the exact syntax you want, I suppose, but this could be > done with at most single token lookahead right? Simplest option (no > lookahead needed): We'll probably keep the type, which can be arbitrarily complex, in front of the variable name. We also plan, at least in the type-deducing variant, to allow destructuring, making the two even more dissimilar. From brendan at mozilla.org Wed May 18 10:18:29 2011 From: brendan at mozilla.org (Brendan Eich) Date: Wed, 18 May 2011 10:18:29 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: On May 18, 2011, at 10:11 AM, Marijn Haverbeke wrote: >> Depends on the exact syntax you want, I suppose, but this could be >> done with at most single token lookahead right? Simplest option (no >> lookahead needed): > > We'll probably keep the type, which can be arbitrarily complex, in > front of the variable name. We also plan, at least in the > type-deducing variant, to allow destructuring, making the two even > more dissimilar. I like type in front from C and C++, but it has caused huge mischief in the lexers and parsers for those languages. Is it worth it? People in this thread mentioned {x: uint, y: uint} vs. {uint x, uint y}, and it does seem worth getting types-in-front in all contexts. But the type-after-colon approach is strictly simpler to parse and make optional. Can someone say why types-in-front wins? It's ok if it is an aesthetic judgment by BDFL or cohort, I am mainly asking if I'm missing a deeper reason. I'm not sure how destructuring with TI affects this, so that could be exactly what I'm missing. Thanks for any info. /be From marijnh at gmail.com Wed May 18 10:30:49 2011 From: marijnh at gmail.com (Marijn Haverbeke) Date: Wed, 18 May 2011 19:30:49 +0200 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: (I have no strong opinion about types before names. But I do want to point out that...) > I'm not sure how destructuring with TI affects this, so that could be exactly what I'm missing. Thanks for any info. Having name:type syntax would allow type annotations in destructuring patterns. The current syntax makes that very hard to do. Same for optional type annotations in inner functions, and destructuring in function arguments. I *think* that this use of the ':' character is compatible with our intention of using it for labels and record fields. (ML, I expect, did think this through further than C did.) From graydon at mozilla.com Wed May 18 11:09:17 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 18 May 2011 11:09:17 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: <4DD40B4D.7030309@mozilla.com> On 11-05-18 10:18 AM, Brendan Eich wrote: > I like type in front from C and C++, but it has caused huge mischief in the lexers and parsers for those languages. Is it worth it? > > People in this thread mentioned {x: uint, y: uint} vs. {uint x, uint y}, and it does seem worth getting types-in-front in all contexts. > > But the type-after-colon approach is strictly simpler to parse and make optional. Can someone say why types-in-front wins? It's ok if it is an aesthetic judgment by BDFL or cohort, I am mainly asking if I'm missing a deeper reason. > > I'm not sure how destructuring with TI affects this, so that could be exactly what I'm missing. Thanks for any info. As far as I know type-after-optional-colon wins on every stated-or-explored technical merit (optional inference, destructuring, symmetry with records) except two: - You have to put in an extra character : when you want types, maybe not a big deal when people use inference locally all the time, but we always annotate fn signatures, so they get chattier. - It's not what C, Java, C++, and the rest of that family does. So .. I'm a bit torn. I'd go either way if there was really strong consensus; as it stands I've hugged the security blanket of "because C does it" here, for fear of scaring off all transitioning programmers. On this point, at least, I'd welcome a straw poll. Just to figure exactly how much everyone thinks "most programmers will balk" at "v:T" rather than "T v". Look at these examples and decide: fn push(&T[] v, T e) -> T[] { let T[] u = v + [e]; ret u; } fn push(v: &T[], e: T) : T[] { let u: T[] = v + [e]; ret u; } -Graydon From paul.stansifer at gmail.com Wed May 18 10:58:43 2011 From: paul.stansifer at gmail.com (Paul Stansifer) Date: Wed, 18 May 2011 10:58:43 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: I'm in favor of having an explicit symbol like colon (though I don't care how it's spelled) to mark type annotations. I suspect that not having an explicit symbol may constrain future syntactic choices. In particular, macros may benefit if type annotations on their arguments are syntactically optional. (I also feel like I mess up the C-style type annotations a lot in the presence of complex types, but that might just be me.) Paul From dherman at mozilla.com Wed May 18 11:24:18 2011 From: dherman at mozilla.com (David Herman) Date: Wed, 18 May 2011 11:24:18 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD40B4D.7030309@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: <66FA29DD-414D-4A12-A76E-0BE8DFFC7582@mozilla.com> > As far as I know type-after-optional-colon wins on every stated-or-explored technical merit (optional inference, destructuring, symmetry with records) except two: > > - You have to put in an extra character : when you want types, > maybe not a big deal when people use inference locally all the > time, but we always annotate fn signatures, so they get chattier. I don't find the chattiness too much; in fact, I find that let x : int reads more instantaneously than let int x because the : more quickly divides the roles to my eyes. That's purely personal, of course. > - It's not what C, Java, C++, and the rest of that family does. That's definitely true. It's never clear how far you can stray from C-like syntax before the tide turns against you. My gut feeling says this won't bother enough people to be a make-or-break issue, but I have no crystal ball. > On this point, at least, I'd welcome a straw poll. Just to figure exactly how much everyone thinks "most programmers will balk" at "v:T" rather than "T v". Look at these examples and decide: > > fn push(&T[] v, T e) -> T[] { > let T[] u = v + [e]; > ret u; > } > > fn push(v: &T[], e: T) : T[] { > let u: T[] = v + [e]; > ret u; > } So yeah, FWIW I prefer the latter. Dave From brendan at mozilla.org Wed May 18 10:55:36 2011 From: brendan at mozilla.org (Brendan Eich) Date: Wed, 18 May 2011 10:55:36 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> Message-ID: <91E62990-D6D2-4BA0-A8A0-3D8A7A7F80E0@mozilla.org> On May 18, 2011, at 10:30 AM, Marijn Haverbeke wrote: > (I have no strong opinion about types before names. But I do want to > point out that...) > >> I'm not sure how destructuring with TI affects this, so that could be exactly what I'm missing. Thanks for any info. > > Having name:type syntax would allow type annotations in destructuring > patterns. The current syntax makes that very hard to do. Same for > optional type annotations in inner functions, and destructuring in > function arguments. I *think* that this use of the ':' character is > compatible with our intention of using it for labels and record > fields. (ML, I expect, did think this through further than C did.) Ok, this makes sense. Some foolish consistency when it comes to syntax can help (greater uniformity for people learning the language, learning inductively/constructively, developing intuition). ML seems less consistent in this light. : T as optional annotation after name seems more consistent. But consistency is overrated ;-). /be From niko at alum.mit.edu Wed May 18 14:05:43 2011 From: niko at alum.mit.edu (Niko Matsakis) Date: Wed, 18 May 2011 23:05:43 +0200 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD40B4D.7030309@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: <4DD434A7.3090503@alum.mit.edu> > On this point, at least, I'd welcome a straw poll. Just to figure > exactly how much everyone thinks "most programmers will balk" at "v:T" > rather than "T v". Look at these examples and decide: My two cents: I prefer"name: type". I find it easier to parse, particularly when types get more complex. Niko From arelius at gmail.com Wed May 18 14:21:11 2011 From: arelius at gmail.com (Nicholas "Indy" Ray) Date: Wed, 18 May 2011 14:21:11 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD40B4D.7030309@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: > > fn push(&T[] v, T e) -> T[] { > let T[] u = v + [e]; > ret u; > } > > fn push(v: &T[], e: T) : T[] { > let u: T[] = v + [e]; > ret u; > } I think your example here show's a clear advantage of consistently using colon to distinguish types. Coming from a C background where I'm used to the return type before the function the type, having the type behind the function, (and after '->') which is inconsistent with the rest of the types threw me off for a while at first. But I think making the syntax to other type annotations makes that much clearer. Indy -------------- next part -------------- An HTML attachment was scrubbed... URL: From andersrb at gmail.com Wed May 18 14:26:16 2011 From: andersrb at gmail.com (Brian Anderson) Date: Wed, 18 May 2011 17:26:16 -0400 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD40B4D.7030309@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: On Wed, May 18, 2011 at 2:09 PM, Graydon Hoare wrote: > > On this point, at least, I'd welcome a straw poll. Just to figure exactly > how much everyone thinks "most programmers will balk" at "v:T" rather than > "T v". Look at these examples and decide: > > fn push(&T[] v, T e) -> T[] { > let T[] u = v + [e]; > ret u; > } > > fn push(v: &T[], e: T) : T[] { > let u: T[] = v + [e]; > ret u; > } > I asked three C# programmers what they thought of the second example and two said 'looks like F#', which I took to be a positive sign. The other sort of wrinkled his nose. FWIW, I like the colons for reasons previously stated. -Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.sylvan at gmail.com Wed May 18 18:02:59 2011 From: sebastian.sylvan at gmail.com (Sebastian Sylvan) Date: Wed, 18 May 2011 18:02:59 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD40B4D.7030309@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: On Wed, May 18, 2011 at 11:09 AM, Graydon Hoare wrote: > ?- You have to put in an extra character : when you want types, As far as I can tell you could even skip the ":" and not lose anyhing because the other possible token is "=" (for an initializer, without an explicit type) which isn't a valid token for a type. So all these things parse unambiguously and without any back tracking, and not even any lookahead. let x int; let y = x; let z int = y; let w int; w = z; That said, I'd probably prefer having an operator in between. Also, I find that putting the name of something in a consistent place (in the sense of "the name of a variable is always 4 characters in from the left") makes it alot easier for humans to quickly scan for a specific name. If there's a big old type there of variable length you have to jump horizontally a lot. I would conjecture that you're scanning for a name more often than for a type, so putting it first makes sense. -- Sebastian Sylvan From pwalton at mozilla.com Wed May 18 18:03:50 2011 From: pwalton at mozilla.com (Patrick Walton) Date: Wed, 18 May 2011 18:03:50 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: <4DD46C76.80207@mozilla.com> On 5/18/11 6:02 PM, Sebastian Sylvan wrote: > On Wed, May 18, 2011 at 11:09 AM, Graydon Hoare wrote: >> - You have to put in an extra character : when you want types, > > As far as I can tell you could even skip the ":" and not lose anyhing > because the other possible token is "=" (for an initializer, without > an explicit type) which isn't a valid token for a type. I find this hard to read. Go uses this syntax and I think it's a little difficult. The ':' helps, at least for me. Patrick From sebastian.sylvan at gmail.com Wed May 18 18:46:23 2011 From: sebastian.sylvan at gmail.com (Sebastian Sylvan) Date: Wed, 18 May 2011 18:46:23 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD40B4D.7030309@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: On Wed, May 18, 2011 at 11:09 AM, Graydon Hoare wrote: > ?fn push(v: &T[], e: T) : T[] { > ? ? ?let u: T[] = v + [e]; > ? ? ?ret u; > ?} No fist fights have broken out yet so I'll push the bike shedding a bit further and suggest that you make all types constructor operator right associative. This would solve the issue of indecipherable types that C++ has (see the paper "A Modest Proposal : C++ resyntaxed" for a lot reasoning around this, with examples etc.). E.g. what does this mean "@ int []"? I wouldn't know without looking it up (which is the same issue C++ has). Is it a box with an array of ints, or an array of boxes of ints? If everything's always right associative the former would be "@ [] int", and the latter "[] @ int". How would you write the type for "a mutable box of array of boxes of mutable ints"? Well with right associativity it would transcribe trivially to: "mutable @ [] @ mutable int". Just read it left to right like you would in English and that's what it is. Of course, you could choose to forego associativity for some things if grouping is natural due to the constructor being a set of brackets (e.g. you might want [T], {x:T}, (T,U) for vectors, records and tuples respectively). This makes it less consistent but possibly without introducing too much confusion, and maybe even improving clarity a bit. In fact, in that case all you'd need to change, as far as I can tell, is the vector type constructor syntax to be "[T]" instead of T[] which would avoid any ambiguous associativity issues (that last example would then be "mutable @ [ @ mutable int ]"). -- Sebastian Sylvan From graydon at mozilla.com Thu May 19 08:28:23 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Thu, 19 May 2011 08:28:23 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> Message-ID: <4DD53717.2050806@mozilla.com> On 18/05/2011 6:46 PM, Sebastian Sylvan wrote: > In fact, in that case all you'd need to change, as far as I can tell, > is the vector type constructor syntax to be "[T]" instead of T[] which > would avoid any ambiguous associativity issues (that last example > would then be "mutable @ [ @ mutable int ]"). Yeah. I'm sympathetic to this and have discussed exactly this point a fair bit already; the problem is that we'd like to reserve room in the syntax for a type of vecs that have a specific interior allocation reserved for them rather than pointing to the heap. I.e. int[10] or such. This could still be done by [int](10) or [10]int or even [10 int] it's just a matter of ... alienness of convention? -Graydon From brendan at mozilla.org Thu May 19 09:17:02 2011 From: brendan at mozilla.org (Brendan Eich) Date: Thu, 19 May 2011 09:17:02 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD53717.2050806@mozilla.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> <4DD53717.2050806@mozilla.com> Message-ID: <23AEAA7F-F4F0-46CF-A4BA-993138E99642@mozilla.org> On May 19, 2011, at 8:28 AM, Graydon Hoare wrote: > On 18/05/2011 6:46 PM, Sebastian Sylvan wrote: > >> In fact, in that case all you'd need to change, as far as I can tell, >> is the vector type constructor syntax to be "[T]" instead of T[] which >> would avoid any ambiguous associativity issues (that last example >> would then be "mutable @ [ @ mutable int ]"). > > Yeah. I'm sympathetic to this and have discussed exactly this point a fair bit already; the problem is that we'd like to reserve room in the syntax for a type of vecs that have a specific interior allocation reserved for them rather than pointing to the heap. I.e. int[10] or such. > > This could still be done by [int](10) or [10]int or even [10 int] it's just a matter of ... alienness of convention? Presumably if the natives are C/C++ hackers, int[10] would be non-alien. But type in the middle or on the right would be more consistent, ceteris paribus. I think [10 int] reads well. Can the constant expression sub-grammar compose this way? /be From noelgrandin at gmail.com Thu May 19 09:21:06 2011 From: noelgrandin at gmail.com (Noel Grandin) Date: Thu, 19 May 2011 18:21:06 +0200 Subject: [rust-dev] Syntax changes In-Reply-To: <23AEAA7F-F4F0-46CF-A4BA-993138E99642@mozilla.org> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> <4DD53717.2050806@mozilla.com> <23AEAA7F-F4F0-46CF-A4BA-993138E99642@mozilla.org> Message-ID: <4DD54372.2060904@gmail.com> Or you could repurpose a keyword inline [ int ] to indicate interior allocation. Brendan Eich wrote: > On May 19, 2011, at 8:28 AM, Graydon Hoare wrote: > >> On 18/05/2011 6:46 PM, Sebastian Sylvan wrote: >> >>> In fact, in that case all you'd need to change, as far as I can tell, >>> is the vector type constructor syntax to be "[T]" instead of T[] which >>> would avoid any ambiguous associativity issues (that last example >>> would then be "mutable @ [ @ mutable int ]"). >> Yeah. I'm sympathetic to this and have discussed exactly this point a fair bit already; the problem is that we'd like to reserve room in the syntax for a type of vecs that have a specific interior allocation reserved for them rather than pointing to the heap. I.e. int[10] or such. >> >> This could still be done by [int](10) or [10]int or even [10 int] it's just a matter of ... alienness of convention? > Presumably if the natives are C/C++ hackers, int[10] would be non-alien. But type in the middle or on the right would be more consistent, ceteris paribus. > > I think [10 int] reads well. Can the constant expression sub-grammar compose this way? > > /be > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev From erick.tryzelaar at gmail.com Thu May 19 09:28:05 2011 From: erick.tryzelaar at gmail.com (Erick Tryzelaar) Date: Thu, 19 May 2011 09:28:05 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <4DD54372.2060904@gmail.com> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> <4DD53717.2050806@mozilla.com> <23AEAA7F-F4F0-46CF-A4BA-993138E99642@mozilla.org> <4DD54372.2060904@gmail.com> Message-ID: Or use an operator. In felix, we used the exponential style syntax of "int^5". This came naturally from the way we expressed tuple types as "int * float * str". I'm not sure it fits rust's style though. On Thu, May 19, 2011 at 9:21 AM, Noel Grandin wrote: > Or you could repurpose a keyword > > ? inline [ int ] > > to indicate interior allocation. > > > Brendan Eich wrote: >> On May 19, 2011, at 8:28 AM, Graydon Hoare wrote: >> >>> On 18/05/2011 6:46 PM, Sebastian Sylvan wrote: >>> >>>> In fact, in that case all you'd need to change, as far as I can tell, >>>> is the vector type constructor syntax to be "[T]" instead of T[] which >>>> would avoid any ambiguous associativity issues (that last example >>>> would then be "mutable @ [ @ mutable int ]"). >>> Yeah. I'm sympathetic to this and have discussed exactly this point a fair bit already; the problem is that we'd like to reserve room in the syntax for a type of vecs that have a specific interior allocation reserved for them rather than pointing to the heap. I.e. int[10] or such. >>> >>> This could still be done by [int](10) or [10]int or even [10 int] it's just a matter of ... alienness of convention? >> Presumably if the natives are C/C++ hackers, int[10] would be non-alien. But type in the middle or on the right would be more consistent, ceteris paribus. >> >> I think [10 int] reads well. Can the constant expression sub-grammar compose this way? >> >> /be >> _______________________________________________ >> Rust-dev mailing list >> Rust-dev at mozilla.org >> https://mail.mozilla.org/listinfo/rust-dev > > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > From graydon at mozilla.com Thu May 19 09:56:57 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Thu, 19 May 2011 09:56:57 -0700 Subject: [rust-dev] Syntax changes In-Reply-To: <23AEAA7F-F4F0-46CF-A4BA-993138E99642@mozilla.org> References: <4DD2CFC1.9030908@mozilla.com> <4DD40B4D.7030309@mozilla.com> <4DD53717.2050806@mozilla.com> <23AEAA7F-F4F0-46CF-A4BA-993138E99642@mozilla.org> Message-ID: <4DD54BD9.2000102@mozilla.com> The bike shed rumbles onwards.. On 19/05/2011 9:17 AM, Brendan Eich wrote: > I think [10 int] reads well. Can the constant expression sub-grammar compose this way? Not for complex exprs, no. Literals would, but that's clearly not enough. More likely [int, 10] or [int ^ 10] as erckt suggested would work. I think I like the former a bit more, and we already have a sort of wrestling act to do with interpreting , subject to its current enclosing type-or-value constructor ( [], (), fn() or { }, it means different things). -Graydon From erick.tryzelaar at gmail.com Thu May 19 21:31:19 2011 From: erick.tryzelaar at gmail.com (Erick Tryzelaar) Date: Thu, 19 May 2011 21:31:19 -0700 Subject: [rust-dev] rust failing to compile on fresh rebuild Message-ID: Rust is erroring out for me when it tries to compile stage2/glue.o. It's dying with: % make stage2/glue.o DYLD_LIBRARY_PATH=/Users/erickt/Projects/rust/build/stage1:/Users/erickt/Projects/rust/build/rt:/Users/erickt/Projects/rust/build/rustllvm:/Users/erickt/Projects/llvm/gcc-i386/install-Release/lib stage1/rustc -O -g -L stage2 -c -o stage2/glue.o --glue warning: no debug symbols in executable (-arch i386) compile: stage2/std.o generate: stage2/glue.o rt: --- rt: cd3c:main:main: rust: error: No input files allowed with --glue. rt: cd3c:main:main: upcall fail 'explicit failure', ../rust/src/comp/driver/session.rs:90 rt: cd3c:main: domain main @0x2800000 root task failed rt: --- rt: cd3c:main:main: rust: error: Multiple input filenames provided. rt: cd3c:main:main: upcall fail 'explicit failure', ../rust/src/comp/driver/session.rs:90 rt: cd3c:main: domain main @0x2800000 root task failed I bisected the bug down to this commit: 31d65453d47f243635ea17563db6b2c1127e9836 if someone has some time to debug it. From erick.tryzelaar at gmail.com Fri May 20 07:17:52 2011 From: erick.tryzelaar at gmail.com (Erick Tryzelaar) Date: Fri, 20 May 2011 07:17:52 -0700 Subject: [rust-dev] rust failing to compile on fresh rebuild In-Reply-To: References: Message-ID: On Thu, May 19, 2011 at 9:31 PM, Erick Tryzelaar wrote: > Rust is erroring out for me when it tries to compile stage2/glue.o. The problem stemmed from the "-g" flag switching from optflag to optopt. This patch fixes the problem: diff --git a/src/comp/driver/rustc.rs b/src/comp/driver/rustc.rs index 0ce20eb..879c53c 100644 --- a/src/comp/driver/rustc.rs +++ b/src/comp/driver/rustc.rs @@ -231,7 +231,7 @@ fn main(vec[str] args) { optflag("ls"), optflag("parse-only"), optflag("O"), optopt("OptLevel"), optflag("shared"), optmulti("L"), - optflag("S"), optflag("c"), optopt("o"), optopt("g"), + optflag("S"), optflag("c"), optopt("o"), optflag("g"), optflag("save-temps"), optopt("sysroot"), optflag("stats"), optflag("time-passes"), optflag("time-llvm-passes"), From graydon at mozilla.com Fri May 20 11:59:27 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Fri, 20 May 2011 11:59:27 -0700 Subject: [rust-dev] LLVM bump Message-ID: <4DD6BA0F.30008@mozilla.com> The tinderboxes upgraded their LLVM yesterday to take a change that depends on something on LLVM trunk (support for 0-length vectors). You'll need to update your LLVM builds if you have been tracking us. -Graydon From munificentbob at gmail.com Mon May 23 11:51:23 2011 From: munificentbob at gmail.com (Bob Nystrom) Date: Mon, 23 May 2011 11:51:23 -0700 Subject: [rust-dev] newbie q: choice of keywords In-Reply-To: <4DD3A314.2070500@peralex.com> References: <4DD29A4E.1000604@peralex.com> <4DD2AFAA.9010308@mozilla.com> <4DD3A314.2070500@peralex.com> Message-ID: > +1 -- terseness is a virtue in a language, but this seems to be taking it > to a bit of an extreme. I refuse to believe that the extra milliseconds > saved by typing "fn" instead of "function" makes a difference in > productivity, > The Javascript committee is spending a *huge* amount of effort right now to improve JS's function syntax moving forward in part because "function" *is*long. > so I presume this choice is deliberate in that it is believed to *improve* > readability? If so, may I gently offer a contrary opinion; words are more > readable than abbreviations, and it's not clear (to me, anyway) that the > shortened keywords meaningfully improve things. > "fn" is still a word. I would actually argue in favor *for* abbreviated keywords: 1. Any user will need to know them anyway, so it's not like they need to be explicitly read and comprehended at each site where they appear. From that angle, any token is as good as any other. 2. Using an abbreviation leaves the full word available as an identifier. Now you can have a variable named "function" in your code. I don't presume to dictate the design of your language, but I have noticed > that languages that adhere more closely to an established language tend to > have an easier time attracting converts (Java, JavaScript, Go). > Most established languages *do* use abbreviations: int, const, float, bool, extern, enum, typedef, etc. Go chose "func" for function. (Having said that, "ret" and "cont" seem a bit perverse to me. ) - bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From stejohns at adobe.com Mon May 23 12:16:20 2011 From: stejohns at adobe.com (Steven Johnson) Date: Mon, 23 May 2011 12:16:20 -0700 Subject: [rust-dev] newbie q: choice of keywords In-Reply-To: References: <4DD29A4E.1000604@peralex.com> <4DD2AFAA.9010308@mozilla.com> <4DD3A314.2070500@peralex.com> Message-ID: <9BC3F306-D64C-42FE-AA9E-5EF5008732F8@adobe.com> On May 23, 2011, at 11:51 AM, Bob Nystrom wrote: > The Javascript committee is spending a huge amount of effort right now to improve JS's function syntax moving forward in part because "function" is long. Everyone's entitled to their opinion; I respectfully disagree. > > "fn" is still a word. > Err, not to be snarky, but a random checking of online dictionaries list the only English meaning of "fn" as "footnote". (But yeah, the meaning in this context is unlikely to be unclear.) > 1. Any user will need to know them anyway, so it's not like they need to be explicitly read and comprehended at each site where they appear. From that angle, any token is as good as any other. Sure, but that's not a useful angle; the point is to use tokens that express the meaning of the code to both human and compiler. So, yeah, you could also use "foo" for this keyword, and people would learn to parse it well enough, but it would be more jarring to read than actually necessary. (Sorta like requiring that Shakespeare always be typeset in Comic Sans....) > 2. Using an abbreviation leaves the full word available as an identifier. Now you can have a variable named "function" in your code. > I don't think I've ever had the desire to do this in 20+ years of programming. (And conversely, it means I *can't* have a variable named "fn" in my code.) > > Most established languages do use abbreviations: int, const, float, bool, extern, enum, typedef, etc. Go chose "func" for function. This is true, and I very much doubt I'd ask for "integer" over "int", so I freely admit this is an arbitrary argument on my part. At any rate, I'm a non-contributing lurker on this list, so my vote counts for very little; but since a little syntax bikeshedding came up, I thought I'd chime in as a dissenting opinion from the Slightly-More-Verbose-Is-Good camp... From sebastian.sylvan at gmail.com Mon May 23 12:56:21 2011 From: sebastian.sylvan at gmail.com (Sebastian Sylvan) Date: Mon, 23 May 2011 12:56:21 -0700 Subject: [rust-dev] newbie q: choice of keywords In-Reply-To: References: <4DD29A4E.1000604@peralex.com> <4DD2AFAA.9010308@mozilla.com> <4DD3A314.2070500@peralex.com> Message-ID: On Mon, May 23, 2011 at 11:51 AM, Bob Nystrom wrote: > >> +1 -- terseness is a virtue in a language, but this seems to be taking it >> to a bit of an extreme. I refuse to believe that the extra milliseconds >> saved by typing "fn" instead of "function" makes a difference in >> productivity, > > The Javascript committee is spending a huge amount of effort right now to > improve JS's function syntax moving forward in part because "function" is > long. And C# introduced special syntax for lambdas because anonymous delegates were too verbose. I think people tend to underestimate the psychological value of terseness. There's a balance here, having extra operators or keywords that are not strictly necessary can help readability (this is why people have issus with Lisp), but having too much of it can have a substantial impact on the way people write code (not much functional stuff was being written in C# 2.0, or indeed Java, despite it being technically possible - the slight added convenience of lambda statements and expressions punches way above its weight here). -- Sebastian Sylvan From lkuper at mozilla.com Mon May 23 17:48:59 2011 From: lkuper at mozilla.com (Lindsey Kuper) Date: Mon, 23 May 2011 17:48:59 -0700 Subject: [rust-dev] Heads up: LLVM include StandardPasses.h missing Message-ID: An LLVM include we're using in rustllvm/Passes2.cpp, llvm/Support/StandardPasses.h, is gone in the most recent revision of LLVM (see: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110516/121142.html -- it looks like it's been gone since Saturday). Presumably we could be using its replacement, llvm/Support/PassManagerBuilder.h, although it's not entirely a drop-in replacement. I believe Rafael said that he would look into this tomorrow. In the meantime, I'm going back to an older revision of LLVM. (r131752 of LLVM and r131425 of compiler-rt are what seem to be working for Eric, so that's what I'm trying.) Lindsey From respindola at mozilla.com Tue May 24 05:52:10 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Tue, 24 May 2011 08:52:10 -0400 Subject: [rust-dev] nice high level overview of llvm Message-ID: <4DDBA9FA.2020805@mozilla.com> http://www.aosabook.org/en/llvm.html Cheers, Rafael From respindola at mozilla.com Tue May 24 10:52:47 2011 From: respindola at mozilla.com (Rafael Avila de Espindola) Date: Tue, 24 May 2011 13:52:47 -0400 Subject: [rust-dev] Heads up: LLVM include StandardPasses.h missing In-Reply-To: References: Message-ID: <4DDBF06F.2030505@mozilla.com> On 11-05-23 08:48 PM, Lindsey Kuper wrote: > An LLVM include we're using in rustllvm/Passes2.cpp, > llvm/Support/StandardPasses.h, is gone in the most recent revision of > LLVM (see: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110516/121142.html > -- it looks like it's been gone since Saturday). Presumably we could > be using its replacement, llvm/Support/PassManagerBuilder.h, although > it's not entirely a drop-in replacement. I believe Rafael said that > he would look into this tomorrow. In the meantime, I'm going back to > an older revision of LLVM. (r131752 of LLVM and r131425 of > compiler-rt are what seem to be working for Eric, so that's what I'm > trying.) > I have created https://github.com/graydon/rust/pull/403 > Lindsey Cheers, Rafael From graydon at mozilla.com Wed May 25 10:53:37 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 10:53:37 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing Message-ID: <4DDD4221.7030104@mozilla.com> Hi, We had a brief discussion on IRC yesterday that ended in me storming off in a very unprofessional manner. I'd like to publicly apologize for that behavior, it was not cool and had little to do with the conversation at hand. My stress level was very high coming into work yesterday, and I was letting personal life spill into work life. My fault, sorry. I'd also like to restart (or at least restate) parts of that discussion here so we can actually get this worked out to everyone's satisfaction (including Rafael, who clearly has strong feelings on the matter). This will be a long rambly email full of back-story to set the stage; if you have specific points to follow-up on, just snip those parts out for your replies. Preface ~~~~~~~ We know (or at least, anyone who's poked at it knows) that the tasking and threading model in rustboot was pretty unsatisfactory. It passed some interesting tests ("millions of tasks, real cheap!") and had a number of needs it was trying to meet -- it was not designed in *complete* ignorance -- but it also imposed strange burdens that we'd like to dispense with this time around. Hopefully without losing the good stuff. So, some background "goals" in order to make this story make sense. The three primary design pressures are: (1) Support *isolation*, in some useful sense, between tasks. That is, a task should be able to reason locally about its data and code without worrying whether other tasks are mucking with their own data and code. With the exception of message-IO points and unsafe blocks, which obviously involve the potential for non-isolated action. (2) Support *lots* of tasks. Millions. Such that a programmer has no fear about making "too many" tasks in a system, if it decomposes nicely. If for no other reason than support for isolation-based local reasoning (though concurrent, or interleaved, or truly parallel execution is also nice to exploit whenever it's surfaced this way). (3) Run at relatively high, but more importantly *predictable* performance. As few magical parts as possible in the concurrency model. Take the M:N performance-penalty if necessary to achieve the other 2 goals, so long as there are no random peformance discontinuities in the model. Concurrency model is intimately connected to memory model, unwinding, gc, and several other things; so when I say we're going to be revisiting design decisions in rustboot's concurrency model, I implicitly include parts of the memory model too, and other parts. The Past (rustboot) ~~~~~~~~~~~~~~~~~~~ The "lots of tasks" pressure breaks down into two sub-issues: making tasks small (in the sense of memory) and making them independently scheduled. We approached the "small" issue via growable stacks (doubling vectors with pointer-rewriting) and a very large dose of ugly magic for doing calls "between stacks" (from rust to C). This had lots of unfortunate fallout: debuggers and tools got upset, calling back into rust code from C was mostly impossible, and to support it safely we'd need to be flushing pointers to the stack and re-reading them *constantly*, much more than just "make sure values are pinned somewhere" necessary for GC. We approached the "scheduling" issue by even *more* magic return-address patching during suspended C calls, and a custom cooperative scheduler. The "isolation" pressure was approached by stratifying the heap memory model into private and shared (between-tasks) memory, with the shared stuff always immutable and acyclic. Sharing was not always possible -- not between threads and not between processes -- but between tasks-in-a-thread it could work, and we figured that scenario was valuable to users. Cheap sending of shared bits between tasks in a thread. Then we'd do a deep copy when we hit domain boundaries. But to support that scenario, tasks had to be pinned to threads. That is, the concurrency scheme in rustboot involved tasks running within domains (threads or processes; though the latter never materialized), where the user explicitly constructed domains and injected threads into the domain. Once spawned in a domain, a task could not leave it (be migrated to another thread), because it might have pointers into the "shared memory" of the domain. This pinning to domains has the unfortunate performance characteristic of *requiring* a user to pay attention to task:thread assignments; this could be a benefit in some cases -- explicit control can be good -- but in many cases it seemed bad, or at least over-complex. It's not just a matter of "having an M:N scheduler in userspace" (which will, says the literature, always underperform a 1:1 scheduler with the kernel involved) but also pinning each individual task in M to a thread in N, such that one task blocking (or even just monopolizing a core) could block or slow down a whole group of tasks on the same thread. This is a usability hazard. The Future (rustc) ~~~~~~~~~~~~~~~~~~ Rustboot is dead (yay!) and we're in the process of working through the leftover cruft in the runtime and removing parts which were bad ideas and/or only around to support rustboot's various limitations and design choices. Rustc doesn't really *have* a tasking system yet -- there are pieces of the communication layer, but eholk is just getting spawn working this week -- so we're mostly doing "rewrite this subsystem" work now. There's been a fair amount of disjointed conversation over this matter, I'm hoping to consolidate what's on the agenda here. The "lots of tasks" issue still has two sub-parts: size and scheduling. We're going to approach "size" via "the other, more standard technique", which is to use a linked list of stack segments and never move them. We will most likely reuse the *exact* technique Go is using here, in the sense of trying to be ABI compatible (at least insofar as this is possible or makes sense) and possibly even use the same runtime support library. This approach is easier for LLVM to cope with (there's a GSoC student implementing it currently), and more tools understand it. It also makes stack segments recyclable between tasks, which should reduce overall memory pressure (equivalent to "shrinking" in our other model). We're also going to use the same "unified" approach to growth and cross-language calling as Go uses -- just grow into a "sufficiently big" segment that may be recycled between tasks in between C calls -- and that may well permit C to call back into rust (assuming it can provide a task* and can be made to play nice with unwinding and GC; see below). We're also going to approach "scheduling" via "the other, more standard technique", which is to use the posix (and before that, system V) schedulable user contexts and (sadly) again our own scheduler. Where ucontext isn't OS-provided, we'll provide our own implementation; it's not actually much more than a "save registers to structure A and load them from structure B" routine anyways, just with a standard API. And on some OSs -- specifically those where we discover threads are sufficiently cheap, if running on small stacks -- we're going to lean much more heavily on the OS thread scheduler. See below. We're going to approach the "isolation" pressure differently. Rather than permit tasks to share pointers at all, we'll be shifting to a stratification of memory based on unique pointers. This will mean that the only possible kinds of send are "move" and "deep copy". Move will happen everywhere in-OS-process, deep-copy between processes. Move semantics -- making a copy while indivisibly de-initializing the source of the copy -- are going to be the focus of a fair bit of work over the next while, and we're betting on them for the messaging/isolation system. Along with minimizing refcounting (and a host of other thorny semantic issues associated with accidental copying, such as environment capture and double-execution of destructors) this will permit tasks to migrate between threads. Or, put more simply, it will permit us to treat threads as an undifferentiated pool of N workers, and tasks as a pool of M work units; when a thread blocks in C code it will have no affect on the other tasks (other than temporarily using up a "large segment" of stack) and M>N runnable tasks should always "saturate" the threads (thus cores) with work. Moreover, when we're on an OS that has "really cheap threads" we can spin up N == M threads and fall into the case of 1:1 scheduling: back off and let the OS kernel do all the scheduling, have our scheduler always say "oh, just keep running the same task you're running" every time it checks at a yield point. On linux, for example, I believe that this may very well be more optimal than getting in the way with our own scheduler and ucontext logic. I'm not sure about other OSs but I'd like to retain this "dial" to be able to dial scheduling back into our runtime if we're on an OS with horrendously bad or otherwise expensive kernel threads. In the process of this change, we'll eliminate the concept of a domain from the language and runtime, and just model OS processes as OS processes. We'll still need some runtime support for interacting with subprocesses, of course, just avoid trying to mix metaphors. Moving to a pool-of-threads model should also permit leaning on "the other, more standard technique" for unwinding: the C++ unwinder (or a large part of it). Since a task blocked-in-C doesn't necessarily block any other tasks (they can run on other threads) we don't need to deschedule the unwinder, which was a large part of my concern for how it might be unusable in this role. We can just let unwinding run to completion (scheduler yield-points can always opt to not-yield when unwinding). A secondary concern has to do with double-faulting (failing within a destructor) but we'll cross that bridge when we come to it; I doubt the policy to call terminate() in C++ is so hard-wired into the unwinder that there are no possible ways of overriding it. Opinions on this welcome. (Incidentally, the more we drift away from "our own per-frame metadata tables" towards relying on stock components, the more I think using a conservative stack scanner for GC root-finding may be perfectly fine, obviating the need for per-frame GC info and explicit GC-safe points. But that's not terribly related to choice of concurrency strategy; just an interesting note for anyone following along the Saga Of Rust Frame Info) Our argument yesterday hit a breaking point when we were discussing the relationship between C++ unwind semantics and pthread cancellation. I think this was actually a red herring: 'fail' could only really map to 'pthread_cancel' in the case that we were hard-wired to a 1:1 scheduling model, always using a kernel thread for a task, which as I've said I would like to retain as only as an option (when on an OS with cheap / fast kernel threads) rather than a semantic requirement. And even if we *did* swallow that requirement, I was arguing yesterday (and I believe further googling today supports) the contention that pthread_cancel is just not a good fit for 'fail' (or kill). It will: (a) Only cancel at cancellation points, not "any instruction"; so it's probing the presence of a flag in the pthread library anyways. Not terribly different, cost-wise, from using our own flag. (b) Not run the C++ unwinder in a reliable, portable fashion; on some platforms this works but on some it does not, and boost has opted to not-present the interface for precisely this reason. Overall I don't think pthread_cancel was a productive avenue for the discussion and I'm sorry we wound up in it. I think it's more useful to talk about ways to make cooperative scheduling (== kill) points cheap enough to live with -- including options for unsafe loops that omit them -- even in the degenerate case where they always say "keep running the task you're running" (1:1 mapping to threads). At least until killed. Phew! Ok. Done. Comments? Preferences? Field datapoints to contribute? -Graydon From respindola at mozilla.com Wed May 25 11:39:07 2011 From: respindola at mozilla.com (Rafael Avila de Espindola) Date: Wed, 25 May 2011 14:39:07 -0400 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD4221.7030104@mozilla.com> References: <4DDD4221.7030104@mozilla.com> Message-ID: <4DDD4CCB.1020409@mozilla.com> On 11-05-25 01:53 PM, Graydon Hoare wrote: > Hi, > > We had a brief discussion on IRC yesterday that ended in me storming off > in a very unprofessional manner. I'd like to publicly apologize for that > behavior, it was not cool and had little to do with the conversation at > hand. My stress level was very high coming into work yesterday, and I > was letting personal life spill into work life. My fault, sorry. > > I'd also like to restart (or at least restate) parts of that discussion > here so we can actually get this worked out to everyone's satisfaction > (including Rafael, who clearly has strong feelings on the matter). This > will be a long rambly email full of back-story to set the stage; if you > have specific points to follow-up on, just snip those parts out for your > replies. > > > Preface > ~~~~~~~ > > We know (or at least, anyone who's poked at it knows) that the tasking > and threading model in rustboot was pretty unsatisfactory. It passed > some interesting tests ("millions of tasks, real cheap!") and had a > number of needs it was trying to meet -- it was not designed in > *complete* ignorance -- but it also imposed strange burdens that we'd > like to dispense with this time around. Hopefully without losing the > good stuff. > > So, some background "goals" in order to make this story make sense. The > three primary design pressures are: > > (1) Support *isolation*, in some useful sense, between tasks. That is, a > task should be able to reason locally about its data and code without > worrying whether other tasks are mucking with their own data and code. > With the exception of message-IO points and unsafe blocks, which > obviously involve the potential for non-isolated action. > > (2) Support *lots* of tasks. Millions. Such that a programmer has no > fear about making "too many" tasks in a system, if it decomposes nicely. > If for no other reason than support for isolation-based local reasoning > (though concurrent, or interleaved, or truly parallel execution is also > nice to exploit whenever it's surfaced this way). > > (3) Run at relatively high, but more importantly *predictable* > performance. As few magical parts as possible in the concurrency model. > Take the M:N performance-penalty if necessary to achieve the other 2 > goals, so long as there are no random peformance discontinuities in the > model. > > Concurrency model is intimately connected to memory model, unwinding, > gc, and several other things; so when I say we're going to be revisiting > design decisions in rustboot's concurrency model, I implicitly include > parts of the memory model too, and other parts. > > > The Past (rustboot) > ~~~~~~~~~~~~~~~~~~~ > > The "lots of tasks" pressure breaks down into two sub-issues: making > tasks small (in the sense of memory) and making them independently > scheduled. We approached the "small" issue via growable stacks (doubling > vectors with pointer-rewriting) and a very large dose of ugly magic for > doing calls "between stacks" (from rust to C). This had lots of > unfortunate fallout: debuggers and tools got upset, calling back into > rust code from C was mostly impossible, and to support it safely we'd > need to be flushing pointers to the stack and re-reading them > *constantly*, much more than just "make sure values are pinned > somewhere" necessary for GC. We approached the "scheduling" issue by > even *more* magic return-address patching during suspended C calls, and > a custom cooperative scheduler. > > The "isolation" pressure was approached by stratifying the heap memory > model into private and shared (between-tasks) memory, with the shared > stuff always immutable and acyclic. Sharing was not always possible -- > not between threads and not between processes -- but between > tasks-in-a-thread it could work, and we figured that scenario was > valuable to users. Cheap sending of shared bits between tasks in a > thread. Then we'd do a deep copy when we hit domain boundaries. > > But to support that scenario, tasks had to be pinned to threads. That > is, the concurrency scheme in rustboot involved tasks running within > domains (threads or processes; though the latter never materialized), > where the user explicitly constructed domains and injected threads into > the domain. Once spawned in a domain, a task could not leave it (be > migrated to another thread), because it might have pointers into the > "shared memory" of the domain. This pinning to domains has the > unfortunate performance characteristic of *requiring* a user to pay > attention to task:thread assignments; this could be a benefit in some > cases -- explicit control can be good -- but in many cases it seemed > bad, or at least over-complex. It's not just a matter of "having an M:N > scheduler in userspace" (which will, says the literature, always > underperform a 1:1 scheduler with the kernel involved) but also pinning > each individual task in M to a thread in N, such that one task blocking > (or even just monopolizing a core) could block or slow down a whole > group of tasks on the same thread. This is a usability hazard. > > > The Future (rustc) > ~~~~~~~~~~~~~~~~~~ > > Rustboot is dead (yay!) and we're in the process of working through the > leftover cruft in the runtime and removing parts which were bad ideas > and/or only around to support rustboot's various limitations and design > choices. Rustc doesn't really *have* a tasking system yet -- there are > pieces of the communication layer, but eholk is just getting spawn > working this week -- so we're mostly doing "rewrite this subsystem" work > now. There's been a fair amount of disjointed conversation over this > matter, I'm hoping to consolidate what's on the agenda here. > > The "lots of tasks" issue still has two sub-parts: size and scheduling. > > We're going to approach "size" via "the other, more standard technique", > which is to use a linked list of stack segments and never move them. We > will most likely reuse the *exact* technique Go is using here, in the > sense of trying to be ABI compatible (at least insofar as this is > possible or makes sense) and possibly even use the same runtime support > library. This approach is easier for LLVM to cope with (there's a GSoC > student implementing it currently), and more tools understand it. It > also makes stack segments recyclable between tasks, which should reduce > overall memory pressure (equivalent to "shrinking" in our other model). > We're also going to use the same "unified" approach to growth and > cross-language calling as Go uses -- just grow into a "sufficiently big" > segment that may be recycled between tasks in between C calls -- and > that may well permit C to call back into rust (assuming it can provide a > task* and can be made to play nice with unwinding and GC; see below). > > We're also going to approach "scheduling" via "the other, more standard > technique", which is to use the posix (and before that, system V) > schedulable user contexts and (sadly) again our own > scheduler. Where ucontext isn't OS-provided, we'll provide our own > implementation; it's not actually much more than a "save registers to > structure A and load them from structure B" routine anyways, just with a > standard API. And on some OSs -- specifically those where we discover > threads are sufficiently cheap, if running on small stacks -- we're > going to lean much more heavily on the OS thread scheduler. See below. > > We're going to approach the "isolation" pressure differently. Rather > than permit tasks to share pointers at all, we'll be shifting to a > stratification of memory based on unique pointers. This will mean that > the only possible kinds of send are "move" and "deep copy". Move will > happen everywhere in-OS-process, deep-copy between processes. Move > semantics -- making a copy while indivisibly de-initializing the source > of the copy -- are going to be the focus of a fair bit of work over the > next while, and we're betting on them for the messaging/isolation system. > > Along with minimizing refcounting (and a host of other thorny semantic > issues associated with accidental copying, such as environment capture > and double-execution of destructors) this will permit tasks to migrate > between threads. Or, put more simply, it will permit us to treat threads > as an undifferentiated pool of N workers, and tasks as a pool of M work > units; when a thread blocks in C code it will have no affect on the > other tasks (other than temporarily using up a "large segment" of stack) > and M>N runnable tasks should always "saturate" the threads (thus cores) > with work. Moreover, when we're on an OS that has "really cheap threads" > we can spin up N == M threads and fall into the case of 1:1 scheduling: > back off and let the OS kernel do all the scheduling, have our scheduler > always say "oh, just keep running the same task you're running" every > time it checks at a yield point. On linux, for example, I believe that > this may very well be more optimal than getting in the way with our own > scheduler and ucontext logic. I'm not sure about other OSs but I'd like > to retain this "dial" to be able to dial scheduling back into our > runtime if we're on an OS with horrendously bad or otherwise expensive > kernel threads. > > In the process of this change, we'll eliminate the concept of a domain > from the language and runtime, and just model OS processes as OS > processes. We'll still need some runtime support for interacting with > subprocesses, of course, just avoid trying to mix metaphors. > > Moving to a pool-of-threads model should also permit leaning on "the > other, more standard technique" for unwinding: the C++ unwinder (or a > large part of it). Since a task blocked-in-C doesn't necessarily block > any other tasks (they can run on other threads) we don't need to > deschedule the unwinder, which was a large part of my concern for how it > might be unusable in this role. We can just let unwinding run to > completion (scheduler yield-points can always opt to not-yield when > unwinding). A secondary concern has to do with double-faulting (failing > within a destructor) but we'll cross that bridge when we come to it; I > doubt the policy to call terminate() in C++ is so hard-wired into the > unwinder that there are no possible ways of overriding it. Opinions on > this welcome. > > (Incidentally, the more we drift away from "our own per-frame metadata > tables" towards relying on stock components, the more I think using a > conservative stack scanner for GC root-finding may be perfectly fine, > obviating the need for per-frame GC info and explicit GC-safe points. > But that's not terribly related to choice of concurrency strategy; just > an interesting note for anyone following along the Saga Of Rust Frame Info) > > Our argument yesterday hit a breaking point when we were discussing the > relationship between C++ unwind semantics and pthread cancellation. I > think this was actually a red herring: 'fail' could only really map to > 'pthread_cancel' in the case that we were hard-wired to a 1:1 scheduling > model, always using a kernel thread for a task, which as I've said I > would like to retain as only as an option (when on an OS with cheap / > fast kernel threads) rather than a semantic requirement. And even if we > *did* swallow that requirement, I was arguing yesterday (and I believe > further googling today supports) the contention that pthread_cancel is > just not a good fit for 'fail' (or kill). It will: > > (a) Only cancel at cancellation points, not "any instruction"; so it's > probing the presence of a flag in the pthread library anyways. Not > terribly different, cost-wise, from using our own flag. > > (b) Not run the C++ unwinder in a reliable, portable fashion; > on some platforms this works but on some it does not, and boost > has opted to not-present the interface for precisely this reason. > > Overall I don't think pthread_cancel was a productive avenue for the > discussion and I'm sorry we wound up in it. I think it's more useful to > talk about ways to make cooperative scheduling (== kill) points cheap > enough to live with -- including options for unsafe loops that omit them > -- even in the degenerate case where they always say "keep running the > task you're running" (1:1 mapping to threads). At least until killed. > > Phew! Ok. Done. Comments? Preferences? Field datapoints to contribute? My main concern is that we have to keep costs and priorities in mind. Yesterday you described C/C++ as crazy/insane. It so happens that that insanity is used to implement all the kernels that we are targeting. It is used by every program I use (sometimes with an extra interpreter on top) and is used by the compiler framework we selected. If all those are crazy, I would say that that is not sufficient market for sanity. Programmers like to complain about the hard to reason about semantics of C++, but they do get the job done. Look at the cost we are proposing to get something a bit better: every loop will have a check to know if it should yield or not. Is this really worth it? What is basically doing is trying to give task a similar semantics to what threads already have. About datapoits. No, I don't have them. It is too early for that. We don't need tasks right now. C, C++ and Java have come a *long* way without them and they have a significant market share. With the changes on how channel work, tasks are not a fundamentally new concept. You even acknowledge that they could be implemented at 1:1. Given that we must also provide access to raw threads, lets start with that. My proposal is * Implement OS threads * Have a notion of a task, which is a language abstraction that provides a subset of the features that are common to all operating systems. You could not, for example, send a signal to a task. * Implement them now with a 1:1 mapping. Once the language is more mature, we can consider what is next. It might be another "insanity" that is known to work, like thread pools and continuation style code or it might be user space tasks. The important point is that once we have web servers, GUI toolkits, image decoders, audio APIs, etc available in rust, we will have data points. It might be that, like what happened to java, using 1:1 is the best. If not, we can then implement M:N as an optimization. > -Graydon Cheers, Rafael From graydon at mozilla.com Wed May 25 14:00:19 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 14:00:19 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD4CCB.1020409@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> Message-ID: <4DDD6DE3.7050409@mozilla.com> On 11-05-25 11:39 AM, Rafael Avila de Espindola wrote: > My main concern is that we have to keep costs and priorities in mind. > > Yesterday you described C/C++ as crazy/insane. It so happens that that > insanity is used to implement all the kernels that we are targeting. It > is used by every program I use (sometimes with an extra interpreter on > top) and is used by the compiler framework we selected. If all those are > crazy, I would say that that is not sufficient market for sanity. I'm sorry to have used such language; clearly "crazy" is too vague to be useful. I'll be more specific in this email. C, C++ and Java (most things exposing "thread") don't use threads as isolation primitives. That is, threads share memory. The language doesn't support private memory terribly well; even TLS is "up to you" to use safely, there's no enforcement. This means that users never use threads for isolation (or if they do, it doesn't work). They use them for other things, and tend to use only modest numbers of them. These uses have two major sub-cases: - One thread per core, more or less, to saturate cores. Small enough N that nearly any scheduler will be fine. Definitely works for 1:1. This is "parallelism". - One thread per IO connection, using "parallelism" to handle concurrent IO problems. Weak, but widely done. This can get bad depending on IO abstraction. If we wire in 1:1, on *some* OSs this will be high performance, on some it will not. You can't wave this away by pointing at C/C++ programs; they hit this wall too and regularly have to write custom event loops with IO+epoll pumps. > Programmers like to complain about the hard to reason about semantics of > C++, but they do get the job done. No. Programmers complain to a point, then they abandon the language. Which happens all the time. Know what the stock answer is in "how do I implement this in C++" at mozilla? "Use JS". > Look at the cost we are proposing to > get something a bit better: every loop will have a check to know if it > should yield or not. Is this really worth it? What is basically doing is > trying to give task a similar semantics to what threads already have. Yes. And every function call has a check to see if it has to grow the stack. We can plausibly collapse these two costs into one in some cases. In cases where we can't, and it's a performance bottleneck, we can offer some unsafe variant; but you're assuming a particular datapoint without evidence just as you accuse me of. We don't know it'll be prohibitively slow any more than we know bounds checking vector accesses is prohibitively slow. I'm not ok with this argument. "Do what threads do" is not an answer; threads don't cancel anywhere they're not checking a flag. They support abrupt killing, but that's even worse: almost guaranteed to corrupt memory. How would we cancel a runaway thread in your proposal? The issue re-occurs. When I criticized C and C++ on this point (and the memory model) yesterday, it was because they sweep these issues under the rug. They're safety issues but the language tells users "you're on your own". I'm not interested in offering a memory model or threading model as weak as that in C++. It makes too few safety promises. > About datapoits. No, I don't have them. It is too early for that. We > don't need tasks right now. C, C++ and Java have come a *long* way > without them and they have a significant market share. They are very, very difficult to program safely. Show me a C or C++ program that isn't littered with memory-safety-corrupting failure modes. Show me a highly concurrent one that doesn't have subtle data races (not just at I/O points). Show me a java program that doesn't have the next worst thing, random take-down-the-whole-process failures due to uncaught NPEs, due to lack of isolation. These are problems I'm trying to address. You keep proposing stances along the lines of "we must sacrifice safety for speed to compete with C++". I'm not willing to do that. Keep your proposals to the space of "safe" (isolation-preserving, including resource-bounds and runaway tasks, portable, diagnostic-friendly) and we can talk; otherwise this is not fruitful. > With the changes on how channel work, tasks are not a fundamentally new > concept. You even acknowledge that they could be implemented at 1:1. With yield points, yes. On OSs that support lots of threads for cheap, that could be true. Without yield points it's unsafe, and we do not have evidence that it'll perform well on every OS. And it doesn't meet other useful cases, see below: > Given that we must also provide access to raw threads, lets start with > that. My proposal is > > * Implement OS threads > * Have a notion of a task, which is a language abstraction that provides > a subset of the features that are common to all operating systems. You > could not, for example, send a signal to a task. > * Implement them now with a 1:1 mapping. No. I am not comfortable with that. - It needs yield points anyways to be safe (see above), and if those are never-taken branches when running 1:1 it will perform the same if it's "M:N running 1:1" as if it's "only 1:1 supported"; so there is IMO no performance argument here. - It's very likely to wind up depending on some aspect of 1:1 if built as you suggest. - It's not clear that it'll be efficient everywhere. Not everything has cheap threads. - It doesn't handle running "task per connection" on IO-heavy loads with a single (or small number) of IO multiplexing threads, which is the appropriate IO model on some OSs. - It doesn't handle debugging by running M:1 and single stepping without debugger heroism (and gets worse if we are trying to tap the message queues for tracing inter-task communication). - It doesn't handle bounding the number of threads a task creates while permitting it to spawn extra tasks; IOW every task turns into a real commitment of CPU resources, scheduled with every other. I want to be able to run a rust program and say "use no more than 5 threads, I don't care what you're trying to do". Threads are a system-wide resource. For these reasons I would like to be M:N by default with 1:1 as a special case. It may be that it's *such* a special case that it's the default mode on some, maybe even all OSs. I'd be perfectly pleased if we never have to reason deeply about "scheduling algorithms" in rust's runtime; the rust scheduler is currently PRNG-driven for a reason: it's the dumbest scheduling algorithm around. I know I'll never be able to write a competitive scheduler. That's not the point. > Once the language is more mature, we can consider what is next. It might > be another "insanity" that is known to work, like thread pools and > continuation style code or it might be user space tasks. Continuation-style code is a non-starter for me. You've never explained how it could be made safe, and users complain bitterly anyways. It's strictly worse language UI than having a stack. Users want to write mostly-sequential logic. > The important point is that once we have web servers, GUI toolkits, > image decoders, audio APIs, etc available in rust, we will have data > points. It might be that, like what happened to java, using 1:1 is the > best. If not, we can then implement M:N as an optimization. They weren't approaching the same problem we are. They were just trying to make "conventional threads go fast"; I agree they probably wound up at the right place with 1:1 always (though weirdly, their IO model was part of what forced many OS vendors to *make* 1:1 go fast). If we weren't trying to make tasks a ubiquitous isolation (and IO multiplexing) concept, I'd probably agree with you here. But I think we are pursuing different goals. -Graydon From eholk at mozilla.com Wed May 25 14:27:51 2011 From: eholk at mozilla.com (Eric Holk) Date: Wed, 25 May 2011 14:27:51 -0700 (PDT) Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD6DE3.7050409@mozilla.com> Message-ID: <1097091793.29387.1306358871057.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> ----- Original Message ----- > From: "Graydon Hoare" > To: rust-dev at mozilla.org > Sent: Wednesday, May 25, 2011 2:00:19 PM > Subject: Re: [rust-dev] threads, tasks, scheduling, migrating, failing > > On 11-05-25 11:39 AM, Rafael Avila de Espindola wrote: > > > Look at the cost we are proposing to > > get something a bit better: every loop will have a check to know if > > it > > should yield or not. Is this really worth it? What is basically > > doing is > > trying to give task a similar semantics to what threads already > > have. > > Yes. And every function call has a check to see if it has to grow the > stack. We can plausibly collapse these two costs into one in some > cases. > In cases where we can't, and it's a performance bottleneck, we can > offer > some unsafe variant; but you're assuming a particular datapoint > without > evidence just as you accuse me of. We don't know it'll be > prohibitively > slow any more than we know bounds checking vector accesses is > prohibitively slow. I'm not ok with this argument. I'm assuming this yield flag is primarily for timeouts, correct? What other ways are there to set the yield flag? I was reading some about Erlang today and it looks like they way they handle timeouts is to let each program run for a certain number of "reductions." If we wanted to fix the timeslice at compile time, we could use a similar approach. We could simply insert yields every N instructions, and for loops we could say "this loop body is 15 instructions, so we need to make sure to yield every N/15 iterations." It seems like this would make the "yield flag" mostly unnecessary. I guess the other reason we need a yield flag is to let tasks kill each other. Is there a chance we could handle this in the scheduler? I guess it would probably amount to doing the yield flag check somewhere else. That said, look at it this way makes me think it's probably not any more expensive than vector bounds checks. -Eric From mike.capp at gmail.com Wed May 25 14:42:56 2011 From: mike.capp at gmail.com (Mike Capp) Date: Wed, 25 May 2011 22:42:56 +0100 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD6DE3.7050409@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> Message-ID: On 25 May 2011 22:00, Graydon Hoare wrote: > This means that users never use threads for isolation (or if they do, it > doesn't work). Possible counterexample: we use explicit and implicit TLS quite a bit in (ASP.NET) server-side code; IIS has a thread-per-request model with a long-running process. I imagine multiplexing FastCGI apps might do something similar. Isolation per request-being-handled is a very natural thing to want in that context, and in idiomatic Rust that would presumably apply to a family of tasks rather than a single one. I'm not sure what your criteria are for "doesn't work". I suspect it works better than everyone rolling their own TLS implementation on top of process-wide global state. Cheers, Mike From graydon at mozilla.com Wed May 25 14:45:35 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 14:45:35 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> Message-ID: <4DDD787F.8090400@mozilla.com> On 11-05-25 02:42 PM, Mike Capp wrote: > On 25 May 2011 22:00, Graydon Hoare wrote: > >> This means that users never use threads for isolation (or if they do, it >> doesn't work). > > Possible counterexample: we use explicit and implicit TLS quite a bit > in (ASP.NET) server-side code; IIS has a thread-per-request model with > a long-running process. I imagine multiplexing FastCGI apps might do > something similar. Isolation per request-being-handled is a very > natural thing to want in that context, and in idiomatic Rust that > would presumably apply to a family of tasks rather than a single one. > > I'm not sure what your criteria are for "doesn't work". I suspect it > works better than everyone rolling their own TLS implementation on top > of process-wide global state. Sorry. The criteria for "doesn't work" is "isn't actually isolated"; it's idiomatic isolation, isolation-by-convention. The language isn't providing any help. Pointers in thread A can point into data currently operated-on by thread B, thus racing with B. -Graydon From gal at mozilla.com Wed May 25 14:50:42 2011 From: gal at mozilla.com (Andreas Gal) Date: Wed, 25 May 2011 23:50:42 +0200 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD4CCB.1020409@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> Message-ID: > My main concern is that we have to keep costs and priorities in mind. > > Yesterday you described C/C++ as crazy/insane. It so happens that that insanity is used to implement all the kernels that we are targeting. It is used by every program I use (sometimes with an extra interpreter on top) and is used by the compiler framework we selected. If all those are crazy, I would say that that is not sufficient market for sanity. C is a low-level programming language that was invented for a specific purpose: more quickly implement an OS on a machine with a few dozen kilobytes of RAM. C++ is an abomination that was built on this shaky foundation to increase the (almost non-existant) abstraction level of C. It adds some nice capabilities and better abstractions, but is at heard still an unsafe and unsound language. I am saying this despite the fact that I have been programming in C++ since the cfront days. I am not dissing C++ because I am some Haskell purist. I am dissing C++ because I have written exactly the kind of programs you talked about in C/C++ (Linux, VMs, SpiderMonkey) and it totally blows. C/C++ are an almost guaranteed way to get exploited. I strongly believe that for the future of the Mozilla project dependability and security are the key factors, and performance is a third factor only (and I am saying that as a beancounter/compiler guy who loves optimization and performance). > > Programmers like to complain about the hard to reason about semantics of C++, but they do get the job done. Look at the cost we are proposing to get something a bit better: every loop will have a check to know if it should yield or not. Is this really worth it? What is basically doing is trying to give task a similar semantics to what threads already have. C++ does not get the job done. Not at all. It is simply the least evil choice. Java is huge and still pretty dog slow. C# is proprietary Microsoft secret sauce. And then there is obscure stuff like Haskell and Ocaml and Fortran and Forth and Pascal, none of which have grownup tools. People don't chose C++ because they want to. They chose it because there is no alternative. Thats why we started rust. Do have an alternative. > > About datapoits. No, I don't have them. It is too early for that. We don't need tasks right now. C, C++ and Java have come a *long* way without them and they have a significant market share. Yes, they came a long way, until parallelism finally happened. Writing efficient and safe parallel code in C/C++/Java is simply not feasible. You are trying to built a fighter jet out of matchsticks. > > With the changes on how channel work, tasks are not a fundamentally new concept. You even acknowledge that they could be implemented at 1:1. Given that we must also provide access to raw threads, lets start with that. My proposal is > > * Implement OS threads > * Have a notion of a task, which is a language abstraction that provides a subset of the features that are common to all operating systems. You could not, for example, send a signal to a task. > * Implement them now with a 1:1 mapping. I think this is the completely wrong way to go. Threads don't offer any decent resource management functionality at the OS-level if we allow arbitrary termination. For that we would have to go with processes, which are very heavyweight. A simple check in a loop can be highly optimized and is well worth it. Every Java VM uses safepoints like this in loops. You can dereference a dummy page and invalidate the TLB/page table entry to make the loop yield. The cost is almost zero in a modern pipelined CPU. Lets make rust as fast as possible, without giving up an inch of safety. The world doesn't need another C/C++. It really needs a C/C++ alternative. > > Once the language is more mature, we can consider what is next. It might be another "insanity" that is known to work, like thread pools and continuation style code or it might be user space tasks. > > The important point is that once we have web servers, GUI toolkits, image decoders, audio APIs, etc available in rust, we will have data points. It might be that, like what happened to java, using 1:1 is the best. If not, we can then implement M:N as an optimization. We will not be able to build a system as complex as a web server in a parallel and dependable way if we have to use bare metal abstractions like pthreads and signals and shared memory and what not. That simply won't happen. We have proof of that (Gecko is almost exclusively single threaded, not by choice, but simply because the complexity cost is unbearable to change that). Andreas > >> -Graydon > > Cheers, > Rafael > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev From igor at mir2.org Wed May 25 14:54:16 2011 From: igor at mir2.org (Igor Bukanov) Date: Wed, 25 May 2011 23:54:16 +0200 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD6DE3.7050409@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> Message-ID: On 25 May 2011 23:00, Graydon Hoare wrote: > ?- One thread per IO connection, using "parallelism" to handle > ? ?concurrent IO problems. Weak, but widely done. This can get bad > ? ?depending on IO abstraction. If we wire in 1:1, on *some* OSs this > ? ?will be high performance, on some it will not. You can't wave this > ? ?away by pointing at C/C++ programs; they hit this wall too and > ? ?regularly have to write custom event loops with IO+epoll pumps. And when one forced to code the event loop it is hard to use the native threads to saturate the cores. Very few applications uses that model and the code that tries it looks messy to say the least. So if Rust can do this right, it would be extremely attractive point of the language. From mike.capp at gmail.com Wed May 25 15:12:19 2011 From: mike.capp at gmail.com (Mike Capp) Date: Wed, 25 May 2011 23:12:19 +0100 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD787F.8090400@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDD787F.8090400@mozilla.com> Message-ID: On 25 May 2011 22:45, Graydon Hoare wrote: > Sorry. The criteria for "doesn't work" is "isn't actually isolated"; it's > idiomatic isolation, isolation-by-convention. The language isn't providing > any help. Fair enough. I only raised this because with the mooted disappearance of "domain" as a language concept, I can't see an obvious place to put this kind of semi-shared state (e.g. data for one of several requests being serviced concurrently by the process) which would allow even idiomatic isolation. Mike From igor at mir2.org Wed May 25 15:17:19 2011 From: igor at mir2.org (Igor Bukanov) Date: Thu, 26 May 2011 00:17:19 +0200 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD4CCB.1020409@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> Message-ID: On 25 May 2011 20:39, Rafael Avila de Espindola > every loop will have a check to know if it should yield or not. Is this really worth it? In tracemonkey every jitted loop has such checkpoint to see if the script was canceled. Initially the worry was that it would very costly and the plan was to implement runtime code-patching to inject the checks only when necessary. But testing revealed that even with very tight loops the performance impact was negligible. From noelgrandin at gmail.com Wed May 25 15:37:50 2011 From: noelgrandin at gmail.com (Noel Grandin) Date: Thu, 26 May 2011 00:37:50 +0200 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD4221.7030104@mozilla.com> References: <4DDD4221.7030104@mozilla.com> Message-ID: Interesting. My 2c as a lurker: I support this approach, mostly because it's how I thought Rust working anyhow (I managed to miss the domain stuff). Keeping memory strictly per-task also make parallel GC a lot easier, which makes pause times a non-problem. The periodic pauses that the Java VM suffers from is a direct consequence of having a shared heap. Yes, there are ways of doing parallel GC in Java (e.g. Metronome), but they are costly. Java also doesn't really support cancelling threads, for precisely the reason that it's not well-supported in the different OS libraries - implementing it properly basically boils down to checking some kind of flag periodically at safe points in the code. It is however useful to have some kind of interrupt() call for threads/tasks, because the runtime library will need to know how to unblock currently blocking calls. Regards, Noel. On Wed, May 25, 2011 at 19:53, Graydon Hoare wrote: > Hi, > > We had a brief discussion on IRC yesterday that ended in me storming off in > a very unprofessional manner. I'd like to publicly apologize for that > behavior, it was not cool and had little to do with the conversation at > hand. My stress level was very high coming into work yesterday, and I was > letting personal life spill into work life. My fault, sorry. > > I'd also like to restart (or at least restate) parts of that discussion here > so we can actually get this worked out to everyone's satisfaction (including > Rafael, who clearly has strong feelings on the matter). This will be a long > rambly email full of back-story to set the stage; if you have specific > points to follow-up on, just snip those parts out for your replies. > > > Preface > ~~~~~~~ > > We know (or at least, anyone who's poked at it knows) that the tasking and > threading model in rustboot was pretty unsatisfactory. It passed some > interesting tests ("millions of tasks, real cheap!") and had a number of > needs it was trying to meet -- it was not designed in *complete* ignorance > -- but it also imposed strange burdens that we'd like to dispense with this > time around. Hopefully without losing the good stuff. > > So, some background "goals" in order to make this story make sense. The > three primary design pressures are: > > ?(1) Support *isolation*, in some useful sense, between tasks. That is, a > task should be able to reason locally about its data and code without > worrying whether other tasks are mucking with their own data and code. With > the exception of message-IO points and unsafe blocks, which obviously > involve the potential for non-isolated action. > > ?(2) Support *lots* of tasks. Millions. Such that a programmer has no fear > about making "too many" tasks in a system, if it decomposes nicely. If for > no other reason than support for isolation-based local reasoning (though > concurrent, or interleaved, or truly parallel execution is also nice to > exploit whenever it's surfaced this way). > > ?(3) Run at relatively high, but more importantly *predictable* performance. > As few magical parts as possible in the concurrency model. Take the M:N > performance-penalty if necessary to achieve the other 2 goals, so long as > there are no random peformance discontinuities in the model. > > Concurrency model is intimately connected to memory model, unwinding, gc, > and several other things; so when I say we're going to be revisiting design > decisions in rustboot's concurrency model, I implicitly include parts of the > memory model too, and other parts. > > > The Past (rustboot) > ~~~~~~~~~~~~~~~~~~~ > > The "lots of tasks" pressure breaks down into two sub-issues: making tasks > small (in the sense of memory) and making them independently scheduled. We > approached the "small" issue via growable stacks (doubling vectors with > pointer-rewriting) and a very large dose of ugly magic for doing calls > "between stacks" (from rust to C). This had lots of unfortunate fallout: > debuggers and tools got upset, calling back into rust code from C was mostly > impossible, and to support it safely we'd need to be flushing pointers to > the stack and re-reading them *constantly*, much more than just "make sure > values are pinned somewhere" necessary for GC. We approached the > "scheduling" issue by even *more* magic return-address patching during > suspended C calls, and a custom cooperative scheduler. > > The "isolation" pressure was approached by stratifying the heap memory model > into private and shared (between-tasks) memory, with the shared stuff always > immutable and acyclic. Sharing was not always possible -- not between > threads and not between processes -- but between tasks-in-a-thread it could > work, and we figured that scenario was valuable to users. Cheap sending of > shared bits between tasks in a thread. Then we'd do a deep copy when we hit > domain boundaries. > > But to support that scenario, tasks had to be pinned to threads. That is, > the concurrency scheme in rustboot involved tasks running within domains > (threads or processes; though the latter never materialized), where the user > explicitly constructed domains and injected threads into the domain. Once > spawned in a domain, a task could not leave it (be migrated to another > thread), because it might have pointers into the "shared memory" of the > domain. This pinning to domains has the unfortunate performance > characteristic of *requiring* a user to pay attention to task:thread > assignments; this could be a benefit in some cases -- explicit control can > be good -- but in many cases it seemed bad, or at least over-complex. It's > not just a matter of "having an M:N scheduler in userspace" (which will, > says the literature, always underperform a 1:1 scheduler with the kernel > involved) but also pinning each individual task in M to a thread in N, such > that one task blocking (or even just monopolizing a core) could block or > slow down a whole group of tasks on the same thread. This is a usability > hazard. > > > The Future (rustc) > ~~~~~~~~~~~~~~~~~~ > > Rustboot is dead (yay!) and we're in the process of working through the > leftover cruft in the runtime and removing parts which were bad ideas and/or > only around to support rustboot's various limitations and design choices. > Rustc doesn't really *have* a tasking system yet -- there are pieces of the > communication layer, but eholk is just getting spawn working this week -- so > we're mostly doing "rewrite this subsystem" work now. There's been a fair > amount of disjointed conversation over this matter, I'm hoping to > consolidate what's on the agenda here. > > The "lots of tasks" issue still has two sub-parts: size and scheduling. > > We're going to approach "size" via "the other, more standard technique", > which is to use a linked list of stack segments and never move them. We will > most likely reuse the *exact* technique Go is using here, in the sense of > trying to be ABI compatible (at least insofar as this is possible or makes > sense) and possibly even use the same runtime support library. This approach > is easier for LLVM to cope with (there's a GSoC student implementing it > currently), and more tools understand it. It also makes stack segments > recyclable between tasks, which should reduce overall memory pressure > (equivalent to "shrinking" in our other model). We're also going to use the > same "unified" approach to growth and cross-language calling as Go uses -- > just grow into a "sufficiently big" segment that may be recycled between > tasks in between C calls -- and that may well permit C to call back into > rust (assuming it can provide a task* and can be made to play nice with > unwinding and GC; see below). > > We're also going to approach "scheduling" via "the other, more standard > technique", which is to use the posix (and before that, system V) > schedulable user contexts and (sadly) again our own scheduler. > Where ucontext isn't OS-provided, we'll provide our own implementation; it's > not actually much more than a "save registers to structure A and load them > from structure B" routine anyways, just with a standard API. And on some OSs > -- specifically those where we discover threads are sufficiently cheap, if > running on small stacks -- we're going to lean much more heavily on the OS > thread scheduler. See below. > > We're going to approach the "isolation" pressure differently. Rather than > permit tasks to share pointers at all, we'll be shifting to a stratification > of memory based on unique pointers. This will mean that the only possible > kinds of send are "move" and "deep copy". Move will happen everywhere > in-OS-process, deep-copy between processes. Move semantics -- making a copy > while indivisibly de-initializing the source of the copy -- are going to be > the focus of a fair bit of work over the next while, and we're betting on > them for the messaging/isolation system. > > Along with minimizing refcounting (and a host of other thorny semantic > issues associated with accidental copying, such as environment capture and > double-execution of destructors) this will permit tasks to migrate between > threads. Or, put more simply, it will permit us to treat threads as an > undifferentiated pool of N workers, and tasks as a pool of M work units; > when a thread blocks in C code it will have no affect on the other tasks > (other than temporarily using up a "large segment" of stack) and M>N > runnable tasks should always "saturate" the threads (thus cores) with work. > Moreover, when we're on an OS that has "really cheap threads" we can spin up > N == M threads and fall into the case of 1:1 scheduling: back off and let > the OS kernel do all the scheduling, have our scheduler always say "oh, just > keep running the same task you're running" every time it checks at a yield > point. On linux, for example, I believe that this may very well be more > optimal than getting in the way with our own scheduler and ucontext logic. > I'm not sure about other OSs but I'd like to retain this "dial" to be able > to dial scheduling back into our runtime if we're on an OS with horrendously > bad or otherwise expensive kernel threads. > > In the process of this change, we'll eliminate the concept of a domain from > the language and runtime, and just model OS processes as OS processes. We'll > still need some runtime support for interacting with subprocesses, of > course, just avoid trying to mix metaphors. > > Moving to a pool-of-threads model should also permit leaning on "the other, > more standard technique" for unwinding: the C++ unwinder (or a large part of > it). Since a task blocked-in-C doesn't necessarily block any other tasks > (they can run on other threads) we don't need to deschedule the unwinder, > which was a large part of my concern for how it might be unusable in this > role. We can just let unwinding run to completion (scheduler yield-points > can always opt to not-yield when unwinding). A secondary concern has to do > with double-faulting (failing within a destructor) but we'll cross that > bridge when we come to it; I doubt the policy to call terminate() in C++ is > so hard-wired into the unwinder that there are no possible ways of > overriding it. Opinions on this welcome. > > (Incidentally, the more we drift away from "our own per-frame metadata > tables" towards relying on stock components, the more I think using a > conservative stack scanner for GC root-finding may be perfectly fine, > obviating the need for per-frame GC info and explicit GC-safe points. But > that's not terribly related to choice of concurrency strategy; just an > interesting note for anyone following along the Saga Of Rust Frame Info) > > Our argument yesterday hit a breaking point when we were discussing the > relationship between C++ unwind semantics and pthread cancellation. I think > this was actually a red herring: 'fail' could only really map to > 'pthread_cancel' in the case that we were hard-wired to a 1:1 scheduling > model, always using a kernel thread for a task, which as I've said I would > like to retain as only as an option (when on an OS with cheap / fast kernel > threads) rather than a semantic requirement. And even if we *did* swallow > that requirement, I was arguing yesterday (and I believe further googling > today supports) the contention that pthread_cancel is just not a good fit > for 'fail' (or kill). It will: > > ?(a) Only cancel at cancellation points, not "any instruction"; so it's > ? ? ?probing the presence of a flag in the pthread library anyways. Not > ? ? ?terribly different, cost-wise, from using our own flag. > > ?(b) Not run the C++ unwinder in a reliable, portable fashion; > ? ? ?on some platforms this works but on some it does not, and boost > ? ? ?has opted to not-present the interface for precisely this reason. > > Overall I don't think pthread_cancel was a productive avenue for the > discussion and I'm sorry we wound up in it. I think it's more useful to talk > about ways to make cooperative scheduling (== kill) points cheap enough to > live with -- including options for unsafe loops that omit them -- even in > the degenerate case where they always say "keep running the task you're > running" (1:1 mapping to threads). At least until killed. > > Phew! Ok. Done. Comments? Preferences? Field datapoints to contribute? > > -Graydon > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > From respindola at mozilla.com Wed May 25 17:14:33 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Wed, 25 May 2011 20:14:33 -0400 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <1097091793.29387.1306358871057.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> References: <1097091793.29387.1306358871057.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> Message-ID: <4DDD9B69.6080309@mozilla.com> > I'm assuming this yield flag is primarily for timeouts, correct? What other ways are there to set the yield flag? > > I was reading some about Erlang today and it looks like they way they handle timeouts is to let each program run for a certain number of "reductions." If we wanted to fix the timeslice at compile time, we could use a similar approach. We could simply insert yields every N instructions, and for loops we could say "this loop body is 15 instructions, so we need to make sure to yield every N/15 iterations." It seems like this would make the "yield flag" mostly unnecessary. Remember that we are producing LLVM and that LLVM has no idea what a call to yield is. > I guess the other reason we need a yield flag is to let tasks kill each other. Is there a chance we could handle this in the scheduler? I guess it would probably amount to doing the yield flag check somewhere else. That said, look at it this way makes me think it's probably not any more expensive than vector bounds checks. > > -Eric > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev Cheers, Rafael From graydon at mozilla.com Wed May 25 17:18:00 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 17:18:00 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD9B69.6080309@mozilla.com> References: <1097091793.29387.1306358871057.JavaMail.root@zimbra1.shared.sjc1.mozilla.com> <4DDD9B69.6080309@mozilla.com> Message-ID: <4DDD9C38.8030508@mozilla.com> On 11-05-25 05:14 PM, Rafael ?vila de Esp?ndola wrote: >> I'm assuming this yield flag is primarily for timeouts, correct? What >> other ways are there to set the yield flag? >> >> I was reading some about Erlang today and it looks like they way they >> handle timeouts is to let each program run for a certain number of >> "reductions." If we wanted to fix the timeslice at compile time, we >> could use a similar approach. We could simply insert yields every N >> instructions, and for loops we could say "this loop body is 15 >> instructions, so we need to make sure to yield every N/15 iterations." >> It seems like this would make the "yield flag" mostly unnecessary. > > Remember that we are producing LLVM and that LLVM has no idea what a > call to yield is. Yeah. This sort of thing doesn't tend to work great anyways; trying to calibrate instruction counts to time is mostly impossible at a distance, and the extra machinery in the loop winds up costing just as much as polling a flag. I'm willing to handle unsafe loops and/or fixed-iteration-count (guaranteed not to diverge) loops explicitly in the language, if this is a concern. Even add an optimization pass that tries to sniff out fixed iteration counts and convert loop types / eliminate checks. In practice I just don't think we have reason to believe this is a performance killer. It hasn't been where it's done elsewhere. -Graydon From respindola at mozilla.com Wed May 25 17:27:52 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Wed, 25 May 2011 20:27:52 -0400 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> Message-ID: <4DDD9E88.9030001@mozilla.com> >> >> About datapoits. No, I don't have them. It is too early for that. >> We don't need tasks right now. C, C++ and Java have come a *long* >> way without them and they have a significant market share. > > Yes, they came a long way, until parallelism finally happened. > Writing efficient and safe parallel code in C/C++/Java is simply not > feasible. You are trying to built a fighter jet out of matchsticks. Insistently, the F35 is another good example of something built with c++. The kernel of the 3 systems we are targeting is written in C or C++. It is hard to imagine us being more scalable than the kernel. Having high parallelism in the hardware is a good argument for threads! Implementing tasks makes sense when the programmer wants to abstract more things happening at the same time than really are. If my new machine has 512 register sets (HT), I need 512 threads to use them (unless the CPU knows what a rust task is :-) ). >> The important point is that once we have web servers, GUI toolkits, >> image decoders, audio APIs, etc available in rust, we will have >> data points. It might be that, like what happened to java, using >> 1:1 is the best. If not, we can then implement M:N as an >> optimization. > > We will not be able to build a system as complex as a web server in a > parallel and dependable way if we have to use bare metal abstractions > like pthreads and signals and shared memory and what not. That simply > won't happen. We have proof of that (Gecko is almost exclusively > single threaded, not by choice, but simply because the complexity > cost is unbearable to change that). So, as Graydon mentioned, we *can* build a system with a 1:1 mapping and the same semantics. Not a single drop in safety or anything. What is the worst thing that can happen? One year from new we decide that having 10^6 tasks is really the feature that we must have we add that while keeping the same interface. > Andreas > Cheers, Rafael From respindola at mozilla.com Wed May 25 18:00:19 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Wed, 25 May 2011 21:00:19 -0400 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD6DE3.7050409@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> Message-ID: <4DDDA623.7010108@mozilla.com> > Yes. And every function call has a check to see if it has to grow the > stack. We can plausibly collapse these two costs into one in some cases. > In cases where we can't, and it's a performance bottleneck, we can offer > some unsafe variant; but you're assuming a particular datapoint without > evidence just as you accuse me of. We don't know it'll be prohibitively > slow any more than we know bounds checking vector accesses is > prohibitively slow. I'm not ok with this argument. Sorry what am I accusing you of? And no, I don't have a datapoint for a system that has not been implemented. > -Graydon Cheers, Rafael From graydon at mozilla.com Wed May 25 18:38:34 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 18:38:34 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDDA623.7010108@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> Message-ID: <4DDDAF1A.9070003@mozilla.com> On 11-05-25 06:00 PM, Rafael ?vila de Esp?ndola wrote: >> Yes. And every function call has a check to see if it has to grow the >> stack. We can plausibly collapse these two costs into one in some cases. >> In cases where we can't, and it's a performance bottleneck, we can offer >> some unsafe variant; but you're assuming a particular datapoint without >> evidence just as you accuse me of. We don't know it'll be prohibitively >> slow any more than we know bounds checking vector accesses is >> prohibitively slow. I'm not ok with this argument. > > Sorry what am I accusing you of? And no, I don't have a datapoint for > a system that has not been implemented. Of arguing that the absence of data motivates a particular assumption. In this case it has to do with yield checking (which, incidentally, we do have some data on from the work in other JITs, and it's seldom a cost center). Elsewhere you seem to be frustrated that in the absence of data to the contrary, I'm promoting the assumption that M:N (even if it runs 1:1 much of the time) is a safer default than mandatory-1:1. I can't say for certain which of those scenarios *will* need M:N and how badly it'll hurt to be without; but I feel like mandatory-1:1 will paint ourselves into a bunch of potential corners, as I tried to point out in the last email. So I'm arguing the M:N stance is safer to start. Or maybe you're not frustrated about that, or hadn't previously considered those scenarios, and I am just reading frustration into the argument. Did any of the points I raised in defense of *permitting* M:N operation sound reasonable? -Graydon From respindola at mozilla.com Wed May 25 18:57:44 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Wed, 25 May 2011 21:57:44 -0400 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDDAF1A.9070003@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> Message-ID: <4DDDB398.8050200@mozilla.com> > Of arguing that the absence of data motivates a particular assumption. > In this case it has to do with yield checking (which, incidentally, we > do have some data on from the work in other JITs, and it's seldom a cost > center). > > Elsewhere you seem to be frustrated that in the absence of data to the > contrary, I'm promoting the assumption that M:N (even if it runs 1:1 > much of the time) is a safer default than mandatory-1:1. I can't say for > certain which of those scenarios *will* need M:N and how badly it'll > hurt to be without; but I feel like mandatory-1:1 will paint ourselves > into a bunch of potential corners, as I tried to point out in the last > email. So I'm arguing the M:N stance is safer to start. > > Or maybe you're not frustrated about that, or hadn't previously > considered those scenarios, and I am just reading frustration into the > argument. Did any of the points I raised in defense of *permitting* M:N > operation sound reasonable? I still don't see where I made an accusation. If I did, sorry, it was not the intention. I am a bit frustrated that in the absence of data we are selecting the more expensive option which is also the more expensive to backtrack and following the examples of the least successful languages. But never mind. I am sure there are plenty of tasks that are known to be needed to work on (running destructors on fail, debug info, a driver that works on the three platforms, etc). > -Graydon Cheers, Rafael From sebastian.sylvan at gmail.com Wed May 25 21:01:29 2011 From: sebastian.sylvan at gmail.com (Sebastian Sylvan) Date: Wed, 25 May 2011 21:01:29 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDDB398.8050200@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> Message-ID: 2011/5/25 Rafael ?vila de Esp?ndola : >> Of arguing that the absence of data motivates a particular assumption. >> In this case it has to do with yield checking (which, incidentally, we >> do have some data on from the work in other JITs, and it's seldom a cost >> center). >> >> Elsewhere you seem to be frustrated that in the absence of data to the >> contrary, I'm promoting the assumption that M:N (even if it runs 1:1 >> much of the time) is a safer default than mandatory-1:1. I can't say for >> certain which of those scenarios *will* need M:N and how badly it'll >> hurt to be without; but I feel like mandatory-1:1 will paint ourselves >> into a bunch of potential corners, as I tried to point out in the last >> email. So I'm arguing the M:N stance is safer to start. >> >> Or maybe you're not frustrated about that, or hadn't previously >> considered those scenarios, and I am just reading frustration into the >> argument. Did any of the points I raised in defense of *permitting* M:N >> operation sound reasonable? > > I still don't see where I made an accusation. If I did, sorry, it was not > the intention. > > I am a bit frustrated that in the absence of data we are selecting the more > expensive option which is also the more expensive to backtrack and following > the examples of the least successful languages. > I'm somewhat unconvinced that spawning an OS thread for every task should be considered the less expensive option. Easier to do, yes, but that pariticular performance claim goes directly against my performance intution, and could use some substantiation, IMO. E.g. Haskell does M:N threads and it's much more efficient than OS threads. I'd link you some microbenchmarks from the shootout game but that website seems broken at the moment - IIRC on the benchmarks measuring concurrency overhead, Haskell's M:N threading beat pretty much everything else including C without breaking much of a sweat. So, my current "gut instinct" (in the absence of data) is to favour light weight tasks that map to (few) OS tasks, for performance reasons. You seem to oppose that on performance grounds. I'm not sure why our intuitions differ, but I don't think there's enough data to, at this point, position the 1:1 option as the more higher performance option. Did I miss some elaboration on why you think it is? -- Sebastian Sylvan From arelius at gmail.com Wed May 25 21:09:01 2011 From: arelius at gmail.com (Nicholas "Indy" Ray) Date: Wed, 25 May 2011 21:09:01 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDDB398.8050200@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> Message-ID: > > I am a bit frustrated that in the absence of data we are selecting the more > expensive option which is also the more expensive to backtrack and following > the examples of the least successful languages. > Maybe I don't understand the subject very well, but shouldn't backtracking from the choice of implementing M:N simply involve limiting it to 1:1, and turning off the code that emits yield statements? Thus making actual benchmarking possible. While that will cost the work to implement it still; Clearly that should be worth getting the actual performance data, right? Indy -------------- next part -------------- An HTML attachment was scrubbed... URL: From graydon at mozilla.com Wed May 25 21:13:11 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 21:13:11 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> Message-ID: <4DDDD357.9010409@mozilla.com> On 11-05-25 09:09 PM, Nicholas "Indy" Ray wrote: >> >> I am a bit frustrated that in the absence of data we are selecting the more >> expensive option which is also the more expensive to backtrack and following >> the examples of the least successful languages. >> > > Maybe I don't understand the subject very well, but shouldn't backtracking > from the choice of implementing M:N simply involve limiting it to 1:1, and > turning off the code that emits yield statements? Thus making actual > benchmarking possible. While that will cost the work to implement it still; > Clearly that should be worth getting the actual performance data, right? Not turning off the yields. Just setting them to "only yield when actually canceled" rather than "yield when time slice expires". Simpler case, even: pure runtime policy choice. -Graydon From respindola at mozilla.com Wed May 25 21:19:54 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Thu, 26 May 2011 00:19:54 -0400 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> Message-ID: <4DDDD4EA.9030404@mozilla.com> > Maybe I don't understand the subject very well, but shouldn't > backtracking from the choice of implementing M:N simply involve limiting > it to 1:1, and turning off the code that emits yield statements? Thus > making actual benchmarking possible. While that will cost the work to > implement it still; Clearly that should be worth getting the actual > performance data, right? You should include the cost of getting there in the first place. A lot more has to be implemented to get M:N working. > Indy > Cheers, Rafael From arelius at gmail.com Wed May 25 21:25:04 2011 From: arelius at gmail.com (Nicholas "Indy" Ray) Date: Wed, 25 May 2011 21:25:04 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDDD4EA.9030404@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> <4DDDD4EA.9030404@mozilla.com> Message-ID: > > You should include the cost of getting there in the first place. A lot more > has to be implemented to get M:N working. > That's very true. But my point is that getting performance numbers (when none clearly exist) might be worth that cost. Indy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From graydon at mozilla.com Wed May 25 21:28:28 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 21:28:28 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDDD4EA.9030404@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> <4DDDD4EA.9030404@mozilla.com> Message-ID: <4DDDD6EC.503@mozilla.com> On 11-05-25 09:19 PM, Rafael ?vila de Esp?ndola wrote: >> Maybe I don't understand the subject very well, but shouldn't >> backtracking from the choice of implementing M:N simply involve limiting >> it to 1:1, and turning off the code that emits yield statements? Thus >> making actual benchmarking possible. While that will cost the work to >> implement it still; Clearly that should be worth getting the actual >> performance data, right? > > You should include the cost of getting there in the first place. A lot > more has to be implemented to get M:N working. Sure. We have to implement ... ucontext.h. Which is already present on 2 of our main platforms and there are several free copies floating around the net if we're having trouble implementing it on the 3rd. Then we also have to implement a few run queues and a simple scheduler, but we already have those in the runtime to crib from. And we need shutdown semantics, which we'd need anyways. Anything else? It'll take some doing, but not an infinite amount. This is not a full threading library for C threads. Just the lifecycle points of semantics we control. The IO layer -- where much of the cost in most threading libraries is -- is completely disjoint from this issue. -Graydon From graydon at mozilla.com Wed May 25 21:41:16 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Wed, 25 May 2011 21:41:16 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDDD6EC.503@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> <4DDDD4EA.9030404@mozilla.com> <4DDDD6EC.503@mozilla.com> Message-ID: <4DDDD9EC.5040904@mozilla.com> On 11-05-25 09:28 PM, Graydon Hoare wrote: >> You should include the cost of getting there in the first place. A lot >> more has to be implemented to get M:N working. > > Sure. We have to implement ... ucontext.h. Which is already present on 2 > of our main platforms and there are several free copies floating around > the net if we're having trouble implementing it on the 3rd. Cases in point, here are some implementations of the core context-switch function: freebsd: http://gitweb.dragonflybsd.org/dragonfly.git/blob/e2b7bcae8c26bcb8da3553133980725affac44c3:/lib/libc/x86_64/gen/mcontext.S glibc: http://repo.or.cz/w/glibc.git/blob/ea486f691d34ef2a28d06bb507ac3352e32e1f13:/sysdeps/unix/sysv/linux/i386/setcontext.S it's just not a massive undertaking, writing that function. You can even elide the signal stuff for us since we're not storing per-task signal masks. -Graydon From niko at alum.mit.edu Thu May 26 08:44:29 2011 From: niko at alum.mit.edu (Niko Matsakis) Date: Thu, 26 May 2011 17:44:29 +0200 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDDA623.7010108@mozilla.com> <4DDDAF1A.9070003@mozilla.com> <4DDDB398.8050200@mozilla.com> Message-ID: <4DDE755D.50401@alum.mit.edu> I concur with other posters that M:N is a better foundation for runtime performance if you want to scale to large number of tasks. Some "data points" (more like anecdotes): 1. In my own personal experience, I implemented a coroutine system that did M:N with expandable stacks precisely as Graydon outlined (linked list of chunks at fixed positions in memory). At one point we did some micro-benchmarks which suggested 1:1 threading wouldn't be so painful, so we ported the system over---and ended up regretting that decision for a long time, since it was very hard to "unport", and the overall system was definitely slower and scaled much more poorly, despite the micro-benchmarks. 2. Java may have a 1:1 model, but---in practice---there is a lot of movement towards M:N. It's just that this M:N mapping is being done in libraries, such as Doug Lea's Fork-Join scheduler work, as well as projects like Kilim and others (not to mention the ever present thread pools and executors, which are basically a primitive M:N setup). These projects have the disadvantage, however, of lacking access to low-level operations, which gives them an inherent efficiency gap. They can't implement a chunked stack, for example. 3. In addition to Haskell, I believe that Erlang uses M:N scheduling. The main downside of M:N as I see it is interaction with C code. This IS a significant downside, to be sure. It's the reason that we tried to port over to 1:1, as described above. The techniques Graydon described (make a big chunk of space, limit callbacks when possible) are basically what we were using before. They do address the problem, though they can sometimes be a pain. There is also the challenge of implementing O/S concepts like high vs low priority threads and so forth, if those prove necessary. Regarding the efficient implementation of yield points and the like, the Capriccio project [1] discussed a number of techniques in a paper some time back. The project seems to be dead at this point, unfortunately. Still I think (as many others have said) that these costs are quite minimal, especially in exchange for the scalability and control that you get. Niko [1] http://capriccio.cs.berkeley.edu/ Sebastian Sylvan wrote: > 2011/5/25 Rafael ?vila de Esp?ndola: >>> Of arguing that the absence of data motivates a particular assumption. >>> In this case it has to do with yield checking (which, incidentally, we >>> do have some data on from the work in other JITs, and it's seldom a cost >>> center). >>> >>> Elsewhere you seem to be frustrated that in the absence of data to the >>> contrary, I'm promoting the assumption that M:N (even if it runs 1:1 >>> much of the time) is a safer default than mandatory-1:1. I can't say for >>> certain which of those scenarios *will* need M:N and how badly it'll >>> hurt to be without; but I feel like mandatory-1:1 will paint ourselves >>> into a bunch of potential corners, as I tried to point out in the last >>> email. So I'm arguing the M:N stance is safer to start. >>> >>> Or maybe you're not frustrated about that, or hadn't previously >>> considered those scenarios, and I am just reading frustration into the >>> argument. Did any of the points I raised in defense of *permitting* M:N >>> operation sound reasonable? >> I still don't see where I made an accusation. If I did, sorry, it was not >> the intention. >> >> I am a bit frustrated that in the absence of data we are selecting the more >> expensive option which is also the more expensive to backtrack and following >> the examples of the least successful languages. >> > > I'm somewhat unconvinced that spawning an OS thread for every task > should be considered the less expensive option. Easier to do, yes, but > that pariticular performance claim goes directly against my > performance intution, and could use some substantiation, IMO. > > E.g. Haskell does M:N threads and it's much more efficient than OS > threads. I'd link you some microbenchmarks from the shootout game but > that website seems broken at the moment - IIRC on the benchmarks > measuring concurrency overhead, Haskell's M:N threading beat pretty > much everything else including C without breaking much of a sweat. > > So, my current "gut instinct" (in the absence of data) is to favour > light weight tasks that map to (few) OS tasks, for performance > reasons. You seem to oppose that on performance grounds. I'm not sure > why our intuitions differ, but I don't think there's enough data to, > at this point, position the 1:1 option as the more higher performance > option. Did I miss some elaboration on why you think it is? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fw at deneb.enyo.de Thu May 26 12:19:00 2011 From: fw at deneb.enyo.de (Florian Weimer) Date: Thu, 26 May 2011 21:19:00 +0200 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD4221.7030104@mozilla.com> (Graydon Hoare's message of "Wed, 25 May 2011 10:53:37 -0700") References: <4DDD4221.7030104@mozilla.com> Message-ID: <87mxi98e3f.fsf@mid.deneb.enyo.de> * Graydon Hoare: > (2) Support *lots* of tasks. Millions. Such that a programmer has no > fear about making "too many" tasks in a system, if it decomposes > nicely. If for no other reason than support for isolation-based local > reasoning (though concurrent, or interleaved, or truly parallel > execution is also nice to exploit whenever it's surfaced this way). This simply does not work. You cannot make tasks much cheaper than GHC thunks (except by eliminating them through compiler transformations), yet creating too much of them is often a source of disappointing performance. If you don't buy this argument, you might want to look at the Erlang pmap example. The textbook implementation looks nice on paper, but it fares poorly on larger workloads because overscheduling is a problem for Erlang, too. Or do you mean short-lived tasks, aproaching coroutines and generators? From fw at deneb.enyo.de Thu May 26 12:24:20 2011 From: fw at deneb.enyo.de (Florian Weimer) Date: Thu, 26 May 2011 21:24:20 +0200 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD6DE3.7050409@mozilla.com> (Graydon Hoare's message of "Wed, 25 May 2011 14:00:19 -0700") References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> Message-ID: <87ipsx8duj.fsf@mid.deneb.enyo.de> * Graydon Hoare: > They are very, very difficult to program safely. Show me a C or C++ > program that isn't littered with memory-safety-corrupting failure > modes. Show me a highly concurrent one that doesn't have subtle data > races (not just at I/O points). Show me a java program that doesn't > have the next worst thing, random take-down-the-whole-process failures > due to uncaught NPEs, due to lack of isolation. To be honest, lack of isolation in Java land often manifests itself in the form of out-of-memory conditions which (by specification) impact the whole VM. NPEs do happen, of course, but the VM can recover from it without any problems at all (much like a web server can recover from a 404 error). From respindola at mozilla.com Thu May 26 13:06:05 2011 From: respindola at mozilla.com (Rafael Avila de Espindola) Date: Thu, 26 May 2011 16:06:05 -0400 Subject: [rust-dev] Heads up: LLVM include StandardPasses.h missing In-Reply-To: <4DDBF06F.2030505@mozilla.com> References: <4DDBF06F.2030505@mozilla.com> Message-ID: <4DDEB2AD.4010706@mozilla.com> > I have created > > https://github.com/graydon/rust/pull/403 > I have updated it to cover the changes to unwind tables too. Cheers, Rafael From graydon at mozilla.com Thu May 26 13:36:01 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Thu, 26 May 2011 13:36:01 -0700 Subject: [rust-dev] Heads up: LLVM include StandardPasses.h missing In-Reply-To: <4DDEB2AD.4010706@mozilla.com> References: <4DDBF06F.2030505@mozilla.com> <4DDEB2AD.4010706@mozilla.com> Message-ID: <4DDEB9B1.2010006@mozilla.com> On 11-05-26 01:06 PM, Rafael Avila de Espindola wrote: >> I have created >> >> https://github.com/graydon/rust/pull/403 >> > > I have updated it to cover the changes to unwind tables too. Do the tinderboxes need to be updated to a new LLVM? -Graydon From respindola at mozilla.com Thu May 26 16:04:50 2011 From: respindola at mozilla.com (Rafael Avila de Espindola) Date: Thu, 26 May 2011 19:04:50 -0400 Subject: [rust-dev] Heads up: LLVM include StandardPasses.h missing In-Reply-To: <4DDEB9B1.2010006@mozilla.com> References: <4DDBF06F.2030505@mozilla.com> <4DDEB2AD.4010706@mozilla.com> <4DDEB9B1.2010006@mozilla.com> Message-ID: <4DDEDC92.5010104@mozilla.com> > Do the tinderboxes need to be updated to a new LLVM? In order for this patch to build, yes. As far as I know there is no particular feature in there that we need, just the convenience of being able to work on trunk. > -Graydon Cheers, Rafael From respindola at mozilla.com Fri May 27 05:07:15 2011 From: respindola at mozilla.com (=?ISO-8859-1?Q?Rafael_=C1vila_de_Esp=EDndola?=) Date: Fri, 27 May 2011 08:07:15 -0400 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDD6DE3.7050409@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> Message-ID: <4DDF93F3.9080205@mozilla.com> > They weren't approaching the same problem we are. They were just trying > to make "conventional threads go fast"; I agree they probably wound up > at the right place with 1:1 always (though weirdly, their IO model was > part of what forced many OS vendors to *make* 1:1 go fast). If we > weren't trying to make tasks a ubiquitous isolation (and IO > multiplexing) concept, I'd probably agree with you here. But I think we > are pursuing different goals. I know I have lost this, but think I should at least make an historical correction. NPTL walked over linuxthreads in the day it was out. And that was in software that had been coded to use linuxthreads. In the same way that NPTL != linuxthreads with m=n, so will be implementing a 1:1 mapping not be that trivial for us. In fact, it will probably be more work, as gcc didn't had to change any for the NPTL switch. > -Graydon Cheers, Rafael From graydon at mozilla.com Fri May 27 06:29:46 2011 From: graydon at mozilla.com (Graydon Hoare) Date: Fri, 27 May 2011 06:29:46 -0700 Subject: [rust-dev] threads, tasks, scheduling, migrating, failing In-Reply-To: <4DDF93F3.9080205@mozilla.com> References: <4DDD4221.7030104@mozilla.com> <4DDD4CCB.1020409@mozilla.com> <4DDD6DE3.7050409@mozilla.com> <4DDF93F3.9080205@mozilla.com> Message-ID: <4DDFA74A.3050406@mozilla.com> On 27/05/2011 5:07 AM, Rafael ?vila de Esp?ndola wrote: >> They weren't approaching the same problem we are. They were just trying >> to make "conventional threads go fast"; I agree they probably wound up >> at the right place with 1:1 always (though weirdly, their IO model was >> part of what forced many OS vendors to *make* 1:1 go fast). If we >> weren't trying to make tasks a ubiquitous isolation (and IO >> multiplexing) concept, I'd probably agree with you here. But I think we >> are pursuing different goals. > > I know I have lost this, but think I should at least make an historical > correction. NPTL walked over linuxthreads in the day it was out. And > that was in software that had been coded to use linuxthreads. I never meant to suggest otherwise; I was there for it too (at RHAT when it was designed). I only meant to say that the Java "teach everyone to use thread-per-IO-request" model was part of the pressure (from customers) to make NPTL in the first place, and make it go so fast. (That and, of course, the fact that linuxthreads had all manner of semantic bugs and limitations, didn't actually implement posix threads correctly) Read the original NPTL proposal if you don't believe me: --snip-- Software Scalability Another use of threads is to solve sub-problems of the user application in separate execution contexts. In Java environments threads are used to implement the programming environment due to missing asynchronous operations. The result is the same: enormous amounts of threads can be created. The new implementation should ideally have no ?xed limits on the number of threads or any other object. --snip-- > In the same way that NPTL != linuxthreads with m=n, so will be > implementing a 1:1 mapping not be that trivial for us. In fact, it will > probably be more work, as gcc didn't had to change any for the NPTL switch. The situations are not analogous. I realize that the words "M:N" strike fear and loathing into the hearts of anyone who was around for that painful fight (decades long, every OS in the world involved). I respect that reaction. I'm not suggesting the conclusion reached during that struggle was at all wrong. When you are talking about two ways of implementing the same semantics (posix threads as used by C) then 1:1 is faster than M:N. This is demonstrable. But we are not implementing posix threads. Our tasks have different semantics. I'm not suggesting M:N instead of 1:1, more like I'm suggesting 1:1:N where there's 1 kernel thread to 1 user-space thread but N things that are semantically different entities than C-style threads, running on that 1 thread. Tasks respect our language rules, not posix thread semantics. Specifically with respect to inter-task IO and cancellation. Which are central aspects of their operation. The situation is more analogous to that other fine library, GCD. In GCD, a block is not a thread; the fact that blocks run on threads is not an argument to assign 1 block to 1 thread. Your argument is like saying "M:N is slow for posix threads, so we must use 1 thread per block". You could -- and might well do so in environments where threads are cheap enough -- but the 'block' abstraction is different from the concept of a thread -- runs on different rules -- so you'll have to have a certain amount of abstraction-management code that the thread jumps into at defined lifecycle points anyways. If you made it "just a thread" you'd be erasing the block as a separate concept. Kernel threads don't run that logic for you. It's not part of the thread abstraction. Is that clearer? -Graydon From peterhull90 at gmail.com Sat May 28 15:25:56 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Sat, 28 May 2011 23:25:56 +0100 Subject: [rust-dev] [llvm-commits] [PATCH] compiler-rt: Sanity check architectures In-Reply-To: References: <4DC7610F.9010107@mozilla.com> Message-ID: On Mon, May 9, 2011 at 9:34 PM, Eric Christopher wrote: > Committed in: > > [yendi:llvm/projects/compiler-rt] echristo% svn ci For me it seems to have started doing it again (osx 10.6.7, llvm r132269) ASSEMBLE: clang_darwin/ios/armv6: /Users/peterhull/Projects/llvm/projects/compiler-rt/lib/arm/divmodsi4.S /usr/llvm-gcc-4.2/libexec/gcc/i686-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/arm/as or /usr/bin/../local/libexec/gcc/darwin/arm/as) for architecture arm not installed Can anyone confirm? Pete From peterhull90 at gmail.com Mon May 30 10:06:25 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Mon, 30 May 2011 18:06:25 +0100 Subject: [rust-dev] [llvm-commits] [PATCH] compiler-rt: Sanity check architectures In-Reply-To: References: <4DC7610F.9010107@mozilla.com> Message-ID: On Sat, May 28, 2011 at 11:25 PM, Peter Hull wrote: > Can anyone confirm? It seems like Patrick Walton's original patch was updated (specifically, replacing 'gcc' with '$(CC)' ) and now it doesn't do what it should. I can file a bug for LLVM - unless anyone's got a more direct line to the developers? Pete From as at hacks.yi.org Mon May 30 11:24:23 2011 From: as at hacks.yi.org (austin seipp) Date: Mon, 30 May 2011 13:24:23 -0500 Subject: [rust-dev] [llvm-commits] [PATCH] compiler-rt: Sanity check architectures In-Reply-To: References: <4DC7610F.9010107@mozilla.com> Message-ID: It's broke for me too. In the mean time, I got it working by removing the armv6/armv7 references in ./llvm/projects/compiler-rt/make/platform/clang_darwin.mk: Index: make/platform/clang_darwin.mk =================================================================== --- make/platform/clang_darwin.mk (revision 132313) +++ make/platform/clang_darwin.mk (working copy) @@ -39,11 +39,11 @@ # in the same linkage unit, and for a couple of other functions that didn't # make it into libSystem. Configs += ios -UniversalArchs.ios := $(call CheckArches,i386 x86_64 armv6 armv7) +UniversalArchs.ios := $(call CheckArches,i386 x86_64) # Configuration for use with kernel/kexts. Configs += cc_kext -UniversalArchs.cc_kext := $(call CheckArches,armv6 armv7 i386 x86_64) +UniversalArchs.cc_kext := $(call CheckArches,i386 x86_64) Then you can build LLVM as the 'getting started' page says. However, the current rust HEAD does not build with the latest revisions of LLVM. You'll get errors about StandardPasses.h not existing. That is because this pull request has not been merged: https://github.com/graydon/rust/pull/403 You can pull it into a new branch yourself and merge with master to get the latest version of rust. I did: $ git remote add espindola https://github.com/espindola/rust.git $ git pull --all $ git checkout -b newllvm remotes/espindola/newllvm $ git merge master $ ... build rust normally With this I was able to build the latest version of rust + latest LLVM. I don't exactly know what the correct fix is for compiler-rt though, sorry. On Mon, May 30, 2011 at 12:06 PM, Peter Hull wrote: > On Sat, May 28, 2011 at 11:25 PM, Peter Hull wrote: >> Can anyone confirm? > It seems like Patrick Walton's original patch was updated > (specifically, replacing 'gcc' with '$(CC)' ) and now it doesn't do > what it should. I can file a bug for LLVM - unless anyone's got a more > direct line to the developers? > > Pete > _______________________________________________ > Rust-dev mailing list > Rust-dev at mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > -- Regards, Austin From peterhull90 at gmail.com Tue May 31 04:46:58 2011 From: peterhull90 at gmail.com (Peter Hull) Date: Tue, 31 May 2011 12:46:58 +0100 Subject: [rust-dev] [llvm-commits] [PATCH] compiler-rt: Sanity check architectures In-Reply-To: References: <4DC7610F.9010107@mozilla.com> Message-ID: On Mon, May 30, 2011 at 7:24 PM, austin seipp wrote: > It's broke for me too. In the mean time, I got it working by removing > the armv6/armv7 references in > ./llvm/projects/compiler-rt/make/platform/clang_darwin.mk: I did it slightly differently - I updated* my LLVM with svn up -r {2011-05-20} (i.e. back to when Graydon sent his 'LLVM bump' message) and then, in clang_darwin.mk , I manually replaced '$(CC)' with 'gcc' in CheckArches as it was in Patrick's original patch. I don't know the correct solution either, it seems that CC is set to clang in the makefile, and clang thinks it can compile ARM code, when it can't - it still needs gnu as to do so? Graydon: would you mind quoting the revision number next time the tinderboxes are updated, then we can follow? Thanks a lot, Pete * 'back-dated'? Not sure the word for this!