I think you nailed this.
Comment on Does this compiler exist?
jeffhykin@lemm.ee 11 months ago
I agree with the general idea, but I think there are lots of misconceptions. Gcc does allow doing things before the preprocess step, after the preprocess step, before the linking step, etc. It’s possible, but not easy, to run your own programs inbetween those kinds of steps. As for why there’s no config file, it’s probably cause gcc is really old, but I’ll have to let someone else comment on that.
However, syntax support is effectively a completely different feature request. For example the “adding brackets to indentation” couldn’t really/correctly come before the preprocessing step. I mean a really hacky solution like my indent experiment from a long time ago can, but it will never be even slightly reliable because of the preprocessor, multi-line strings, comments and other edgecases. Let me explain.
- The syntax cannot be parsed without running the preprocessor. Things like un-matched brackets are completely allowed before the preprocessing step. It would be literally impossible for the parser to run before preprocessing.
- So let’s talk preprocessing. The preprocessor is so stupid it won’t even notice the difference between C, Haskell, or Ada. It’s just looking for strings, comments, ints, and preprocessor directives. That’s it. It has no idea about scopes or brackets or anything like that.
- So for the “adding brackets to indentation” to work, it would need to run its own preprocessor step, then do some parsing of its own, and then run the indent-to-bracket conversion.
But note, preprocessor strings just coincidentally parse the same as C strings. There’s already a limitation of the preprocessor failing on, lets say, python where python has triple-quote strings.
That said, preprocessing is actually highly unusual in the sense that it can be done as a separate step. Usually parsing needs to be done as a unified operation. Not to say it can’t be modular, but rather the module must be given to a central controller that knows about everything rather than just having a code-transformaiton step.
With those misconceptions out of the way, now I want to talk about the parts I agree with.
IMO the perfect language is the one that has an “engine” that is completely separate from the syntax. And then the language/compiler should allow for different syntax.
The closest current thing to what you’re talking about is almost certainly Rust macros. Unlike the preprocessor, Rust macros fully understand rust and are a part of the parsing process. They are decently close to what you’re saying, instead of compiler flags it’s just imports within Rust. You can write HMTL, SQL, and other code just right in the middle of a rust program (and it’s not a string either, it’s actual syntax support). Not only is it possible, but I have been eagerly awaiting for someone to create a garbage-collected syntax within a Rust macro. People have already created garbage collectors, it’s just a matter of making a nice wrapper and inter-op.
That said, and even though Rust macros are head-and-sholders above basically every other language, I personally still think rust macros don’t go far enough. Indent-based code isn’t really possible within rust macros, rust macros can’t have imbalanced braces, and there can be escaping issues that prevent things like YAML syntax from ever being possible. They also can’t allow for extensions like units, e.g. 10gallons
without wrapping it with some kind of delimiter (which defeats the point)
AFAIK currently there is no compiler that supports a composable syntax like that. I’ve worked on designing such a system, and while I don’t think it’s impossible, it is extremely hard. There’s a lot of complications, like parsing precedence, lookaheads, operator precedence. Two syntax modules that don’t know about each other can easily break each other. Like I said, I don’t think it’s impossible, but it is difficult.
spykyvenator@programming.dev 11 months ago
jeffhykin@lemm.ee 11 months ago
also if you’re interested in languages join c/programminglanguages!
spykyvenator@programming.dev 11 months ago
Seems like the better place to post this idd
porgamrer@programming.dev 11 months ago
I mentioned it elsewhere here but I think the Terra research language has explored this area more thoroughly than Rust, just because that its only purpose. The website and academic papers are definitely worth a skim: terralang.org
It’s basically a powerful LLVM-based compilation library exposed where everything is exposed through Lua bindings. The default Terra compiler is just a Lua script that you can pull apart, extend, rearrange, etc. It’s all designed for ease of experimentation, whereas Rust has to worry about being a rock-solid production compiler.