Executables are always ET_DYN on Haiku, so like shared libraries, they should
not have an image base set. Elf2 already got this right, but Elf and Lld didn't.
closes https://codeberg.org/ziglang/zig/issues/32100
Followup to #30769
I grepped for `try .*toOwnedSlice` and checked all of them by hand.
Fixes a bunch of memory leaks removes usages or `errdefer` and `vars` in some places. I also switched array_list.Managed to ArrayList where it was convenient.
Reviewed-on: https://codeberg.org/ziglang/zig/pulls/32001
Reviewed-by: Andrew Kelley <andrew@ziglang.org>
This PR enables all incremental tests under the `test/incremental` directory, except one: `change_exports`, which is currently ignored as it requires a non-trivial amount of work on the linker, since we do not currently support exporting data symbols.
To enable the other tests, the following fixes were needed:
1. `src/link/Wasm.zig`: instead of chasing function type through Nav, get it directly.
2. `src/target.zig`: `.panic_fn` appears to work fine with the wasm backend.
3. `src/codegen/wasm/CodeGen.zig`: there was a liveness related bug that caused some `ArenaAllocator` code to crash the backend.
More info on (3), the liveness and local reuse code in the backend for years in unfinished state. For example there is currently no branch merging and reuse happens only when inst die in same block level. I initially considered doing a large refactor to implement everything correctly, but aborted due to its sheer size and currently! no clear idea about how to do this efficiently.
Instead, I fixed the bug with minimal changes and removed useless code, keeping the old solution otherwise intact.
MachO has a mechanism where symbols can introduce "subsections", which
(as I understand it) allows a linker to garbage-collect parts of
sections without pulling in the heavy machinery of `-fdata-sections` and
`-ffunction-sections`. Essentially, symbols can be considered to
partition a section, and these boundaries are not allowed to be crossed
by memory accesses, so the linker can detect symbols which are unused
and drop the corresponding input section regions.
However, the symbol flag `N_ALT_ENTRY` indicates that a symbol should
not participate in this "splitting", and is instead an "alternate entry
point" to the previous subsection, which should continue through this
symbol.
The Mach-O linker was failing to ignore `N_ALT_ENTRY` symbols when
creating subsections, which meant that for certain link inputs, it would
create additional subsection splits, and then garbage collect the extra
sections (due to the `N_ALT_ENTRY` symbol being unused). Naturally, this
silent dropping of parts of input sections led to miscompilations.
The changes to the LLVM backend here changed the compiler_rt object
which LLVM emits, and exposed some buggy behavior in the self-hosted
WASM linker when parsing that object.
If the unwind record address has not been added to
`superposition` (maybe it is not in the current symbol table mapping?)
then there's a panic on null dereference. Ensure the entry exists in
`superposition`.
I've realised that the cause of at least some of our weird CI flakiness
was a bug in how `Nav` values were resolved. Consider this scenario: the
frontend resolves the type of a `Nav`, and then sends a function to the
backend, which requires the backend to lower a pointer to that `Nav`.
The backend calls `InternPool.getNav` to determine the `Nav`'s type.
However, this races with the frontend resolving the *value* of that
`Nav`. This involves writing separately to two fields, `bits` and
`type_or_value`. If only one of these changes is observed, then the
backend will incorrectly interpret the type as the value or vice versa,
leading to a crash or even a miscompilation. (Of course, there's also
the straightforward issue that the racing loads were non-atomic, making
them illegal).
The only good solution to this was to make `Nav` 4 bytes bigger, giving
it separate `type` and `value` fields. In theory that's a quite small
change, but it ended up having a bunch of nice consequences which led to
this diff being a bit bulkier than expected:
* `Nav.Repr.Bits` was simplified, because it no longer has to track
"resolution status": we can use `.none` for that. This frees up some
bits to make things more consistent between the "type resolved" and
"fully resolved" states.
* This consistency allowed the `Nav.status` union to be replaced with a
simpler field `Nav.resolved`, which is a bit nicer to work with.
* Most of the "getter" functions were able to be removed from `Nav`
because the state they were fetching had been moved to simple fields
on `Nav.resolved`.
* There were still a handful of free bits in `Nav.Repr.Bits`, which
could be used to represent the "const" and "threadlocal" flags rather
than these being stored on `Key.Extern` and `Key.Variable`. This is a
bit more convenient for linkers.
* With those bits gone, `Key.Variable` is a trivial wrapper around a
type and an initial value, and the fact that a declaration is mutable
can be represented solely through the "const" flag. Therefore,
`Key.Variable` no longer served a purpose, and could be eliminated
entirely in favour of storing the variable's initial value directly in
the "value" field of the `Nav`.
So, I'm quite pleased with this refactor! But anyway, regarding the bug
fix which actually motivated this: if I've done my job correctly, this
should solve some crashes, such as these (which were what tipped me off
to this bug in the first place):
https://codeberg.org/ziglang/zig/actions/runs/2306/jobs/7/attempt/1https://codeberg.org/ziglang/zig/actions/runs/2173/jobs/6/attempt/1
...and, who knows, perhaps even the random SIGSEGVs we've seen on some
targets! Probably not, but one can hope.
BE32 is deprecated and only supported by older cores and some v6 cores. All
cores v6 or newer support BE8, so default to that for v6+.
closes https://codeberg.org/ziglang/zig/issues/31404
```
* thread #1, name = 'zig', stop reason = breakpoint 1.1
frame #0: 0x00000000028387c3 zig`link.MachO.eh_frame.Cie.parse(cie=0x000000003cd62060, macho_file=0x000000003ca857c0) at eh_frame.zig:56:21
53 else => @panic("unexpected lsda encoding"), // TODO error
54 }
55 },
-> 56 else => @panic("unexpected augmentation string"), // TODO error
^
57 };
58 }
59
(lldb) frame variable
(link.MachO.eh_frame.Cie *) cie = 0x000000003cd62060
(link.MachO *) macho_file = 0x000000003ca857c0
([]u8) data = "\x14\x00\x00\x00\x00\x00\x00\x00\x01zRS\x00\x01x\x1e\x01\x10\x0c\x1f\x00\x00\x00\x00"
([]u8) aug = "zRS"
(Io.Reader) reader = {
vtable = 0x0000000005407ec0
buffer = "\x01x\x1e\x01\x10\x0c\x1f\x00\x00\x00\x00"
seek = 5
end = 11
}
(unsigned char) ch = 'S'
```
zig supports RPL for augmentation strings, and finds zRS (though it
scans RS). It panics on S. Presumably the parser should just ignore
this at this stage. (found [this description][1], and a bit of code [here][2]
and [there][3]).
[1]: https://www.airs.com/blog/archives/460
[2]:
https://github.com/eliben/pyelftools/blob/main/elftools/dwarf/callframe.py#L287
[3]: https://sourceforge.net/p/elftoolchain/tickets/557/
> The character ‘S’ in the augmentation string means that this CIE
> represents a stack frame for the invocation of a signal
> handler. When unwinding the stack, signal stack frames are handled
> slightly differently: the instruction pointer is assumed to be
> before the next instruction to execute rather than after it.
Signed-off-by: Antonin Décimo <antonin.decimo@gmail.com>
I was trying out combining struct layout resolution with resolution of
default field values, but it broke a few cases which it's not clear we
want to break. The simplest such case was a struct with a field which
was a slice of itself, with a default value of `&.{}`.
So, at least for now, I'm accepting defeat and splitting this back out.
This allows a couple of behavior tests which were removed to be
re-introduced---I will do that in the commit following this one.
I have *not* made this separate phase of resolution "lazy": instead, it
is tied to layout resolution, in the sense that if a struct's layout is
referenced, then its default field values are also referenced. I chose
this approach for simplicity---not of the implementation (it's actually
slightly *more* code to do it this way!), but in terms of the language
specification. I think this behavior is easier to understand and keep in
your head. It can be easily changed in future if we decide we want to.
This partially reverts the commit titled "compiler: merge struct default
value resolution into layout resolution".
The goal of these changes is to allow the C backend to support the new
lazier type resolution system implemented by the frontend. This required
a full rewrite of the `CType` abstraction, and major changes to the C
backend "linker".
The `DebugConstPool` abstraction introduced in a previous commit turns
out to be useful for the C backend to codegen types. Because this use
case is not debug information but rather general linking (albeit when
targeting an unusual object format), I have renamed the abstraction to
`ConstPool`. With it, the C linker is told when a type's layout becomes
known, and can at that point generate the corresponding C definitions,
rather than deferring this work until `flush`.
The work done in `flush` is now more-or-less *solely* focused on
collecting all of the buffers into a big array for a vectored write.
This does unfortunately involve a non-trivial graph traversal to emit
type definitions in an appropriate order, but it's still quite fast in
practice, and it operates on fairly compact dependency data. We don't
generate the actual type *definitions* in `flush`; that happens during
compilation using `ConstPool` as discussed above. (We do generate the
typedefs for underaligned types in `flush`, but that's a trivial amount
of work in most cases.)
`CType` is now an ephemeral type: it is created only when we render a
type (the logic for which has been pushed into just 2 or 3 functions in
`codegen.c`---most of the backend now operates on unmolested Zig `Type`s
instead). C types are no longer stored in a "pool", although the type
"dependencies" of generated C code (that is, the struct, unions, and
typedefs which the generated code references) are tracked (in some
simple hash sets) and given to the linker so it can codegen the types.
The LLVM backend can now run the behavior tests and standard library
tests, like the x86_64 backend can. This commit required me to make a
lot of changes to how the LLVM backend lowers debug information, and
while I was doing that, I improved a few things:
* `anyerror` is now an enum type (and other error sets just wrap it), so
error values appear by name in debuggers
* Fixed broken lowering for tagged unions with zero-width payloads
* Associate container types with source locations in all cases
* Avoid depending on the order of type resolution (using the new
`DebugConstPool` abstraction), so debug information will contain all
available type information rather than just the subset which happens
to be resolved when the backend lowers that debug type
Introduces a small abstraction, `link.DebugConstPool`, to deal with
lowering type/value information into debug info when it may not be known
until type resolution (which in some cases will *never* happen). It is
currently only used by self-hosted DWARF logic, but it will also be of
use to the LLVM backend (which is my next focus).
This actually doesn't cause any dependency loops in std, which is pretty
much my benchmark for it being acceptable. This can be reverted if it
turns out to be problematic, but for now, let's err on the side of
language simplicity.
To be clear, this *does* regress some cases which previously worked: I
will have to remove some behavior tests as a result of this commit. To
be honest, the tests which look to be failing as a result of this are
things which I think are generally unadvisable; I actually reckon a bit
more friction to use default field values in non-trivial ways might be a
good thing to stop people from misusing them as much. Struct fields
should very rarely have default values; about the only common situation
where they make sense is "options" structs.
Now that https://github.com/ziglang/zig/issues/24657 has been
implemented, the compiler can simplify its internal representation of
comptime-known `packed struct` and `packed union` values. Instead of
storing them field-wise, we can simply store their backing integer
value. This simplifies many operations and improves efficiency in some
cases.
The end of the archive needs to also be aligned to a two-byte boundary,
not just the start of records. This was causing lld to reject archives.
Notably, this was happening with compiler_rt when rebuilding in fuzz
mode, which is why this commit is included in this patchset.