Pointer tagging for x86 systems [LWN.net]

By Jonathan Corbet
March 28, 2022

Pointers are a fact of life for developers working in numerous languages. It is often convenient to be able to associate a small amount — a few bits at most — of ancillary information with a pointer. This can often be done within the pointer value itself with some careful masking and shifting. CPU manufacturers have been adding ways to support the addition of this sort of "tag" to pointers; the most recent may be AMD's "upper address ignore" (UAI) feature, support for which was recently posted by Bharata B Rao. This feature has an uncertain future in Linux, though, as the result of a fundamental design decision.

On a 64-bit system, a pointer is, naturally, 64 bits wide. But the CPU does not actually need all of those bits to dereference an address stored in a pointer. There are no systems (yet) that require — or can provide — all of the memory that can be addressed by 64 bits, meaning that there are ranges of address space that do not map to physical memory. Normally, user-space addresses start at (or near) zero and increase from there; that means that the highest-order bits will be zero even with the largest possible addresses. As a result, it can be possible to use those high-order bits to store other types of information.

There are numerous use cases for stashing metadata into those unused bits. Memory allocators could use that space to track different memory pools, for example, or for garbage collection. Database management systems have their own uses for that space. Applications can implement this sort of tagging now, but it must be done with care; an address with extra bits set is no longer a valid pointer, so that metadata must be masked out before dereferencing that pointer or passing it into code that does not understand the tagging scheme. That is error-prone and may slow down the application.

To make life easier for the developers of this sort of application, CPU manufacturers have been adding the ability for the processor to simply ignore the non-address bits in an address value. Naturally, every manufacturer has invented its own way of supporting this feature. The AMD version, UAI, specifically allows the uppermost seven bits of an address to be used for ancillary data.

If accepted, AMD's implementation of this feature would not be the first; support for the Arm "top-byte ignore" feature was merged for the 5.4 kernel in 2019. At that time, a set of prctl() commands was added to control the use of this feature. Top-byte ignore can be enabled with:

    int prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0);

This interface was designed around Arm's implementation, which makes eight bits available for tag data. The AMD implementation only allows for seven bits, meaning that applications wanting to use tagged addresses will need a way to discover how many bits are available. So Rao's patch set starts with a patch from Kirill Shutemov (intended to add support for a similar Intel feature, more about that below) adding two new parameters to the above prctl() call, both of which are integer pointers. The first of those is for the caller to specify how many bits they would like to use for pointer metadata; the kernel will update that value to reflect the number of bits that are actually available. The second pointer tells the kernel where to store the number of bits to right-shift a pointer value to obtain the tag data.

The subsequent patches then implement support for UAI in the Linux kernel.

The idea is simple enough, and this feature already exists for the Arm architecture, but the UAI patches have still run into pushback, for a number of reasons. Perhaps the most fundamental of those is that UAI allows the most-significant bit of the address to be used by user space. In current systems, only kernel-space addresses have that bit set. Turning on UAI would allow user space to create pointer values that look like kernel addresses, but which would actually be valid user-space pointers. Those pointers can, of course, be passed into the kernel via system calls where, in the absence of due care, they might be interpreted as kernel-space addresses. The consequences of such confusion would not be good, and the possibility of it happening is relatively high.

This mechanism could probably be made to work safely, but, as Andy Lutomirski said: "A lot of auditing of existing code would be needed to make it safe". Even more auditing would be required, of course, to keep it safe in a rapidly evolving kernel. It sounds like a recipe for ongoing security problems, which is why Thomas Gleixner said that "there is no justification for the bit 63 abuse". He suggested that AMD should rework the feature in its processors to disallow that bit in address tags; he did not say that this problem would block the merging of UAI, but the meaning was reasonably clear.

Another problem that Lutomirski pointed out is that UAI is not specific to any running context; once it is enabled, it is turned on for the entire CPU. That, too, could lead to unpleasant surprises, so he suggested that the kernel would need to make the UAI settings process-local, even if it slows down context switches considerably.

Finally, there is the issue of Intel's similar feature, called "Linear Address Masking" (LAM). It does not have the most-significant-bit issue that UAI has, and it is managed as part of the process context. It supports two modes, with either six or 15 bits being made available for ancillary data; the 15-bit mode only works if five-level page tables are not in use. LAM has been around for a while, and support patches were posted (by Shutemov) in early 2021. That work seems to have stalled after that posting, but can be expected to come back at some point.

Rao's UAI patch set deliberately keeps the AMD implementation entirely separate from the proposed LAM implementation, even though the two are doing essentially the same thing. That led recently appointed x86 co-maintainer Dave Hansen to object: "We'll have one x86 implementation of address bit masking. Both the Intel and AMD implementations will feed into a shared implementation". So this is something that would certainly need to be fixed before this work could be considered for mainline merging.

The other issues are tied to the design of the hardware, though, and will be rather harder to fix in kernel code. For these reasons, the sentiment among kernel developers seems to be that LAM is a better-designed implementation of pointer tagging and should perhaps be what all x86 systems use. In the above-linked message, Lutomirski concluded:

I believe it's possible for a high-quality kernel UAI implementation to exist, but, as above, I think it would be slow, and it might be quite complex and fragile. Are we sure that it's worth supporting it?

A better solution, he suggested, would be for AMD to go back to the drawing board and create its own implementation of LAM instead.

In the early days of Linux, kernel developers had to adapt to whatever the hardware manufacturers put out; the alternative was to not have hardware to run on at all. In 2022, though, those developers feel more confident in their ability to reject support for hardware features that, for whatever reason, they feel do not fit in well with the design of the system. If AMD is unable to get support for UAI into the kernel (it's worth noting that Rao hasn't given up yet), UAI is likely to go mostly unused and developers needing pointer tagging may gravitate toward competing CPUs. According to Gleixner (linked above), AMD was told about the problems with its implementation some time ago; the company may yet have reason to wish it had listened.

Index entries for this article
Kernel	Architectures/x86

Pointer tagging for x86 systems

Posted Mar 28, 2022 17:49 UTC (Mon) by butlerm (subscriber, #13312) [Link] (3 responses)

The wisdom of using addresses with bit 63 set aside, wouldn't it be trivial for an attacker to construct such addresses and trigger any latent bugs in the kernel related to this already? Surely that work is required either way.

Pointer tagging for x86 systems

Posted Mar 28, 2022 18:50 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

While you're right that constructing kernel addresses is trivial, the mitigation today is also trivial - if an address is passed to the kernel with its top bit set, then the called code should simply fail noisily because Something is Bad.

In the UAI world, a pointer with the top bit set could be a kernel address, but it could also be the case that the user is using bit 63 as a tag bit, and the CPU will ignore it on access - the kernel can't tell.

Pointer tagging for x86 systems

Posted Mar 28, 2022 20:12 UTC (Mon) by bartoc (guest, #124262) [Link]

yeah, but since you need to ask the kernel to turn the feature on in the first place the kernel could presumably just say "yeah I know technically you could stash stuff in bit 63, but I'm not gunna let you do that. It could then tell userspace that only 6 bits were available. Sure, userspace could just ignore the kernel and set the 63rd bit, but it could do that already, unfortunately after UAI is enabled there won't be a noisy fault if the kernel dereferences such a pointer.

Pointer tagging for x86 systems

Posted Mar 30, 2022 15:12 UTC (Wed) by BenHutchings (subscriber, #37955) [Link]

With UAI enabled the critiical bit becomes bit 56, not bit 63. All the existing checks - which could be written as (long)addr & BIT(63), or (long)addr < 0, or even implemented in assembly - would need to be updated.

Pointer tagging for x86 systems

Posted Mar 28, 2022 18:15 UTC (Mon) by jhoblitt (subscriber, #77733) [Link] (3 responses)

Does windows still set the top 16bits? If so, amd would seem to have designed this feature ignoring the behavior of operation systems most likely to run on their hardware.

Pointer tagging for x86 systems

Posted Mar 28, 2022 18:33 UTC (Mon) by JoeBuck (subscriber, #2330) [Link] (2 responses)

Servers mostly run Linux, so that's not a market that can be ignored.

Pointer tagging for x86 systems

Posted Mar 28, 2022 19:43 UTC (Mon) by jhoblitt (subscriber, #77733) [Link]

Linux, and most other unix like, kernel space addesss layout seems to have not been considered. Windows is the only other player with significant market share in the x86 space.

Pointer tagging for x86 systems

Posted Mar 28, 2022 22:35 UTC (Mon) by Paf (subscriber, #91811) [Link]

I think his point is basically “doesn’t this also mess up Windows? Who on earth *was* it designed for?”

Pointer tagging for x86 systems

Posted Mar 28, 2022 20:29 UTC (Mon) by willy (subscriber, #9762) [Link] (7 responses)

I know that Our Benevolent Editor knows this is misleading (and this is presumably just clumsy wording):

> There are no systems (yet) that require — or can provide — all of the memory that can be addressed by 64 bits, meaning that there are ranges of address space that do not map to physical memory.

This conflates userspace addressing and kernel addressing. Userspace would love to have more address space available. Even with 64 bit pointers, address space fragmentation is a real thing, and prevents doing things like mmap() of an entire multi-petabyte file.

The real problem is with the CPU. More virtual address bits available to userspace means more levels of page table or larger tables at each level or some other undesirable expansion that affects performance. It also affects how the L3 (perhaps also L2 and L1?) caches are implemented as more address bits must be checked, and thus also stored. Five level page tables come with some real costs beyond the obvious extra level of lookup!

It is my opinion (based on precisely zero inside information) that Intel and AMD have decided that their current architectures will stop at five levels (57 bits of virtual address). When they need to go beyond this, we're talking about a 128 bit architecture with a rather different approach to page tables. So they're using the last few top bits to provide useful functionality like pointer tagging that people actually want.

Pointer tagging for x86 systems

Posted Mar 28, 2022 21:29 UTC (Mon) by jhoblitt (subscriber, #77733) [Link] (5 responses)

128b would be wasteful in terms of I/D cache space. 80b/95b would probably be a reasonable next step. There is also an argument to be made for having the size of memory addresses != size of integers.

Pointer tagging for x86 systems

Posted Mar 29, 2022 3:10 UTC (Tue) by willy (subscriber, #9762) [Link]

Do you have experience in CPU design?

Pointer tagging for x86 systems

Posted Mar 29, 2022 6:07 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (3 responses)

> There is also an argument to be made for having the size of memory addresses != size of integers.

This argument can only come from those nostalgic of the 8086/8088! What a disaster it was!

Pointer tagging for x86 systems

Posted Mar 29, 2022 12:33 UTC (Tue) by jem (subscriber, #24231) [Link] (1 responses)

Or the PDP-10, which had a word size of 36 bits, and an address size of half a word (18 bits). The PDP-10 was word addressable, i.e. the address pointed to a 36-bit word, but there were also "byte instructions", which used special 36-bit byte pointers. Instructions using these byte pointers could load or store any number of bits (1-36) starting from any bit position. (I don't know what happened if the offset+size exceeded the number of bits in a word.)

sizeof (int) == 5 according to a C compiler I got to try out on a DEC-20.

http://pdp10.nocrew.org/docs/instruction-set/Byte.html

Pointer tagging for x86 systems

Posted Apr 1, 2022 1:02 UTC (Fri) by azz (subscriber, #371) [Link]

It's also idiomatic on the PDP-10 to use 36-bit "AOBJN" pointers where the low 18 bits contains the address, and the high 18 bits contains a negated count of words following the address - so each pointer also contains the length of the thing it points at, and you can easily do bounds checks and iteration.

C is really not a good fit for the PDP-10. There's at least one PDP-10 C compiler (Alan Snyder's) where all the primitive types, including char, are 36 bits...

Pointer tagging for x86 systems

Posted Apr 7, 2022 14:35 UTC (Thu) by mrugiero (guest, #153040) [Link]

Not quite: https://nw0.github.io/cheri-rust.pdf

Pointer tagging for x86 systems

Posted Mar 29, 2022 9:31 UTC (Tue) by farnz (subscriber, #17727) [Link]

One thing that adds weight to your notion that they'll not go beyond 57 bits of VA space is that thus far, each page level in x86-64 adds 9 bits of VA space (because that fills an entire 4K page), and apart from the PTEs having their PAT bit in a different place to PDE and PDPTE (the PTE place for PAT is used in all higher level tables for "stop paging here, this is a large page"), the levels are all identical layouts.

And a second thing is the existence of Arm Morello as an implementation that supports the CHERI capabilities model (effectively 129 bit pointers consisting of a hidden "is a valid pointer" bit, then 64 bits used for compressed bounds, permissions and an object type field, 64 bits for an address). If Morello demonstrates that CHERI capabilities can be made to work well and have significant benefits while not hurting existing code, then Intel or AMD may well want to expand chapter 6 of the CHERI paper into a full design.

Pointer tagging for x86 systems

Posted Mar 28, 2022 21:23 UTC (Mon) by abufrejoval (guest, #100159) [Link] (12 responses)

x86-CFI, ARM memory tagging extensions and CHERI are ultimately just stopgap measures, ultimately capability based addressing or something similar needs to break these PDP/VAX based notions of a fully qualified global shared address space.

Best get started earlier fixing this properly than pushing this along with these half-hearted approaches that just pile on more legacy.

Pointer tagging for x86 systems

Posted Mar 28, 2022 21:42 UTC (Mon) by dullfire (guest, #111432) [Link] (5 responses)

> ultimately capability based addressing or something similar needs to break these PDP/VAX based notions of a fully qualified global shared address space.

you mean you want x86 segmentation fully reinstated for AMD64?

Pointer tagging for x86 systems

Posted Mar 28, 2022 21:58 UTC (Mon) by abufrejoval (guest, #100159) [Link] (4 responses)

Please leave the Intel box: the PDP-11 (and many others) was segmented, long before the 8086 did it with a static offset.

But generally speaking the old guys in the 1960's were much less restrained by todays VAX legacy and had some pretty cool ideas (and implementations like the IBM i-Series). There is a lot of inspiration to be found by looking back at what they had theorized on and implemented back then.

Please note that capability based security is finding its way back in projects like Google's Fuchsia and many others really concerned about the long term viability of the von Neumann/Princeton memory model.

Pointer tagging for x86 systems

Posted Mar 28, 2022 23:33 UTC (Mon) by dullfire (guest, #111432) [Link]

Maybe you should take another look at protected mode segmentation then.

It's very very much like a capabilities bases permissions system (and very very much unlike the 8086 segmentation that was only for getting around the 16-bit limit).

Pointer tagging for x86 systems

Posted Mar 29, 2022 3:01 UTC (Tue) by mtaht (guest, #11087) [Link] (2 responses)

Everywhere I turn I see portions of the ideas that were in the mill computer's far more unified versions, gradually being retrofitted into existing architectures.

I wish they'd build it, even just as a virtual machine. It would help people to think better about where we should have started going in the 90s, especially securitywise, when it came to cpu architectures.

Take out some popcorn and watch their talk about security... https://millcomputing.com/docs/#security

Pointer tagging for x86 systems

Posted Mar 29, 2022 5:03 UTC (Tue) by pabs (subscriber, #43278) [Link]

Is there any prospect of Mill resulting in some gateware or hardware that people can use? Or will it remain something that doesn't exist outside the designers minds, talks, documentation and patents?

Pointer tagging for x86 systems

Posted Mar 29, 2022 15:11 UTC (Tue) by willy (subscriber, #9762) [Link]

But the Mill is legit crazy. I forget whether they track dirtiness on a per byte or per bit level, but that level of detail in tracking dirtiness can only hurt. If anything, we should track dirtiness at a super-cache-line level and move things around in a group of two or four cache lines. Of course, that comes with a false sharing problem, which is why it hasn't happened yet.

There's always trade-offs and people can have a real conversation about whether 32, 64 or 128 bytes is the correct size of a cache line, but there's a knee to this curve and 1-4 bytes is outside the scope of sane conversation.

Pointer tagging for x86 systems

Posted Mar 28, 2022 21:48 UTC (Mon) by calumapplepie (guest, #143655) [Link] (4 responses)

I'm not worried about the legacy added by these new measures: we'll need a whole new architecture for (insert new memory addressing scheme here), which means we can ignore most portability problems. Since apps already have to account for systems that don't support this newfangled technology anyways, and porting to an entirely new addressing scheme will probably require work for any application doing low-level shenanigans like memory tagging, it's not like this hurts anything.

Besides, inventing a new magic CPU architecture that fixes all our problems will take years. In the meantime, lets try and use the stopgap measures to get as much performance and power efficiency as we can.

Pointer tagging for x86 systems

Posted Mar 28, 2022 22:05 UTC (Mon) by abufrejoval (guest, #100159) [Link] (3 responses)

While I do agree with you in a way, this topic keeps reminding me of the climate change debate: we all know something needs to be done, but nobody seems to work on solutions with the proper level of forward looking research.

And we should at least invest into making sure that new code can be written without fully qualified pointers (somewhat like Rust vs. Cx).

Finding ways to defuse pointers seems pretty easy compared to saving the planet, especially when potential architectures have already been proposed decades ago...

Pointer tagging for x86 systems

Posted Mar 28, 2022 22:39 UTC (Mon) by Paf (subscriber, #91811) [Link] (2 responses)

You know, “next” architectures have been tried before - several times, including with huge amounts of $$$ behind them (Alpha, Itanic, others). For the last few decades at least, they’ve been losing out to extensions/cleanups of the existing models.

It’s not obvious to me these thoughts about addressing will be different.

Pointer tagging for x86 systems

Posted Mar 29, 2022 10:38 UTC (Tue) by james (guest, #1325) [Link] (1 responses)

The big problem with "next" architectures is that they don't have access to all the decades of experience conventional architectures do, but the first generation is expected to be at least competitive. If the general conclusion is that "this would be good if they changed x, y and z", it's probably too late (especially if changing those features can't be done while keeping compatibility). I'm hoping that ARM has learnt that lesson and aren't trying to productise CHERI too soon.

Examples
Take MIPS and SPARC as successful "next" architectures -- tremendously influential, but with a number of features which later RISC designs dropped.

i432 -- Bob Colwell (one of the lead designers of the Pentium Pro and Pentium 4) is fascinating on the subject: he says that both hardware and software engineers dropped the ball on performance. If they hadn't -- if they'd got to within 50% of competitive on performance, with security and reliability improvements, there would have been a market for the chip.

Itanium -- the big lesson they should have learnt before they ploughed billions into hardware and software is "can we actually provide compilers that do what we say they can", but it's at least arguable there were other mistakes. I seem to remember that when compilers scheduled Itanium programs, they used cycle timings from the current processors -- in particular, fast level 1 cache. That meant that a (theoretical) high-clocking Itanium with caches that were slower in terms of cycles (but not in nanoseconds) would spend a disproportionate amount of time waiting for data from cache when running existing binaries, so Intel didn't produce a system like that.

I really don't think that Alpha counts as "next" for anything other than performance -- and dominating the CPU performance tables throughout the 1990s, then getting dropped for business reasons, doesn't exactly count as an engineering failure.

Pointer tagging for x86 systems

Posted Mar 29, 2022 13:24 UTC (Tue) by farnz (subscriber, #17727) [Link]

CHERI has a decent chance, because it's designed to let you have a "legacy" capability that gives code all the permissions it had before CHERI came into existence; the theory is that you'll start your porting with all of the kernel and userspace pointers living in the legacy capability, and then gradually narrow things down over time, rather than having to do a big bang port. The idea is that you can put capabilities in place at the edges, and move inwards over time, to have a tiny trustworthy core that has "full" capabilities, and that simply restricts the capabilities on offer as you head further from the core.

One more thing I'd add to your Itanium example; the simulations that justified Itanium's design compared hand-written "perfect" Itanium code to current compiler output for x86. With hindsight, the thing that they could easily have seen up-front and didn't spot is that the compiler improvements needed for EPIC also improved performance of compiled code for OoOE, and thus their predicted performance advantage wasn't nearly as good as it could be because OoOE also benefited from their compiler improvements.

Pointer tagging for x86 systems

Posted Mar 28, 2022 23:21 UTC (Mon) by jrtc27 (subscriber, #107748) [Link]

> x86-CFI, ARM memory tagging extensions and CHERI

CHERI is not like the other two, it *is* full capability-based addressing; the C stands for Capability

Pointer tagging for x86 systems

Posted Mar 28, 2022 22:11 UTC (Mon) by JoeBuck (subscriber, #2330) [Link] (9 responses)

Perhaps the most fundamental of those is that UAI allows the most-significant bit of the address to be used by user space. In current systems, only kernel-space addresses have that bit set. Turning on UAI would allow user space to create pointer values that look like kernel addresses, but which would actually be valid user-space pointers.

Couldn't this be changed so that the most significant non-UAI bit, instead of the most significant bit, would be used to distinguish kernel addresses?

Pointer tagging for x86 systems

Posted Mar 28, 2022 23:36 UTC (Mon) by dullfire (guest, #111432) [Link] (8 responses)

That would add significant overhead to basically all code paths that need to check a pointers validity

(to be clear: any overhead to those code paths will be significant)

Pointer tagging for x86 systems

Posted Mar 29, 2022 16:36 UTC (Tue) by imMute (guest, #96323) [Link] (7 responses)

How would checking if bit 47 was set be any more expensive than checking bit 63?

Pointer tagging for x86 systems

Posted Mar 29, 2022 16:44 UTC (Tue) by farnz (subscriber, #17727) [Link] (5 responses)

Bit 63 of a 64 bit register is the sign bit if you're interpreting the register contents as a signed integer; it thus has special handling to make it easier to check. test rax, rax ; jl error will jump to the label error if the pointer in rax has bit 63 set. test doesn't support a 64 bit immediate, so you need to free up a register, or accept the slowdown from accessing memory, if you want to check any bits other than 63.

Pointer tagging for x86 systems

Posted Mar 29, 2022 22:00 UTC (Tue) by khim (subscriber, #9252) [Link] (3 responses)

Why would you need test for that? Just use bt rax, 47; jc error and that's it.

Pointer tagging for x86 systems

Posted Mar 30, 2022 8:33 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

Because test is friendlier to the OoOE machinery on modern CPUs than bt, and hence faster to execute. As this is pure overhead in the absence of bugs and/or malicious code attacking the kernel, we want it to be as lightweight as possible so as to spend more CPU resource doing useful work, and less CPU resource validating that userspace hasn't gone insane.

Using Agner's instruction table PDF, test reg, reg has had a lower reciprocal throughput than bt reg, imm since IvyBridge on Intel's high performance side, and since Intel Haswell processors, up until the latest Xeons, can be executed on more execution ports than bt reg,imm. On the low power Intel side (Atom from Silvermont onwards), test reg, reg takes one execution unit instead of both on Silvermont, and has 3x the throughput on Goldmont.

On the AMD side, test reg, reg becomes higher throughput than bt reg, imm in Excavator cores, returns to be equally cheap for Zen 1 and Zen 2, and then test reg, reg becomes cheaper than bt reg, imm in Zen 3. In the low power cores (Bobcat, Jaguar), the cost is the same for either instruction.

Hence the preference for test reg, reg over bt reg, imm - there are no CPUs on Agner's list where bt is faster than test, but there are several cores, including the current high performance µarches from Intel and AMD, where test is cheaper to execute than bt.

Pointer tagging for x86 systems

Posted Mar 30, 2022 10:21 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

I'm not asking about why you would use 63th bit instead of 47th if you have a choice.

But we are discussing here AMD-only features and you say that one would need to free up a register, or accept the slowdown from accessing memory. That's not true. If these future CPUs have fast enough bt then there would be no slowdown (at least if that would be CPU-specific kernel build which is often acceptable for servers or things like ChromeOS).

Yes, you probably couldn't build an universal kernel which is both supporting UAI and Intel CPUs, but that's another, separate, issue.

Pointer tagging for x86 systems

Posted Mar 30, 2022 13:59 UTC (Wed) by farnz (subscriber, #17727) [Link]

If we change future CPUs to have a fast bt, then yes, we can use it instead of the test instruction that's faster on AMD EPYC processors.

But if we change UAI to not use bit 63 as part of the tag, then we could avoid the whole problem, too. And given that both Intel and AMD have changed from having bt reg, imm be as fast as test reg, reg to having test reg, reg be faster than bt reg, imm, I think a fix to UAI is almost certainly the simpler route.

This is especially true because UAI is the new thing - if making bt fast was worthwhile for things other than bt, then we'd have done it already. Making test reg, reg fast is worthwhile because it's a common idiom used by compilers for testing the relationship between a register and 0, so having it be fast speeds up other code.

Plus, your example code bt reg, 47 is buggy in its own right if AMD ever implement 5-level paging, and is buggy on Intel chips that exist today with 5 level paging. And, on top of that, if AMD do implement 5 level paging, UAI would leave no bit that can be uniquely used to distinguish kernel and user addresses, so there's no way to make the bt solution work reliably (with 5 level paging, bits 56 to 0 are VA bits, leaving 7 bits at bit 57 to 63 not translated, but UAI permits userspace processes to convert bits 63 to 56 into tag bits). Intel, at least, left bit 63 spare, and said that you get 6 tag bits with 5 level paging, or 15 with 4 level.

Pointer tagging for x86 systems

Posted Apr 1, 2022 7:57 UTC (Fri) by ecm (subscriber, #129897) [Link]

While jl may work to branch depending on the most significant bit, the idiomatic choice is js/jns, "jump if (not) sign bit set".

Pointer tagging for x86 systems

Posted Mar 30, 2022 13:51 UTC (Wed) by adobriyan (subscriber, #30858) [Link]

"test rax, rax; js" is smaller than "bt rax, 47; jc"

0000000000000000 <f>:
0: 48 85 c0 test rax,rax
3: 48 0f ba e0 3f bt rax,0x3f

why not use low bit instead ?

Posted Mar 30, 2022 16:00 UTC (Wed) by ballombe (subscriber, #9523) [Link] (7 responses)

Why not use low bit instead ?
After all unaligned accesses are not supported anymore so all pointers start with 3 zero bits.

why not use low bit instead ?

Posted Mar 30, 2022 17:17 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (4 responses)

That means you need to manually mask off the bits before any actual dereference instead of the hardware supporting "I ignore the upper bits" support.

why not use low bit instead ?

Posted Apr 3, 2022 9:57 UTC (Sun) by dcoutts (subscriber, #5387) [Link] (2 responses)

This is exactly how GHC's pointer tagging works.

https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts...

why not use low bit instead ?

Posted Apr 3, 2022 12:08 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

That's fine in a language which hides raw pointers from you in the first place. I think I'd prefer using the upper bits in C if the CPU had support for ignoring them for me rather than asking "umm, did I remember to mask this out properly?" before any pointer dereference.

why not use low bit instead ?

Posted Apr 4, 2022 12:01 UTC (Mon) by dcoutts (subscriber, #5387) [Link]

Indeed, it would be a nightmare in C. I guess it'd be doable in C++.

This hardware feature is for almost certainly for performance though, not convenience. My guess is that it's primarily aimed at JVMs and similar.

I don't know for sure, but I'd guess that doing pointer tagging in software (and thus having to untag before dereferencing) is cheaper to do for the low bits than the high bits. That is, cheaper in terms of the extra instructions and their sizes. But then when doing it in hardware, a hardware impl can do it cheaply either way, and given that there's more bits available at the high end, it makes sense to use the high bits.

why not use low bit instead ?

Posted Apr 6, 2022 6:51 UTC (Wed) by anton (subscriber, #25547) [Link]

In many cases you know when using the address what the tag is, and then you can just use an offset at no or very low extra cost. E.g., if tag 3 means that we have a pointer to a cons cell, then car (aka head) accesses the machine word at offset -3, while cdr (tail) accesses the word at offset 5.

Low-bit tagging is used when 3, maybe 4 bits of tags are enough. If you need more, it becomes impractical, and you use high-bit tagging.

why not use low bit instead ?

Posted Mar 30, 2022 23:25 UTC (Wed) by neilbrown (subscriber, #359) [Link]

When accessing a single-byte (e.g. ASCII character) the low bit is a meaningful bit.

why not use low bit instead ?

Posted Apr 1, 2022 8:13 UTC (Fri) by marcH (subscriber, #57642) [Link]

> After all unaligned accesses are not supported anymore

Says who?

> so all pointers start with 3 zero bits.

Yes as long as you use only 64 bits values.

Pointer tagging for x86 systems

Posted Apr 1, 2022 8:07 UTC (Fri) by marcH (subscriber, #57642) [Link]

> It sounds like a recipe for ongoing security problems, ...
> ....
> In the early days of Linux, kernel developers had to adapt to whatever the hardware manufacturers put out; the alternative was to not have hardware to run on at all. In 2022, though, those developers feel more confident in their ability to reject support for hardware features that, for whatever reason, they feel do not fit in well with the design of the system.

In the early days, software developers trusted CPU designs. Sure there were a couple bugs now and then but nothing huge. Then came spectre and friends...

Pointer tagging for x86 systems

Posted Apr 27, 2022 8:35 UTC (Wed) by cavok (subscriber, #33216) [Link]

Wouldn't all this pointer tagging affect address randomization as well?