--
JaredSmolens - 24 Jul 2007
Introduction
This section explains how to add a special ASI (address space
identifier) to the
OpenSPARC T1 instruction set. ASIs are useful for
exposing internal registers to user-level and privileged software.
Because ASI accesses take the form of 64-bit load and store
instructions, they offer a convenient and flexible way to read and
write both small and large internal structures.
Background
ASI accesses look very much like normal load and store instructions to
the assembly programmer. For example, the following program writes a
value in architectural register %l0 to virtual address (VA) 0x8 at the
ASI 0xa1 (implemented later in this document) and then reads from the
same location into architectural register %l1:
#define ASI_EXAMPLE 0xa1
setx 0x08, %g2, %g1
setx 0xdeadbeef0, %g2, %l0
stxa %l0, [%g1] ASI_EXAMPLE
ldxa [%g1] ASI_EXAMPLE, %l1
For simplicity, this section shows how to use "non-translating" ASIs
where the supplied virtual address is used directly by hardware. This
is simple and generally the best choice for internal registers, due to
its simplicity. Other types of ASIs which use mappings to real or
physical addresses are also available, but require more work (see
chapter 10.2 in
UltraSPARC? Architecture 2005 for more information).
In this section, we piggyback on the existing interface to the
scratchpad registers (defined using 0x20 and 0x4f for privileged and
hyperprivileged registers, respectively) to provide a read-write
interface to ASI 0x1a (an unallocated ASI that can be accessed by both
privileged and hyperprivileged programs). The Load-Store Unit (LSU)
is responsible for decoding the ASI and routing load and store data
between internal registers and the general-purpose registers.
Therefore, we will concentrate our changes within the LSU (the Trap
Logic Unit (TLU) contains SRAM for the actual scratchpad registers,
but we do not need to modify the TLU in this example).
In order to process an ASI, we use the following existing signals (all
available at the "sparc" level of the RTL hierarchy). For simplicity,
we have chosen control signals from the E pipeline stage, although
signals at later stages are also sometimes available. Write data is
only available in the W/G stage. Note: the pipeline stages generally follow
this convention: F - S - D - E - M - W/G - W2.
| Name | Description |
| ifu_lsu_alt_space_e | Decode signal indicating an ASI load or store |
| ifu_lsu_ld_inst_e | Decode signal indicating a load |
| ifu_lsu_st_inst_e | Decode signal indicating a store |
| ifu_tlu_thrid_e | Decode signal indicating the current Thread ID |
| lsu_spu_asi_state_e[7:0] | LSU signal specifying the ASI number |
| exu_lsu_ldst_va_e[47:0] | Virtual address for the ASI access |
| lsu_tlu_rs3_data_g[63:0] | Write data for ASI store instructions (to your internal registers) |
We also create a set of new signals which are used by ASI load instructions
to return data to the LSU. These signals are asserted in the W2
stage. Note that if an ASI load is executed, the valid signal must be
asserted. If this does not occur, the load instruction will never
complete (the RTL model will eventually halt with a timeout)!
| Name | Description |
| archfp_lsu_ldxa_vld_w2 | ASI load data valid signal |
| archfp_lsu_ldxa_tid_w2 | ASI load's Thread ID |
| archfp_lsu_ldxa_data_w2[63:0] | ASI load data (from your internal registers) |
Implementation
We must now create a module ("archfp" in this example) that pipes the
control signals to the appropriate stages and acts upon them (here,
writing to register for the ASI store and reading from the same
register for the ASI load). Next, we must also tell the LSU that ASI
0x1A is now a valid ASI and route the loaded value through the LSU's
bypass network.
We will use the above signals to interface with an example internal
register in the following simplified Verilog module which we
instantiate in the top-level "sparc" Verilog module.
module archfp ( // Inputs
clk,
ifu_lsu_alt_space_e,
ifu_lsu_ld_inst_e,
ifu_lsu_st_inst_e,
ifu_tlu_thrid_e,
lsu_spu_asi_state_e,
exu_lsu_ldst_va_e,
lsu_tlu_rs3_data_g,
// Outputs
archfp_lsu_ldxa_vld_w2,
archfp_lsu_ldxa_tid_w2,
archfp_lsu_ldxa_data_w2 );
// ... [input/output/wire declarations] ...
// Enable signals for loading/storing our ASI in the E stage
assign asi_ld_e = ifu_lsu_alt_space_e & ifu_lsu_ld_inst_e &
( lsu_spu_asi_state_e == 8'h1A );
assign asi_st_e = ifu_lsu_alt_space_e & ifu_lsu_st_inst_e &
( lsu_spu_asi_state_e == 8'h1A );
// ... [Pipe asi_ld_e to stage W2 and asi_st_e to stage W/G] ...
// [through stages E -> M -> W/G -> W2]
//
// Can also pipe the VA for special operations based upon the VA.
// Our internal flop which is written by the store ASI and
// read by the load ASI instructions
dffe #(64)
internal_ff ( .clk ( clk ), .en ( asi_st_g ),
.din ( lsu_tlu_rs3_data_g ),
.q ( archfp_lsu_ldxa_data_w2 ) );
// Valid signal and TID asserted at the appropriate stage
assign archfp_lsu_ldxa_vld_w2 = asi_ld_w2;
assign archfp_lsu_ldxa_tid_w2 = ifu_tlu_thrid_w2;
endmodule
Next, in order to tell the LSU that ASI 0x1a is now valid, edit the file
sparc/lsu/rtl/lsu_asi_decode.v and locate the assign statement for
asi_internal_d. Add a condition for the new ASI value:
assign asi_internal_d =
(asi_d[7:0] == 8'h1A) |
[ ... remainder of original assign ... ];
Route the new signals archfp_lsu_ldxa_vld_w2, archfp_lsu_ldxa_tid_w2 and
archfp_lsu_ldxa_vld_w2 through the lsu module into lsu_qdp1 (data and vld
signals) lsu_dctl (vld and tid).
In lsu_dctl, first locate the assign statements for
lmq_byp_data_fmx_sel[3:0]. This signal is normally asserted when the
TLU processes an ASI load instruction. We will also assert it when
our module replies to an ASI load (there is no conflict here because
only one ASI load instruction can be in each pipeline stage at a
time). The final code looks like this:
assign lmq_byp_data_fmx_sel[0] = ( int_ldxa_vld | archfp_lsu_ldxa_vld_w2 ) & thread0_w2 ;
assign lmq_byp_data_fmx_sel[1] = ( int_ldxa_vld | archfp_lsu_ldxa_vld_w2 ) & thread1_w2 ;
assign lmq_byp_data_fmx_sel[2] = ( int_ldxa_vld | archfp_lsu_ldxa_vld_w2 ) & thread2_w2 ;
assign lmq_byp_data_fmx_sel[3] = ( int_ldxa_vld | archfp_lsu_ldxa_vld_w2 ) & thread3_w2 ;
Also in lsu_dctl, locate the assign for ldxa_thrid_w2[1:0] and mux the current TID from the TLU
with the new TID from our unit.
// ldxa thread id
//assign ldxa_thrid_w2[1:0] = tlu_lsu_ldxa_tid_w2[1:0] ; // Removed: original TID assignment
mux2ds #(2) // Added mux
mux_ldxa_thrid ( .dout ( ldxa_thrid_w2 ),
.in0 ( tlu_lsu_ldxa_tid_w2[1:0] ),
.in1 ( archfp_lsu_ldxa_tid_w2[1:0] ),
.sel0 ( ~archfp_lsu_ldxa_vld_w2 ),
.sel1 ( archfp_lsu_ldxa_vld_w2 ) );
Finally, in lsu_qdp1, we mux our load value with the TLU's value using our
valid signal to distinguish the two requests and route the signal into
the existing ldbyp0_fmx mux.
wire [63:0] ldxa_data_w2;
mux2ds #(64) ldbyp0_archfp ( .in0 ( tlu_lsu_int_ldxa_data_w2[63:0] ),
.in1 ( archfp_lsu_ldxa_data_w2[63:0] ),
.sel0 ( ~archfp_lsu_ldxa_vld_w2 ),
.sel1 ( archfp_lsu_ldxa_vld_w2 ),
.dout ( ldxa_data_w2 ) );
// Existing mux
mux2ds #(64) ldbyp0_fmx (
.in0 (lmq0_bypass_misc_data[63:0]),
.in1 (ldxa_data_w2[63:0]), // We changed this input from tlu_lsu_int_ldxa_data_w2[63:0]
.sel0 (~lmq_byp_data_fmx_sel[0]),
.sel1 (lmq_byp_data_fmx_sel[0]),
.dout (lmq0_bypass_data_in[63:0]) );
Conclusion
This implements a new ASI load/store in the
OpenSPARC T1. In this
example, we have avoided processing the virtual address (this could,
for example, address entries in an SRAM or read from other internal
registers). We have also ignored validity checking on the VA address
(this could raise an exception). This implementation also requires
strict timing on responses to ASI loads. ASIs with variable-latency
responses can also be implemented.