GEM5 Garnet Standalone packet injection pattern garnet包的生成路径:packet message flit

完整的流程

/ 在不同的消息类别上对不同的一致性消息类型进行建模。
//
// GarnetSyntheticTraffic 采用 Garnet_standalone 一致性协议
// 它对三个消息类/虚拟网络进行建模。
// 它们是:请求、转发、响应。
// 请求和转发是“控制”数据包(通常为 8 字节),
// 而响应是“数据”包(通常为 72 字节)。
//
// 数据包从测试仪进入网络的生命周期:
// (1) 该函数generatePkt()生成其中之一的数据包
// 以下 3 种类型(随机):ReadReq、INST_FETCH、WriteReq
// (2) mem/ruby/system/RubyPort.cc 将它们转换为 RubyRequestType_LD,
// 分别为 RubyRequestType_IFETCH、RubyRequestType_ST
// (3) mem/ruby/system/Sequencer.cc 将这些发送到缓存控制器
// 在一致性协议中。
// (4) Network_test-cache.sm 标签 RubyRequestType:LD,
// RubyRequestType: IFETCH 和 RubyRequestType: ST as
// 分别为请求、转发和响应事件;
// 并将它们分别注入到虚拟网络0、1和2中。
// 它立即回调定序器。
// (5) 数据包遍历网络(simple/garnet)并到达其
// 目的地(目录)和网络统计信息已更新。
// (6) Network_test-dir.sm 只是丢弃数据包。

先启动docker,然后cd 进gem5文件夹

sudo docker run -u $UID:$GID --volume  /home/yz/myprojects/2024GEM5/parsec-tests/yzmodifiedgem5/:/gem5  --rm -it gcr.io/gem5-test/ubuntu-22.04_all-dependencies:v22-1 #docker run -u $UID:$GID --volume :/gem5 --rm -it 

按照官网,编译 garnet standaalone.

scons build/NULL/gem5.debug PROTOCOL=Garnet_standalone

使用的命令行实例

./build/NULL/gem5.debug configs/example/garnet_synth_traffic.py  \
        --num-cpus=16 \
        --num-dirs=16 \
        --network=garnet2.0 \
        --topology=Mesh_XY \
        --mesh-rows=4  \
        --sim-cycles=1000 \
        --synthetic=uniform_random \
        --injectionrate=0.01

garnet_synth_traffic.py 解读

这个config python文件,cpu类型是 GarnetSyntheticTraffic:

cpus = [
    GarnetSyntheticTraffic(
        num_packets_max=args.num_packets_max,
        single_sender=args.single_sender_id,
        single_dest=args.single_dest_id,
        sim_cycles=args.sim_cycles,
        traffic_type=args.synthetic,
        inj_rate=args.injectionrate,
        inj_vnet=args.inj_vnet,
        precision=args.precision,
        num_dest=args.num_dirs,
    )
    for i in range(args.num_cpus)
]

src/cpu/testers/garnet_synthetic_traffic/GarnetSyntheticTraffic.py 下,pybind了cpp的代码

class GarnetSyntheticTraffic(ClockedObject):
	type = "GarnetSyntheticTraffic"
    cxx_header = (
        "cpu/testers/garnet_synthetic_traffic/GarnetSyntheticTraffic.hh"
    )
    cxx_class = "gem5::GarnetSyntheticTraffic"

garnet_synth_traffic.py中的 一大串例如 num_packets_max=args.num_packets_max, 会作为 const Params &p传递给cpp,而src/cpu/testers/garnet_synthetic_traffic/GarnetSyntheticTraffic.cc中,创建时就对成员变量初始化:

GarnetSyntheticTraffic::GarnetSyntheticTraffic(const Params &p)
    : ClockedObject(p),
      tickEvent([this]{ tick(); }, "GarnetSyntheticTraffic tick",
                false, Event::CPU_Tick_Pri),
      cachePort("GarnetSyntheticTraffic", this),
      retryPkt(NULL),
      size(p.memory_size),
      blockSizeBits(p.block_offset),
      numDestinations(p.num_dest),
      simCycles(p.sim_cycles),
      numPacketsMax(p.num_packets_max),
      numPacketsSent(0),
      singleSender(p.single_sender),
      singleDest(p.single_dest),
      trafficType(p.traffic_type),
      injRate(p.inj_rate),
      injVnet(p.inj_vnet),
      precision(p.precision),
      responseLimit(p.response_limit),
      requestorId(p.system->getRequestorId(this))

在 C++ 中,冒号 ( : )在构造函数中用于初始化成员变量和基类。这种语法称为初始化列表。初始化列表紧跟在构造函数声明的后面,并在函数体执行之前初始化类的成员。提供的代码中,GarnetSyntheticTraffic 类的构造函数使用初始化列表来初始化其成员变量和基类。

代码示例解释:

ClockedObject§:这是对基类 ClockedObject 的构造函数的调用。它使用参数 p(一个 Params 结构体)来初始化基类部分的 GarnetSyntheticTraffic 对象。

后续的每一行(例如,tickEvent([this]{ tick(); }, “GarnetSyntheticTraffic tick”, false, Event::CPU_Tick_Pri))都是成员变量的初始化。每个成员变量都使用特定的值或表达式进行初始化。例如:

tickEvent 成员使用一个 lambda 函数、一个字符串和两个布尔值进行初始化。
cachePort 使用字符串和 this 指针(指向当前对象)进行初始化。
numPacketsMax是一个int值,具体的数字初始化为p.num_packets_max.
最后一个成员变量 requestorId 是使用 p.system->getRequestorId(this) 的返回值进行初始化。

命令行args与运行

pkt的生成

命令行有一些args 例如–injectionrate=0.01,还有 --synthetic=uniform_random ,传递给ruby createsystem.
Ruby.create_system(args, False, system)

比如 uniform_random,会(通过某种方式,目前还没解读到)传递到 src/cpu/testers/garnet_synthetic_traffic/GarnetSyntheticTraffic.cc中,
GarnetSyntheticTraffic.cc中的代码:
else if (traffic == UNIFORM_RANDOM_) {
destination = random_mt.random(0, num_destinations - 1);
.
void
GarnetSyntheticTraffic::generatePkt() 根据vnet不同,创建不同的 req .

req打包变成 PacketPtr pkt = new Packet(req, requestType);

sendPkt(pkt); 发送出去.

sendPkt(pkt) 怎么发送

void
GarnetSyntheticTraffic::sendPkt(PacketPtr pkt)
{
    if (!cachePort.sendTimingReq(pkt)) {
        retryPkt = pkt; // RubyPort will retry sending
    }
    numPacketsSent++;
}

simplePort: cachePort.sendTimingReq(pkt) 怎么发送

!cachePort.sendTimingReq(pkt) 用了 sendTimingReq.

这个函数细节在 src/mem/port.hh里:

RequestPort::sendTimingReq(PacketPtr pkt)
{
    try {
        addTrace(pkt);
        bool succ = TimingRequestProtocol::sendReq(_responsePort, pkt);
        if (!succ)
            removeTrace(pkt);
        return succ;
    } catch (UnboundPortException) {
        reportUnbound();
    }
}

s这里的 rc/mem/port.hh中的RequestPort::sendTimingReq 使用的 _responsePort传递进 TimingRequestProtocol::sendReq函数里,作为 *peer.
这里,从port.hh的 sendTimingReq,到下一步我们要看到 src/mem/protocol/timing.cc 中的 sendReq.

src/mem/protocol/timing.cc 中的 TimingRequestProtocol::sendReq怎么发送

src/mem/protocol/timing.cc里, sendReq 被使用了,而 sendReq内部,则是使用了 peer->recvTimingReq(pkt).

/* The request protocol. */

bool
TimingRequestProtocol::sendReq(TimingResponseProtocol *peer, PacketPtr pkt)
{
    assert(pkt->isRequest());
    return peer->recvTimingReq(pkt);
}

这里TimingRequestProtocol的peer是 src/mem/protocol/timing.hh中 class TimingResponseProtocol类 , 这个类的官方注释里写了,

 @param peer Peer to send the packet to.
     * @param pkt Packet to send.

peer->recvTimingReq(pkt)

src/mem/ruby/system/RubyPort.cc 中会用一个 makeRequest

bool
RubyPort::MemResponsePort::recvTimingReq(PacketPtr pkt)

	// Submit the ruby request
    RequestStatus requestStatus = owner.makeRequest(pkt);

makeRequest

src/mem/ruby/system/Sequencer.cc 中 ,Sequencer 会有一个 issueRequest 的操作

RequestStatus
Sequencer::makeRequest(PacketPtr pkt)
	   // non-aliased with any existing request in the request table, just issue
    // to the cache
    if (status != RequestStatus_Aliased)
        issueRequest(pkt, secondary_type);

    // TODO: issue hardware prefetches here
    return RequestStatus_Issued;

Sequencer的 issueRequest

``cpp
//创建一个ruby request
std::shared_ptr msg;
msg = pkt.各种操作//将pkt变成ruby request
m_mandatory_q_ptr->enqueue(msg, clockEdge(), latency);//插入 m_mandatory_q_ptr-

其实能找到这里是逆推的,csdn的小伙伴告诉我有资料说 mandatoryqueue是request进入网络的关键,于是从 mandatoryqueue倒推找到pkt的send后再顺着写下来.  不然,附录里有太多 recvTimingReq很容易混淆. 

##  m_mandatory_q_ptr = m_controller->getMandatoryQueue(); 在 src/mem/ruby/system/RubyPort.cc中 
 src/mem/ruby/system/RubyPort.hh中定义了类型是msg buffer:   MessageBuffer* m_mandatory_q_ptr;
m_mandatory_q_ptr有enqueue操作没有dequeue操作,说明它的deque操作是 这个mandatoryqueue的另一个名字操作.
 
 ## NI 中 CHECK MESSAGE BUFFER 并且  flitisizeMessage



```cpp

void
NetworkInterface::wakeup()
{
    std::ostringstream oss;
    for (auto &oPort: outPorts) {
        oss << oPort->routerID() << "[" << oPort->printVnets() << "] ";
    }
    DPRINTF(RubyNetwork, "Network Interface %d connected to router:%s "
            "woke up. Period: %ld\n", m_id, oss.str(), clockPeriod());

    assert(curTick() == clockEdge());
    MsgPtr msg_ptr;
    Tick curTime = clockEdge();

    // Checking for messages coming from the protocol
    // can pick up a message/cycle for each virtual net
    for (int vnet = 0; vnet < inNode_ptr.size(); ++vnet) {
        MessageBuffer *b = inNode_ptr[vnet];
        if (b == nullptr) {
            continue;
        }

        if (b->isReady(curTime)) { // Is there a message waiting
            msg_ptr = b->peekMsgPtr();
            if (flitisizeMessage(msg_ptr, vnet)) {
                b->dequeue(curTime);
            }
        }
    }

到这里,一个pkt 就变成msg,存进message buffer,然后变成了flits,进入了noc 网络.

以下是附录,备用而已,不用看.

以下是附录,备用而已,不用看.

以下是附录,备用而已,不用看.

其他的相近代码 没删,只是为了备用

peer->recvTimingReq(pkt) 中的 recvTimingReq(pkt)

我们点击vccode中的gotodefination
src/systemc/tlm_bridge/gem5_to_tlm.hh

bool
        recvTimingReq(gem5::PacketPtr pkt) override
        {
            return bridge.recvTimingReq(pkt);
        }

或者
util/tlm/src/sc_slave_port.cc


/**
 *  Similar to TLM's non-blocking transport (AT)
 */
bool
SCSlavePort::recvTimingReq(gem5::PacketPtr packet)
{
    CAUGHT_UP;

    panic_if(packet->cacheResponding(), "Should not see packets where cache "
             "is responding");

    panic_if(!(packet->isRead() || packet->isWrite()),
             "Should only see read and writes at TLM memory\n");


    /* We should never get a second request after noting that a retry is
     * required */
    sc_assert(!needToSendRequestRetry);

    /* Remember if a request comes in while we're blocked so that a retry
     * can be sent to gem5 */
    if (blockingRequest) {
        needToSendRequestRetry = true;
        return false;
    }

    /*  NOTE: normal tlm is blocking here. But in our case we return false
     *  and tell gem5 when a retry can be done. This is the main difference
     *  in the protocol:
     *  if (requestInProgress)
     *  {
     *      wait(endRequestEvent);
     *  }
     *  requestInProgress = trans;
    */

    /* Prepare the transaction */
    tlm::tlm_generic_payload * trans = mm.allocate();
    trans->acquire();
    packet2payload(packet, *trans);

    /* Attach the packet pointer to the TLM transaction to keep track */
    Gem5Extension* extension = new Gem5Extension(packet);
    trans->set_auto_extension(extension);

    /*
     * Pay for annotated transport delays.
     *
     * The header delay marks the point in time, when the packet first is seen
     * by the transactor. This is the point int time, when the transactor needs
     * to send the BEGIN_REQ to the SystemC world.
     *
     * NOTE: We drop the payload delay here. Normally, the receiver would be
     *       responsible for handling the payload delay. In this case, however,
     *       the receiver is a SystemC module and has no notion of the gem5
     *       transport protocol and we cannot simply forward the
     *       payload delay to the receiving module. Instead, we expect the
     *       receiving SystemC module to model the payload delay by deferring
     *       the END_REQ. This could lead to incorrect delays, if the XBar
     *       payload delay is longer than the time the receiver needs to accept
     *       the request (time between BEGIN_REQ and END_REQ).
     *
     * TODO: We could detect the case described above by remembering the
     *       payload delay and comparing it to the time between BEGIN_REQ and
     *       END_REQ. Then, a warning should be printed.
     */
    auto delay = sc_core::sc_time::from_value(packet->payloadDelay);
    // reset the delays
    packet->payloadDelay = 0;
    packet->headerDelay = 0;

    /* Starting TLM non-blocking sequence (AT) Refer to IEEE1666-2011 SystemC
     * Standard Page 507 for a visualisation of the procedure */
    tlm::tlm_phase phase = tlm::BEGIN_REQ;
    tlm::tlm_sync_enum status;
    status = transactor->socket->nb_transport_fw(*trans, phase, delay);
    /* Check returned value: */
    if (status == tlm::TLM_ACCEPTED) {
        sc_assert(phase == tlm::BEGIN_REQ);
        /* Accepted but is now blocking until END_REQ (exclusion rule)*/
        blockingRequest = trans;
    } else if (status == tlm::TLM_UPDATED) {
        /* The Timing annotation must be honored: */
        sc_assert(phase == tlm::END_REQ || phase == tlm::BEGIN_RESP);

        PayloadEvent<SCSlavePort> * pe;
        pe = new PayloadEvent<SCSlavePort>(*this,
            &SCSlavePort::pec, "PEQ");
        pe->notify(*trans, phase, delay);
    } else if (status == tlm::TLM_COMPLETED) {
        /* Transaction is over nothing has do be done. */
        sc_assert(phase == tlm::END_RESP);
        trans->release();
    }

    return true;
}

peer->recvTimingReq 是谁作为peer

src/mem/port.cc有bind的函数.

void
RequestPort::bind(Port &peer)
{
    auto *response_port = dynamic_cast<ResponsePort *>(&peer);
    fatal_if(!response_port, "Can't bind port %s to non-response port %s.",
             name(), peer.name());
    // request port keeps track of the response port
    _responsePort = response_port;
    Port::bind(peer);
    // response port also keeps track of request port
    _responsePort->responderBind(*this);
}

peer->recvTimingReq 是怎么receive的

src/mem/tport.cc 有一串代码,本质是 schedTimingResp.

bool
SimpleTimingPort::recvTimingReq(PacketPtr pkt)
{
    // the SimpleTimingPort should not be used anywhere where there is
    // a need to deal with snoop responses and their flow control
    // requirements
    if (pkt->cacheResponding())
        panic("SimpleTimingPort should never see packets with the "
              "cacheResponding flag set\n");

    bool needsResponse = pkt->needsResponse();
    Tick latency = recvAtomic(pkt);
    // turn packet around to go back to requestor if response expected
    if (needsResponse) {
        // recvAtomic() should already have turned packet into
        // atomic response
        assert(pkt->isResponse());
        schedTimingResp(pkt, curTick() + latency);
    } else {
        // queue the packet for deletion
        pendingDelete.reset(pkt);
    }

    return true;
}

schedTimingResp在哪里操作的

src/mem/bridge.hh

/**
         * Queue a response packet to be sent out later and also schedule
         * a send if necessary.
         *
         * @param pkt a response to send out after a delay
         * @param when tick when response packet should be sent
         */
        void schedTimingResp(PacketPtr pkt, Tick when);

小结

GarnetSyntheticTraffic 会打包好packet,一个requst packet准备好了后, GarnetSyntheticTraffic::sendPkt 会调用 cachePort.sendTimingReq(pkt). 这个port.sendTimingReq会调用port内部函数 TimingRequestProtocol::sendReq函数.
TimingRequestProtocol::sendReq 里会把传入的pkt 和_responsePort 一起读进来,调用 _responsePort的函数recvTimingReq,也就是这里执行的 peer->recvTimingReq(pkt),其实是 _responsePort->recvTimingReq(pkt).
这个 _responsePort每次都是会变的,取决于何时bind.

而这里,request发出的包是直接相连,或者说 "虚空连接"到response的,并没有经过network. 这里用的函数也都是port.hh或者tport.hh.

rubyport的传包过程

cachePort.sendTimingReq(pkt) 用的 rubyport

还是从 void
GarnetSyntheticTraffic::sendPkt(PacketPtr pkt) 中使用的 cachePort.sendTimingReq(pkt)开始.
只不过,这次的 cachePort 是 RubyPort了.

发req还是用protocol里的 TimingRequestProtocol::sendReq.

一切照常,直到rubyport里的 recvTimingReq

之前我们看的是 src/mem/tport.cc 里的
SimpleTimingPort::recvTimingReq(PacketPtr pkt)

rubyport里则有两种


bool
RubyPort::MemResponsePort::recvTimingReq(PacketPtr pkt)
{
    DPRINTF(RubyPort, "Timing request for address %#x on port %d\n",
            pkt->getAddr(), id);

    if (pkt->cacheResponding())
        panic("RubyPort should never see request with the "
              "cacheResponding flag set\n");

    // ruby doesn't support cache maintenance operations at the
    // moment, as a workaround, we respond right away
    if (pkt->req->isCacheMaintenance()) {
        warn_once("Cache maintenance operations are not supported in Ruby.\n");
        pkt->makeResponse();
        schedTimingResp(pkt, curTick());
        return true;
    }
    // Check for pio requests and directly send them to the dedicated
    // pio port.
    if (pkt->cmd != MemCmd::MemSyncReq) {
        if (!pkt->req->isMemMgmt() && !isPhysMemAddress(pkt)) {
            assert(owner.memRequestPort.isConnected());
            DPRINTF(RubyPort, "Request address %#x assumed to be a "
                    "pio address\n", pkt->getAddr());

            // Save the port in the sender state object to be used later to
            // route the response
            pkt->pushSenderState(new SenderState(this));

            // send next cycle
            RubySystem *rs = owner.m_ruby_system;
            owner.memRequestPort.schedTimingReq(pkt,
                curTick() + rs->clockPeriod());
            return true;
        }
    }

    // Save the port in the sender state object to be used later to
    // route the response
    pkt->pushSenderState(new SenderState(this));

    // Submit the ruby request
    RequestStatus requestStatus = owner.makeRequest(pkt);

    // If the request successfully issued then we should return true.
    // Otherwise, we need to tell the port to retry at a later point
    // and return false.
    if (requestStatus == RequestStatus_Issued) {
        DPRINTF(RubyPort, "Request %s 0x%x issued\n", pkt->cmdString(),
                pkt->getAddr());
        return true;
    }

    // pop off sender state as this request failed to issue
    SenderState *ss = safe_cast<SenderState *>(pkt->popSenderState());
    delete ss;

    if (pkt->cmd != MemCmd::MemSyncReq) {
        DPRINTF(RubyPort,
                "Request %s for address %#x did not issue because %s\n",
                pkt->cmdString(), pkt->getAddr(),
                RequestStatus_to_string(requestStatus));
    }

    addToRetryList();

    return false;
}


bool
RubyPort::PioResponsePort::recvTimingReq(PacketPtr pkt)
{

    for (size_t i = 0; i < owner.request_ports.size(); ++i) {
        AddrRangeList l = owner.request_ports[i]->getAddrRanges();
        for (auto it = l.begin(); it != l.end(); ++it) {
            if (it->contains(pkt->getAddr())) {
                // generally it is not safe to assume success here as
                // the port could be blocked
                [[maybe_unused]] bool success =
                    owner.request_ports[i]->sendTimingReq(pkt);
                assert(success);
                return true;
            }
        }
    }
    panic("Should never reach here!\n");
}

你可能感兴趣的:(GEM5,片上网络NoC,gem5,计算机体系架构)