lab1要求实现将不可靠的字节流 (可能会覆盖、交叉重叠、不按顺序) 转换为可靠字节流的StreamReassembler类。不可靠字节流以不定长字符串形式到达(长度可能为0,但其eof信号是有效的),可靠字节流要求写入lab0实现的字节流中(test程序会调用lab0中的read函数读取写入的可靠字节流)。
传入的参数capacity指的是StreamReassembler所能处理的最大字节数,超过其范围的字节都应当舍弃。
分析其不可靠的情况:
- 覆盖
字符串a首序号为10,长度为5;字符串b首序号为5,长度为25。字符串b覆盖字符串a。
- 交叉重叠
字符串a首序号为10,长度为10;字符串b首序号为5,长度为10。字符串a和字符串b交叉重叠。
- 不按顺序到达
字符串a首序号为10,字符串b首序号为5,字符串a可能会先于字符串b到达。
首先想到的思路是使用unordered_map存储到达的字串及其首序号,利用unordered_map的快速查找来降低部分开销并且处理首序号相同的覆盖情况,之后通过系列逻辑来处理剩余覆盖及交叉叠和不按顺序到达的情况。
其次想到的思路是化整为散,使用unordered_map存储每个字符及其首序号,利用unordered_map的自身的不可重复特性去重,进一步考虑可以用map替代unordered_map,map较于unordered_map的优点是会维护元素之间的顺序,这样在将字符写入的时候查找消耗还会进一步降低,但map在每次新字符到达时都需要维护元素之间的顺序,会产生一定的消耗,和unordered_map之间孰优孰劣不太明显,故最终考虑仍然使用unordered_map。
第一种方法处理覆盖、交叉重叠和不按顺序到达的逻辑更为复杂,但其性能较好;第二种方法的优点是代码逻辑简单,但其时间复杂度是随着字节流的长度增加而线性增加的,空间复杂度在最坏的情况下也是如此,当字节流长度较大时不可用(该实验所用测试字节流最长只有十几万字节,此方法仍可用)。
stream_reassembler.cc:
#include "stream_reassembler.hh"
#include
// Dummy implementation of a stream reassembler.
// For Lab 1, please replace with a real implementation that passes the
// automated checks run by `make check_lab1`.
// You will need to add private members to the class declaration in `stream_reassembler.hh`
template <typename... Targs>
void DUMMY_CODE(Targs &&... /* unused */) {}
using namespace std;
StreamReassembler::StreamReassembler(const size_t capacity) : _output(capacity), _capacity(capacity) {buffer_.erase(1);}
//! \details This function accepts a substring (aka a segment) of bytes,
//! possibly out-of-order, from the logical stream, and assembles any newly
//! contiguous substrings and writes them into the output stream in order.
void StreamReassembler::write_less_capacity(const string &s) {
_output.write(s);
num_ += s.size();
next_ += s.size();
}
void StreamReassembler::write_over_capacity(const string &s) {
_output.write(string(s.cbegin(), s.cbegin() + (_capacity - num_)));
next_ += _capacity - num_;
num_ = _capacity;
}
void StreamReassembler::string_to_write(const string &s) { //将s写入输出字节流
size_t len = s.size();
num_ = _output.bytes_written() - _output.bytes_read();
if (len <= _capacity - num_) { //判断剩余的capacity是否够完整写入s
write_less_capacity(s); //足够完整写入s
} else {
write_over_capacity(s); //不够完整写入s
}
}
auto StreamReassembler::FindMap(unordered_map<size_t, std::string> &umap, size_t index) { //获取buffer_中将next_包含的元素,这个元素可以进行reassemble操作了。
for(auto it = umap.cbegin(); it != umap.cend(); ++it) {
if(it -> first <= index && it -> first + it ->second.size() - 1 >= index)
return it;
}
return umap.cend();
}
void StreamReassembler::BufToWrite() { //依次从buffer_中获取可以reassemble的字符串,然后一次性写入输出字节流
string s;
size_t tempNext = next_; //用临时变量替换next_,防止while循环中对next加len的操作和后面string_to_write中对next加s.size()的操作重合
auto it = FindMap(buffer_, tempNext);
while(it != buffer_.cend()) { //不为end说明it指向的元素必然可以写入输出字节流,或部分或全部
size_t len = it -> second.size() - (tempNext - it -> first);
string tempS = string(it -> second.cend() - len, it -> second.cend());
s += tempS;
tempNext += len;
buffer_.erase(it);
it = FindMap(buffer_, tempNext);
}
if(s.size() > 0) {
string_to_write(s);
}
if (eof_flag_ && (next_ == eof_index_)) { //不能把它放在s.size() > 0的if语句中,输入的字节流可能为空
_output.end_input();
}
}
void StreamReassembler::push_substring(const string &data, const size_t index, const bool eof) {
if (eof) {
eof_index_ = index + data.size();
eof_flag_ = true;
}
if(buffer_.find(index) == buffer_.end()) { //buffer_没有保存过index开始的字符串
buffer_[index] = data;
} else { //buffer_保存过index开始的字符串,那就比较哪个更长,保留更长的,舍弃更短的
if(buffer_[index].size() < data.size())
buffer_[index] = data;
}
BufToWrite();
}
size_t StreamReassembler::unassembled_bytes() const {
unordered_set<size_t> uset;
for(auto &p : buffer_) {
size_t index = p.first;
for(size_t i = 0; i < p.second.size(); ++i) {
if(index + i >= next_) uset.insert(index + i); //加if判断防止有已经装配过的字符(可能装配完后又再次到达,导致BufToWrite的循环中的删除语句未将其删除)
}
}
return uset.size();
}
bool StreamReassembler::empty() const {
return unassembled_bytes() == 0;
}
stream_reassembler.hh:
#ifndef SPONGE_LIBSPONGE_STREAM_REASSEMBLER_HH
#define SPONGE_LIBSPONGE_STREAM_REASSEMBLER_HH
#include "byte_stream.hh"
#include
#include
#include
//! \brief A class that assembles(聚集) a series of excerpts(片段) from a byte stream (possibly out of order,
//! possibly overlapping) into an in-order byte stream.
class StreamReassembler {
private:
// Your code here -- add private members as necessary.
ByteStream _output; //!< The reassembled in-order byte stream
size_t _capacity; //!< The maximum number of bytes
std::unordered_map<size_t, std::string> buffer_ = {{1, "a"}}; //暂时保存未重新装配的字符串
size_t next_ = 0; //已重新装配的长度的后一位,即下一个应该重新装配的string的index
size_t num_ = 0; //已经重新装配的数量
bool eof_flag_ = false; //最后的string是否到来(不一定是已经装配了,可能会暂存)
size_t eof_index_ = -1;
public:
//! \brief(简介) Construct a `StreamReassembler` that will store up to `capacity` bytes.
//! \note This capacity limits both the bytes that have been reassembled,
//! and those that have not yet been reassembled.
StreamReassembler(const size_t capacity);
//! \brief Receive a substring and write any newly contiguous(相邻的) bytes into the stream.
//!
//! The StreamReassembler will stay within the memory limits of the `capacity`.
//! Bytes that would exceed(超过) the capacity are silently discarded(抛弃).
//!
//! \param(parameter 参数) data the substring
//! \param index indicates the index (place in sequence) of the first byte in `data`
//! \param eof the last byte of `data` will be the last byte in the entire stream
void push_substring(const std::string &data, const uint64_t index, const bool eof);
void write_less_capacity(const std::string &s);
void write_over_capacity(const std::string &s);
void string_to_write(const std::string &s);
auto FindMap(std::unordered_map<size_t, std::string> &umap, size_t index);
void BufToWrite();
//! \name Access the reassembled byte stream
//!@{
const ByteStream &stream_out() const { return _output; }
ByteStream &stream_out() {return _output;}
//!@}
//! The number of bytes in the substrings stored but not yet reassembled
//!
//! \note If the byte at a particular(特定的) index has been pushed more than once, it
//! should only be counted once for the purpose of this function.
size_t unassembled_bytes() const;
//! \brief Is the internal(内部的) state(状况) empty (other than(除了) the output stream)?
//! \returns `true` if no substrings are waiting to be assembled
bool empty() const;
};
#endif // SPONGE_LIBSPONGE_STREAM_REASSEMBLER_HH
stream_reassembler.cc:
#include "stream_reassembler.hh"
// Dummy implementation of a stream reassembler.
// For Lab 1, please replace with a real implementation that passes the
// automated checks run by `make check_lab1`.
// You will need to add private members to the class declaration in `stream_reassembler.hh`
template <typename... Targs>
void DUMMY_CODE(Targs &&... /* unused */) {}
using namespace std;
StreamReassembler::StreamReassembler(const size_t capacity) : _output(capacity), _capacity(capacity) {buffer_.erase(1);}
//! \details This function accepts a substring (aka a segment) of bytes,
//! possibly out-of-order, from the logical stream, and assembles any newly
//! contiguous substrings and writes them into the output stream in order.
void StreamReassembler::write_less_capacity(const string &s) {
_output.write(s);
num_ += s.size();
next_ += s.size();
}
void StreamReassembler::write_over_capacity(const string &s) {
_output.write(string(s.cbegin(), s.cbegin() + (_capacity - num_)));
next_ += _capacity - num_;
num_ = _capacity;
}
void StreamReassembler::string_to_write(const string &s) {
size_t len = s.size();
num_ = _output.bytes_written() - _output.bytes_read();
if (len <= _capacity - num_) {
write_less_capacity(s);
} else {
write_over_capacity(s);
}
}
void StreamReassembler::SToC(const string &s, int index) { //将字符串中的字符存入buffer_, unordered_map的buffer_自带去重能力
for(size_t i = 0; i < s.size(); i++) {
buffer_[index + i] = s[i];
}
}
void StreamReassembler::BufToWrite() { //依次获取buffer_中的字符并组装成字符串,然后一次性写入输出字节流
string s;
while(buffer_.find(next_) != buffer_.cend()) {
s += buffer_[next_];
buffer_.erase(next_);
++next_;
}
if(s.size() > 0) {
next_ -= s.size();
string_to_write(s);
}
if (eof_flag_ && (next_ == eof_index_)) { //不能把它放在s.size() > 0的if语句中,输入的字节流可能为空
_output.end_input();
}
}
void StreamReassembler::push_substring(const string &data, const size_t index, const bool eof) {
if (eof) { //eof所在的子串到达后并不一定会直接写入输出字节流,可能还需要等待其他子串填补eof子串前面的空缺
eof_index_ = index + data.size();
eof_flag_ = true;
}
SToC(data, index);
BufToWrite();
}
size_t StreamReassembler::unassembled_bytes() const {
return buffer_.size();
}
bool StreamReassembler::empty() const {
return unassembled_bytes() == 0;
}
stream_reassembler.hh:
#ifndef SPONGE_LIBSPONGE_STREAM_REASSEMBLER_HH
#define SPONGE_LIBSPONGE_STREAM_REASSEMBLER_HH
#include "byte_stream.hh"
#include
#include
#include
//! \brief A class that assembles(聚集) a series of excerpts(片段) from a byte stream (possibly out of order,
//! possibly overlapping) into an in-order byte stream.
class StreamReassembler {
private:
// Your code here -- add private members as necessary.
ByteStream _output; //!< The reassembled in-order byte stream
size_t _capacity; //!< The maximum number of bytes
std::unordered_map<size_t, char> buffer_ = {{1, 'a'}}; //暂时保存未重新装配的字符串
size_t next_ = 0; //已重新装配的长度的后一位,即下一个应该重新装配的string的index
size_t num_ = 0; //已经重新装配的数量
bool eof_flag_ = false; //最后的string是否到来(不一定是已经装配了,可能会暂存)
size_t eof_index_ = -1;
public:
//! \brief(简介) Construct a `StreamReassembler` that will store up to `capacity` bytes.
//! \note This capacity limits both the bytes that have been reassembled,
//! and those that have not yet been reassembled.
StreamReassembler(const size_t capacity);
//! \brief Receive a substring and write any newly contiguous(相邻的) bytes into the stream.
//!
//! The StreamReassembler will stay within the memory limits of the `capacity`.
//! Bytes that would exceed(超过) the capacity are silently discarded(抛弃).
//!
//! \param(parameter 参数) data the substring
//! \param index indicates the index (place in sequence) of the first byte in `data`
//! \param eof the last byte of `data` will be the last byte in the entire stream
void push_substring(const std::string &data, const uint64_t index, const bool eof);
void write_less_capacity(const std::string &s);
void write_over_capacity(const std::string &s);
void string_to_write(const std::string &s);
void SToC(const std::string &s, int index);
void BufToWrite();
//! \name Access the reassembled byte stream
//!@{
const ByteStream &stream_out() const { return _output; }
ByteStream &stream_out() { BufToWrite(); return _output;}
//!@}
//! The number of bytes in the substrings stored but not yet reassembled
//!
//! \note If the byte at a particular(特定的) index has been pushed more than once, it
//! should only be counted once for the purpose of this function.
size_t unassembled_bytes() const;
//! \brief Is the internal(内部的) state(状况) empty (other than(除了) the output stream)?
//! \returns `true` if no substrings are waiting to be assembled
bool empty() const;
};
#endif // SPONGE_LIBSPONGE_STREAM_REASSEMBLER_HH