暴走的橙子～

BoostSearch搜索引擎

今天讲的项目是基于C++的Boost库的站内搜索引擎。因为Boost库内没有搜索关键字功能，所以在这里我们来手动实现一个这样的搜索引擎。当用户在输入框输入要查询的关键字后，就会快速查询出相关的 boost 库中的文档，弥补 boost 在线文档没有搜索功能的缺陷。

项目介绍

开发环境

项目流程

搜索引擎的相关宏观原理

项目代码

parser.cc

index.hpp

searcher.hpp

log.hpp

util.hpp

Makefile

debug.cc

http_server.cc

wwwroot/index.html

项目演示

界面

搜索测试

项目介绍

开发环境

CentOS7、vim、g++、Makefile、vscode

项目流程

1、编写数据去标签与数据清洗模块，将原 html 文档解析成一个行文本文件。

2、读取处理好的行文本进行分词，权重计算等操作，在内存中构建出正排索引和倒排索引。
3、对查询词进行分词、触发，依据查询权重值对结果对结果进行排序，并以 Json 格式序列化为字符串返回。

4、通过 HTTP 服务器搭建搜索页面，为外部提供服务。

搜索引擎的相关宏观原理

客户端把要搜索的关键字通过GET传参方式交给服务端的searcher模块，searcher根据提前建立好的索引，通过用户关键字查找到对应的内容，构建成网页返回给用户。

项目代码

parser.cc

#include 
#include 
#include 
#include 
#include 
#include "util.hpp"

//是一个目录，下面放的是源未处理的所有html网页
const std::string src_path = "data/input";
const std::string output = "data/raw_html/raw.txt"; //放处理后的网页内容

typedef struct DocInfo
{
    std::string title;
    std::string content;
    std::string url;
}DocInfo_t;

//const & 输入
//* 输出
//& 输入输出
bool EnumFile(const std::string& src_path, std::vector* file_list);
bool ParselfHtml(const std::vector& file_list, std::vector* results);
bool SaveHtml(const std::vector& results, const std::string& output);

int main()
{
    //第一步: 递归式的把每个html文件名带路径,保存到file_list中, 方便后期一个一个的文件读取
    std::vector files_list;
    if(!EnumFile(src_path, &files_list))
    {
        std::cerr << "enum file error!" << std::endl;
        return 1;
    }

    //第二步: 按照file_list读取的每个文件的内容, 并进行操作
    std::vector results;
    if(!ParselfHtml(files_list, &results))
    {
        std::cerr << "parsr html error!" << std::endl;
        return 2;
    }

    //第三步: 把解析完的各个文件内容, 写入到output, 按照\3作为每个文档的分隔符
    if(!SaveHtml(results, output))
    {
        std::cerr << "save html error!" << std::endl;
        return 3;
    }

    return 0;
}


//处理该路径下的所有文件, 一般的读文件操作处理这种大批的不适用，所以借用boos库中的
bool EnumFile(const std::string& src_path, std::vector* files_list)
{
    // bool exists(const path& p); //path特是一个命名空间中的类型
    namespace fs = boost::filesystem;
    fs::path root_path(src_path);

    //判断路径是否存在, 如果不存在就没必要往后走了
    if(!fs::exists(root_path)) //没找到
    {
        std::cerr << src_path << " not exists" << std::endl;
        return false;
    }
    //对文本进行递归遍历
    fs::recursive_directory_iterator end; // 定义一个空的迭代器, 用来判断递归结束
    for(fs::recursive_directory_iterator iter(root_path); iter != end; iter++)
    {
        //判断文件是否为普通文件, .html就是普通文件
        if(!fs::is_regular_file(*iter)) //不是普通文件就continue
        {
            continue;
        }
        if(iter->path().extension() != ".html") //判断文件路径名后缀是否符合要求
        {
            continue;
        }

        //当前带路径一定是一个合法的, 以.html结束的普通文件
        //将所有带路径的html保存在file_list, 方便后续进行文本分析
        files_list->push_back(iter->path().string());
    }
    return true;
}
static bool ParseTitle(const std::string file, std::string* title)
{
    size_t begin = file.find("");
    if(begin == std::string::npos)
    {
        return false;
    }

    size_t end = file.find(""); 
    if(end == std::string::npos)
    {
        return false;
    }

    begin += std::string("").size(); //标题部分
    if(begin > end)
    {
        return false;
    }
    *title = file.substr(begin, end - begin); //相减的话, 左闭右开
    return true;
}
static bool ParseContent(const std::string file, std::string* content)
{
    //去标签, 基于一个简易的状态机
    enum status
    {
        LABLE, //不读
        CONTENT //有可能读
    };

    enum status s = LABLE;
    for(char c : file)
    {
        switch(s)
        {
        case LABLE:
            if(c == '>')
            {
                s = CONTENT;
            }
            break;
        case CONTENT:
            if(c == '<')
            {
                s = LABLE;
            }
            else
            {
                //不保留原始文件中的\n, 因为我们想用\n作为html解析之后的文本分隔符
                if(c == '\n')
                {
                    c = ' ';
                }
                content->push_back(c);
            }
            break;
        default:
            break;
        }
    }
    return true;
}
static bool ParseUrl(const std::string file_path, std::string* url)
{
    std::string url_head = "https://www.boost.org/doc/libs/1_82_0/doc/html";
    std::string url_tail = file_path.substr(src_path.size());
    *url = url_head + url_tail; //形成新的官网链接
    return true;
}
static void ShowDoc(const DocInfo_t& doc)
{
    std::cout << "title: " << doc.title << std::endl;
    std::cout << "content: " << doc.content << std::endl;
    std::cout << "url: " << doc.url << std::endl;
}
bool ParselfHtml(const std::vector<std::string>& files_list, std::vector<DocInfo_t>* results)
{
    for(const std::string& file : files_list)
    {
        // 1、读取文件
        std::string result;
        if(!ns_util::FileUtil::ReadFile(file, &result))
        {
            continue;
        }

        DocInfo_t doc;
        // 2、解析指定的文件, 提取title
        if(!ParseTitle(result, &doc.title))
        {
            continue;
        }

        // 3、解析指定文件的content,就是去标签
        if(!ParseContent(result, &doc.content))
        {
            continue;
        }

        // 4、解析指定的文件路径, 构建url
        // html中自己有跳转，但是自己没有url，需要自己拼接
        if(!ParseUrl(file, &doc.url)) //file是当前文本路径内容
        {
            continue;
        }

        //完成了解析任务
        results->push_back(std::move(doc)); //move提高效率,move以后变成右值了->移动构造

        // ShowDoc(doc);
        // break;
    }
    return true;
}
bool SaveHtml(const std::vector<DocInfo_t>& results, const std::string& output)
{
#define SEP '\3'
    //按照二进制方法进行写入
    std::ofstream out(output, std::ios::out | std::ios::binary);
    if(!out.is_open())
    {
        std::cerr << "open " << output << " failed!" << std::endl;
        return false;
    }

    //写文件内容
    for(auto& item : results)
    {
        std::string out_string;
        out_string = item.title;
        out_string += SEP;
        out_string += item.content;
        out_string += SEP;
        out_string += item.url;
        out_string += '\n';

        out.write(out_string.c_str(), out_string.size());
    }
    out.close();
    return true;
}</code></pre> 
  <h3 id="index.hpp">index.hpp</h3> 
  <pre><code class="language-cpp">#pragma once
#include<iostream>
#include<string>
#include<vector>
#include<unordered_map>
#include<mutex>
#include<fstream>
#include"log.hpp"
#include"util.hpp"

namespace ns_index
{
    struct DocInfo
    {
        std::string title; //文档标题
        std::string content; //文档对应的去标签之后的内容
        std::string url; //官网文档url
        int doc_id; //文档的ID, 暂时不做过多理解
    };
    struct InvertedElem
    {
        uint64_t doc_id; //文档id
        std::string word; //关键字
        int weight; //文档权重
    };

    //倒排拉链
    typedef std::vector<InvertedElem> InvertedList;

    class Index
    {
    private:
        //正排索引的数据结构用数组, 数组的下标天然是文档的ID
        std::vector<DocInfo> forward_index; //正排索引
        //倒排索引一定是一个关键字和一个组InvertedElem对应[关键字和倒排拉链映射关系]
        std::unordered_map<std::string, InvertedList> inverted_index;
    private:
        Index() //设置成单例模式
        {}
        Index(const Index& ) = delete;
        Index& operator=(const Index&) = delete; //传过来的Index是否为const都会被禁止
        static Index* instance; //单例指针
        static std::mutex mtx;
    public:
        static Index* GetInstance() //对外提供获取单例对象方法
        {
            if(instance == nullptr)
            {
                mtx.lock();
                if(instance == nullptr)
                {
                    instance = new Index(); //返回创建好对象的地址
                }
                mtx.unlock();
            }
            return instance;
        }

        //根据doc_id找到文档内容
        DocInfo* GetForwardIndex(uint64_t doc_id)
        {
            if(doc_id >= forward_index.size())
            {
                std::cerr << "doc_id out range, error!" << std::endl;
                return nullptr;
            }
            return &forward_index[doc_id];
        }

        //根据关键字string, 获得倒排拉链
        InvertedList* GetInvertedList(const std::string& word)
        {
            //std::unordered_map<std::string, InvertedList>::iterator
            auto iter = inverted_index.find(word);
            if(iter == inverted_index.end())
            {
                std::cerr << word << " have no InvertedList" << std::endl;
                return nullptr;
            }
            return &(iter->second);
        }

        //根据去标签, 格式化之后的文档, 构建正排和倒排索引
        // data/raw_html/raw.txt
        bool BuildIndex(const std::string& input) //prase处理完毕的数据交给我
        {
            std::ifstream in(input, std::ios::in | std::ios::binary);
            if(!in.is_open())
            {
                std::cerr << "sorry, " << input << " open error" << std::endl;
                return false;
            }
            std::string line;
            int count = 0;
            while(std::getline(in, line))
            {
                DocInfo* doc = BuildForwardIndex(line);
                if(doc == nullptr)
                {
                    std::cerr << "build " << line << " error" << std::endl;
                    continue;
                }
                BuildInvertedIndex(*doc);
                count++;
                if(count % 50 == 0)
                {
                    //std::cout << "当前已经建立的索引文档: " << count << std::endl;
                    LOG(NORMAL, "当前已经建立的搜索文档: " + std::to_string(count));
                }
            }
            in.close();
            return true;
        }
    private:
        //编写正排索引
        DocInfo* BuildForwardIndex(const std::string& line)
        {
            // 1、解析line, 字符串切分
            //line -> 3 string title, content, url
            std::vector<std::string> results;
            std::string sep = "\3";
            ns_util::StringUtil::Split(line, &results, sep);
            if(results.size() != 3)
            {
                return nullptr;
            }

            //2、 字符串进行填充到DocInfo
            DocInfo doc;
            doc.title = results[0];
            doc.content = results[1];
            doc.url = results[2];
            //先进行保存id, 再插入, 对应的id就是当前doc在vector中的下标
            doc.doc_id = forward_index.size();

            //3、插入到正排索引的vector
            forward_index.push_back(std::move(doc)); //doc.html文件文件内容比较大, move可以提高效率
            return &forward_index.back(); //返回插入doc的地址, 因为是拷贝过去的, 不能返回&doc!!
        }

        //编写倒排索引
        bool BuildInvertedIndex(const DocInfo& doc)
        {
            //DocInfo(title, content, url, doc_id)
            //word -> 倒排拉链
            struct word_cnt
            {
                int title_cnt;
                int content_cnt;
                word_cnt()
                    :title_cnt(0)
                    ,content_cnt(0)
                {}
            };
            std::unordered_map<std::string, word_cnt> word_map; //用来暂存词频的映射表

            //对标题进行分词
            std::vector<std::string> title_words;
            ns_util::JiebaUtil::CutString(doc.title, &title_words);
            //对标题进行词频统计
            for(auto s : title_words)
            {
                boost::to_lower(s); //将我们的分词进行统一转换成小写
                word_map[s].title_cnt++;
            }

            //对文档内容进行分词
            std::vector<std::string> content_words;
            ns_util::JiebaUtil::CutString(doc.content, &content_words);
            for(auto s : content_words)
            {
                boost::to_lower(s); //将我们的分词进行统一转换成小写
                word_map[s].content_cnt++;
            }
            //接下来以小写的形式统计词频, 并进行倒排拉链
        #define X 10
        #define Y 1
            //Hello, hello, HELLO 不区分大小写, 搜索时不区分
            for(auto& word_pair : word_map)
            {
                InvertedElem item;
                item.doc_id = doc.doc_id; //对应数组下标
                item.word = word_pair.first;
                item.weight = X*word_pair.second.title_cnt + Y*word_pair.second.content_cnt; //相关性
                //typedef std::vector<InvertedElem> InvertedList;
                //一个string对应一批关键字和倒排拉链的关系, 返回的是一个vector
                //每次不同的html文档遍历时, 每一个词对应插入进去: 
                // 后面的文档有相同的词时，就直接在数组中插入文档id;没有相同的词,就当第一次出现的词插入
                InvertedList& inverted_list = inverted_index[word_pair.first]; 
                inverted_list.push_back(item);
            }
            return true;
        }
    };
    Index* Index::instance = nullptr; //静态成员变量类外初始化
    std::mutex Index::mtx;
}

//统计词频时, 标题出现的词,在正文中也会匹配到的话，会被当做content中的词多被统计一次
//实际页面又写了一遍标题(有的有有的html没有),当做content</code></pre> 
  <h3 id="searcher.hpp">searcher.hpp</h3> 
  <pre><code class="language-cpp">#pragma once
#include<iostream>
#include<string>
#include<vector>
#include<unordered_map>
#include<boost/algorithm/string.hpp>
#include<algorithm>
#include<jsoncpp/json/json.h>
#include"index.hpp"
#include"util.hpp"
#include"log.hpp"

namespace ns_searcher
{
    struct InvertedElemPrint
    {
        uint64_t doc_id;
        int weight;
        std::vector<std::string> words; //多个词对应一个文档id
        InvertedElemPrint()
            :doc_id(0)
            ,weight(0)
        {}
    };
    class Searcher
    {
    private:
        ns_index::Index* index; //供系统进行查找的索引
    public:
        Searcher()
        {}
        ~Searcher()
        {}
    public:
        void InitSearcher(const std::string& input)
        {
            // 1、获取或创建index对象
            index = ns_index::Index::GetInstance();
            //std::cout << "获取index单例成功..." << std::endl;
            LOG(NORMAL, "获取index单例成功...");
            //2、根据index对象建立索引
            index->BuildIndex(input);
            //std::cout << "建立正排和倒排索引成功..." << std::endl;
            LOG(NORMAL, "建立正排和倒排索引成功...");
        }

        //query: 搜索关键字
        //json_string: 返回给用户浏览器的搜索结果
        void Search(const std::string& query, std::string* json_string)
        {
            //1、[分词]: 对我们query进行按照searcher的要求进行分词
            std::vector<std::string> words;
            ns_util::JiebaUtil::CutString(query, &words);
            //2、[触发]: 就是根据分词的各个"词", 进行index差找
            //倒排拉链
            // typedef std::vector<InverterElem> InvertedList;
            // ns_index::InvertedList inverted_list_all; //内部是InvertedElem

            //去重优化写法
            std::vector<InvertedElemPrint> inverted_list_all;
            std::unordered_map<uint64_t, InvertedElemPrint> tokens_map; //去重
            for(std::string word : words)
            {
                boost::to_lower(word);

                ns_index::InvertedList* inverted_list = index->GetInvertedList(word);
                if(inverted_list == nullptr)
                {
                    continue;
                }
                //不完美的地方: 文档ID会有重复
                //你/是/一个/好人/ 不同的词会对应各自的文档ID，这些词的ID就会重复显示
                //所以内容也会重复，在这里要进行去重
                //inverted_list_all.insert(inverted_list_all.end(), inverted_list->begin(), inverted_list->end());

                //去重优化算法
                for(const auto& elem : *inverted_list)
                {
                    auto& item = tokens_map[elem.doc_id];
                    //item一定是doc_id相同的的print结点
                    item.doc_id = elem.doc_id; //保证id相同
                    item.weight += elem.weight; //搜索的词分词后,多个词匹配到同一个文档,将它们的权重相加
                    item.words.push_back(elem.word);
                }
            }

            //遍历把去重后的文档加入
            for(const auto& item : tokens_map)
            {
                inverted_list_all.push_back(std::move(item.second));
            }
            //3、[合并排序]: 汇总查找结果, 按照相关性(weight)降序排序
            // std::sort(inverted_list_all.begin(), inverted_list_all.end(), 
            //     [](const ns_index::InvertedElem& e1, const ns_index::InvertedElem& e2)
            //     {return e1.weight > e2.weight;});

            //结合去重优化的更新
            std::sort(inverted_list_all.begin(), inverted_list_all.end(), [](const InvertedElemPrint& e1, const InvertedElemPrint& e2){
                return e1.weight > e2.weight;
            });

            //4、[构建]: 根据查找出来的相关结果, 构建json串 -- jsoncpp
            Json::Value root;
            for(auto& item : inverted_list_all)
            {
                ns_index::DocInfo* doc = index->GetForwardIndex(item.doc_id); //对应文档基本信息找到
                if(doc == nullptr)
                {
                    continue;
                }
                Json::Value elem;
                elem["title"] = doc->title;
                //item.word是搜索关键字
                //content是去标签的一部分，但是不是我们想要的(摘要)
                //elem["desc"] = GetDesc(doc->content, item.word); 

                //结合去重优化的更新
                elem["desc"] = GetDesc(doc->content, item.words[0]); //[0]肯定存在  根据分的第一个词获取摘要
                elem["url"] = doc->url;
                //for debug, for delete --- 用户不需要这两行
                elem["id"] = (int)item.doc_id;
                elem["weight"] = item.weight;

                root.append(elem);
            }
            Json::StyledWriter writer;
            //Json::FastWriter writer;
            *json_string = writer.write(root);
        }
        std::string GetDesc(const std::string& html_content, const std::string word)
        {
            //找到word在html_content中的首次出现, 然后往前找50字节(如果没有, 从begin开始)
            //然后往后找100字节(如果没有, 到end就可以), 截出这部分
            const int prev_step = 50;
            const int next_step = 100;
            //1、找到首次出现
            auto iter = std::search(html_content.begin(), html_content.end(), word.begin(), word.end(), [](int x, int y){
                return std::tolower(x) == std::tolower(y); //tolower/toupper参数是int
            });
            if(iter == html_content.end())
            {
                return"None1";
            }
            //拿到first(开头)到iter迭代器间的个数 --->下标位置
            int pos = std::distance(html_content.begin(), iter); //获取下标位置

            //word是处理过的小写词，再到原网页中找，可能有匹配不到大写的！！
            //这里是不能使用find，这是有坑的！！
            // size_t pos = html_content.find(word); //find是精准匹配
            // if(pos == std::string::npos)
            // {
            //     return "None1";
            // }

            //2、获取start, end
            int start = 0;
            int end = html_content.size()-1;
            //如果之前有50+字符, 就更新开始位置
            if(pos > start + prev_step) //这里有一个大坑, size_t是一个无符号整数,所以换成int更好解决繁琐问题
            {
                start = pos - prev_step;
            }
            if(pos < end - next_step)
            {
                end = pos + next_step;
            }

            // if(pos-prev_step > start) //这里有一个大坑，size_t是一个无符号整数
            // {
            //     start = pos-prev_step;
            // }
            // if(pos+next_step < end)
            // {
            //     end = pos+next_step;
            // }


            //3、截取子串
            if(start >= end)
            {
                return "None2";
            }
            //摘要部分加最后
            std::string desc = html_content.substr(start, end-start);
            desc += "...";
            return desc;
        }
    };
}</code></pre> 
  <h3 id="log.hpp">log.hpp</h3> 
  <pre><code class="language-cpp">#pragma once
#include<iostream>
#include<string>
#include<time.h>

#define NORMAL 1
#define WARNING 2
#define DEBUG 3
#define FATAL 4

//#... 宏名称转字符串
//在调用的地方就会进行宏替换(#写在后面)
#define LOG(LEVEL, MESSAGE) log(#LEVEL, MESSAGE, __FILE__, __LINE__)

//级别 文件内容信息 哪个文件 哪一行
void log(std::string level, std::string message, std::string file, int line)
{
    std::cout << "[" << level << "]" << "[" << time(nullptr) << "]"
        << "[" << message << "]" << "[" << file << " : " << line << "]" << std::endl;
}</code></pre> 
  <h3 id="util.hpp">util.hpp</h3> 
  <pre><code class="language-cpp">#pragma once
#include<fstream>
#include<vector>
#include<string>
#include<iostream>
#include<boost/algorithm/string.hpp>
#include"cppjieba/include/cppjieba/Jieba.hpp"
#include<mutex>
#include<unordered_map>
#include"log.hpp"

namespace ns_util
{
    class FileUtil
    {
    public:
        static bool ReadFile(const std::string& file_path, std::string* out)
        {
            std::ifstream in(file_path, std::ios::in);
            if(!in.is_open())
            {
                std::cerr << "open file " << file_path << " error" << std::endl;
                return false;
            }

            std::string line;
            while(std::getline(in, line))
            {
                *out += line;
            }
            in.close();
            return true;
        }
    };
    class StringUtil
    {
    public:
        static void Split(const std::string& target, std::vector<std::string>* out, const std::string& sep)
        {
            // boost::split 不建议使用strtok
            // aaa\3bbb\3\3\3ccc
            boost::split(*out, target, boost::is_any_of(sep), boost::token_compress_on);
        }
    };

    //按照demo.cpp写
    //词库路径
    const char* const DICT_PATH = "./dict/jieba.dict.utf8";
    const char* const HMM_PATH = "./dict/hmm_model.utf8";
    const char* const USER_DICT_PATH = "./dict/user.dict.utf8";
    const char* const IDF_PATH = "./dict/idf.utf8";
    const char* const STOP_WORD_PATH = "./dict/stop_words.utf8";

    // load这个文件本身就是需要load一次的行为, 所以最好设计成单例模式
    // class JiebaUtil
    // {
    // private:
    //     cppjieba::Jieba jieba;
    //     std::unordered_map<std::string, bool> stop_words;
    //     static JiebaUtil* instance;
    //     static std::mutex mtx;
        
    // private:
    //     JiebaUtil() //构造函数初始化
    //         :jieba(DICT_PATH,HMM_PATH,USER_DICT_PATH,IDF_PATH,STOP_WORD_PATH)
    //     {}
    //     JiebaUtil(const JiebaUtil&) = delete;
    //     JiebaUtil operator=(const JiebaUtil&) = delete;

    // public:
    //     static JiebaUtil* get_instance()
    //     {
    //         if(instance == nullptr)
    //         {
    //             mtx.lock();
    //             if(instance == nullptr)
    //             {
    //                 instance = new JiebaUtil(); //堆上开辟对象
    //                 instance->InitJiebaUtil();
    //             }
    //             mtx.unlock();
    //         }
    //         return instance;
    //     }
    //     void InitJiebaUtil() //这个instance(静态指针)指向的对象不是静态的,所以这里可以调用
    //     {
    //         std::ifstream in(STOP_WORD_PATH);
    //         if(!in.is_open())
    //         {
    //             //std::cout << "load stop words file error" << std::endl;
    //             LOG(FATAL, "load stop words file error");
    //             return;
    //         }

    //         std::string line;
    //         while(std::getline(in, line))
    //         {
    //             stop_words.insert({line, true}); //添加记录暂停词
    //         }
    //         in.close();
    //     }
    //     void CutStringHelper(const std::string& src, std::vector<std::string>* out)
    //     {
    //         jieba.CutForSearch(src, *out);
    //         for(auto iter = out->begin(); iter != out->end(); ) //不适合用下标遍历,不能一味的iter++,要考虑迭代器失效问题
    //         {
    //             auto it = stop_words.find(*iter);
    //             if(it != stop_words.end())
    //             {
    //                 //说明当前的string是暂停词, 需要去掉
    //                 //注意迭代器失效的问题
    //                 iter = out->erase(iter);
    //             }
    //             else
    //             {
    //                 iter++;
    //             }
    //         }
    //     }
    //     static void CutString(const std::string& src, std::vector<std::string>* out)
    //     {
    //         get_instance()->CutStringHelper(src, out);//静态成员函数调用静态..
    //     }
    // };
    // JiebaUtil* JiebaUtil::instance = nullptr;
    // std::mutex JiebaUtil::mtx;


    //静态对象初始化
    //cppjieba::Jieba JiebaUtil::jieba(DICT_PATH,HMM_PATH,USER_DICT_PATH,IDF_PATH,STOP_WORD_PATH);

    //
    //不去掉暂停词的版本
    //load这个文件本身就是需要load一次的行为，所以最好设计成单例模式
    class JiebaUtil
    {
    private:
        static cppjieba::Jieba jieba;
    public:
        static void CutString(const std::string& src, std::vector<std::string>* out)
        {
            jieba.CutForSearch(src, *out);
        }
    };
    //静态对象初始化
    cppjieba::Jieba JiebaUtil::jieba(DICT_PATH,HMM_PATH,USER_DICT_PATH,IDF_PATH,STOP_WORD_PATH);
}</code></pre> 
  <h3 id="Makefile">Makefile</h3> 
  <pre><code class="language-bash">.PHONY:all
all:parser debug http_server

parser:parser.cc
	g++ -o $@ $^ -std=c++11 -lboost_system -lboost_filesystem
debug:debug.cc
	g++ -o $@ $^ -std=c++11 -ljsoncpp 
http_server:http_server.cc
	g++ -o $@ $^ -std=c++11 -lpthread -ljsoncpp

.PHONY:clean
clean:
	rm -rf parser debug http_server
</code></pre> 
  <h3 id="debug.cc">debug.cc</h3> 
  <pre><code class="language-cpp">#include<iostream>
#include<string>
#include<string.h>
#include"searcher.hpp"


const std::string input = "data/raw_html/raw.txt"; //处理完的文本的路径
int main()
{
    // for test
    ns_searcher::Searcher* search = new ns_searcher::Searcher();
    search->InitSearcher(input);

    char buffer[1024];
    std::string query;
    std::string json_string;
    while(true)
    {
        std::cout << "Plase Enter You Search Query# ";
        //std::cin >> query; //有bug, 读到空格或者换行符就不连续了
        fgets(buffer, sizeof(buffer)-1, stdin); //-1是为了预留\0位置
        buffer[strlen(buffer)-1] = 0; //strlen计算到\0就停止,取消换行符
        query = buffer;
        search->Search(query, &json_string);
        std::cout << json_string << std::endl;
    }

    return 0;
}</code></pre> 
  <h3 id="http_server.cc">http_server.cc</h3> 
  <pre><code class="language-cpp">#include"searcher.hpp"
#include"cpp-httplib-v0.7.15/httplib.h"
#include<iostream>
#include<string>
#include"log.hpp"


const std::string root_path = "./wwwroot";
const std::string input = "data/raw_html/raw.txt";

int main()
{
    ns_searcher::Searcher search;
    search.InitSearcher(input);

    httplib::Server svr;
    svr.set_base_dir(root_path.c_str()); //指明外部根目录在这里

    // "/s?"后面跟参数
    svr.Get("/s", [&search](const httplib::Request& req, httplib::Response& rep)
    {
        //rep.set_content("hello world", "text/plain; charset=utf-8");
        //根据浏览器get传参特性
        if(!req.has_param("word")) // "word="后面跟关键字, has_param判断是否有该关键字
        {
            rep.set_content("必须要有关键字", "text/plain; charset=utf-8");
            return;
        }
        std::string word = req.get_param_value("word"); //提取请求参数
        //std::cout << "用户在搜索: " << word << std::endl;
        LOG(NORMAL, "用户搜索的: " + word);
        std::string json_string;
        search.Search(word, &json_string);
        rep.set_content(json_string, "application/json");
    });
    LOG(NORMAL, "服务器启动成功...");
    svr.listen("0.0.0.0", 7781); //"0.0.0.0" 接收任意ip
    return 0;
}</code></pre> 
  <h3 id="wwwroot%2Findex.html">wwwroot/index.html</h3> 
  <pre><code class="language-html"><!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script src="http://code.jquery.com/jquery-2.1.1.min.js"></script>
    
    <title>boost 搜索引擎

项目演示

项目链接：http://119.23.208.209:7781/

界面

搜索测试

点击跳转

看到这里，给博主点个赞吧~

华为OD机试2025B卷 - 返回矩阵中非1的元素、个数/数值同化（Java & Python& JS & C++ & C ）算法大师最新华为OD机试真题华为OD机试真题 (Java/JS/Py/C)java 华为od 矩阵 javascript c++python
最新华为OD机试真题目录：点击查看目录华为OD面试真题精选：点击立即查看题目描述存在一个m*n的二维数组，其成员取值范围为0，1，2。其中值为1的元素具备同化特性，每经过1S，将上下左右值为0的元素同化为1。而值为2的元素，免疫同化。将数组所有成员随机初始化为0或2，再将矩阵的[0,0]元素修改成1，在经过足够长的时间后求矩阵中有多少个元素是0或2（即0和2数量之和）。输入描述输入的前两个数字是矩
华为OD机试2025A卷 - 返回矩阵中非1的元素个数/数值同化（Java & Python& JS & C++ & C ）算法大师最新华为OD机试真题华为OD机试真题 (Java/JS/Py/C)java 华为od 矩阵 javascript c++python 华为OD2025A卷
最新华为OD机试真题目录：点击查看目录华为OD面试真题精选：点击立即查看题目描述存在一个m*n的二维数组，其成员取值范围为0，1，2。其中值为1的元素具备同化特性，每经过1S，将上下左右值为0的元素同化为1。而值为2的元素，免疫同化。将数组所有成员随机初始化为0或2，再将矩阵的[0,0]元素修改成1，在经过足够长的时间后求矩阵中有多少个元素是0或2（即0和2数量之和）。输入描述输入的前两个数字是矩
华为od 机试 2025 B卷 - 数值同化 (C++ & Python & JAVA & JS & GO) 无限码力华为OD机试真题刷题笔记华为od 华为OD2025B卷华为OD机试2025B卷华为OD机考2025B卷
数值同化华为OD机试真题目录:点击去查看华为OD2025B卷100分题型题目描述存在一个m*n的二维数组，其成员取值范围为0，1，2。其中值为1的元素具备同化特性，每经过1S，将上下左右值为0的元素同化为1，而值为2的元素，免疫同化。将数组所有成员随机初始化为0或2，再将矩阵的[0,0]元素修改为1，在经过足够长的时间后，求矩阵中有多少个元素是0或2（即0和2数量之和）。输入描述输入的前两个数字是
cocos2dx3.x项目升级到xcode15以上的iconv与duplicate symbols报错问题 itme268 iconv报错
cocos2dx3.x项目升级xcode15以上后会有几处报错。1.CCFontAtlas.cpp文件下的iconv与iconv_close的报错。修改如下：//iconv_close(_iconv);iconv_close((iconv_t)_iconv);iconv((iconv_t)_iconv,(char**)&pin,&inLen,&pout,&outLen);//iconv(_icon
使用Python调用C++：简单易学的方法程序员杨弋 Python全栈工程师学习指南 python c++开发语言
Python是一种易于学习和理解的编程语言，而C++是强大的编程语言。Python代码可以在很短的时间内编写出来，但如果涉及到大量的计算或需要高性能，则需要使用更快、更高效的编程语言。在这种情况下，Python调用C++是一种常见的方法，因为它可以提供C++的高速性能和Python的便捷性。在本文中，我们将介绍如何使用Python调用C++。首先，需要创建C++函数库（DLL），并确保该库包含需要
C++调用python的方法
一、C++中调用python接口在线手册：https://docs.python.org/3/c-api/intro.htmlWindows环境下python安装时提供了给C++调用的头文件及库文件。C++中引用头文件include，放在所有标准引用之前。将头文件目录、库文件目录添加到工程属性。调用python提供的API，传入模块名、函数名、函数参数（封装成PyObject的形式）获取返回值并解
项目篇：加入Python程序之如何在Python中使用C++？ guangcheng0312q python c++windows 开发语言
项目篇：加速Python程序之如何在Python中使用C++？通常像一些耗时的操作，我们期望在C++中去实现，然后使用Python去调用对应的接口，或者因为底层库的原因，需要支持对外的PythonAPI，那么我们通常需要支持在Python中访问C++，如何实现呢？方法比较多，本节以pybind11为例，引入一个完整的项目工程模版，如果你后续有这种需求，可以基于模版去修改。注：(懒人版)本节的所有代
Python与c++互相调用（pybind11）欢迎下辈子光临 CPP Python python c++开发语言
1.安装pybind11看网上使用pipinstallpybind11,没有弄明白，因此下载源码编译。1.1下载pybind11gitclonehttps://github.com/pybind/pybind11.git1.2源码编译cd/pybind11mkdirbuildcdbuildcmake..make编译完成2.cpp样例//example.cpp#include#include"Abs
ShardingSphere技术解析
我是廖志伟，一名Java开发工程师、《Java项目实战——深入理解大型互联网企业通用技术》（基础篇）、（进阶篇）、（架构篇）清华大学出版社签约作家、Java领域优质创作者、CSDN博客专家、阿里云专家博主、51CTO专家博主、产品软文专业写手、技术文章评审老师、技术类问卷调查设计师、幕后大佬社区创始人、开源项目贡献者。拥有多年一线研发和团队管理经验，研究过主流框架的底层源码(Spring、Spri
并发编程与MyBatis核心解析
我是廖志伟，一名Java开发工程师、《Java项目实战——深入理解大型互联网企业通用技术》（基础篇）、（进阶篇）、（架构篇）清华大学出版社签约作家、Java领域优质创作者、CSDN博客专家、阿里云专家博主、51CTO专家博主、产品软文专业写手、技术文章评审老师、技术类问卷调查设计师、幕后大佬社区创始人、开源项目贡献者。拥有多年一线研发和团队管理经验，研究过主流框架的底层源码(Spring、Spri
python和C++相互调用使用妄想出头的工业炼药师 c++开发语言
结论：首选PyBind11：综合性能、易用性最佳（GitHub⭐48k+）优先考虑Cython：涉及大量科学计算或已有Cython代码避免Boost.Python（历史包袱重）和SWIG（配置复杂），除非维护旧项目。python调用C++接口C++调用python接口在C++中使用Python库，特别是使用pybind11，是一个非常强大的方法，可以让你在C++项目中轻松地利用Python的强大功
华为OD机试 2025B卷 - 士兵过河 (C++ & Python & JAVA & JS & GO) 无限码力华为OD机试真题刷题笔记华为od 华为OD机考2025A卷华为OD2025B卷华为OD机试2025B卷华为OD机考2025B卷
士兵过河2025B卷目录点击查看：华为OD机试2025B卷真题题库目录｜机考题库+算法考点详解华为OD2025B卷200分题型题目描述一支N个士兵的军队正在趁夜色逃亡，途中遇到一条湍急的大河。敌军在T的时长后到达河面，没到过对岸的士兵都会被消灭。现在军队只找到了1只小船，这船最多能同时坐上2个士兵。当1个士兵划船过河，用时为a[i]；0<=i
Spring MVC 架构详解 Java廖志伟 Java场景面试宝典 Spring MVC Web Application Development MVC Architecture
我是廖志伟，一名Java开发工程师、《Java项目实战——深入理解大型互联网企业通用技术》（基础篇）、（进阶篇）、（架构篇）清华大学出版社签约作家、Java领域优质创作者、CSDN博客专家、阿里云专家博主、51CTO专家博主、产品软文专业写手、技术文章评审老师、技术类问卷调查设计师、幕后大佬社区创始人、开源项目贡献者。拥有多年一线研发和团队管理经验，研究过主流框架的底层源码(Spring、Spri
Coze 实战：如何用自动提示词优化功能提升 AI 应用开发效率？ charles666666 产品经理人工智能自然语言处理
在与多家企业合作开发AI应用项目中，我深感团队提示词质量不稳定的困扰。某次为电商客户打造智能客服项目，初期开发团队撰写的提示词繁杂冗长，AI生成的回答时而偏题、时而重复。由于成员对业务理解不一，提示词质量参差不齐，导致产品交付延迟。这个痛点在中小型企业技术团队中尤为突出。模块1：功能定位解析传统提示工程依赖人工反复调试，如开发团队需手动调整提示词结构。而Coze的自动优化功能则不同。Coze能基于
java组件化设计_构建之路—谈谈组件化后端构建和实现
前言这一篇文章，准备了很久，构思了很久，草稿了很久。从个人编程至今，历经了C，C++，Java，到现如今的NodeJS。也后端到前端，再回到后端。更从学校里的学生信息管理系统到大型商业系统构建，是的，我曾一直以为编程也就是如此了，由瀑布模型，敏捷开发，设计模式等等组成的软件工程大致就是如此了。相信可能很多人也会有和我类似的想法，是否也都曾迷茫过？幸运的是，伴随着对前端的接触和深入，云雾散开。前端组
【机器学习-08】参数调优宝典：网格搜索与贝叶斯搜索等攻略云天徽上机器学习机器学习人工智能
博主简介：曾任某智慧城市类企业算法总监，目前在美国市场的物流公司从事高级算法工程师一职，深耕人工智能领域，精通python数据挖掘、可视化、机器学习等，发表过AI相关的专利并多次在AI类比赛中获奖。CSDN人工智能领域的优质创作者，提供AI相关的技术咨询、项目开发和个性化解决方案等服务，如有需要请站内私信或者联系任意文章底部的的VX名片（ID：xf982831907）博主粉丝群介绍：①群内初中生、
打造自己的组件库（一）宏函数解析行云＆流水 Vue3组件库 vue3组件库 vue.js javascript 前端
1.初始化项目npmcreatevite生成项目后，文件目录如下：├──.idea/#IntelliJIDEA配置目录├──.vscode/#VSCode配置目录├──public/#静态资源目录│└──vite.svg#Vite默认图标├──src/#源代码目录│├──assets/#项目资源文件││└──vue.svg#Vue图标│├──components/#Vue组件目录││└──Hell
C++ 内存泄漏排查全攻略：万字实战宝典 TravisBytes 编程问题档案 c++开发语言 linux ubuntu
写在前面本文定位为“从入门到精通”的深度教程，全文超过12,000字，结合作者多年在Qt框架、游戏引擎、服务器端及高并发协程框架中的一线经验，系统梳理C++内存泄漏的原理、检测、定位与修复方案。示例代码均可在GCC/Clang/MSVC（C++20标准）下编译通过，并特别对Windows、Linux、macOS三大平台的差异化工具与坑点进行说明。欢迎评论区互动交流～目录1.序章：为什么你迟早会遇到
【2025B卷专题】华为OD机试2025B卷统一考试题库清单，时间紧张就刷这个（Python/JS/C/C++）哪吒搬砖工逆袭Java架构师华为od python javascript 华为OD机试 2025B卷
专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。2025年5月12日，华为官方已经将华为OD机试（A卷）切换为B卷。目前正在考的是B卷，按照华为OD往常的操作，B卷题目是由往
后端领域的自然语言处理技术应用大厂资深架构师 Spring Boot 开发实战自然语言处理 easyui 人工智能 ai
后端领域的自然语言处理技术应用关键词：后端领域、自然语言处理、技术应用、算法原理、实际案例摘要：本文聚焦于后端领域中自然语言处理技术的应用。首先介绍了相关背景，包括目的范围、预期读者等。接着阐述核心概念与联系，通过文本示意图和Mermaid流程图展示其原理和架构。详细讲解了核心算法原理并给出Python源代码示例，同时介绍了数学模型和公式。通过项目实战，展示代码实际案例并进行详细解释。分析了自然语
Excalidraw：开源手绘风格白板工具的技术与生态解析 wylee 开源
一、项目定位与核心价值Excalidraw是一款基于浏览器的开源虚拟手绘风格白板工具，由Excalidraw团队开发并维护。项目以MIT协议开源，旨在提供轻量级、高定制性的在线绘图解决方案，适用于流程图设计、原型绘制、教学演示等场景。截至2025年3月，项目已发布v0.18.0版本，月下载量超24.5万次，被GoogleCloud、Meta等企业集成，成为开源协作工具领域的标杆项目。二、核心功能与
linux环境下tomcat安装 M.za linux tomcat 运维服务器
Tomcat一、什么是Tomcat？1.1、Tomcat介绍Tomcat又叫ApacheTomcat最早是sun公司开发的，1999年捐献给apache基金会，隶属于雅加达项目，现在已经独立成一个顶级项目，因为tomcat技术先进，性能稳定，又是一个开源的web应用服务器，所以很多企业都在使用，很多Java开发者也在使用，开发调试jsp的首选，被更多企业用于Java容器。Tomcat官网：http
基于单片机的住宅防火防盗报警系统设计启初科技 51单片机毕业设计单片机毕业设计单片机嵌入式硬件
文章目录一、系统概述二、项目内容和功能介绍三、效果图四、资料获取一、系统概述基于单片机的住宅防火防盗报警系统设计介绍一、系统设计背景与意义随着城市化进程的加快和居民生活水平的提高，住宅安全已成为人们关注的焦点。火灾和盗窃是威胁住宅安全的两大主要因素，传统的人工巡查和简单的安防设备已难以满足现代住宅的安全需求。基于单片机的住宅防火防盗报警系统集成了传感器技术、单片机控制技术和无线通信技术，能够实时监
【面面俱到/c++】多态的实现（重载、模板、虚函数表、虚基表） ChongYu重玉面面俱到/c++面试 c++开发语言笔记经验分享面试
目录一分钟速面静态多态（编译时多态）函数重载运算符重载模板动态多态（运行时多态）虚函数虚函数表vtable、虚函数表指针vptr虚基表指针vbptr一分钟速面c++的多态有静态多态（编译时多态）和动态多态（运行时多态）。静态多态主要依靠函数重载、运算符重载和函数模板实现，在编译期间生成不同的函数与类型，由编译器根据函数签名或模板实例化选择正确函数与类型。多态多态主要依靠继承、虚函数与虚函数重写实现
使用 p6spy，拦截到持久层执行的sql及参数 Peter-OK 一些问题 p6spy sql
声明：文章内容是自己使用后整理，大部分工具代码出自大牛，但因无法确认出处，故仅在此处由衷的对无私分享源代码的作者表示感谢与致敬！本人在拦截到sql的基础上加了分析功能和异常告警功能1、导入p6spy的jar包，如果是maven项目引入pomp6spyp6spy3.9.12、修改datasource数据源的driverClassName驱动和url地址为com.p6spy.engine.spy.P6
spring-data-jpa+spring+hibernate+druid配置
参考链接：http://doc.okbase.net/liuyitian/archive/109276.htmlhttp://my.oschina.net/u/1859292/blog/312188最新公司的web项目需要用到spring-data-jpa作为JPA的实现框架，同时使用阿里巴巴的开源数据库连接池druid。关于这两种框架的介绍我在这里就不多赘述。直接进入配置页面：spring的配置
AI时代产品经理高薪密码！0经验转岗，月入27K的秘诀都在这！
“211计算机本硕，有2段学校项目经验，校招面了大厂AI产品经理岗，群面和专业面的时候挂了，怎么快速突击，提升AI产品专业能力呢？”“7年UI，被裁跳槽准备找产品工作了，上一家基本是半设计半产品，怎么包装过往经验，实现转岗？”“3年开发，每天写代码有点厌倦，想转产品经理，从0-1设计一款产品更有成就感，怎么快速上手产品工作？”这是上半年来咨询的几类同学的烦恼，近期求职市场些微回暖，产品经理岗位需求
C++系列（十）：面向对象编程终极指南！从封装到多态，彻底掌握类与对象的核心奥秘傅里叶的耶 C++语言系列（教程 +实战）c++类和对象
引言面向对象编程（OOP）是现代软件开发的核心范式，C++通过封装、继承和多态三大特性提供了强大的面向对象能力。这些特性使代码更易维护、扩展和复用，是构建复杂系统的基石。本章将深入探讨C++类和对象的方方面面，从基础封装到高级多态应用，帮助您掌握面向对象编程的精髓。最后，如果大家喜欢我的创作风格，请大家多多关注up主，你们的支持就是我创作最大的动力！如果各位观众老爷觉得我哪些地方需要改进，请一定在
C++ 工厂模式与抽象工厂：创建对象的灵活设计海派程序猿 c++java jvm
C++工厂模式与抽象工厂：让对象“流水线”更优雅想象一下，你是一家玩具工厂的老板，主要生产两种玩具：小汽车和积木。最初，你的生产流程很简单，需要什么就直接用new创建什么：//生产小汽车Car*myCar=newCar();//生产积木Block*myBlock=newBlock();简单粗暴，效率很高，就像直接从仓库里抓取零件组装一样。但问题也随之而来：耦合度高：生产代码直接依赖于具体的Car和
【视频观看系统】- 技术与架构选型
✅项目技术选型方案一、整体架构风格项目层级技术选型说明架构风格微服务架构（SpringCloud）独立部署、易扩展、易维护服务通信HTTP（RestTemplate或Feign）+RocketMQ同步调用+异步事件注册中心Nacos服务注册、发现、配置中心配置中心Nacos配置管理多服务统一配置API网关SpringCloudGateway路由转发、权限验证、限流服务监控SpringBootAdm
java杨辉三角 3213213333332132 java基础
package com.algorithm; /** * @Description 杨辉三角 * @author FuJianyong * 2015-1-22上午10:10:59 */ public class YangHui { public static void main(String[] args) { //初始化二维数组长度 int[][] y
《大话重构》之大布局的辛酸历史白糖_ 重构
《大话重构》中提到“大布局你伤不起”，如果企图重构一个陈旧的大型系统是有非常大的风险，重构不是想象中那么简单。我目前所在公司正好对产品做了一次“大布局重构”，下面我就分享这个“大布局”项目经验给大家。背景公司专注于企业级管理产品软件，企业有大中小之分，在2000年初公司用JSP/Servlet开发了一套针对中
电驴链接在线视频播放源码 dubinwei 源码电驴播放器视频 ed2k
本项目是个搜索电驴（ed2k）链接的应用,借助于磁力视频播放器（官网： http://loveandroid.duapp.com/ 开放平台），可以实现在线播放视频，也可以用迅雷或者其他下载工具下载。项目源码： http://git.oschina.net/svo/Emule,动态更新。也可从附件中下载。项目源码依赖于两个库项目，库项目一链接： http://git.oschina.
Javascript中函数的toString()方法周凡杨 JavaScript js toString function object
简述 The toString() method returns a string representing the source code of the function. 简译之，Javascript的toString()方法返回一个代表函数源代码的字符串。句法 function.
struts处理自定义异常 g21121 struts
很多时候我们会用到自定义异常来表示特定的错误情况，自定义异常比较简单，只要分清是运行时异常还是非运行时异常即可，运行时异常不需要捕获，继承自RuntimeException，是由容器自己抛出，例如空指针异常。非运行时异常继承自Exception，在抛出后需要捕获，例如文件未找到异常。此处我们用的是非运行时异常，首先定义一个异常LoginException: /** * 类描述：登录相
Linux中find常见用法示例 510888780 linux
Linux中find常见用法示例 ·find path -option [ -print ] [ -exec -ok command ] {} \; find命令的参数；
SpringMVC的各种参数绑定方式 Harry642 springMVC 绑定表单
1. 基本数据类型(以int为例，其他类似)： Controller代码： @RequestMapping("saysth.do") public void test(int count) { } 表单代码： <form action="saysth.do" method="post&q
Java 获取Oracle ROWID aijuans java oracle
A ROWID is an identification tag unique for each row of an Oracle Database table. The ROWID can be thought of as a virtual column, containing the ID for each row. The oracle.sql.ROWID class i
java获取方法的参数名 antlove java jdk parameter method reflect
reflect.ClassInformationUtil.java package reflect; import javassist.ClassPool; import javassist.CtClass; import javassist.CtMethod; import javassist.Modifier; import javassist.bytecode.CodeAtt
JAVA正则表达式匹配查找替换提取操作百合不是茶 java 正则表达式替换提取查找
正则表达式的查找;主要是用到String类中的split(); String str; str.split();方法中传入按照什么规则截取,返回一个String数组常见的截取规则: str.split("\\.")按照.来截取 str.
Java中equals()与hashCode()方法详解 bijian1013 java set equals()hashCode()
一.equals()方法详解 equals()方法在object类中定义如下： public boolean equals(Object obj) { return (this == obj); } 很明显是对两个对象的地址值进行的比较（即比较引用是否相同）。但是我们知道，String 、Math、I
精通Oracle10编程SQL(4)使用SQL语句 bijian1013 oracle 数据库 plsql
--工资级别表 create table SALGRADE ( GRADE NUMBER(10), LOSAL NUMBER(10,2), HISAL NUMBER(10,2) ) insert into SALGRADE values(1,0,100); insert into SALGRADE values(2,100,200); inser
【Nginx二】Nginx作为静态文件HTTP服务器 bit1129 HTTP服务器
Nginx作为静态文件HTTP服务器在本地系统中创建/data/www目录，存放html文件(包括index.html) 创建/data/images目录，存放imags图片在主配置文件中添加http指令 http { server { listen 80; server_name
kafka获得最新partition offset blackproof kafka partition offset 最新
kafka获得partition下标，需要用到kafka的simpleconsumer import java.util.ArrayList; import java.util.Collections; import java.util.Date; import java.util.HashMap; import java.util.List; import java.
centos 7安装docker两种方式 ronin47
第一种是采用yum 方式 yum install -y docker
java-60-在O(1)时间删除链表结点 bylijinnan java
public class DeleteNode_O1_Time { /** * Q 60 在O(1)时间删除链表结点 * 给定链表的头指针和一个结点指针(!!)，在O(1)时间删除该结点 * * Assume the list is: * head->...->nodeToDelete->mNode->nNode->..
nginx利用proxy_cache来缓存文件 cfyme cache
user zhangy users; worker_processes 10; error_log /var/vlogs/nginx_error.log crit; pid /var/vlogs/nginx.pid; #Specifies the value for ma
[JWFD开源工作流]JWFD嵌入式语法分析器负号的使用问题 comsci 嵌入式
假如我们需要用JWFD的语法分析模块定义一个带负号的方程式，直接在方程式之前添加负号是不正确的，而必须这样做： string str01 = "a=3.14;b=2.71;c=0;c-((a*a)+(b*b))" 定义一个0整数c,然后用这个整数c去
如何集成支付宝官方文档 dai_lm android
官方文档下载地址 https://b.alipay.com/order/productDetail.htm?productId=2012120700377310&tabId=4#ps-tabinfo-hash 集成的必要条件 1. 需要有自己的Server接收支付宝的消息 2. 需要先制作app，然后提交支付宝审核，通过后才能集成调试的时候估计会真的扣款，请注意
应该在什么时候使用Hadoop datamachine hadoop
原帖地址：http://blog.chinaunix.net/uid-301743-id-3925358.html 存档，某些观点与我不谋而合，过度技术化不可取，且hadoop并非万能。 --------------------------------------------万能的分割线-------------------------------- 有人问我，“你在大数据和Hado
在GridView中对于有外键的字段使用关联模型进行搜索和排序 dcj3sjt126com yii
在GridView中使用关联模型进行搜索和排序首先我们有两个模型它们直接有关联: class Author extends CActiveRecord { ... } class Post extends CActiveRecord { ... function relations() { return array( '
使用NSString 的格式化大全 dcj3sjt126com Objective-C
格式定义The format specifiers supported by the NSString formatting methods and CFString formatting functions follow the IEEE printf specification; the specifiers are summarized in Table 1. Note that you c
使用activeX插件对象object滚动有重影蕃薯耀 activeX插件滚动有重影
使用activeX插件对象object滚动有重影 <object style="width:0;" id="abc" classid="CLSID:D3E3970F-2927-9680-BBB4-5D0889909DF6" codebase="activex/OAX339.CAB#
SpringMVC4零配置 hanqunfeng springmvc4
基于Servlet3.0规范和SpringMVC4注解式配置方式，实现零xml配置，弄了个小demo，供交流讨论。项目说明如下： 1.db.sql是项目中用到的表，数据库使用的是oracle11g 2.该项目使用mvn进行管理，私服为自搭建nexus,项目只用到一个第三方 jar，就是oracle的驱动； 3.默认项目为零配置启动，如果需要更改启动方式，请
《开源框架那点事儿16》：缓存相关代码的演变 j2eetop 开源框架
问题引入上次我参与某个大型项目的优化工作，由于系统要求有比较高的TPS，因此就免不了要使用缓冲。该项目中用的缓冲比较多，有MemCache，有Redis，有的还需要提供二级缓冲，也就是说应用服务器这层也可以设置一些缓冲。当然去看相关实现代代码的时候，大致是下面的样子。 [java] view plain copy print ? public vo
AngularJS浅析 kvhur JavaScript
概念 AngularJS is a structural framework for dynamic web apps. 了解更多详情请见原文链接：http://www.gbtags.com/gb/share/5726.htm Directive 扩展html，给html添加声明语句，以便实现自己的需求。对于页面中html元素以ng为前缀的属性名称，ng是angular的命名空间
架构师之jdk的bug排查(一)---------------split的点号陷阱 nannan408 split
1.前言. jdk1.6的lang包的split方法是有bug的,它不能有效识别A.b.c这种类型,导致截取长度始终是0.而对于其他字符,则无此问题.不知道官方有没有修复这个bug. 2.代码 String[] paths = "object.object2.prop11".split("'"); System.ou
如何对10亿数据量级的mongoDB作高效的全表扫描 quentinXXZ mongodb
本文链接: http://quentinXXZ.iteye.com/blog/2149440 一、正常情况下，不应该有这种需求首先，大家应该有个概念，标题中的这个问题，在大多情况下是一个伪命题，不应该被提出来。要知道，对于一般较大数据量的数据库，全表查询，这种操作一般情况下是不应该出现的，在做正常查询的时候，如果是范围查询，你至少应该要加上limit。说一下，
C语言算法之水仙花数 qiufeihu c 算法
/** * 水仙花数 */ #include <stdio.h> #define N 10 int main() { int x,y,z; for(x=1;x<=N;x++) for(y=0;y<=N;y++) for(z=0;z<=N;z++) if(x*100+y*10+z == x*x*x
JSP指令 wyzuomumu jsp
jsp指令的一般语法格式： <%@ 指令名属性 =”值 ” %> 常用的三种指令： page,include,taglib page指令语法形式： <%@ page 属性 1=”值 1” 属性 2=”值 2”%> include指令语法形式： <%@include file=”relative url”%> (jsp可以通过 include

BoostSearch搜索引擎

项目介绍

开发环境

项目流程

搜索引擎的相关宏观原理

项目代码

parser.cc

项目演示

界面

搜索测试

你可能感兴趣的:(C++项目,搜索引擎)