932我

Boost搜索引擎

1.项目相关背景

百度，搜狐，360等搜索引擎；

boost的官网是没有站内搜索的。

2.搜索引擎的相关宏观原理

爬虫程序我就不做了，受国家的法律法规的限制，我就通过正规的下载途径来做。

3.搜索引擎技术栈和项目环境

技术栈 : C/C++ C++11, STL, 准标准库 Boost ， Jsoncpp ， cppjieba ， cpp - httplib , html5 ， css ， js 、jQuery、 Ajax

项目环境： Centos 7 云服务器， vim/gcc(g++)/Makefile , vs2019 or vs code

4.正排序引VS倒排序引-搜索引擎具体原理

文档 1 ：雷军买了四斤小米

文档 2 ：雷军发布了小米手机

正排索引：就是从文档 ID 找到文档内容 ( 文档内的关键字 )

目标文档进行分词（目的：方便建立倒排索引和查找）：

文档 1[ 雷军买了四斤小米 ]: 雷军 / 买 / 四斤 / 小米 / 四斤小米

文档 2[ 雷军发布了小米手机 ] ：雷军 / 发布 / 小米 / 小米手机

停止词：了，的，吗， a ， the，一般我们在分词的时候可以不考虑

倒排索引：根据文档内容，分词，整理不重复的各个关键字，对应联系到文档 ID 的方案

关键字(具有唯一性)	文档，weight(权重)
雷军	文档1，文档2
买	文档1
四斤	文档1
小米	文档1，文档2
四斤小米	文档1
发布	文档2
小米手机	文档2

模拟一次查找的过程：

用户输入：小米 -> 倒排索引中查找 -> 提取出文档 ID(1,2) -> 根据正排索引 -> 找到文档的内容 ->

title+conent （desc） +url 文档结果进行摘要 -> 构建响应结果

5.编写数据去标签与数据清洗的模块Parser

boost 官网： https : //www.boost.org/

// 目前只需要 boost_1_78_0/doc/html 目录下的 html 文件，用它来进行建立索引

去标签

[ whb@VM - 0 - 3 - centos boost_searcher ] $ touch parser . cc

// 原始数据 -> 去标签之后的数据

DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

"http://www.w3.org/TR/html4/loose.dtd" >

< html >

< head >

< meta http - equiv = "Content-Type" content = "text/html; charset=UTF-8" >

< title > Chapter 30. Boost . Process title >

< link rel = "stylesheet" href = "../../doc/src/boostbook.css" type = "text/css" >

< meta name = "generator" content = "DocBook XSL Stylesheets V1.79.1" >

< link rel = "home" href = "index.html" title = "The Boost C++ Libraries BoostBook Documentation

Subset" >

< link rel = "up" href = "libraries.html" title = "Part I. The Boost C++ Libraries (BoostBook

Subset)" >

< link rel = "prev" href = "poly_collection/acknowledgments.html" title = "Acknowledgments" >

< link rel = "next" href = "boost_process/concepts.html" title = "Concepts" >

head >

< body bgcolor = "white" text = "black" link = "#0000FF" vlink = "#840084" alink = "#0000FF" >

< table cellpadding = "2" width = "100%" >< tr >

< td valign = "top" >< img alt = "Boost C++ Libraries" width = "277" height = "86"

src = "../../boost.png" > td >

< td align = "center" >< a href = "../../index.html" > Home a > td >

< td align = "center" >< a href = "../../libs/libraries.htm" > Libraries a > td >

< td align = "center" >< a href = "http://www.boost.org/users/people.html" > People a > td >

< td align = "center" >< a href = "http://www.boost.org/users/faq.html" > FAQ a > td >

< td align = "center" >< a href = "../../more/index.htm" > More a > td >

tr > table >

.........

// <> : html 的标签，这个标签对我们进行搜索是没有价值的，需要去掉这些标签，一般标签都是成对出现的！

[ whb@VM - 0 - 3 - centos data ] $ mkdir raw_html

[ whb@VM - 0 - 3 - centos data ] $ ll

total 20

drwxrwxr - x 60 whb whb 16384 Mar 24 16 : 49 input // 这里放的是原始的 html 文档

drwxrwxr - x 2 whb whb 4096 Mar 24 16 : 56 raw_html // 这是放的是去标签之后的干净文档

[ whb@VM - 0 - 3 - centos input ] $ ls - Rl | grep - E '*.html' | wc - l

8141

目标：把每个文档都去标签，然后写入到同一个文件中！每个文档内容不需要任何 \n ！文档和文档之间用 \3 区分

version1 ：

类似： XXXXXXXXXXXXXXXXX\3YYYYYYYYYYYYYYYYYYYYY\3ZZZZZZZZZZZZZZZZZZZZZZZZZ\3

采用下面的方案：

version2 : 写入文件中，一定要考虑下一次在读取的时候，也要方便操作 !

类似： title\3content\3url \n title\3content\3url \n title\3content\3url \n ...

方便我们 getline ( ifsream , line ) ，直接获取文档的全部内容： title\3content\3url

编写 parser

#include
#include
#include
#include
#include"util.hpp"


const std::string src_path = "data/input/";   //所有的html
const std::string output = "data/raw_html/raw.txt"; //解析所有完的html

typedef  struct DocInfo54
{
    std::string title;    //文档的标题
    std::string content;  //文档的内容
    std::string url;      //该文档在官网的url
}DocInfo_t;

bool EnumFile(const std::string &src_path,std::vector *file_list);
bool ParseHtml(std::vector& file_list,std::vector *results);
bool SaveHtml(std::vector& results,const std::string &output);



static bool ParseTitle(const std::string &result,std::string *title) 
{
    size_t begin = result.find("");
    if(begin == std::string::npos)
    {
        return false;
    }
    size_t end = result.find("");
    if(end == std::string::npos)
    {
        return false;
    }
    begin += std::string("").size();
    if(begin > end)
    {
        return false;
    }
    *title = result.substr(begin,end - begin);

    return true;
}
static bool ParseContent(const std::string &file,std::string *content)
{
    //去标签，基于一个简单的状态机
    enum status
    {
        LABLE,
        CONTENT
    };
    enum status s = LABLE;
    for(auto e :file)
    {
        switch (s)
        {
        case LABLE:
            if(e == '>')  //代表结束
                s = CONTENT;
            /* code */
            break;
        case CONTENT:
            if(e == '<') //代表开始
                s = LABLE;
            else
            {
                if(e == '\n') e = ' '; 
                *content += e;
            }
        break;
        default:
            break;
        }
    }
    return true;
}
static bool ParseUrl(const std::string &file,std::string *url)
{
    std::string url_head = "https://www.boost.org/doc/libs/1_79_0/doc/html/";
    std::string url_tail = file.substr(src_path.size());
    *url = url_head + url_tail;
    return true;
}

int main()
{
    //第一步拿到所有文件名
    std::vector<std::string> files_list;
    if(!EnumFile(src_path,&files_list))
    {
        std::cerr<<"enum file name error"<<std::endl;
        return 1;
    }
    //第二步解析文件
    std::vector<DocInfo_t> results;
    if(!ParseHtml(files_list,&results))
    {
        std::cerr<<"parse is error"<<std::endl;
        return 2;
    }
    //第三步,把解析完毕的各个文件内容，写入output,按照\3作为每个文档的分隔符
    if(!SaveHtml(results,output))
    {
        std::cerr<<"save html error"<<std::endl;
        return 3;
    }
    return 0;
}

bool EnumFile(const std::string &src_path,std::vector<std::string> *file_list)  //拿到所有html文件名
{
    namespace fs = boost::filesystem;
    fs::path root_path(src_path);   //创建一个路径名对象
    if(!fs::exists(root_path))  //根据路径创建的对象不存在
    {
        std::cerr<<src_path<<"not exists"<<std::endl;
        return false;
    }
    //定义一个空迭代器，用来判断递归结束
    fs::recursive_directory_iterator end;
    for(fs::recursive_directory_iterator it(root_path); it != end; ++it)
    {
        if(!fs::is_regular_file(*it)) //如果不是普通文件继续
        {
            continue;
        }
        if(it->path().extension() != ".html")
        {
            continue;
        }
        //测试
        //std::cout<<"debug"<<it->path().string()<<std::endl;
        file_list->push_back(it->path().string());
    }

    return true;
}
void ShowInfo(const DocInfo_t &doc)
{
    std::cout<<doc.title<<std::endl;
    std::cout<<doc.content<<std::endl;
    std::cout<<doc.url<<std::endl;
}
bool ParseHtml(std::vector<std::string>& file_list,std::vector<DocInfo_t> *results)//拿到所有html的标题，内容，url
{
    for(const auto  file : file_list)
    {
        //1读取文件
        std::string result;
        if(!ns_util::FileUtil::ReadFile(file,&result))
        {
            //文件读取失败
            continue;
        }
        DocInfo_t doc;
        //2提取标签
        if(!ParseTitle(result,&doc.title))
        {
            continue;;
        }
        //3提取内容
        if(!ParseContent(result,&doc.content))
        {
            continue;
        }
        //4提取url
        if(!ParseUrl(file,&doc.url))
        {
            continue;
        }
        //将结果出入到vector，这里有拷贝问题，以后在优化
        results->push_back(std::move(doc)); //采用右值，资源转移
        //for debug
        //ShowInfo(doc);
        //break;
    }
    return true;
}
bool SaveHtml(std::vector<DocInfo_t>& results,const std::string &output)
{
#define SEP '\3'
    std::ofstream of(output,std::ios::out | std::ios::binary);
    if(!of.is_open())
    {
        std::cerr<<"open"<<output<<"fail"<<std::endl;
        return false;
    }
    //写入文件
    for(const auto &item : results)
    {
        std::string  out_result;
        out_result = item.title;
        out_result += SEP;
        out_result += item.content;
        out_result += SEP;
        out_result += item.url;
        out_result += '\n';

        of.write(out_result.c_str(),out_result.size());
    }
    of.close();
    return true;
}
</code></pre> 
         <div> <span style="color:#333333;"><strong>boost 开发库的安装</strong></span> 
         </div> 
         <div> <span style="color:#333333;"><strong>s</strong></span> <span style="color:#000000;">udo yum install </span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">y boost</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">devel </span> <span style="color:#aa5500;">//</span> <span style="color:#aa5500;">是</span> <span style="color:#aa5500;">boost </span> <span style="color:#aa5500;">开发库</span> 
         </div> 
         <div> <span style="color:#333333;"><strong>提取</strong></span> <span style="color:#333333;"><strong>title</strong></span> 
         </div> 
         <div> <a href="http://img.e-com-net.com/image/info8/14c2343be1354817ac967242fc52a4a7.jpg" target="_blank"><img alt="Boost搜索引擎_第3张图片" height="188" src="http://img.e-com-net.com/image/info8/14c2343be1354817ac967242fc52a4a7.jpg" width="650" style="border:1px solid black;"></a> 
         </div> 
         <p> <a href="http://img.e-com-net.com/image/info8/b679405e75b642eca2ae3acb0e4a0529.jpg" target="_blank"><img alt="Boost搜索引擎_第4张图片" height="182" src="http://img.e-com-net.com/image/info8/b679405e75b642eca2ae3acb0e4a0529.jpg" width="650" style="border:1px solid black;"></a></p> 
         <p><a href="http://img.e-com-net.com/image/info8/5ac7a5f875824b6983107c02f5b7c3cf.jpg" target="_blank"><img alt="Boost搜索引擎_第5张图片" height="393" src="http://img.e-com-net.com/image/info8/5ac7a5f875824b6983107c02f5b7c3cf.jpg" width="650" style="border:1px solid black;"></a> <span style="color:#333333;">在进行遍历的时候，只要碰到了 </span><span style="color:#333333;">> </span><span style="color:#333333;">,</span><span style="color:#333333;">就意味着，当前的标签被处理完毕</span><span style="color:#333333;">. </span><span style="color:#333333;">只要碰到了 </span><span style="color:#333333;">< </span><span style="color:#333333;">意味着新的标签开始了</span></p> 
         <p><span style="color:#333333;"><strong>构建</strong></span><span style="color:#333333;"><strong>URL </strong></span></p> 
         <div> <span style="color:#333333;">boost</span> <span style="color:#333333;">库的官方文档，和我们下载下来的文档，是有路径的对应关系的</span> 
         </div> 
         <div> 
          <div> <span style="color:#333333;">官网</span> <span style="color:#333333;">URL</span> <span style="color:#333333;">样例： </span> <span style="color:#333333;">https://www.boost.org/doc/libs/1_78_0/doc/html/accumulators.html </span> 
          </div> 
          <div> <span style="color:#333333;">我们下载下来的</span> <span style="color:#333333;">url</span> <span style="color:#333333;">样例：</span> <span style="color:#333333;">boost_1_78_0/doc/html/accumulators.html </span> 
          </div> 
          <div> <span style="color:#333333;">我们拷贝到我们项目中的样例：</span> <span style="color:#333333;">data/input/accumulators.html //</span> <span style="color:#333333;">我们把下载下来的</span> <span style="color:#333333;">boost</span> <span style="color:#333333;">库 </span> <span style="color:#333333;">doc/html/* copy </span> 
          </div> 
          <div> <span style="color:#333333;">data/input/ </span> 
          </div> 
          <div> <span style="color:#333333;">url_head = "https://www.boost.org/doc/libs/1_78_0/doc/html"; </span> 
          </div> 
          <div> <span style="color:#333333;">url_tail = [data/input](</span> <span style="color:#333333;">删除</span> <span style="color:#333333;">) /accumulators.html -> url_tail = /accumulators.html </span> 
          </div> 
          <div> <span style="color:#333333;">url = url_head + url_tail ; </span> <span style="color:#333333;">相当于形成了一个官网链接 </span> 
          </div> 
         </div> 
         <p> <span style="color:#333333;"><strong>将解析内容写入文件中</strong></span> </p> 
         <div> <span style="color:#aa5500;">//</span> <span style="color:#aa5500;">见代码 </span> 
         </div> 
         <div> <span style="color:#000000;">采用下面的方案： </span> 
         </div> 
         <div> <span style="color:#000000;">version2</span> <span style="color:#333333;">: </span> <span style="color:#000000;">写入文件中，一定要考虑下一次在读取的时候，也要方便操作</span> <span style="color:#981a1a;">! </span> 
         </div> 
         <div> <span style="color:#000000;">类似：</span> <span style="color:#000000;">title\3content\3url \n title\3content\3url \n title\3content\3url \n </span> <span style="color:#333333;">... </span> 
         </div> 
         <div> <span style="color:#000000;">方便我们</span> <span style="color:#000000;">getline</span> <span style="color:#333333;">(</span> <span style="color:#000000;">ifsream</span> <span style="color:#333333;">, </span> <span style="color:#000000;">line</span> <span style="color:#333333;">)</span> <span style="color:#000000;">，直接获取文档的全部内容：</span> <span style="color:#000000;">title\3content\3url</span> 
         </div> 
        </div> 
       </div> 
      </div> 
     </div> 
    </div> 
   </div> 
  </div> 
  <h1>6.编写建立索引的模块Index</h1> 
  <pre><code class="language-cpp">#pragma once
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <fstream>
#include "util.hpp"
#include <mutex>
#include"log.hpp"
namespace ns_index
{
    struct DocInfo
    {
        std::string title;   //文档标题
        std::string content; //文档对应的去标签之后的内容
        std::string url;     //官网的url
        uint64_t doc_id;     //文档的id
    };

    struct InvertedElem
    {
        uint64_t doc_id;
        std::string word;
        int weigth;
    };

    //倒排拉链
    typedef std::vector<InvertedElem> InvertedList;

    class Index
    {
    private:
        std::vector<DocInfo> forward_index;                           //正排索引
        std::unordered_map<std::string, InvertedList> inverted_index; //倒排索引
        static Index *Instance;
        static std::mutex mtx;

    private:
        Index() = default;
        Index(const Index &) = delete;
        Index &operator=(const Index &) = delete;

    public:
        ~Index() = default;
        static Index *GetInstance()
        {
            if (nullptr == Instance)
            {
                mtx.lock();
                if (nullptr == Instance)
                {
                    Instance = new Index();
                }
                mtx.unlock();
            }

            return Instance;
        }
        //根据doc_id找到文档内容
        DocInfo *GetForWardIndex(uint64_t doc_id)
        {
            if (doc_id >= forward_index.size())
            {
                std::cerr << "doc_id is error" << std::endl;
                return nullptr;
            }
            return &forward_index[doc_id];
        }
        //根据关键字string,获得倒排拉链
        InvertedList *GetInvertedList(const std::string &word)
        {
            auto it = inverted_index.find(word);
            if (it == inverted_index.end())
            {
                std::cerr << word << "have no InvertedList" << std::endl;
                return nullptr;
            }
            return &(it->second);
        }
        //根据去标签，格式化之后的文档，构建正排索引和倒排索引
        // data/raw_html/raw.txt
        bool BuildIndex(const std::string &input)
        {
            std::ifstream in(input, std::ios::in | std::ios::binary);
            if (!in.is_open())
            {
                std::cerr << "sorry" << input << "open sorry" << std::endl;
                return false;
            }
            std::string line;

            int count = 0;
            while (std::getline(in, line))
            {
                DocInfo *doc = BuildForwardIndex(line); //构建正排
                if (doc == nullptr)
                {
                    std::cerr << "build" << line << std::endl; // for debug
                    continue;
                }
                BuildInvertedIndex(*doc);
                count++;
                if(count % 50 == 0) //std::cout<<"当前已经建立的索引文档："<<count<<std::endl;
                LOG(NORMAL, "当前的已经建立的索引文档: " + std::to_string(count));
            }
            return true;
        }

    private:
        DocInfo *BuildForwardIndex(const std::string &line)
        {
            // 1进行字符串切分
            std::vector<std::string> results;
            const std::string seq = "\3";
            ns_util::StringUtil::Split(line, &results, seq);
            if (results.size() != 3)
            {
                return nullptr;
            }
            // 2将字符串进行填充到DocInfo
            DocInfo doc;
            doc.title = results[0];
            doc.content = results[1];
            doc.url = results[2];
            doc.doc_id = forward_index.size();
            // 3插入到正排索引的vector中
            forward_index.push_back(std::move(doc));
            return &forward_index.back();
        }
        bool BuildInvertedIndex(const DocInfo &doc)
        {
            // word 倒排拉链
            struct word_cnt
            {
                /* data */
                int title_cnt;
                int content_cnt;
                word_cnt() : title_cnt(0), content_cnt(0) {}
            };
            std::unordered_map<std::string, word_cnt> word_map;
            //对标题进行分词
            std::vector<std::string> title_words;
            ns_util::JiebaUtil::CurString(doc.title, &title_words);
            for (auto s : title_words)
            {
                boost::to_lower(s); //转化成小写
                word_map[s].title_cnt++;
            }
            //对文档内容进行分词
            std::vector<std::string> contnet_word;
            ns_util::JiebaUtil::CurString(doc.content, &contnet_word);
            for (auto s : contnet_word)
            {
                boost::to_lower(s); //转化成小写
                word_map[s].content_cnt++;
            }
#define X 10
#define Y 1
            for (auto &word_pair : word_map)
            {
                InvertedElem item;
                item.doc_id = doc.doc_id;
                item.word = word_pair.first;
                //相关性
                item.weigth = X * word_pair.second.title_cnt + Y * word_pair.second.content_cnt;
                InvertedList &inverted_list = inverted_index[word_pair.first];
                inverted_list.push_back(std::move(item));
            }
            return true;    
        }
    };
    Index* Index::Instance = nullptr;
    std::mutex Index::mtx;
}</code></pre> 
  <div> <span style="color:#aa5500;">//jieba</span> <span style="color:#aa5500;">的使用</span> <span style="color:#aa5500;">--cppjieba </span> 
  </div> 
  <div> <span style="color:#000000;">获取链接： </span> <span style="color:#000000;">git clone https</span> <span style="color:#333333;">:</span> <span style="color:#aa5500;">//gitcode.net/mirrors/yanyiwu/cppjieba.git </span> 
  </div> 
  <div> <span style="color:#000000;">如何使用：注意细节，我们需要自己执行： </span> <span style="color:#000000;">cd cppjieba</span> <span style="color:#333333;">; </span> <span style="color:#000000;">cp </span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">rf deps</span> <span style="color:#981a1a;">/</span> <span style="color:#000000;">limonp include</span> <span style="color:#981a1a;">/</span> <span style="color:#000000;">cppjieba</span> <span style="color:#981a1a;">/</span> <span style="color:#333333;">, </span> <span style="color:#000000;">不然会编 译报错 </span> 
  </div> 
  <p> </p> 
  <h1>7.编写搜索引擎模块Searcher</h1> 
  <pre><code class="language-cpp">#pragma once
#include "index.hpp"
#include <algorithm>
#include <jsoncpp/json/json.h>
// struct Com
// {
//     bool operator>(const InvertedElem& e1,const InvertedElem& e2)
//     {
//         return e1.weigth > e2.weigth;
//     }
// }
struct InvertedElemPrint
{
    uint64_t doc_id;
    int weight;
    std::vector<std::string> words;
    InvertedElemPrint() : doc_id(0), weight(0) {}
};
namespace ns_searcher
{
    class Searcher
    {
    private:
        ns_index::Index *index;

    public:
        Searcher() = default;
        ~Searcher() = default;
        void InitSearcher(const std::string &input)
        {
            // 1.获取或者创建index对象
            index = ns_index::Index::GetInstance(); //获得单例
            //std::cout << "获取单例成功" << std::endl;
            LOG(NORMAL, "获取index单例成功...");
            // 2.根据index对象建立索引
            index->BuildIndex(input);
           // std::cout << "建立正排和倒排索引成功...." << std::endl;
            LOG(NORMAL, "建立正排和倒排索引成功...");
        }

        std::string GetDesc(const std::string &html_src, const std::string &word)
        {
            const int prev_step = 50;
            const int next_step = 100;
            //找到首次出现的位置
            // std::size_t pos = html_src.find(word);  //错误原文档没有忽略大小写
            auto it = std::search(html_src.begin(), html_src.end(), word.begin(), word.end(),
                                  [](int a, int b)
                                  { return std::tolower(a) == std::tolower(b); });
            int pos = std::distance(html_src.begin(), it);
            if (pos == std::string::npos)
            {
                return "None1"; //不存在这种情况
            }
            // 2获取start end
            int start = 0;
            int end = html_src.size() - 1;

            if (pos > start + prev_step)
                start = pos - prev_step;
            if (pos < end - next_step)
                end = pos + next_step;

            if (start >= end)
                return "None2";
            return html_src.substr(start, end - start) + "...";
        }
        // query：搜索关键字
        // josn_string:返回给用户的搜索结果
        void Search(const std::string &query, std::string *json_string)
        {
            // 1.[分词]：对我们的query进行按照searcher的要求进行分词
            std::vector<std::string> words;
            ns_util::JiebaUtil::CurString(query, &words);
            // 2.[触发]：就是根据分词的各个“词”，进行Index查找
            // ns_index::InvertedList inverted_list_all;
            std::vector<InvertedElemPrint> inverted_list_all;

            std::unordered_map<uint64_t, InvertedElemPrint> tokens_map;
            for (auto &e : words)
            {
                boost::to_lower(e);
                ns_index::InvertedList *inverted_list = index->GetInvertedList(e);
                if (inverted_list == nullptr)
                    continue;
                //不完美的地方，可能有重复的文档
                //  inverted_list_all.insert(inverted_list_all.end(),inverted_list->begin(),inverted_list->end());
                for (const auto &elem : *inverted_list)
                {
                    auto &item = tokens_map[elem.doc_id]; //[]:如果存在直接获取，如果不存在新建
                    // item一定是doc_id相同的print节点
                    item.doc_id = elem.doc_id;
                    item.weight += elem.weigth;
                    item.words.push_back(elem.word);
                }
                for (const auto &elem : *inverted_list)
                {
                    auto &item = tokens_map[elem.doc_id]; //[]:如果存在直接获取，如果不存在新建
                    // item一定是doc_id相同的print节点
                    item.doc_id = elem.doc_id;
                    item.weight += elem.weigth;
                    item.words.push_back(elem.word);
                }
                for (const auto &item : tokens_map)
                {
                    inverted_list_all.push_back(std::move(item.second));
                }
            }
            // 3.[合并排序]：汇总查找结果，按照相关性（weight）降序排序
            //  std::sort(inve rted_list_all.begin(), inverted_list_all.end(),\
            //               []( const ns_index::InvertedElem e1,  const ns_index::InvertedElem e2){
            //                return e1.weigth > e2.weigth;
            //                });
            //  std::sort(inverted_list_all.begin(),inverted_list_all.end(),Com());

            std::sort(inverted_list_all.begin(), inverted_list_all.end(),
                      [](const InvertedElemPrint &e1, const InvertedElemPrint &e2)
                      {
                          return e1.weight > e2.weight;
                      });
            // 4.[构建]：根据查找出来的结果，构建jsonc串 -----jsoncpp
            Json::Value root;
            for (auto &item : inverted_list_all)
            {
                ns_index::DocInfo *doc = index->GetForWardIndex(item.doc_id);
                if (doc == nullptr)
                    continue;

                Json::Value elem;
                elem["title"] = doc->title;
                elem["desc"] = GetDesc(doc->content, item.words[0]); // content是文档的去标签的结果，但是不是我们想要的，我们要的是一部分 TODO
                elem["url"] = doc->url;
                // for deubg, for delete
                elem["id"] = (int)item.doc_id;
                elem["weight"] = item.weight; // int->string

                root.append(elem);
            }

            Json::StyledWriter writer;
            *json_string = writer.write(root);
        }
    };
}</code></pre> 
  <p><a href="http://img.e-com-net.com/image/info8/56b798e628b648c78909d50bed07d9fd.jpg" target="_blank"><img alt="Boost搜索引擎_第6张图片" height="255" src="http://img.e-com-net.com/image/info8/56b798e628b648c78909d50bed07d9fd.jpg" width="650" style="border:1px solid black;"></a> </p> 
  <p><span style="color:#333333;">搜索：雷军小米</span><span style="color:#333333;"> -> </span><span style="color:#333333;">雷军、小米</span><span style="color:#333333;">-></span><span style="color:#333333;">查倒排</span><span style="color:#333333;">-></span><span style="color:#333333;">两个倒排拉链（文档</span><span style="color:#333333;">1</span><span style="color:#333333;">，文档</span><span style="color:#333333;">2</span><span style="color:#333333;">，文档</span><span style="color:#333333;">1</span><span style="color:#333333;">、文档</span><span style="color:#333333;">2</span><span style="color:#333333;">）</span> </p> 
  <p><span style="color:#333333;"><strong>安装 </strong></span><span style="color:#333333;">jsoncpp </span> </p> 
  <div> <span style="color:#000000;">sudo yum install </span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">y jsoncpp</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">devel</span> 
  </div> 
  <div>
    获取摘要 
  </div> 
  <div> <a href="http://img.e-com-net.com/image/info8/ab7808a5e52442d09cecb5d8147b5d52.jpg" target="_blank"><img alt="Boost搜索引擎_第7张图片" height="263" src="http://img.e-com-net.com/image/info8/ab7808a5e52442d09cecb5d8147b5d52.jpg" width="650" style="border:1px solid black;"></a> 
  </div> 
  <p><span style="color:#333333;"><strong>关于调试 </strong></span> </p> 
  <div> <span style="color:#333333;">把整个文件读到内存 </span> 
  </div> 
  <div> <span style="color:#333333;">先拿到标题，取到了标题。 </span> 
  </div> 
  <div> <span style="color:#333333;">对整个文件进行去标签，其中是包括标签的！！！！ </span> 
  </div> 
  <div> <span style="color:#333333;">实际如果一个词在</span> <span style="color:#333333;">title</span> <span style="color:#333333;">中出现，一定会被当标题 和 当内容分别被统计一次！！！</span> 
  </div> 
  <h1>8.编写http server模块</h1> 
  <div> <span style="color:#000000;">cpp</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">httplib</span> <span style="color:#000000;">库：</span> <span style="color:#000000;">https</span> <span style="color:#333333;">:</span> <span style="color:#aa5500;">//gitee.com/zhangkt1995/cpp-httplib?_from=gitee_search </span> 
  </div> 
  <div> <span style="color:#000000;">注意：</span> <span style="color:#000000;">cpp</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">httplib</span> <span style="color:#000000;">在使用的时候需要使用较新版本的</span> <span style="color:#000000;">gcc</span> <span style="color:#000000;">，</span> <span style="color:#000000;">centos </span> <span style="color:#116644;">7</span> <span style="color:#000000;">下默认</span> <span style="color:#000000;">gcc </span> <span style="color:#116644;">4.8.5 </span> 
  </div> 
  <div></div> 
  <div> 
   <div> <span style="color:#000000;">sudo yum install centos</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">release</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">scl scl</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">utils</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">build</span> 
   </div> 
   <div>
     yum -y install devtoolset-9-gcc devtoolset-9-gcc-c++ devtoolset-9-binutils # 安装gcc和g++ 
   </div> 
   <div> 
    <div> <span style="color:#000000;">scl enable devtoolset</span> <span style="color:#981a1a;">-</span> <span style="color:#116644;">7 </span> <span style="color:#000000;">bash</span> 
    </div> 
   </div> 
   <div> 
    <div> <span style="color:#000000;">ls </span> <span style="color:#981a1a;">/</span> <span style="color:#000000;">opt</span> <span style="color:#981a1a;">/</span> <span style="color:#000000;">rh</span> <span style="color:#981a1a;">/  </span> <span style="color:#aa5500;">//</span> <span style="color:#aa5500;">启动： 细节，命令行启动只能在本会话有效</span> 
    </div> 
    <div> <span style="color:#aa5500;">//</span> <span style="color:#aa5500;">可选：如果想每次登陆的时候，都是较新的</span> <span style="color:#aa5500;">gcc </span> 
    </div> 
    <div> 
     <div></div> 
     <div> <span style="color:#000000;">永久更新 vim ~</span> <span style="color:#981a1a;">/</span> <span style="color:#333333;">.</span> <span style="color:#000000;">bash_profile 在文本末尾添加 scl enable devtoolset</span> <span style="color:#981a1a;">-</span> <span style="color:#116644;">7 </span> <span style="color:#000000;">bash</span> 
     </div> 
     <div> <span style="color:#333333;"><strong>安装 </strong></span> <span style="color:#333333;">cpp</span> <span style="color:#333333;"><strong>-</strong></span> <span style="color:#333333;">httplib </span> 
     </div> 
     <div> 
      <div> <span style="color:#000000;">最新的</span> <span style="color:#000000;">cpp</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">httplib</span> <span style="color:#000000;">在使用的时候，如果</span> <span style="color:#000000;">gcc</span> <span style="color:#000000;">不是特别新的话有可能会有运行时错误的问题 </span> 
      </div> 
      <div> <span style="color:#000000;">建议：</span> <span style="color:#000000;">cpp</span> <span style="color:#981a1a;">-</span> <span style="color:#000000;">httplib </span> <span style="color:#116644;">0.7.15 </span> 
      </div> 
      <div> <span style="color:#000000;">下载</span> <span style="color:#000000;">zip</span> <span style="color:#000000;">安装包，上传到服务器即可 或者直接get clone也行，在gitee上有项目，直接搜索cpp-hpplib</span> 
      </div> 
      <div> 
       <pre><code class="language-cpp">#include "cpp-httplib/httplib.h"
#include "searcher.hpp"
const std::string root_path = "./wwwroot";
const std::string input ="data/raw_html/raw.txt";
int main()
{ 
    ns_searcher::Searcher searcher;
    searcher.InitSearcher(input);
    httplib::Server svr; 
    svr.set_base_dir(root_path.c_str()); 
    svr.Get("/s", [&searcher](const httplib::Request &req, httplib::Response &rsp){
        if(!req.has_param("word"))
        {
            rsp.set_content("必须要有搜索关键字！","text/plain: chatset=utf-8");
            return;
        }
        std::string word = req.get_param_value("word");
       // std::cout<<"用户正在搜索："<<word<<std::endl;
        LOG(NORMAL, "用户搜索的: " + word);
        std::string json_string;
        searcher.Search(word,&json_string);
        rsp.set_content(json_string,"application/json");
        });
        //rsp.set_content("你好,世界!", "text/plain; charset=utf-8"); 
        LOG(NORMAL, "服务器启动成功...");
        svr.listen("0.0.0.0", 8081); return 0;
}
</code></pre> 
       <p></p> 
      </div> 
     </div> 
    </div> 
   </div> 
  </div> 
  <h1>9.编写前段模块</h1> 
  <p></p> 
  <pre><code class="language-html"><!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script src="http://code.jquery.com/jquery-2.1.1.min.js"></script>
    <title>boost 搜索引擎

makefile

Parser=parser
DUG=debug
HTTP_SEARCHER=http_searcher
cc=g++

.PHONY:all
all:$(Parser) $(DUG) $(HTTP_SEARCHER)
$(Parser):parser.cc
	$(cc) -o $@ $^ -lboost_system -lboost_filesystem -std=c++11
$(DUG):debug.cc
	$(cc) -o $@ $^  -ljsoncpp -std=c++11
$(HTTP_SEARCHER):http_searcher.cc
	$(cc) -o $@ $^  -ljsoncpp -lpthread -std=c++11

	
.PHONY:clean
clean:
	rm -rf $(Parser) $(DUG) $(HTTP_SEARCHER)

10.项目效果

11.添加日志

#pragma once

#include 
#include 
#include 

#define NORMAL  1
#define WARNING 2
#define DEBUG   3
#define FATAL   4

#define LOG(LEVEL, MESSAGE) log(#LEVEL, MESSAGE, __FILE__, __LINE__)

void log(std::string level, std::string message, std::string file, int line)
{
    std::cout << "[" << level << "]" << "[" << time(nullptr) << "]" << "[" << message << "]" << "[" << file << " : " << line << "]" << std::endl;
}

结项总结

项目扩展方向

1. 建立整站搜索

2. 设计一个在线更新的方案，信号，爬虫，完成整个服务器的设计

3. 不使用组件，而是自己设计一下对应的各种方案（有时间，有精力）

4. 在我们的搜索引擎中，添加竞价排名 ( 强烈推荐 )

5. 热次统计，智能显示搜索关键词（字典树，优先级队列） ( 比较推荐 )

6. 设置登陆注册，引入对 mysql 的使用 ( 比较推荐的 )

项目源码： liutao932/boost (github.com)https://github.com/liutao932/boost

你可能感兴趣的:(网络,数据结构,c++进阶,搜索引擎,百度)

C++11堆操作深度解析：std::is_heap与std::is_heap_until原理解析与实践
文章目录堆结构基础与函数接口堆的核心性质函数签名与核心接口std::is_heapstd::is_heap_until实现原理深度剖析std::is_heap的验证逻辑std::is_heap_until的定位策略算法优化细节代码实践与案例分析基础用法演示自定义比较器实现最小堆检查边缘情况处理性能分析与实际应用时间复杂度对比典型应用场景与手动实现的对比注意事项与最佳实践迭代器要求比较器设计C++标
什么是证书吊销列表？CRL 解释 WoTrusSSL ssl https
数字证书是安全在线互动的支柱，用于验证身份和确保加密通信。但是，当这些证书被盗用或滥用时，必须立即撤销它们以维持信任。这就是证书撤销列表(CRL)的作用所在。CRL由证书颁发机构(CA)维护，对于识别和撤销已撤销的证书，防止其造成危害至关重要。在本指南中，我们将探讨什么是CRL、它们如何运作以及为什么它们对网络安全至关重要。什么是证书吊销列表(CRL)？证书吊销列表(CRL)是证书颁发机构(CA)
C++ 11 Lambda表达式和min_element()与max_element()的使用_c++ lamda函数 min_element((1) 2401_84976182 程序员 c语言 c++学习
既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上CC++开发知识点，真正体系化！由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新如果你需要这些资料，可以戳这里获取#include#include#includeusingnamespacestd;boolcmp(int
C++ 11 Lambda表达式和min_element()与max_element()的使用_c++ lamda函数 min_element(
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化的资料的朋友，可以添加戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！intmain(){vectormyvec{3,
k8s:安装 Helm 私有仓库ChartMuseum、helm-push插件并上传、安装Zookeeper 云游 docker helm helm-push
ChartMuseum是Kubernetes生态中用于存储、管理和发布HelmCharts的开源系统，主要用于扩展Helm包管理器的功能核心功能‌集中存储‌：提供中央化仓库存储Charts，支持版本管理和权限控制。‌‌跨集群部署‌：支持多集群环境下共享Charts，简化部署流程。‌‌离线部署‌：适配无网络环境，可将Charts存储在本地或局域网内。‌‌HTTP接口‌：通过HTTP协议提供服务，用户
嵌入式系统LCD显示模块编程实践
本文还有配套的精品资源，点击获取简介：本文档提供了一个具有800x480分辨率的3.5英寸液晶显示模块LW350AC9001的驱动程序代码，以及嵌入式系统中使用C/C++语言进行硬件编程的实践指南。该模块的2mm厚度使其适用于空间受限的便携式设备。内容包括驱动程序源代码、硬件控制接口使用方法，以及如何在嵌入式系统中进行图形处理、电源管理与性能优化。1.嵌入式系统原理1.1嵌入式系统概念嵌入式系统是
Android 开源组件和第三方库汇总 gyyzzr Android Android 开源框架
转载1、github排名https://github.com/trending,github搜索：https://github.com/search2、https://github.com/wasabeef/awesome-android-ui目录UIUI卫星菜单节选器下拉刷新模糊效果HUD与Toast进度条UI其它动画网络相关响应式编程地图数据库图像浏览及处理视频音频处理测试及调试动态更新热更新
ARM嵌入式可编程控制器技术开发拉勾科研工作室 arm开发
PLC自动化设计|毕业设计指导|工业自动化解决方案✨专业领域：PLC程序设计与调试工业自动化控制系统HMI人机界面开发工业传感器应用电气控制系统设计工业网络通信擅长工具：西门子S7系列PLC编程三菱/欧姆龙PLC应用触摸屏界面设计电气CAD制图工业现场总线技术自动化设备调试主要内容：PLC控制系统设计工业自动化方案规划电气原理图绘制控制程序编写与调试毕业论文指导毕业设计题目与程序设计✅具体问题可以
理解TCP连接中的进程阻塞与CPU调度机制 109702008 编程 #C语言网络 tcp/ip 网络人工智能
引言在计算机网络通信中，TCP连接的建立是一个经典的三次握手过程。当用户调用connect()函数发起连接时，内核会发送SYN报文并等待对方的SYN-ACK响应。此时，调用进程通常会进入阻塞状态，暂停执行直至连接成功或超时。这一机制看似简单，但其背后的内核实现却涉及进程调度、等待队列管理和CPU资源分配等复杂操作。本文将深入探讨阻塞状态的实现原理，并解析CPU在进程阻塞期间的行为。一、进程阻塞的实
【超硬核】JVM源码解读：Java方法main在虚拟机上解释执行 HeapDump性能社区 java 开发语言后端 jvm
本文由HeapDump性能社区首席讲师鸠摩（马智）授权整理发布第1篇-关于Java虚拟机HotSpot，开篇说的简单点开讲Java运行时，这一篇讲一些简单的内容。我们写的主类中的main()方法是如何被Java虚拟机调用到的？在Java类中的一些方法会被由C/C++编写的HotSpot虚拟机的C/C++函数调用，不过由于Java方法与C/C++函数的调用约定不同，所以并不能直接调用，需要JavaC
Python 脚本最佳实践2025版
前文可以直接把这篇文章喂给AI,可以放到AI角色设定里,也可以直接作为提示词.这样,你只管提需求,写脚本就让AI来.概述追求简洁和清晰：脚本应简单明了。使用函数(functions)、常量(constants)和适当的导入(import)实践来有逻辑地组织你的Python脚本。使用枚举(enumerations)和数据类(dataclasses)等数据结构高效管理脚本状态。通过命令行参数增强交互性
（Python基础篇）字典的操作 EternityArt 基础篇 python 开发语言
一、引言在Python编程中，字典（Dictionary）是一种极具灵活性的数据结构，它通过“键-值对”（key-valuepair）的形式存储数据，如同现实生活中的字典——通过“词语（键）”快速查找“释义（值）”。相较于列表和元组的有序索引访问，字典的优势在于基于键的快速查找，这使得它在处理需要频繁通过唯一标识获取数据的场景中极为高效。掌握字典的操作，能让我们更高效地组织和管理复杂数据，是Pyt
C++ 设计模式：抽象工厂（Abstract Factory）冀晓武 C++设计模式 c++设计模式抽象工厂模式
链接：C++设计模式链接：C++设计模式-工厂方法链接：C++设计模式-原型模式链接：C++设计模式-建造者模式抽象工厂（AbstractFactory）是一种创建型设计模式，它提供一个接口，用于创建一系列相关或相互依赖的对象，而无需指定它们的具体类。抽象工厂模式通常用于创建一组相关的产品对象，例如不同类型的机器人和它们的配件。1.问题分析在某些情况下，我们需要创建一组相关或相互依赖的对象，但我们
php SPOF 贵哥的编程之路(热爱分享为后来者) PHP语言经典程序100题 php 开发语言
1.什么是单点故障（SPOF）？单点故障指的是系统中某个组件一旦失效，整个系统或服务就会不可用。常见的单点有：数据库、缓存、Web服务器、负载均衡、网络设备等。2.常见单点故障场景只有一台数据库服务器，宕机后所有业务不可用只有一台Redis缓存，挂掉后缓存全部失效只有一台Web服务器，挂掉后网站无法访问只有一个负载均衡节点，挂掉后流量无法分发只有一条网络链路，断开后所有服务失联3.消除单点故障的主
霍夫变换（Hough Transform）算法原来详解和纯C++代码实现以及OpenCV中的使用示例点云SLAM 算法图形图像处理算法 opencv 图像处理与计算机视觉算法直线提取检测目标检测霍夫变换算法
霍夫变换（HoughTransform）是一种经典的图像处理与计算机视觉算法，广泛用于检测图像中的几何形状，例如直线、圆、椭圆等。其核心思想是将图像空间中的“点”映射到参数空间中的“曲线”，从而将形状检测问题转化为参数空间中的峰值检测问题。一、霍夫变换基本思想输入：边缘图像（如经过Canny边缘检测）输出：一组满足几何模型的形状（如直线、圆）关键思想：图像空间中的一个点→参数空间中的一个曲线参数空
用OpenCV标定相机内参应用示例（C++和Python）
下面是一个完整的使用OpenCV进行相机内参标定（CameraCalibration）的示例，包括C++和Python两个版本，基于棋盘格图案标定。一、目标：相机标定通过拍摄多张带有棋盘格图案的图像，估计相机的内参：相机矩阵（内参）K畸变系数distCoeffs可选外参（R,T）标定精度指标（如重投影误差）二、棋盘格参数设置（根据自己的棋盘格设置）：棋盘格角点数：9x6（内角点，9列×6行）；每个
C++设计模式：简单工厂、工厂方法、抽象工厂起个别名 C++算法 c++
1.工厂模式的特点在我们现实生活中，买馒头和自己蒸馒头、去饭店点一份大盘鸡和自己养鸡，杀鸡，做大盘鸡，这是全然不同的两种体验：自己做麻烦，而且有失败的风险，需要自己承担后果。买现成的，可以忽略制作细节，方便快捷并且无风险，得到的肯定是美味的食物。对于后者，就相当于是一个加工厂，通过这个工厂我们就可以得到想要的东西，在程序设计中，这种模式就叫做工厂模式，工厂生成出的产品就是某个类的实例，也就是对象。
计算机网络技术 CZZDg 计算机网络
目录一.网络概述1.网络的概念2.网络发展是3.网络的四要素4.网络功能5.网络类型6.网络协议与标准7.网络中常见的概念8.网络拓补结构二.网络模型1.分层思想2.OSI七层模型3.TCP/IP五层模型4.数据的封装与解封装过程三.IP地址1.进制转换2.IP地址定义3.IP地址组成成分4.IP地址分类5.地址划分6、相关概念一.网络概述1.网络的概念两个主机通过传输介质和通信协议实现通信和资源
玩转Docker | 使用Docker部署gopeed下载工具心随_风动玩转Docker docker 容器运维
玩转Docker|使用Docker部署gopeed下载工具前言一、gopeed介绍Gopeed简介主要特点二、系统要求环境要求环境检查Docker版本检查检查操作系统版本三、部署gopeed服务下载镜像创建容器检查容器状态检查服务端口安全设置四、访问gopeed应用五、测试与下载六、总结前言在当今信息爆炸的时代，高效地获取和管理网络资源变得尤为重要。无论是下载大型文件还是进行日常的数据传输，一个稳
Docker指定网桥和指定网桥IP
$dockernetworklsNETWORKIDNAMEDRIVER7fca4eb8c647bridgebridge9f904ee27bf5nonenullcf03ee007fb4hosthostBridge默认bridge网络,我们可以使用dockernetworkinspect命令查看返回的网络信息，我们使用dockerrun命令是将网络自动应用到新的容器Host如果是hosts模式，启动容
UNIX域套接字
1、UNIX域套接字的定义UNIX域套接字是进程间通信（IPC）的一种方式，不涉及网络协议栈，因此在同一台主机上的通信中，它比基于TCP/IP协议的网络套接字更快速、更高效。2、UNIX域套接字的分类字节流套接字（SOCK_STREAM）：提供面向连接的、可靠的数据传输服务。数据报套接字（SOCK_DGRAM）：提供无连接的数据传输服务，数据以独立的数据报形式传输。3、UNIX套接字与TCP/IP
Kimi Chat 1.5 与 2.0 架构升级对比 charles666666 人工智能 transformer 深度学习产品经理 chatgpt
1.5版的MoE架构优化KimiChat1.5采用了优化后的MoE架构，其核心在于“专家网络动态路由”。这一机制类似于快递系统智能选择最优路径，能够根据输入数据的特性动态分配计算资源。这种优化显著提升了模型的计算效率，同时降低了硬件资源的浪费。在实际应用中，这意味着开发者可以在相同的硬件配置下处理更复杂的任务，或者在有限的资源下实现更高的性能。2.0的混合专家系统创新点与1.5版相比，KimiCh
Java三年经验程序员技术栈全景指南：从前端到架构，对标阿里美团全栈要求可曾去过倒悬山 java 前端架构
Java三年经验程序员技术栈全景指南：从前端到架构，对标阿里美团全栈要求三年经验是Java程序员的分水岭，技术栈深度决定你成为“业务码农”还是“架构师候选人”。本文整合阿里、美团、滴滴等大厂招聘要求，为你绘制可落地的进阶路线。一、Java核心：从语法糖到JVM底层三年经验与初级的核心差异在于系统级理解，大厂面试常考以下能力：JVM与性能调优内存模型（堆外内存、元空间）、GC算法（G1/ZGC适用场
【GESP】C++三级真题 luogu-B4359 [GESP202506 三级] 分糖果 CoderCodingNo GESP c++java 开发语言
GESPC++三级，2025年6月真题，模拟算法，难度★★☆☆☆。本次三级题目个人感觉比较简单。题目题解详见：【GESP】C++三级真题luogu-B4359[GESP202506三级]分糖果|OneCoder【GESP】C++三级真题luogu-B4359[GESP202506三级]分糖果|OneCoderGESPC++三级，2025年6月真题，模拟算法，难度★★☆☆☆。本次三级题目个人感觉比较
C++设计秘籍：为什么所有参数都需类型转换时，非成员函数才是王道？讳疾忌医丶 c++前端开发语言
当所有参数都需要类型转换时，为什么要选择非成员函数？在C++的世界里，有一个看似简单却蕴含深意的设计原则：当所有参数（包括被this指针所指的那个隐式参数）皆须进行类型转换时，请为此采用非成员函数实现。这个原则背后隐藏着C++类型系统的精妙设计，也揭示了成员函数与非成员函数在处理隐式类型转换时的本质差异。想象一下，你正在设计一个数学计算库，需要支持整数与有理数的混合运算。如果你天真地将所有操作都实
什么是OA系统？使用OA系统对企业有哪些好处？
OA系统（OfficeAutomationSystem），即办公自动化系统，是将现代化办公和计算机网络功能结合起来的一种新型的办公方式。是现代企业管理中一种重要的信息化工具，它通过计算机技术、网络技术和数据库技术等手段，实现企业内部办公流程的自动化和信息化管理。使企业的信息交流更加顺畅，办公流程更加高效，从而提高企业的运营效率和管理水平。一、主要功能1.文档管理文档存储与检索：OA系统可以集中存储
深入了解 Vim 编辑器：从入门到精通誰能久伴不乏编辑器 vim linux
文章目录深入了解Vim编辑器：从入门到精通一、Vim的三个基本模式1.普通模式（NormalMode）2.插入模式（InsertMode）3.命令模式（CommandMode）二、常用快捷键光标移动删除操作复制和粘贴撤销和重做三、文件操作与搜索文件操作搜索文本替换文本四、Vim的进阶功能多文件编辑分屏功能标签页查看帮助五、总结深入了解Vim编辑器：从入门到精通Vim是一个强大的文本编辑器，广泛应用
初始化列表与类型转换（C++） 2401_89195731 c++开发语言
初始化列表和构造函数体在C++中都是用于给类的成员变量赋初值区别：初始化列表是给每个成员变量定义初始化的地方，即使有成员变量没有给它显式在初始化列表初始化，它也会走初始化列表初始化时机初始化列表：在对象创建时，成员变量通过初始化列表被直接初始化，这发生在构造函数体执行之前。构造函数体内赋值：成员变量首先被默认初始化，然后在构造函数体内通过赋值语句进行赋值。性能差异初始化列表：通常更高效，因为它避免
list的一些特性（C++） 2401_89195731 c++开发语言
C++STL库中的std::list是一个带头双向循环链表，使用之前需要包头文件，它和vector的使用高度类似。构造list支持多种构造方式默认构造函数：创建一个空的列表。拷贝构造函数：从另一个相同类型的列表创建一个新的列表。范围构造函数：从一对迭代器指定的范围内复制元素到新的列表中。初始值列表构造函数：使用初始化列表（initializerlist）创建一个包含指定元素的列表。填充构造函数：创
Docker容器底层原理详解：从零理解容器化技术 Debug Your Career 面试 docker 容器 docker java
一、容器本质：一个“隔离的进程”关键认知：Docker容器并不是一个完整的操作系统，而是一个被严格隔离的进程。这个进程拥有独立的文件系统、网络、进程视图等资源，但它直接运行在宿主机内核上（而虚拟机需要模拟硬件和操作系统）。类比理解：想象你在一个办公楼里租了一间独立办公室（容器）。你有自己的桌椅（文件系统）、电话分机（网络）、门牌号（主机名），但共享整栋楼的水电（宿主机内核）和电梯（硬件资源）。办公
继之前的线程循环加到窗口中运行 3213213333332132 java thread JFrame JPanel
之前写了有关java线程的循环执行和结束，因为想制作成exe文件，想把执行的效果加到窗口上，所以就结合了JFrame和JPanel写了这个程序，这里直接贴出代码，在窗口上运行的效果下面有附图。 package thread; import java.awt.Graphics; import java.text.SimpleDateFormat; import java.util
linux 常用命令 BlueSkator linux 命令
1.grep 相信这个命令可以说是大家最常用的命令之一了。尤其是查询生产环境的日志，这个命令绝对是必不可少的。但之前总是习惯于使用（grep -n 关键字文件名）查出关键字以及该关键字所在的行数，然后再用（sed -n '100,200p' 文件名），去查出该关键字之后的日志内容。但其实还有更简便的办法，就是用（grep -B n、-A n、-C n 关键
php heredoc原文档和nowdoc语法 dcj3sjt126com PHP heredoc nowdoc
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Current To-Do List</title> </head> <body> <?
overflow的属性周华华 JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
《我所了解的Java》——总体目录 g21121 java
准备用一年左右时间写一个系列的文章《我所了解的Java》，目录及内容会不断完善及调整。在编写相关内容时难免出现笔误、代码无法执行、名词理解错误等，请大家及时指出，我会第一时间更正。 &n
[简单]docx4j常用方法小结 53873039oycg docx
本代码基于docx4j-3.2.0，在office word 2007上测试通过。代码如下: import java.io.File; import java.io.FileInputStream; import ja
Spring配置学习云端月影 spring配置
首先来看一个标准的Spring配置文件 applicationContext.xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi=&q
Java新手入门的30个基本概念三 aijuans java 新手 java 入门
17.Java中的每一个类都是从Object类扩展而来的。　　18.object类中的equal和toString方法。　　equal用于测试一个对象是否同另一个对象相等。　　toString返回一个代表该对象的字符串,几乎每一个类都会重载该方法,以便返回当前状态的正确表示.(toString 方法是一个很重要的方法)　　 19.通用编程:任何类类型的所有值都可以同object类性的变量来代替。　
《2008 IBM Rational 软件开发高峰论坛会议》小记 antonyup_2006 软件测试敏捷开发项目管理 IBM 活动
我一直想写些总结,用于交流和备忘,然都没提笔,今以一篇参加活动的感受小记开个头,呵呵! 其实参加《2008 IBM Rational 软件开发高峰论坛会议》是9月4号,那天刚好调休.但接着项目颇为忙,所以今天在中秋佳节的假期里整理了下. 参加这次活动是一个朋友给的一个邀请书,才知道有这样的一个活动,虽然现在项目暂时没用到IBM的解决方案,但觉的参与这样一个活动可以拓宽下视野和相关知识.
PL/SQL的过程编程,异常,声明变量,PL/SQL块百合不是茶 PL/SQL的过程编程异常 PL/SQL块声明变量
PL/SQL; 过程; 符号; 变量; PL/SQL块; 输出; 异常; PL/SQL 是过程语言(Procedural Language)与结构化查询语言(SQL)结合而成的编程语言PL/SQL 是对 SQL 的扩展,sql的执行时每次都要写操作
Mockito(三)--完整功能介绍 bijian1013 持续集成 mockito 单元测试
mockito官网：http://code.google.com/p/mockito/，打开documentation可以看到官方最新的文档资料。一.使用mockito验证行为 //首先要import Mockito import static org.mockito.Mockito.*; //mo
精通Oracle10编程SQL(8)使用复合数据类型 bijian1013 oracle 数据库 plsql
/* *使用复合数据类型 */ --PL/SQL记录 --定义PL/SQL记录 --自定义PL/SQL记录 DECLARE TYPE emp_record_type IS RECORD( name emp.ename%TYPE, salary emp.sal%TYPE, dno emp.deptno%TYPE ); emp_
【Linux常用命令一】grep命令 bit1129 Linux常用命令
grep命令格式 grep [option] pattern [file-list] grep命令用于在指定的文件(一个或者多个,file-list)中查找包含模式串(pattern)的行,[option]用于控制grep命令的查找方式。 pattern可以是普通字符串，也可以是正则表达式，当查找的字符串包含正则表达式字符或者特
mybatis3入门学习笔记白糖_ sql ibatis qq jdbc 配置管理
MyBatis 的前身就是iBatis，是一个数据持久层(ORM)框架。 MyBatis 是支持普通 SQL 查询，存储过程和高级映射的优秀持久层框架。MyBatis对JDBC进行了一次很浅的封装。以前也学过iBatis，因为MyBatis是iBatis的升级版本，最初以为改动应该不大，实际结果是MyBatis对配置文件进行了一些大的改动，使整个框架更加方便人性化。
Linux 命令神器：lsof 入门 ronin47 lsof
lsof是系统管理/安全的尤伯工具。我大多数时候用它来从系统获得与网络连接相关的信息，但那只是这个强大而又鲜为人知的应用的第一步。将这个工具称之为lsof真实名副其实，因为它是指“列出打开文件（lists openfiles）”。而有一点要切记，在Unix中一切（包括网络套接口）都是文件。有趣的是，lsof也是有着最多
java实现两个大数相加，可能存在溢出。 bylijinnan java实现
import java.math.BigInteger; import java.util.regex.Matcher; import java.util.regex.Pattern; public class BigIntegerAddition { /** * 题目：java实现两个大数相加，可能存在溢出。 * 如123456789 + 987654321
Kettle学习资料分享，附大神用Kettle的一套流程完成对整个数据库迁移方法 Kai_Ge Kettle
Kettle学习资料分享 Kettle 3.2 使用说明书目录概述..........................................................................................................................................7 1.Kettle 资源库管
[货币与金融]钢之炼金术士 comsci 金融
自古以来,都有一些人在从事炼金术的工作.........但是很少有成功的那么随着人类在理论物理和工程物理上面取得的一些突破性进展...... 炼金术这个古老
Toast原来也可以多样化 dai_lm android toast
Style 1：默认 Toast def = Toast.makeText(this, "default", Toast.LENGTH_SHORT); def.show(); Style 2：顶部显示 Toast top = Toast.makeText(this, "top", Toast.LENGTH_SHORT); t
java数据计算的几种解决方法3 datamachine java hadoop ibatis r-langue r
4、iBatis 简单敏捷因此强大的数据计算层。和Hibernate不同，它鼓励写SQL，所以学习成本最低。同时它用最小的代价实现了计算脚本和JAVA代码的解耦，只用20%的代价就实现了hibernate 80%的功能,没实现的20%是计算脚本和数据库的解耦。复杂计算环境是它的弱项，比如：分布式计算、复杂计算、非数据
向网页中插入透明Flash的方法和技巧 dcj3sjt126com html Web Flash
将 Flash 作品插入网页的时候，我们有时候会需要将它设为透明，有时候我们需要在Flash的背面插入一些漂亮的图片，搭配出漂亮的效果……下面我们介绍一些将Flash插入网页中的一些透明的设置技巧。　　一、Swf透明、无坐标控制　　首先教大家最简单的插入Flash的代码，透明，无坐标控制：　　注意wmode="transparent"是控制Flash是否透明
ios UICollectionView的使用 dcj3sjt126com
UICollectionView的使用有两种方法，一种是继承UICollectionViewController，这个Controller会自带一个UICollectionView；另外一种是作为一个视图放在普通的UIViewController里面。个人更喜欢第二种。下面采用第二种方式简单介绍一下UICollectionView的使用。 1.UIViewController实现委托，代码如
Eos平台java公共逻辑蕃薯耀 Eos平台java公共逻辑 Eos平台 java公共逻辑
Eos平台java公共逻辑 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月1日 17:20:4
SpringMVC4零配置--Web上下文配置【MvcConfig】 hanqunfeng springmvc4
与SpringSecurity的配置类似，spring同样为我们提供了一个实现类WebMvcConfigurationSupport和一个注解@EnableWebMvc以帮助我们减少bean的声明。 applicationContext-MvcConfig.xml  <
解决ie和其他浏览器poi下载excel文件名乱码 jackyrong Excel
使用poi,做传统的excel导出，然后想在浏览器中，让用户选择另存为，保存用户下载的xls文件，这个时候，可能的是在ie下出现乱码（ie,9,10,11),但在firefox,chrome下没乱码，因此必须综合判断，编写一个工具类： /** * * @Title: pro
挥洒泪水的青春 lampcy 编程生活程序员
2015年2月28日，我辞职了，离开了相处一年的触控，转过身--挥洒掉泪水，毅然来到了兄弟连，背负着许多的不解、质疑——”你一个零基础、脑子又不聪明的人，还敢跨行业，选择Unity3D？“，”真是不自量力••••••“，”真是初生牛犊不怕虎•••••“，••••••我只是淡淡一笑，拎着行李----坐上了通向挥洒泪水的青春之地——兄弟连！这就是我青春的分割线，不后悔，只会去用泪水浇灌——已经来到
稳增长之中国股市两点意见-----严控做空，建立涨跌停版停牌重组机制 nannan408
对于股市，我们国家的监管还是有点拼的，但始终拼不过飞流直下的恐慌，为什么呢？笔者首先支持股市的监管。对于股市越管越荡的现象，笔者认为首先是做空力量超过了股市自身的升力，并且对于跌停停牌重组的快速反应还没建立好，上市公司对于股价下跌没有很好的利好支撑。我们来看美国和香港是怎么应对股灾的。美国是靠禁止重要股票做空，在
动态设置iframe高度(iframe高度自适应) Rainbow702 JavaScript iframe contentDocument 高度自适应局部刷新
如果需要对画面中的部分区域作局部刷新，大家可能都会想到使用ajax。但有些情况下，须使用在页面中嵌入一个iframe来作局部刷新。对于使用iframe的情况，发现有一个问题，就是iframe中的页面的高度可能会很高，但是外面页面并不会被iframe内部页面给撑开，如下面的结构： <div id="content"> <div id=&quo
用Rapael做图表 tntxia rap
function drawReport(paper,attr,data){ var width = attr.width; var height = attr.height; var max = 0; &nbs
HTML5 bootstrap2网页兼容（支持IE10以下） xiaoluode html5 bootstrap
<!DOCTYPE html> <html> <head lang="zh-CN"> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge">

按字母分类： A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 其他