Qt数字报阅读器图文版,由于先前PDF版本的阅读器仅显示PDF或JPG大图,可以满足对数字报进行粗略阅读的需求,但不便详细查看新闻和对新闻进行检索,同时如果数字报不存在PDF或JPG大图,则不能进行收录,此图文版阅读器对数字报图文形式新闻进行收录。在比较Solr和Elasticsearch的前提下,决定使用Elasticsearch进行新闻的存储和检索。由于树莓派3B+只有1G内存,无法部署Elasticsearch,改用MySQL,新闻检索改为对引题、主题、副题、图片描述、内容进行全文检索。
由于树莓派上使用apt-get安装的MySQL为MariaDB,不包括Mroonga存储引擎,而MariaDB需要使用Mroonga才能进行全文检索(仅对于中文检索而言),故删除原先apt-get安装的MariaDB,继续使用MariaDB,不再使用MySQL,重新源码编译MariaDB,Mroonga自10.0.15版本后默认可用,只需源码编译后启用即可,同时编译时不需加额外命令,本文使用的是10.1.41版本。
编译完成后指定datadir新目录,新目录的上层目录读写权限必须为755,修改新目录所有者为mysql:mysql。修改datadir过后,每次开机后MariaDB不会正常运行,需手动sudo start再运行一次才行,后文有解决办法。同时记录一个问题,my.cnf中mysql.sock不可修改目录,由于datadir新目录位于挂载的硬盘,打算把mysql.sock也放于datadir新目录下,不知是mysql.sock文件不能和datadir同目录还是由于挂载硬盘原因,重新启动MariaDB后无法成功,/media/pi下原先的挂载目录中,比如nas,只有mysql的空文件夹,nas1里的文件才是原先的文件,但是没有权限访问,root权限手动删除nas并mv nas1 nas,提示设备busy,这时只需修改mysql.sock为默认目录,重启树莓派即可。
接下来需要增加swap分区,防止出现数据库内存不足错误,但是树莓派swap分区只能位于/var/swap,而且在TF卡上增加swap分区会缩短TF卡寿命,此种做法可参考以下第三篇文章。本文使用zram,参考以下第一篇文章的zram超频部分,不同的是如果手动加载模块并执行zram.sh,只能成功加上zram0,因默认设备数为1,可在加载模块时sudo modprobe zram num_devices=4,zram.sh会自动发现cpu核心数,故建议加载模块并下载zram.sh后,直接写入rc.local,重启树莓派。同时记录一个问题,树莓派刚用的时候,内存为927MB,现在内存只有874MB,和通过zram增加的swap分区同样大小。
rc.local如下图
my_start.sh如下图
树莓派超频全攻略 – 树莓派中文站
http://www.52pi.net/archives/1384
zram 简介 - 半月旋空 - CSDN博客
https://blog.csdn.net/longwang155069/article/details/51900031
修改树莓派交换分区 SWAP 的正确姿势 | 树莓派实验室
http://shumeipai.nxez.com/2017/12/18/how-to-modify-raspberry-pi-swap-partition.html
浅谈MySQL和MariaDB区别? - 咻一咻的博客 - CSDN博客
https://blog.csdn.net/qq_37187976/article/details/79117863
Downloads - MariaDB
https://downloads.mariadb.org/
Mariadb修改root密码 - KeithTt - 博客园
https://www.cnblogs.com/keithtt/p/6922378.html
cmake . -DCMAKE_BUILD_TYPE=Release
make
sudo make install
cd /usr/local/mysql/scripts
sudo ./mysql_install_db --user=mysql --basedir=/usr/local/mysql/ --datadir=/media/pi/nas/mysql
sudo cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mariadb
sudo chmod +x /etc/init.d/mariadb
sudo systemctl enable mariadb
sudo vim /etc/init.d/mariadb添加basedir、datadir、conf
sudo cp /usr/local/mysql/support-files/my-huge.cnf /etc/my.cnf
sudo vim /etc/my.cnf修改datadir到/media/pi/nas/mysql,[client]下添加default-character-set=utf8mb4,[mysqld]下添加character-set-server=utf8mb4
sudo mysql -u root
use mysql;
select host, user from user;把除了’root’@’127.0.0.1’的记录都删掉
update user set password=PASSWORD('YourPasswordHere') where user='root';
update user set host='%' where user='root';
flush privileges;
sudo service mariadb restart
启用Mroonga存储引擎,按照下面的文章进行即可。
MariaDB10.2.6启用Mroonga存储引擎用于全文索引-运维人生-51CTO博客
https://blog.51cto.com/jinyan2049/1942333
show engines;
INSTALL SONAME 'ha_mroonga';
CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so';
show engines;
MariaDB/MySQL全文检索的介绍、语法及各场景的检索实例可参考以下四篇文章。
Full-Text Indexes - MariaDB Knowledge Base
https://mariadb.com/kb/en/library/full-text-indexes/
Mroonga - MariaDB Knowledge Base
https://mariadb.com/kb/en/library/mroonga/
MySQL中文全文检索demoSQL - 马丁传奇 - 博客园
https://www.cnblogs.com/martinzhang/p/3220345.html
MySQL全文检索fulltext和中日韩文解析插件ngram使用笔记 - 蛙鳜鸡鹳狸猿 - CSDN博客
https://blog.csdn.net/sweeper_freedoman/article/details/82847754
mainwindow.cpp
#include "mainwindow.h"
#include "ui_mainwindow.h"
#include "readepaperwidget.h"
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
MainWindow::MainWindow(QWidget *parent) :
QMainWindow(parent),
ui(new Ui::MainWindow)
{
ui->setupUi(this);
QFont font;
font.setPixelSize(16);
setFont(font);
setWindowTitle(QStringLiteral("数字报阅读器 - 图文版"));
ui->dateEdit_read_start->setCalendarPopup(true);
ui->dateEdit_read_end->setCalendarPopup(true);
ui->dateEdit_read_start->setDate(QDate::currentDate());
ui->dateEdit_read_end->setDate(QDate::currentDate());
ui->dateEdit_search_start->setCalendarPopup(true);
ui->dateEdit_search_end->setCalendarPopup(true);
ui->dateEdit_search_start->setDate(QDate::currentDate());
ui->dateEdit_search_end->setDate(QDate::currentDate());
QStringList headers;
headers << QStringLiteral("名称") << QStringLiteral("日期") << QStringLiteral("版面") << QStringLiteral("主题");
ui->treeWidget_read_result->setColumnCount(headers.size());
ui->treeWidget_read_result->setHeaderLabels(headers);
connect(ui->treeWidget_read_result, SIGNAL(itemClicked(QTreeWidgetItem*,int)), this, SLOT(readEpaper(QTreeWidgetItem*,int)));
for (int i = 0; i < headers.size(); ++i)
{
ui->treeWidget_read_result->headerItem()->setTextAlignment(i, Qt::AlignHCenter | Qt::AlignVCenter);
}
ui->lineEdit_keyword->setPlaceholderText(QStringLiteral("多个关键词用空格分开"));
ui->comboBox_page_size->setCurrentText("50");
ui->tableWidget_search_result->setEditTriggers(QAbstractItemView::NoEditTriggers);
ui->tableWidget_search_result->setSelectionBehavior(QAbstractItemView::SelectRows);
ui->tableWidget_search_result->setSelectionMode(QAbstractItemView::SingleSelection);
ui->tableWidget_search_result->verticalHeader()->setVisible(false);
ui->tableWidget_search_result->horizontalHeader()->setHighlightSections(false);
ui->tableWidget_search_result->horizontalHeader()->setStretchLastSection(true);
headers.clear();
headers << QStringLiteral("名称") << QStringLiteral("日期") << QStringLiteral("版面") << QStringLiteral("引题") << QStringLiteral("主题") << QStringLiteral("副题") << QStringLiteral("作者");
ui->tableWidget_search_result->setColumnCount(headers.size());
ui->tableWidget_search_result->setHorizontalHeaderLabels(headers);
for (int i = 0; i < headers.size(); ++i)
{
ui->tableWidget_search_result->horizontalHeader()->setSectionResizeMode(i, QHeaderView::ResizeToContents);
}
connect(ui->tableWidget_search_result, SIGNAL(doubleClicked(QModelIndex)), this, SLOT(readEpaperSearch(QModelIndex)));
QTimer::singleShot(0, this, SLOT(initCheckBox()));
}
MainWindow::~MainWindow()
{
delete ui;
}
void MainWindow::getEpaperName()
{
if (!mEpaperNameLst.isEmpty())
{
return;
}
QDir dir("Z:\\");
if (!dir.exists() || !dir.isReadable())
{
return;
}
dir.setFilter(QDir::Dirs | QDir::NoSymLinks | QDir::NoDotAndDotDot);
dir.setSorting(QDir::Name);
QFileInfoList fileInfoLst = dir.entryInfoList();
if (fileInfoLst.isEmpty())
{
return;
}
foreach (QFileInfo fileInfo, fileInfoLst)
{
QString name = fileInfo.fileName();
mEpaperNameLst.append(name);
}
QLocale cn(QLocale::Chinese);
QCollator collator(cn);
std::sort(mEpaperNameLst.begin(), mEpaperNameLst.end(), collator);
}
void MainWindow::initReadCheckBox()
{
getEpaperName();
foreach (QString epaperName, mEpaperNameLst)
{
QCheckBox *checkBox = new QCheckBox(QStringLiteral("%1").arg(epaperName));
connect(checkBox, SIGNAL(toggled(bool)), this, SLOT(showCheckBoxRead(bool)));
mCheckBoxLstRead.append(checkBox);
}
QVBoxLayout *mainLayout = new QVBoxLayout;
QWidget *widget = new QWidget;
mSelectAllCheckBoxRead = new QCheckBox(QStringLiteral("全选(未选择)"));
QHBoxLayout *layout = new QHBoxLayout;
layout->addWidget(mSelectAllCheckBoxRead);
mainLayout->addLayout(layout);
connect(mSelectAllCheckBoxRead, SIGNAL(toggled(bool)), this, SLOT(selectAllRead(bool)));
int size = mEpaperNameLst.size();
int column = 4;
int row = size / column + 1;
for (int i = 1; i <= row; ++i)
{
QHBoxLayout *layout = new QHBoxLayout;
int start = (i - 1) * column;
int end = i * column - 1;
for (int j = start; j <= end; ++j)
{
if (j < size)
{
layout->addWidget(mCheckBoxLstRead.at(j));
}
}
mainLayout->addLayout(layout);
}
widget->setLayout(mainLayout);
ui->scrollArea->setFrameShape(QFrame::NoFrame);
ui->scrollArea->setWidget(widget);
}
void MainWindow::initSearchCheckBox()
{
getEpaperName();
foreach (QString epaperName, mEpaperNameLst)
{
QCheckBox *checkBox = new QCheckBox(QStringLiteral("%1").arg(epaperName));
connect(checkBox, SIGNAL(toggled(bool)), this, SLOT(showCheckBoxSearch(bool)));
mCheckBoxLstSearch.append(checkBox);
}
QVBoxLayout *mainLayout = new QVBoxLayout;
QWidget *widget = new QWidget;
mSelectAllCheckBoxSearch = new QCheckBox(QStringLiteral("全选(未选择)"));
QHBoxLayout *layout = new QHBoxLayout;
layout->addWidget(mSelectAllCheckBoxSearch);
mainLayout->addLayout(layout);
connect(mSelectAllCheckBoxSearch, SIGNAL(toggled(bool)), this, SLOT(selectAllSearch(bool)));
int size = mEpaperNameLst.size();
int column = 4;
int row = size / column + 1;
for (int i = 1; i <= row; ++i)
{
QHBoxLayout *layout = new QHBoxLayout;
int start = (i - 1) * column;
int end = i * column - 1;
for (int j = start; j <= end; ++j)
{
if (j < size)
{
layout->addWidget(mCheckBoxLstSearch.at(j));
}
}
mainLayout->addLayout(layout);
}
widget->setLayout(mainLayout);
ui->scrollArea_2->setFrameShape(QFrame::NoFrame);
ui->scrollArea_2->setWidget(widget);
}
bool MainWindow::informationMessageBox(const QString& title, const QString& text, bool isOnlyOk)
{
QMessageBox msgBox(this);
msgBox.setFont(this->font());
msgBox.setIcon(QMessageBox::Information);
msgBox.setWindowTitle(title);
msgBox.setText(text);
if (isOnlyOk)
{
msgBox.setStandardButtons(QMessageBox::Ok);
msgBox.setButtonText(QMessageBox::Ok, QStringLiteral("确定"));
}
else
{
msgBox.setStandardButtons(QMessageBox::Ok | QMessageBox::Cancel);
msgBox.setButtonText(QMessageBox::Ok, QStringLiteral("确定"));
msgBox.setButtonText(QMessageBox::Cancel, QStringLiteral("取消"));
}
return (msgBox.exec() == QMessageBox::Ok);
}
void MainWindow::showSearchData()
{
int rowCount = ui->tableWidget_search_result->rowCount();
for (int i = rowCount; i > 0; --i)
{
ui->tableWidget_search_result->removeRow(0);
}
if (mSearchDataLst.isEmpty())
{
return;
}
mTotalCount = mSearchDataLst.size();
mTotalPage = mTotalCount / mPageSize + 1;
ui->label_page_tip->setText(QStringLiteral("共%1条结果,第%2页,共%3页")
.arg(mTotalCount).arg(mCurrentPage).arg(mTotalPage));
QList tmpLst;
if (mCurrentPage == mTotalPage)
{
for (int i = (mCurrentPage - 1) * mPageSize; i < mTotalCount; ++i)
{
tmpLst.append(mSearchDataLst.at(i));
}
}
else
{
for (int i = (mCurrentPage - 1) * mPageSize; i <= (mCurrentPage * mPageSize - 1); ++i)
{
tmpLst.append(mSearchDataLst.at(i));
}
}
foreach (QStringList searchDataLst, tmpLst)
{
int rowCount = ui->tableWidget_search_result->rowCount();
ui->tableWidget_search_result->insertRow(rowCount);
QTableWidgetItem *itemName = new QTableWidgetItem(QStringLiteral("%1")
.arg(searchDataLst[0]));
QTableWidgetItem *itemDate = new QTableWidgetItem(QStringLiteral("%1")
.arg(searchDataLst[1]));
QTableWidgetItem *itemLayout = new QTableWidgetItem(QStringLiteral("%1")
.arg(searchDataLst[2]));
QTableWidgetItem *itemPreTitle = new QTableWidgetItem(QStringLiteral("%1")
.arg(searchDataLst[3]));
QTableWidgetItem *itemTitle = new QTableWidgetItem(QStringLiteral("%1")
.arg(searchDataLst[4]));
QTableWidgetItem *itemSubTitle = new QTableWidgetItem(QStringLiteral("%1")
.arg(searchDataLst[5]));
QTableWidgetItem *itemAuthor = new QTableWidgetItem(QStringLiteral("%1")
.arg(searchDataLst[6]));
ui->tableWidget_search_result->setItem(rowCount, 0, itemName);
ui->tableWidget_search_result->setItem(rowCount, 1, itemDate);
ui->tableWidget_search_result->setItem(rowCount, 2, itemLayout);
ui->tableWidget_search_result->setItem(rowCount, 3, itemPreTitle);
ui->tableWidget_search_result->setItem(rowCount, 4, itemTitle);
ui->tableWidget_search_result->setItem(rowCount, 5, itemSubTitle);
ui->tableWidget_search_result->setItem(rowCount, 6, itemAuthor);
}
}
void MainWindow::initCheckBox()
{
initReadCheckBox();
initSearchCheckBox();
}
void MainWindow::selectAllRead(bool ok)
{
if (ok)
{
foreach (QCheckBox *checkBox, mCheckBoxLstRead)
{
checkBox->setChecked(true);
}
}
else
{
foreach (QCheckBox *checkBox, mCheckBoxLstRead)
{
checkBox->setChecked(false);
}
}
}
void MainWindow::selectAllSearch(bool ok)
{
if (ok)
{
foreach (QCheckBox *checkBox, mCheckBoxLstSearch)
{
checkBox->setChecked(true);
}
}
else
{
foreach (QCheckBox *checkBox, mCheckBoxLstSearch)
{
checkBox->setChecked(false);
}
}
}
void MainWindow::showCheckBoxRead(bool ok)
{
Q_UNUSED(ok);
int count = 0;
foreach (QCheckBox *checkBox, mCheckBoxLstRead)
{
if (checkBox->isChecked())
{
count += 1;
}
}
if (count == 0)
{
mSelectAllCheckBoxRead->setText(QStringLiteral("全选(未选择)"));
}
else
{
mSelectAllCheckBoxRead->setText(QStringLiteral("全选(已选择%1个)").arg(count));
}
}
void MainWindow::showCheckBoxSearch(bool ok)
{
Q_UNUSED(ok);
int count = 0;
foreach (QCheckBox *checkBox, mCheckBoxLstSearch)
{
if (checkBox->isChecked())
{
count += 1;
}
}
if (count == 0)
{
mSelectAllCheckBoxSearch->setText(QStringLiteral("全选(未选择)"));
}
else
{
mSelectAllCheckBoxSearch->setText(QStringLiteral("全选(已选择%1个)").arg(count));
}
}
void MainWindow::on_read_pushButton_clicked()
{
if (ui->dateEdit_read_start->date() > ui->dateEdit_read_end->date())
{
return;
}
QStringList paperNameLst;
foreach (QCheckBox *checkBox, mCheckBoxLstRead)
{
if (checkBox->isChecked())
{
paperNameLst.append(checkBox->text());
}
}
if (paperNameLst.isEmpty())
{
return;
}
QString retStr = mDBHelper.getConnectDB();
if (!retStr.isEmpty())
{
informationMessageBox(QStringLiteral("提示"), QStringLiteral("数据库连接失败:\n%1").arg(retStr));
return;
}
ui->read_pushButton->setEnabled(false);
ui->treeWidget_read_result->clear();
QStringList paperDateLst;
QDate startDate = ui->dateEdit_read_start->date();
QDate endDate = ui->dateEdit_read_end->date();
while (startDate <= endDate)
{
paperDateLst.append(startDate.toString("yyyy-MM-dd"));
startDate = startDate.addDays(1);
}
QProgressDialog progress(this);
progress.setFont(this->font());
progress.setWindowTitle(QStringLiteral("数字报阅读器 - 图文版"));
progress.setWindowFlags(windowFlags() & (~Qt::WindowContextHelpButtonHint) & (~Qt::WindowMinMaxButtonsHint) & (~Qt::WindowCloseButtonHint));
progress.setLabelText(QStringLiteral("处理中..."));
progress.setRange(0, paperDateLst.size() * paperNameLst.size());
progress.setModal(true);
progress.setCancelButtonText(QStringLiteral("取消"));
progress.setMinimumDuration(0);
connect(&progress, SIGNAL(canceled()), this, SLOT(progressCanceled()));
int count = 1;
QStringList columnLst;
columnLst << "paper_layout" << "title";
foreach (QString paperName, paperNameLst)
{
foreach (QString paperDate, paperDateLst)
{
qApp->processEvents(QEventLoop::ExcludeUserInputEvents);
QString str = QStringLiteral("select columns from t_epaper where paper_date = '%1' and paper_name = '%2' order by seq_num;").arg(paperDate).arg(paperName);
QTreeWidgetItem *name = new QTreeWidgetItem(ui->treeWidget_read_result);
name->setText(0, paperName);
ui->treeWidget_read_result->addTopLevelItem(name);
QTreeWidgetItem *date = new QTreeWidgetItem(name);
date->setText(1, paperDate);
QStringList layoutLst;
QList retLst = mDBHelper.getSqlSelect(str, columnLst);
foreach (QStringList ret, retLst)
{
if (!layoutLst.contains(ret[0]))
{
layoutLst.append(ret[0]);
}
}
foreach (QString layout, layoutLst)
{
QTreeWidgetItem *paperLayout = new QTreeWidgetItem(date);
paperLayout->setText(2, layout);
foreach (QStringList ret, retLst)
{
if (ret[0] == layout)
{
QTreeWidgetItem *paperTitle = new QTreeWidgetItem(paperLayout);
paperTitle->setText(3, ret[1]);
}
}
}
progress.setValue(count++);
}
}
ui->read_pushButton->setEnabled(true);
}
void MainWindow::on_search_pushButton_clicked()
{
if (ui->lineEdit_keyword->text().isEmpty())
{
return;
}
if (ui->dateEdit_search_start->date() > ui->dateEdit_search_end->date())
{
return;
}
QStringList paperNameLst;
foreach (QCheckBox *checkBox, mCheckBoxLstSearch)
{
if (checkBox->isChecked())
{
paperNameLst.append(checkBox->text());
}
}
if (paperNameLst.isEmpty())
{
return;
}
QString retStr = mDBHelper.getConnectDB();
if (!retStr.isEmpty())
{
informationMessageBox(QStringLiteral("提示"), QStringLiteral("数据库连接失败:\n%1").arg(retStr));
return;
}
ui->search_pushButton->setEnabled(false);
QStringList paperDateLst;
QDate startDate = ui->dateEdit_search_start->date();
QDate endDate = ui->dateEdit_search_end->date();
while (startDate <= endDate)
{
paperDateLst.append(startDate.toString("yyyy-MM-dd"));
startDate = startDate.addDays(1);
}
QString keyWord = ui->lineEdit_keyword->text();
QString keyWordRelationship = ui->comboBox_keyword_relation->currentText();
QString searchRange = ui->comboBox_search_range->currentText();
QString against;
if (keyWordRelationship == QStringLiteral("全词"))
{
against = QStringLiteral("'\"%1\"' IN BOOLEAN MODE").arg(keyWord);
}
else if (keyWordRelationship == QStringLiteral("并且"))
{
QStringList wordLst = keyWord.split(" ");
against = "'";
foreach (QString word, wordLst)
{
if (!word.isEmpty())
{
against += QStringLiteral("+%1 ").arg(word);
}
}
against.remove(against.lastIndexOf(" "), 1);
against += "' IN BOOLEAN MODE";
}
else if (keyWordRelationship == QStringLiteral("或者"))
{
QStringList wordLst = keyWord.split(" ");
against = "'";
foreach (QString word, wordLst)
{
if (!word.isEmpty())
{
against += QStringLiteral("%1 ").arg(word);
}
}
against.remove(against.lastIndexOf(" "), 1);
against += "' IN BOOLEAN MODE";
}
QString match;
if (searchRange == QStringLiteral("全部"))
{
match = "pre_title, title, sub_title, image_text, content";
}
else if (searchRange == QStringLiteral("仅标题"))
{
match = "pre_title, title, sub_title";
}
else if (searchRange == QStringLiteral("仅内容"))
{
match = "image_text, content";
}
QProgressDialog progress(this);
progress.setFont(this->font());
progress.setWindowTitle(QStringLiteral("数字报阅读器 - 图文版"));
progress.setWindowFlags(windowFlags() & (~Qt::WindowContextHelpButtonHint) & (~Qt::WindowMinMaxButtonsHint) & (~Qt::WindowCloseButtonHint));
progress.setLabelText(QStringLiteral("处理中..."));
progress.setRange(0, paperDateLst.size() * paperNameLst.size());
progress.setModal(true);
progress.setCancelButtonText(QStringLiteral("取消"));
progress.setMinimumDuration(0);
connect(&progress, SIGNAL(canceled()), this, SLOT(progressCanceled()));
int count = 1;
QStringList columnLst;
columnLst << "paper_layout" << "pre_title" << "title" << "sub_title" << "author" << QStringLiteral("match(%1) against(%2) as relevance").arg(match).arg(against);
mSearchDataLst.clear();
foreach (QString paperName, paperNameLst)
{
foreach (QString paperDate, paperDateLst)
{
qApp->processEvents(QEventLoop::ExcludeUserInputEvents);
QString str = QStringLiteral("select columns from t_epaper where match(%1) against(%2) and paper_date = '%3' and paper_name = '%4' order by relevance desc, paper_date desc, seq_num asc;").arg(match).arg(against).arg(paperDate).arg(paperName);
QList retLst = mDBHelper.getSqlSelect(str, columnLst);
foreach (QStringList ret, retLst)
{
ret.insert(0, paperName);
ret.insert(1, paperDate);
mSearchDataLst.append(ret);
}
progress.setValue(count++);
}
}
ui->search_pushButton->setEnabled(true);
on_pushButton_first_page_clicked();
}
void MainWindow::readEpaper(QTreeWidgetItem *item, int column)
{
Q_UNUSED(column);
if (!item->text(3).isEmpty())
{
QTreeWidgetItem *layout = item->parent();
QTreeWidgetItem *date = layout->parent();
QTreeWidgetItem *name = date->parent();
QString paperTitle = item->text(3);
QString paperLayout = layout->text(2);
QString paperDate = date->text(1);
QString paperName = name->text(0);
ReadEpaperWidget *widget = new ReadEpaperWidget(paperName, paperDate, paperLayout, paperTitle);
widget->showMaximized();
}
}
void MainWindow::on_pushButton_first_page_clicked()
{
mCurrentPage = 1;
showSearchData();
}
void MainWindow::on_pushButton_previous_page_clicked()
{
if (mCurrentPage == 1)
{
return;
}
mCurrentPage -= 1;
showSearchData();
}
void MainWindow::on_pushButton_next_page_clicked()
{
if (mCurrentPage == mTotalPage)
{
return;
}
mCurrentPage += 1;
showSearchData();
}
void MainWindow::on_pushButton_last_page_clicked()
{
mCurrentPage = mTotalPage;
showSearchData();
}
void MainWindow::on_comboBox_page_size_currentTextChanged(const QString &arg1)
{
mPageSize = arg1.toInt();
on_pushButton_first_page_clicked();
}
void MainWindow::readEpaperSearch(QModelIndex index)
{
Q_UNUSED(index);
int row = ui->tableWidget_search_result->currentRow();
QString paperName = ui->tableWidget_search_result->item(row, 0)->text();
QString paperDate = ui->tableWidget_search_result->item(row, 1)->text();
QString paperLayout = ui->tableWidget_search_result->item(row, 2)->text();
QString paperTitle = ui->tableWidget_search_result->item(row, 4)->text();
QStringList wordLst = ui->lineEdit_keyword->text().split(" ");
QStringList highLightLst;
foreach (QString word, wordLst)
{
if (!word.isEmpty())
{
highLightLst.append(word);
}
}
if (!paperName.isEmpty() && !paperDate.isEmpty() && !paperLayout.isEmpty() &&!paperTitle.isEmpty())
{
ReadEpaperWidget *widget = new ReadEpaperWidget(paperName, paperDate, paperLayout, paperTitle, true, highLightLst);
widget->showMaximized();
}
}
void MainWindow::progressCanceled()
{
// dummy
}
readepaperwidget.cpp
#include "readepaperwidget.h"
#include "dbhelper.h"
#include
#include
#include
#include
#include
#include
#include
#include
ReadEpaperWidget::ReadEpaperWidget(const QString &paperName, const QString &paperDate, const QString &paperLayout, const QString &paperTitle, bool highLight, QStringList highLightLst)
{
QFont font;
font.setPixelSize(16);
setFont(font);
setWindowTitle(QStringLiteral("%1 %2").arg(paperName).arg(QDate::fromString(paperDate, "yyyy-MM-dd").toString(QStringLiteral("yyyy年M月d日"))));
setAttribute(Qt::WA_DeleteOnClose);
mTextBrowser = new QTextBrowser(this);
QVBoxLayout *mainLayout = new QVBoxLayout(this);
mainLayout->addWidget(mTextBrowser);
setLayout(mainLayout);
mTextBrowser->setContextMenuPolicy(Qt::CustomContextMenu);
connect(mTextBrowser, SIGNAL(customContextMenuRequested(QPoint)), this, SLOT(menuDisplayed(QPoint)));
mPaperName = paperName;
mPaperDate = paperDate;
mPaperLayout = paperLayout;
mPaperTitle = paperTitle;
mIsHighLight = highLight;
mHighLightLst = highLightLst;
QTimer::singleShot(0, this, SLOT(showArticle()));
}
ReadEpaperWidget::~ReadEpaperWidget()
{
}
bool ReadEpaperWidget::informationMessageBox(const QString &title, const QString &text, bool isOnlyOk)
{
QMessageBox msgBox(this);
msgBox.setFont(this->font());
msgBox.setIcon(QMessageBox::Information);
msgBox.setWindowTitle(title);
msgBox.setText(text);
if (isOnlyOk)
{
msgBox.setStandardButtons(QMessageBox::Ok);
msgBox.setButtonText(QMessageBox::Ok, QStringLiteral("确定"));
}
else
{
msgBox.setStandardButtons(QMessageBox::Ok | QMessageBox::Cancel);
msgBox.setButtonText(QMessageBox::Ok, QStringLiteral("确定"));
msgBox.setButtonText(QMessageBox::Cancel, QStringLiteral("取消"));
}
return (msgBox.exec() == QMessageBox::Ok);
}
void ReadEpaperWidget::showArticle()
{
DBHelper dbHelper;
QString retStr = dbHelper.getConnectDB();
if (!retStr.isEmpty())
{
informationMessageBox(QStringLiteral("提示"), QStringLiteral("数据库连接失败:\n%1").arg(retStr));
return;
}
QStringList columnLst;
columnLst << "pre_title" << "sub_title" << "author" << "html_url" << "image_text" << "content";
QString str = QStringLiteral("select columns from t_epaper where paper_date = '%1' and paper_name = '%2' and paper_layout = '%3' and title = '%4' order by update_time desc limit 1;").arg(mPaperDate).arg(mPaperName).arg(mPaperLayout).arg(mPaperTitle);
QList retLst = dbHelper.getSqlSelect(str, columnLst);
foreach (QStringList ret, retLst)
{
QString preTitle = ret[0];
QString subTitle = ret[1];
QString author = ret[2];
QString htmlUrl = ret[3];
QString imageText = ret[4];
QString content = ret[5];
mHtmlUrl = htmlUrl;
QString html;
html.append(QStringLiteral("%1 %2 %3
").arg(mPaperName).arg(QDate::fromString(mPaperDate, "yyyy-MM-dd").toString(QStringLiteral("yyyy年M月d日"))).arg(mPaperLayout));
html.append(QStringLiteral("网页链接:%1
").arg(htmlUrl));
html.append(QStringLiteral("引题:%1
").arg(preTitle));
html.append(QStringLiteral("主题:%1
").arg(mPaperTitle));
html.append(QStringLiteral("副题:%1
").arg(subTitle));
html.append(QStringLiteral("作者:%1
").arg(author));
QStringList imageLst = imageText.split("; ");
foreach (QString image, imageLst)
{
if (!image.isEmpty())
{
QString imageFile = image.split("|")[0]
.replace(QStringLiteral("/home/pi/数字报/"), "Z:/");
QString imageText = image.split("|")[1];
html.append(QStringLiteral("
").arg(imageFile));
html.append(QStringLiteral("%1
").arg(imageText.replace("", "").replace("
", "")));
}
}
html.append(QStringLiteral("
%1").arg(content));
if (mIsHighLight)
{
foreach (QString highLight, mHighLightLst)
{
html.replace(highLight, QStringLiteral("%1").arg(highLight));
}
}
mTextBrowser->setHtml(html);
QTextCursor textCursor(mTextBrowser->textCursor());
textCursor.movePosition(QTextCursor::Start);
mTextBrowser->setTextCursor(textCursor);
}
}
void ReadEpaperWidget::menuDisplayed(const QPoint &pos)
{
Q_UNUSED(pos);
QMenu *menu = new QMenu(this);
QAction *copy = new QAction(QStringLiteral("复制"), this);
connect(copy, SIGNAL(triggered(bool)), mTextBrowser, SLOT(copy()));
QAction *selectAll = new QAction(QStringLiteral("选择全部"), this);
connect(selectAll, SIGNAL(triggered(bool)), mTextBrowser, SLOT(selectAll()));
QAction *copyPlainText = new QAction(QStringLiteral("复制纯文本"), this);
connect(copyPlainText, SIGNAL(triggered(bool)), this, SLOT(copyPlainText()));
QAction *copyHtmlUrl = new QAction(QStringLiteral("复制网页链接"), this);
connect(copyHtmlUrl, SIGNAL(triggered(bool)), this, SLOT(copyHtmlUrl()));
menu->addAction(copy);
menu->addAction(selectAll);
menu->addSeparator();
menu->addAction(copyPlainText);
menu->addAction(copyHtmlUrl);
menu->exec(QCursor::pos());
delete copy;
delete selectAll;
delete copyPlainText;
delete copyHtmlUrl;
delete menu;
}
void ReadEpaperWidget::copyHtmlUrl()
{
QClipboard *clipboard = QApplication::clipboard();
clipboard->setText(mHtmlUrl);
}
void ReadEpaperWidget::copyPlainText()
{
QClipboard *clipboard = QApplication::clipboard();
clipboard->setText(mTextBrowser->toPlainText());
}
epaper_v2.py
#!/usr/bin/python3
# coding: utf-8
from urllib import request
from urllib import error
from urllib.parse import quote
import re
import threading
import os
import sys
import string
import time
import datetime
import random
import schedule
import fcntl
import queue
import pymysql
import hashlib
from DBUtils.PooledDB import PooledDB
import socket
socket.setdefaulttimeout(20.0)
pdf_dir = "/home/pi/数字报/"
epapers_done_dict = {}
download_image_queue = queue.Queue()
mysql_insert_queue = queue.Queue()
download_image_error_queue = queue.Queue()
mysql_insert_error_queue = queue.Queue()
download_image_error_dict = {}
mysql_insert_error_dict = {}
download_image_error_dict_lock = threading.Lock()
mysql_insert_error_dict_lock = threading.Lock()
mysql_insert_log_file_dir_lock = threading.Lock()
download_image_thread_lst = []
mysql_insert_thread_lst = []
download_image_error_thread_lst = []
mysql_insert_error_thread_lst = []
mysql_pool = PooledDB(pymysql, 10, host="host", user="user", password="password", database="database", port=3306)
def my_escape_string(src_str):
return src_str.replace("\\", "\\\\").replace("'", "\\'").replace("\"", "\\\"")\
.replace("\n", "\\n").replace("\r", "\\r").replace("\0", "\\0")\
.replace(" ", " ").replace(" ", " ")\
.replace("\u3000", " ").replace("\u0020", " ").replace("\xa0", " ")\
.replace("\u00a0", " ").replace(",", ",").replace(";", ";")
def my_print(paper_name, paper_date, paper_info):
log_file_dir = "/home/pi/python_svn/%s%s-logs/%s/" %(paper_date.split("-")[0], paper_date.split("-")[1], paper_date)
if not os.path.isdir(log_file_dir):
os.makedirs(log_file_dir)
log_file_name = "%s%s.txt" %(log_file_dir, paper_name)
with open(log_file_name, "a", encoding="utf-8") as f:
fcntl.flock(f, fcntl.LOCK_EX)
date_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print("[%s] [%s %s] %s\n" %(date_time, paper_name, paper_date, paper_info))
f.write("[%s] [%s %s] %s\n\n" %(date_time, paper_name, paper_date, paper_info))
fcntl.flock(f, fcntl.LOCK_UN)
def download_image(image_file_name, image_url):
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
req = request.Request(url=quote(image_url, safe=string.printable), headers=headers)
image = "%s %s" %(image_file_name, image_url)
try:
total = 0
with request.urlopen(req) as f:
with open("%s" %(image_file_name), "wb") as f2:
while True:
buff = f.read(1024 * 100)
if not buff:
break
f2.write(buff)
total = total + 1024 * 100
sys.stdout.write("download %d KB\r" %(total / 1024))
sys.stdout.flush()
print("download %s OK [%d KB]\n" %(image, total / 1024))
except error.HTTPError as e:
if e.code == 404:
print("download %s ERROR [404]\n" %(image))
except Exception:
md5_image = hashlib.md5(image.encode(encoding="UTF-8")).hexdigest()
download_image_error_dict_lock.acquire()
if md5_image not in download_image_error_dict.keys():
download_image_error_dict[md5_image] = 1
download_image_error_queue.put(image)
elif download_image_error_dict[md5_image] <= 2:
download_image_error_dict[md5_image] += 1
download_image_error_queue.put(image)
download_image_error_dict_lock.release()
print("download %s ERROR\n" %(image))
def download_image_thread():
while True:
image_lst = download_image_queue.get()
image_file_name = image_lst[0]
image_url = image_lst[1]
image = "%s %s" %(image_file_name, image_url)
print("%s: %s\n" %(threading.current_thread().name, image))
download_image(image_file_name, image_url)
download_image_queue.task_done()
def mysql_insert_thread():
while True:
sql_data = mysql_insert_queue.get()
info_lst = sql_data.split(", ")
paper_name = str(info_lst[0])[1:-1]
paper_date = str(info_lst[1])[1:-1]
paper_layout = str(info_lst[2])[1:-1]
pre_title = str(info_lst[3])[1:-1]
title = str(info_lst[4])[1:-1]
sub_title = str(info_lst[5])[1:-1]
author = str(info_lst[6])[1:-1]
image_text = str(info_lst[7])[1:-1]
content = str(info_lst[8])[1:-1]
html_url = str(info_lst[9])[1:-1]
seq_num = int(info_lst[10])
info = "%s, %s, %s, %s, %s, %s" %(paper_name, paper_date, paper_layout, pre_title, title, sub_title)
md5_str = hashlib.md5(info.encode(encoding="UTF-8")).hexdigest()
md5_url = hashlib.md5(html_url.encode(encoding="UTF-8")).hexdigest()
date_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
sql = "insert ignore into t_epaper values('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', %d, '%s');" %(md5_str, md5_url, paper_name, paper_date, paper_layout, pre_title, title, sub_title, author, image_text, content, html_url, seq_num, date_time)
print("%s: %s\n" %(threading.current_thread().name, info))
date = time.strftime("%Y-%m-%d", time.localtime())
log_file_dir = "/home/pi/python_svn/mysql_insert-logs/%s/" %(date)
mysql_insert_log_file_dir_lock.acquire()
if not os.path.isdir(log_file_dir):
os.makedirs(log_file_dir)
mysql_insert_log_file_dir_lock.release()
log_file_name = "%s%s.txt" %(log_file_dir, threading.current_thread().name)
with open(log_file_name, "a", encoding="utf-8") as f:
fcntl.flock(f, fcntl.LOCK_EX)
date_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
f.write("[%s] %s\n\n" %(date_time, sql))
fcntl.flock(f, fcntl.LOCK_UN)
try:
mysql_conn = mysql_pool.connection()
mysql_conn.ping(reconnect=True)
with mysql_conn.cursor() as cursor:
cursor.execute(sql)
mysql_conn.commit()
except Exception as e:
mysql_conn.rollback()
md5_sql = hashlib.md5(sql.encode(encoding="UTF-8")).hexdigest()
mysql_insert_error_dict_lock.acquire()
if md5_sql not in mysql_insert_error_dict.keys():
mysql_insert_error_dict[md5_sql] = 1
mysql_insert_error_queue.put("%s#%s" %(sql, e))
elif mysql_insert_error_dict[md5_sql] <= 2:
mysql_insert_error_dict[md5_sql] += 1
mysql_insert_error_queue.put("%s#%s" %(sql, e))
mysql_insert_error_dict_lock.release()
print("%s: %s %s\n" %(threading.current_thread().name, info, e))
else:
print("%s: %s OK\n" %(threading.current_thread().name, info))
finally:
mysql_conn.close()
mysql_insert_queue.task_done()
def download_image_error_thread():
while True:
image = download_image_error_queue.get()
image_lst = image.split(" ")
image_file_name = image_lst[0]
image_url = image_lst[1]
print("%s: %s\n" %(threading.current_thread().name, image))
if os.path.isfile(image_file_name) and os.path.getsize(image_file_name):
os.remove(image_file_name)
download_image(image_file_name, image_url)
download_image_error_queue.task_done()
def mysql_insert_error_thread():
while True:
sql_data = mysql_insert_error_queue.get()
sql = sql_data.split(");#")[0] + ");"
err = sql_data.split(");#")[1]
date = time.strftime("%Y-%m-%d", time.localtime())
log_file_dir = "/home/pi/python_svn/mysql_insert-logs/%s/" %(date)
mysql_insert_log_file_dir_lock.acquire()
if not os.path.isdir(log_file_dir):
os.makedirs(log_file_dir)
mysql_insert_log_file_dir_lock.release()
log_file_name = "%s%s.txt" %(log_file_dir, threading.current_thread().name)
with open(log_file_name, "a", encoding="utf-8") as f:
fcntl.flock(f, fcntl.LOCK_EX)
date_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
f.write("[%s] %s\n%s\n\n" %(date_time, sql, err))
fcntl.flock(f, fcntl.LOCK_UN)
try:
mysql_conn = mysql_pool.connection()
mysql_conn.ping(reconnect=True)
with mysql_conn.cursor() as cursor:
cursor.execute(sql)
mysql_conn.commit()
except Exception as e:
mysql_conn.rollback()
md5_sql = hashlib.md5(sql.encode(encoding="UTF-8")).hexdigest()
mysql_insert_error_dict_lock.acquire()
if md5_sql not in mysql_insert_error_dict.keys():
mysql_insert_error_dict[md5_sql] = 1
mysql_insert_error_queue.put("%s#%s" %(sql, e))
elif mysql_insert_error_dict[md5_sql] <= 2:
mysql_insert_error_dict[md5_sql] += 1
mysql_insert_error_queue.put("%s#%s" %(sql, e))
mysql_insert_error_dict_lock.release()
print("%s: %s\n" %(threading.current_thread().name, e))
else:
print("%s: OK\n" %(threading.current_thread().name))
finally:
mysql_conn.close()
mysql_insert_error_queue.task_done()
def process_image_lst(paper_name, date, detail_lst, sql_data, image_url):
image_file_dir = "%s%s/IMG/%s/%s%s%s/" %(pdf_dir, paper_name, date.split("-")[0], date.split("-")[0], date.split("-")[1], date.split("-")[2])
if not os.path.isdir(image_file_dir):
os.makedirs(image_file_dir)
image_lst = []
sql_data += ", '"
for i in range(0, len(detail_lst)):
sql_image = ""
if type(detail_lst[i]) is tuple:
image_name = detail_lst[i][0]
image_text = my_escape_string(detail_lst[i][1])
url = image_url %(image_name)
image_file_name = "%s%s_%s" %(image_file_dir, url.split("/")[-2].replace("-", "").replace(".", ""), url.split("/")[-1])
sql_image = "%s|%s" %(image_file_name, image_text)
if os.path.isfile(image_file_name) and os.path.getsize(image_file_name):
print("download %s %s OK, File Exists\n" %(image_file_name, url))
else:
lst = [image_file_name, url]
image_lst.append(lst)
sql_data += "%s" %(sql_image)
if i != (len(detail_lst) - 1):
sql_data += "; "
sql_data += "'"
if image_lst:
for image in image_lst:
download_image_queue.put(image)
return sql_data
def mysql_insert(sql_data):
mysql_insert_queue.put(sql_data)
def get_data_start_end(html, start_tag, end_tag, reg_str):
start = False
match_lst = []
for line in html:
if line.find(start_tag) != -1:
start = True
continue
if start:
if line.find(end_tag) != -1:
break
m = re.findall(reg_str, line)
if m:
for match in m:
match_lst.append(match)
return match_lst
def get_data_line(html, start_tag, reg_str):
match_lst = []
for line in html:
if line.find(start_tag) != -1:
m = re.findall(reg_str, line)
if m:
for match in m:
match_lst.append(match)
break
return match_lst
def get_multi_data_start_end(html, start_tag, end_tag, reg_str_lst):
start = False
strip_line = ""
for line in html:
if line.find(start_tag) != -1:
start = True
continue
if start:
if line.find(end_tag) != -1:
break
strip_line += line.strip()
match_lst = []
for reg_str in reg_str_lst:
m = re.findall(reg_str, strip_line)
if m:
for match in m:
match_lst.append(match)
else:
match_lst.append("")
return match_lst
def get_rmrb(paper_name, date):
index_url = "http://paper.people.com.cn/rmrb/html/%s-%s/%s/nbs.D110000renmrb_01.htm" %(date.split("-")[0], date.split("-")[1], date.split("-")[2])
my_print(paper_name, date, "start get %s" %(index_url))
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
req = request.Request(url=index_url, headers=headers)
layout_name_url_lst = []
layout_article_lst = []
flag_index = True
flag_other = True
try:
with request.urlopen(req) as f:
data = f.read().decode("utf-8", "ignore").split("\r\n")
layout_name_lst = get_data_start_end(data, "", "")
layout_url_lst = get_data_start_end(data, "", "