culiu9261

scrap_Web Scrap Scotch：节点方式

scrap

A lot of new web technologies and design paradigms have emerged in the last couple of years. Some programming languages are beginning to gain increasing popularity. It's very likely to have heard about concepts like responsive design, hybrid mobile/desktop apps, progressive web apps(PWAs), single page applications(SPAs), server-side rendered(SSR) apps, serverless architechture, the list goes on.

在最近几年中出现了许多新的Web技术和设计范例。一些编程语言开始变得越来越流行。很可能已经听说过诸如响应式设计，混合移动/桌面应用程序，渐进式Web应用程序（PWA），单页应用程序（SPA），服务器端渲染（SSR）应用程序，无服务器架构等概念，该清单还在继续。

While every modern web developer aims at getting up to speed with these technologies, there are a few less popular web concepts and techniques that are quite useful - one of which is web scraping. In this tutorial, we will take a look at web scraping and practical ways we can harness the technique.

尽管每个现代Web开发人员都致力于与这些技术保持同步，但还有一些不太流行的Web概念和技术非常有用-其中之一是Web scraping 。在本教程中，我们将研究Web抓取以及利用该技术的实用方法。

什么是网页抓取？ (What is Web Scraping?)

In very simple terms, web scraping is the technique of extracting data from websites. This data can further be stored in a database or any other storage system for analysis or other uses. While extracting data from websites can be done manually, web scraping usually refers to an automated and less tedious process.

简单来说，网络抓取是从网站提取数据的技术。该数据可以进一步存储在数据库或任何其他存储系统中，以进行分析或其他用途。从网站提取数据虽然可以手动完成，但网络抓取通常是指自动化且乏味的过程。

Web scraping may seem very trivial, but it is the technique used by most bots and web crawlers for data extraction. There are different techniques that can be employed for web scraping. However, in this tutorial, we will use a technique that involves DOM parsing a webpage.

Web抓取看似微不足道，但这是大多数机器人和Web爬网程序用于数据提取的技术。有多种技术可用于刮纸。但是，在本教程中，我们将使用涉及DOM解析网页的技术。

苏格兰刮 (Scotch Scraping)

In this tutorial, we will use web scraping to extract some data from the Scotch website. Scotch does not provide an API for fetching the profiles and tutorials/posts of authors. However, we will be building a very simple API for fetching the profiles and tutorials/posts of Scotch authors.

在本教程中，我们将使用Web抓取从Scotch网站提取一些数据。 Scotch不提供用于获取作者的个人资料和教程/帖子的API。但是，我们将构建一个非常简单的API，以获取Scotch作者的个人资料和教程/帖子。

Here is a screenshot of a simple demo app created based on the API we will be build in this tutorial. You can see the app on Heroku and the source code on GitHub.

这是一个基于我们将在本教程中构建的API创建的简单演示应用程序的屏幕截图。您可以在Heroku上看到该应用程序，并在GitHub上看到源代码。

先决条件 ( Prerequisites )

高级JavaScript和ES6语法 (Advanced JavaScript and ES6 Syntax)

Web scraping can be done in virtually any programming language that has support for HTTP and XML or DOM parsing. In this tutorial, we will focus on web scraping using JavaScript in a Node.js server environment. Hence, an advanced knowledge of JavaScript is required to fully understand the code snippets.

几乎可以使用任何支持HTTP和XML或DOM解析的编程语言来完成Web抓取。在本教程中，我们将重点介绍在Node.js服务器环境中使用JavaScript进行Web抓取。因此，需要JavaScript的高级知识才能完全理解代码段。

Also in this tutorial, there is heavy usage of ES6 syntax as shown through the repeated use of arrow functions, destructuring, block-scope variables, template literals, rest parameters, spread operator, default parameters, object enhancements and promises. Hence, adequate familiarity with ES6 syntax is required to fully understand the code snippets.

同样在本教程中，大量使用ES6语法，如反复使用箭头函数，解构，块范围变量，模板文字，剩余参数，散布运算符，默认参数，对象增强和Promise所示。因此，需要充分熟悉ES6语法才能完全理解代码段。

We will also use a couple of ES7 features such as async functions and the await operator. Hence, knowledge of asynchronous functions and working with Promises is required.

我们还将使用ES7的一些功能，例如async functions和await运算符。因此，需要了解异步功能和使用Promises。

jQuery熟悉度 (jQuery Familiarity)

Some familiarity with the jQuery DOM library is required to completely understand some of the code snippets in this tutorial. This is because the Cheerio package we will be using in this tutorial is based on the jQuery DOM manipulation and traversing style. Check the jQuery API Documentation to learn more about jQuery.

为了完全理解本教程中的某些代码片段，需要熟悉jQuery DOM库。这是因为我们将在本教程中使用的Cheerio包基于jQuery DOM操作和遍历样式。查看jQuery API文档以了解有关jQuery的更多信息。

功能编程 (Functional Programming)

In this tutorial, we will employ the functional programming paradigm in building our desired API. As such, we will create so many specialized functions, and also apply a couple of functional programming concepts such as pure functions(immutability concept), higher-order functions and composition. Hence, you will be better-off if you already have prior understanding of functional programming concepts. You can learn more about functional programming concepts here.

在本教程中，我们将使用函数式编程范例来构建所需的API。这样，我们将创建许多专门的函数，并应用一些函数式编程概念，例如纯函数（不变性概念），高阶函数和组合。因此，如果您已经对函数式编程概念有事先的了解，那么您的情况就会更好。您可以在此处了解有关函数式编程概念的更多信息。

核心依赖 (Core Dependencies)

Before you begin, ensure that you have Node and npm or yarn installed on your machine. Since we will use a lot of ES6/7 syntax in this tutorial, I recommend you use the following versions of Node and npm for complete ES6/7 support: Node(8.9.0 or higher) and npm(5.2.0 or higher).

在开始之前，请确保已在计算机上安装了Node和npm或yarn 。由于在本教程中我们将使用很多ES6 / 7语法，因此我建议您使用以下版本的Node和npm以获得完整的ES6 / 7支持： Node （ 8.9.0或更高版本）和npm （ 5.2.0或更高版本）。

Here are the core packages we will be using:

以下是我们将使用的核心软件包：

Cheerio - Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It makes DOM parsing very easy.

Cheerio -Cheerio是专门为服务器设计的核心jQuery的快速，灵活和精益实现。这使得DOM解析非常容易。
Axios - Axios is a promise based HTTP client for the browser and Node.js. It will enable us fetch page contents through HTTP requests.

Axios -Axios是用于浏览器和Node.js的基于Promise的HTTP客户端。这将使我们能够通过HTTP请求获取页面内容。
Express - Express is a minimal and flexible Node.js web application framework that provides a robust set of features for web and mobile applications.

Express -Express是一个最小且灵活的Node.js Web应用程序框架，为Web和移动应用程序提供了一组强大的功能。
Lodash - Lodash is a modern JavaScript utility library delivering modularity, performance & extras. It makes JavaScript easier by taking the hassle out of working with arrays, numbers, objects, strings, etc.

Lodash -Lodash是一个现代JavaScript实用程序库，提供模块化，性能和附加功能。通过消除处理数组，数字，对象，字符串等的麻烦，它使JavaScript更加容易。

入门 ( Getting Started )

安装依赖项 (Installing Dependencies)

Create a new directory for the application and run the following command to install the required dependencies for the app.

为该应用程序创建一个新目录，然后运行以下命令以安装该应用程序所需的依赖项。

# Create a new directory
mkdir scotch-scraping

# cd into the new directory
cd scotch-scraping

# Initiate a new package and install app dependencies
npm init -y
npm install express morgan axios cheerio lodash

设置Express服务器应用程序 (Setting up the Express server application)

We will go ahead to setup a simple HTTP server application using Express. Create a server.js file in the root directory of your application and add the following code snippet to setup the server:

我们将继续使用Express设置一个简单的HTTP服务器应用程序。在应用程序的根目录中创建一个server.js文件，并添加以下代码片段以设置服务器：

/_ server.js _/

// Require dependencies
const logger = require('morgan');
const express = require('express');

// Create an Express application
const app = express();

// Configure the app port
const port = process.env.PORT || 3000;
app.set('port', port);

// Load middlewares
app.use(logger('dev'));

// Start the server and listen on the preconfigured port
app.listen(port, () => console.log(`App started on port ${port}.`));

修改npm `scripts` (Modify npm `scripts`)

Finally, we will modify the "scripts" section of the package.json file to look like the following snippet:

最后，我们将package.json文件的"scripts"部分修改为类似于以下代码段：

"scripts": {
  "start": "node server.js"
}

We have gotten all we need to start building our application. If you run the command npm start in your terminal now, it will start up the application server on port 3000 if it is available. However, we cannot access any route yet since we are yet to add routes to our application. Let's start building some helper functions we will need for web scraping.

我们已经获得了开始构建应用程序所需的全部信息。如果现在在终端中运行命令npm start ，它将在端口3000上启动应用程序服务器（如果可用）。但是，由于尚未向我们的应用程序添加路由，因此我们无法访问任何路由。让我们开始构建一些Web抓取所需的帮助程序功能。

辅助功能 ( Helper Functions )

As stated earlier, we will create a couple of helper functions that will used in several parts of our application. Create a new app directory in your project root. Create a new file named helpers.js in the just created directory and add the following content to it:

如前所述，我们将创建几个帮助程序函数，这些函数将在应用程序的多个部分中使用。在项目根目录中创建一个新的app目录。在刚刚创建的目录中创建一个名为helpers.js的新文件， helpers.js其中添加以下内容：

/_ app/helpers.js _/

const _ = require('lodash');
const axios = require("axios");
const cheerio = require("cheerio");

As you can see we are simply requiring the dependencies we will need for our helper functions. Let's go ahead and add the helper functions.

如您所见，我们只是需要帮助函数所需的依赖项。让我们继续添加辅助函数。

实用工具功能 (Utility Helper Functions)

We will start by creating some utility helper functions. Add the following snippet to the app/helpers.js file.

我们将从创建一些实用程序助手功能开始。将以下代码段添加到app/helpers.js文件。

/_ app/helpers.js _/

///
// UTILITY FUNCTIONS
///

/**
 **_ Compose function arguments starting from right to left
 _** to an overall function and returns the overall function
 */
const compose = (...fns) => arg => {
  return **_.flattenDeep(fns).reduceRight((current, fn) => {
    if (_**.isFunction(fn)) return fn(current);
    throw new TypeError("compose() expects only functions as parameters.");
  }, arg);
};

/**
 _ Compose async function arguments starting from right to left
 _ to an overall async function and returns the overall async function
 _/
const composeAsync = (...fns) => arg => {
  return .flattenDeep(fns).reduceRight(async (current, fn) => {
    if (.isFunction(fn)) return fn(await current);
    throw new TypeError("compose() expects only functions as parameters.");
  }, arg);
};

/**
 _ Enforces the scheme of the URL is https
 _ and returns the new URL
 _/
const enforceHttpsUrl = url =>
  _.isString(url) ? url.replace(/^(https?:)?\/\//, "https://") : null;

/*
  Strips number of all non-numeric characters
  and returns the sanitized number
 /
const sanitizeNumber = number =>
  _.isString(number)
    ? number.replace(/[^0-9-.]/g, "")
    : _.isNumber(number) ? number : null;

/*
  Filters null values from array
  and returns an array without nulls
 /
const withoutNulls = arr =>
  _.isArray(arr) ? arr.filter(val => !_.isNull(val)) : _[_];

/_**
 ** Transforms an array of ({ key: value }) pairs to an object
 ** and returns the transformed object
 */
const arrayPairsToObject = arr =>
  arr.reduce((obj, pair) => ({ ...obj, ...pair }), {});

/**_
 _ A composed function that removes null values from array of ({ key: value }) pairs
 _ and returns the transformed object of the array
 */
const fromPairsToObject = compose(arrayPairsToObject, withoutNulls);

Let's go through the functions one at a time to understand what they do.

让我们一次浏览一个功能，以了解它们的作用。

compose() - This is a higher-order function that takes one or more functions as its arguments and returns a composed function. The composed function has the same effect as invoking the functions passed in as arguments from right to left, passing the result of a function invocation as argument to the next function each time.

If any of the arguments passed to compose() is not a function, the composed function will throw an error whenever it is invoked. Here is a code snippet that describes how compose() works.
compose() -这是一个高阶函数，将一个或多个函数作为其参数，并返回一个composed function 。组成函数与从右向左调用作为参数传递的函数的效果相同，每次将函数调用的结果作为参数传递给下一个函数。

如果传递了任何参数to compose()不是一个function ，只要调用该组合函数，就会抛出错误。这是描述compose()如何工作的代码段。

/**
**_ -------------------------------------------------
_** Method 1: Functions in sequence
**_ -------------------------------------------------
_**/
function1( function2( function3(arg) ) );

/**
_ -------------------------------------------------
_ Method 2: Using compose()
_ -------------------------------------------------
_ Invoking the composed function has the same effect as (Method 1)
*/
const composedFunction = compose(function1, function2, function3);

composedFunction(arg);

composeAsync() - This function works in the same way as the compose() function. The only difference being that it is asynchronous. Hence, it is ideal for composing functions that have asynchronous behaviour - for example, functions that return promises.

composeAsync() -此函数的工作方式与compose()函数相同。唯一的区别是它是异步的。因此，非常适合组成具有异步行为的函数-例如，返回诺言的函数。
enforceHttpsUrl() - This function takes a url string as argument and returns the url with https scheme provided the url begins with either https://, http:// or //. If the url is not a string then null is returned. Here is an example.

enforceHttpsUrl() -此函数将url字符串作为参数，并使用https方案返回URL，前提是该url以https:// ， http://或//开头。如果url不是字符串，则返回null 。这是一个例子。

enforceHttpsUrl('scotch.io'); // returns => 'scotch.io'
enforceHttpsUrl('//scotch.io'); // returns => 'https://scotch.io'
enforceHttpsUrl('http://scotch.io'); // returns => 'https://scotch.io'

sanitizeNumber() - This function expects a number or string as argument. If a number is passed to it, it returns the number. However, if a string is passed to it, it removes non-numeric characters from the string and returns the sanitized string. For other value types, it simply returns null. Here is an example.
sanitizeNumber() -此函数需要number或string作为参数。如果将number传递给它，它将返回该数字。但是，如果将string传递给它，它将从字符串中删除非数字字符并返回经过清理的字符串。对于其他值类型，它仅返回null 。这是一个例子。

sanitizeNumber(53.56); // returns => 53.56
sanitizeNumber('-2oo,40'); // returns => '-240'
sanitizeNumber('badnumber.decimal'); // returns => '.'

withoutNulls() - This function expects an array as argument and returns a new array that only contains the non-null items of the original array. Here is an example.
withoutNulls() -此函数需要一个array作为参数，并返回一个仅包含原始数组non-null项的新数组。这是一个例子。

withoutNulls([ 'String', [], null, {}, null, 54 ]); // returns => ['String', [], {}, 54]

arrayPairsToObject() - This function expects an array of ({ key: value }) objects, and returns a transformed object with the keys and values. Here is an example.
arrayPairsToObject() -此函数需要一个（ { key: value } ）对象array ，并返回包含键和值的转换对象。这是一个例子。

const pairs = [ { key1: 'value1' }, { key2: 'value2' }, { key3: 'value3' } ];

arrayPairsToObject(pairs); // returns => { key1: 'value1', key2: 'value2', key3: 'value3' }

fromPairsToObject() - This is a composed function created using compose(). It has the same effect as executing:
fromPairsToObject() -这是一个使用compose()创建的组合函数。它具有与执行相同的效果：

arrayPairsToObject( withoutNulls(array) );

请求和响应助手功能 (Request and Response Helper Functions)

Add the following to the app/helpers.js file.

将以下内容添加到app/helpers.js文件。

/_ app/helpers.js _/

/**
 **_ Handles the request(Promise) when it is fulfilled
 _** and sends a JSON response to the HTTP response stream(res).
 */
const sendResponse = res => async request => {
  return await request
    .then(data => res.json({ status: "success", data }))
    .catch(({ status: code = 500 }) =>
      res.status(code).json({ status: "failure", code, message: code == 404 ? 'Not found.' : 'Request failed.' })
    );
};

/**
 _ Loads the html string returned for the given URL
 _ and sends a Cheerio parser instance of the loaded HTML
 */
const fetchHtmlFromUrl = async url => {
  return await axios
    .get(enforceHttpsUrl(url))
    .then(response => cheerio.load(response.data))
    .catch(error => {
      error.status = (error.response && error.response.status) || 500;
      throw error;
    });
};

Here, we have added two new functions: sendResponse() and fetchHtmlFromUrl(). Let's try to understand what they do.

在这里，我们添加了两个新函数： sendResponse()和fetchHtmlFromUrl() 。让我们尝试了解他们的工作。

sendResponse() - This is a higher-order function that expects an Express HTTP response stream(res) as its argument and returns an async function. The returned async function expects a promise or a thenable as its argument(request).

If the request promise resolves, then a successful JSON response is sent using res.json(), containing the resolved data. If the promise rejects, then an error JSON response with an appropriate HTTP status code is sent. Here is how it can be used in an Express route:
sendResponse() -这是一个高阶函数，它期望Express HTTP响应流（ res ）作为其参数，并返回一个async function 。返回的async function期望一个promise或一个thenable作为其参数（ request ）。

如果request promise得以解决，则使用res.json()发送成功的JSON响应，其中包含已解决的数据。如果承诺被拒绝，则将发送带有适当HTTP状态代码的错误JSON响应。这是如何在Express路线中使用的方法：

app.get('/path', (req, res, next) => {
  const request = Promise.resolve([1, 2, 3, 4, 5]);
  sendResponse(res)(request);
});

Making a GET request to the /path endpoint will return this JSON response:

向/path端点发出GET请求将返回以下JSON响应：

{
  "status": "success",
  "data": [1, 2, 3, 4, 5]
}

fetchHtmlFromUrl() - This is an async function that expects a url string as its argument. First, it uses axios.get() to fetch the content of the URL(which returns a promise). If the promise resolves, it uses cheerio.load() with the returned content to create a Cheerio parser instance, and then returns the instance. However, if the promise rejects, it throws an error with an appropriate status code.

The Cheerio parser instance that is returned by this function will enable us extract the data we require. We can use it in much similar ways as we use the jQuery instance returned by calling $() or jQuery() on a DOM target.
fetchHtmlFromUrl() -这是一个async function ，期望将url字符串作为其参数。首先，它使用axios.get()来获取URL的内容（这将返回一个axios.get() ）。如果承诺cheerio.load()解决，它将使用带有返回内容的cheerio.load()创建一个Cheerio解析器实例，然后返回该实例。但是，如果Promise拒绝，它将抛出错误并带有适当的状态码。

此函数返回的Cheerio解析器实例将使我们能够提取所需的数据。我们可以使用与通过在DOM目标上调用$()或jQuery()返回的jQuery实例非常相似的方式来使用它。

DOM解析助手功能 (DOM Parsing Helper Functions)

Let's go ahead to add some additional functions to help us with DOM parsing. Add the following content to the app/helpers.js file.

让我们继续添加一些其他功能来帮助我们进行DOM解析。将以下内容添加到app/helpers.js文件。

/_ app/helpers.js _/

///
// HTML PARSING HELPER FUNCTIONS
///

/**
 **_ Fetches the inner text of the element
 _** and returns the trimmed text
 */
const fetchElemInnerText = elem => (elem.text && elem.text().trim()) || null;

/**
 _ Fetches the specified attribute from the element
 _ and returns the attribute value
 _/
const fetchElemAttribute = attribute => elem =>
  (elem.attr && elem.attr(attribute)) || null;

/**
 _ Extract an array of values from a collection of elements
 _ using the extractor function and returns the array
 _ or the return value from calling transform() on array
 _/
const extractFromElems = extractor => transform => elems => $ => {
  const results = elems.map((i, element) => extractor($(element))).get();
  return _.isFunction(transform) ? transform(results) : results;
};

/_*
  A composed function that extracts number text from an element,
  sanitizes the number text and returns the parsed integer
 /
const extractNumber = compose(parseInt, sanitizeNumber, fetchElemInnerText);

/_
 _ A composed function that extracts url string from the element's attribute(attr)
 _ and returns the url with https scheme
 _/
const extractUrlAttribute = attr =>
  compose(enforceHttpsUrl, fetchElemAttribute(attr));


module.exports = {
  compose,
  composeAsync,
  enforceHttpsUrl,
  sanitizeNumber,
  withoutNulls,
  arrayPairsToObject,
  fromPairsToObject,
  sendResponse,
  fetchHtmlFromUrl,
  fetchElemInnerText,
  fetchElemAttribute,
  extractFromElems,
  extractNumber,
  extractUrlAttribute
};

We've added a few more functions. Here are the functions and what they do:

我们添加了更多功能。以下是功能及其作用：

fetchElemInnerText() - This function expects an element as argument. It extracts the innerText of the element by calling elem.text(), it trims the text of surrounding whitespaces and returns the trimmed inner text. Here is an example.
fetchElemInnerText() -此函数需要一个element作为参数。它通过调用elem.text()提取元素的innerText ，修剪周围空白的文本并返回修剪后的内部文本。这是一个例子。

const $ = cheerio.load('  Glad Chinda 
');
const elem = $('div.fullname');

fetchElemInnerText(elem); // returns => 'Glad Chinda'

fetchElemAttribute() - This is a higher-order function that expects an attribute as argument and returns another function that expects an element as argument. The returned function extracts the value of the given attribute of the element by calling elem.attr(attribute). Here is an example.
fetchElemAttribute() -这是一个高阶函数，期望将attribute作为参数，并返回另一个函数，将element作为参数。返回的函数通过调用elem.attr(attribute)提取元素的给定attribute的值。这是一个例子。

const $ = cheerio.load('@gladchinda
');
const elem = $('div.username');

// fetchTitle is a function that expects an element as argument
const fetchTitle = fetchElemAttribute('title');

fetchTitle(elem); // returns => 'Glad Chinda'

extractFromElems() - This is a monster function although it does a very simple job. It is a higher-order function that returns another monster higher-order function. Here, we have used a functional programming technique known as currying to create a sequence of functions each requiring just one argument. Here is the sequence of arguments:
extractFromElems() -这是一个怪兽函数，尽管它做的很简单。它是一个高阶函数，它返回另一个怪物高阶函数。在这里，我们使用了一种称为currying的功能编程技术来创建一系列函数，每个函数仅需要一个参数。这是参数的顺序：

extractorFunction -> transformFunction -> elementsCollection -> cheerioInstance

extractFromElems() makes it possible to extract data from a collection of similar elements using an extractor function, and also transform the extracted data using a transform function. The extractor function receives an element as argument, while the transform function receives an array of values as argument.

Let's say we have a collection of elements, each containing the name of a person as innerText. We want to extract all these names and return them in an array, all in uppercase. Here is how we can do this using extractFromElems():

extractFromElems()使得使用extractor函数从相似元素的集合中提取数据成为可能，并且还可以使用transform函数对提取的数据进行transform 。 extractor函数接收一个元素作为参数，而transform函数接收一个值的数组作为参数。

假设我们有一个元素的集合，每个元素都包含一个人的名字，如innerText 。我们要提取所有这些名称并以大写形式将它们返回到数组中。这是我们如何使用extractFromElems()做到这一点：

const $ = cheerio.load('Glad ChindaJohn DoeBrendan Eich
');

// Get the collection of span elements containing names
const elems = $('div.people span');

// The transform function
const transformUpperCase = values => values.map(val => String(val).toUpperCase());

// The arguments sequence: extractorFn => transformFn => elemsCollection => cheerioInstance($)
// fetchElemInnerText is used as extractor function
const extractNames = extractFromElems(fetchElemInnerText)(transformUpperCase)(elems);

// Finally pass in the cheerioInstance($)
extractNames($); // returns => ['GLAD CHINDA', 'JOHN DOE', 'BRENDAN EICH']

extractNumber() - This is a composed function that expects an element as argument and tries to extract a number from the innerText of the element. It does this by composing parseInt(), sanitizeNumber() and fetchElemInnerText(). It has the same effect as executing:
extractNumber() -这是一个组合函数，希望将element用作参数，并尝试从元素的innerText中提取一个数字。它通过组成parseInt() ， sanitizeNumber()和fetchElemInnerText() 。它具有与执行相同的效果：

parseInt( sanitizeNumber( fetchElemInnerText(elem) ) );

extractUrlAttribute() - This is a composed higher-order function that expects an attribute as argument and returns another function that expects an element as argument. The returned function tries to extract the URL value of an attribute in the element and returns it with the https scheme. Here is a snippet that shows how it works:
extractUrlAttribute() -这是一个组合的高阶函数，期望将attribute作为参数，并返回另一个函数，将element作为参数。返回的函数尝试提取元素中属性的URL值，并使用https方案将其返回。这是显示其工作原理的代码段：

// METHOD 1
const fetchAttribute = fetchElemAttribute(attr);
enforceHttpsUrl( fetchAttribute(elem) );

// METHOD 2: Using extractUrlAttribute()
const fetchUrlAttribute = extractUrlAttribute(attr);
fetchUrlAttribute(elem);

Finally, we export all the helper functions we have created using module.exports. Now that we have our helper functions, we can proceed to the web scraping part of this tutorial.

最后，我们导出使用module.exports创建的所有辅助函数。现在我们有了助手功能，我们可以继续本教程的Web抓取部分。

准备好去苏格兰 ( Getting Ready to go Scotchy )

Create a new file named scotch.js in the app directory of your project and add the following content to it:

在项目的app目录中创建一个名为scotch.js的新文件， scotch.js其中添加以下内容：

/_ app/scotch.js _/

const _ = require('lodash');

// Import helper functions
const {
  compose,
  composeAsync,
  extractNumber,
  enforceHttpsUrl,
  fetchHtmlFromUrl,
  extractFromElems,
  fromPairsToObject,
  fetchElemInnerText,
  fetchElemAttribute,
  extractUrlAttribute
} = require("./helpers");

// scotch.io (Base URL)
const SCOTCH_BASE = "https://scotch.io";

///
// HELPER FUNCTIONS
///

/*
  Resolves the url as relative to the base scotch url
  and returns the full URL
 /
const scotchRelativeUrl = url =>
  _.isString(url) ? `${SCOTCH_BASE}${url.replace(/^\/*?/, "/")}` : null;

/_*
 _ A composed function that extracts a url from element attribute,
 _ resolves it to the Scotch base url and returns the url with https
 _/
const extractScotchUrlAttribute = attr =>
  compose(enforceHttpsUrl, scotchRelativeUrl, fetchElemAttribute(attr));

As you can see, we imported lodash as well as some of the helper functions we created earlier. We also defined a constant named SCOTCH_BASE that contains the base URL of the Scotch website. Finally, we added two helper functions:

如您所见，我们导入了lodash以及我们先前创建的一些辅助函数。我们还定义了一个名为SCOTCH_BASE的常量，其中包含Scotch网站的基本URL。最后，我们添加了两个帮助器功能：

scotchRelativeUrl() - This function takes a relative url string as argument and returns the URL with the pre-configured SCOTCH_BASE prepended to it. If the url is not a string then null is returned. Here is an example.
scotchRelativeUrl() -此函数以相对url字符串作为参数，并返回带有预先配置的SCOTCH_BASE的URL。如果url不是字符串，则返回null 。这是一个例子。

scotchRelativeUrl('tutorials'); // returns => 'https://scotch.io/tutorials'
scotchRelativeUrl('//tutorials'); // returns => 'https://scotch.io///tutorials'
scotchRelativeUrl('http://domain.com'); // returns => 'https://scotch.io/http://domain.com'

extractScotchUrlAttribute() - This is a composed higher-order function that expects an attribute as argument and returns another function that expects an element as argument. The returned function tries to extract the URL value of an attribute in the element, prepends the pre-configured SCOTCH_BASE to it and returns it with the https scheme. Here is a snippet that shows how it works:
extractScotchUrlAttribute() -这是一个组合的高阶函数，期望将attribute作为参数，并返回另一个函数，将element作为参数。返回的函数尝试提取元素中属性的URL值，将预配置的SCOTCH_BASE到该元素，然后使用https方案返回。这是显示其工作原理的代码段：

// METHOD 1
const fetchAttribute = fetchElemAttribute(attr);
enforceHttpsUrl( scotchRelativeUrl( fetchAttribute(elem) ) );

// METHOD 2: Using extractScotchUrlAttribute()
const fetchUrlAttribute = extractScotchUrlAttribute(attr);
fetchUrlAttribute(elem);

苏格兰威士忌提取功能 ( Scotch Extraction Functions )

We want to be able to extract the following data for any Scotch author:

我们希望能够为任何苏格兰威士忌作者提取以下数据：

profile (name, role, avatar, etc)
个人资料 （姓名，角色，头像等）
social links (facebook, twitter, github, etc)
社交链接 （facebook，twitter，github等）
stats (total views, total posts, etc)
统计信息 （总观看次数，帖子总数等）
posts
帖子

If you recall, the extractFromElems() helper function we created earlier requires an extractor function for extracting content from a collection of similar elements. We are going to define some extractor functions in this section.

如果您还记得的话，我们之前创建的extractFromElems()帮助extractor函数需要一个extractor函数，用于从一组相似元素中提取内容。我们将在本节中定义一些提取器功能。

First, we will create an extractSocialUrl() function for extracting the social network name and URL from a social link element. Here is the DOM structure of the social link element expected by extractSocialUrl().

首先，我们将创建一个extractSocialUrl()函数，用于从社交链接元素中提取社交网络名称和URL。这是extractSocialUrl()期望的社交链接元素的DOM结构。

<a href="https://github.com/gladchinda" target="_blank" title="GitHub">
  <span class="icon icon-github">
    <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" id="Capa_1" x="0px" y="0px" width="50" height="50" viewBox="0 0 512 512" style="enable-background:new 0 0 512 512;" xml:space="preserve">
      ...
    svg>
  span>
a>

Calling the extractSocialUrl() function should return an object that looks like the following:

调用extractSocialUrl()函数应返回一个类似于以下内容的对象：

{ github: 'https://github.com/gladchinda' }

Let's go on to create the function. Add the following content to the app/scotch.js file.

让我们继续创建函数。将以下内容添加到app/scotch.js文件。

/_ app/scotch.js _/

///
// EXTRACTION FUNCTIONS
///

/_*
 _ Extract a single social URL pair from container element
 */
const extractSocialUrl = elem => {

  // Find all social-icon  elements
  const icon = elem.find('span.icon');

  // Regex for social classes
  const regex = /^(?:icon|color)-(.+)$/;

  // Extracts only social classes from the class attribute
  const onlySocialClasses = regex => (classes = '') => classes
      .replace(/\s+/g, ' ')
      .split(' ')
      .filter(classname => regex.test(classname));

  // Gets the social network name from a class name
  const getSocialFromClasses = regex => classes => {
    let social = null;
    const [classname = null] = classes;

    if (_.isString(classname)) {
      const _[_, name = null] = classname.match(regex);
      social = name ? _.snakeCase(name) : null;
    }

    return social;
  };

  // Extract the href URL from the element
  const href = extractUrlAttribute('href')(elem);

  // Get the social-network name using a composed function
  const social = compose(
    getSocialFromClasses(regex),
    onlySocialClasses(regex),
    fetchElemAttribute('class')
  )(icon);

  // Return an object of social-network-name(key) and social-link(value)
  // Else return null if no social-network-name was found
  return social && { [social]: href };

};

Let's try to understand how the extractSocialUrl() function works:

让我们尝试了解extractSocialUrl()函数的工作方式：

First, we fetch the child element with an icon class. We also define a regular expression that matches social-icon class names.

首先，我们使用icon类获取子元素。我们还定义了一个与社交图标类名称匹配的正则表达式。
We define onlySocialClasses() higher-order function that takes a regular expression as its argument and returns a function. The returned function takes a string of class names separated by spaces. It then uses the regular expression to extract only the social class names from the list and returns them in an array. Here is an example:

我们定义onlySocialClasses()高阶函数，该函数以正则表达式为参数并返回一个函数。返回的函数采用一串用空格分隔的类名。然后，它使用正则表达式从列表中仅提取社交类名称，并以数组形式返回它们。这是一个例子：

const regex = /^(?:icon|color)-(.+)$/;
const extractSocial = onlySocialClasses(regex);
const classNames = 'first-class another-class color-twitter icon-github';

extractSocial(classNames); // returns [ 'color-twitter', 'icon-github' ]

Next, we define getSocialFromClasses() higher-order function that takes a regular expression as its argument and returns a function. The returned function takes an array of single class strings. It then uses the regular expression to extract the social network name from the first class in the list and returns it. Here is an example:
接下来，我们定义getSocialFromClasses()高阶函数，该函数以正则表达式为参数并返回一个函数。返回的函数采用单个类字符串的数组。然后，它使用正则表达式从列表中的第一类中提取社交网络名称，然后将其返回。这是一个例子：

const regex = /^(?:icon|color)-(.+)$/;
const extractSocialName = getSocialFromClasses(regex);
const classNames = [ 'color-twitter', 'icon-github' ];

extractSocialName(classNames); // returns 'twitter'

Afterwards, we extract the href attribute URL from the element. We also extract the social network name from the icon element using a composed function created by composing getSocialFromClasses(regex), onlySocialClasses(regex) and fetchElemAttribute('class').

然后，我们从元素中提取href属性URL。我们还使用由组成getSocialFromClasses(regex) ， onlySocialClasses(regex)和fetchElemAttribute('class')组成的函数从图标元素中提取社交网络名称。
Finally, we return an object with the social network name as key and the href URL as value. However, if no social network was fetched, then null is returned. Here is an example of the returned object:

最后，我们返回一个对象，其社交网络名称为键， href URL为值。但是，如果未获取任何社交网络，则返回null。这是返回对象的示例：

{ twitter: 'https://twitter.com/gladchinda' }

提取帖子和统计信息 (Extracting Posts and Stats)

We will go ahead to create two additional extraction functions namely: extractPost() and extractStat(), for extracting posts and stats respectively. Before we create the functions, let's take a look at the DOM structure of the elements expected by these functions.

我们将继续创建两个附加的提取函数： extractPost()和extractStat() ，分别用于提取帖子和统计信息。在创建函数之前，让我们看一下这些函数期望的元素的DOM结构。

Here is the DOM structure of the element expected by extractPost().

这是extractPost()期望的元素的DOM结构。

<div class="card large-card" data-type="post" data-id="2448">
  <a href="/tutorials/password-strength-meter-in-angularjs" class="card**img lazy-background" data-src="https://cdn.scotch.io/7540/iKZoyh9WSlSzB9Bt5MNK_post-cover-photo.jpg">
    <span class="tag is-info">Postspan>
  a>
  <h2 class="card**title">
    <a href="/tutorials/password-strength-meter-in-angularjs">Password Strength Meter in AngularJSa>
  h2>
  <div class="card-footer">
    <a class="name" href="/@gladchinda">Glad Chindaa>
    <a href="/tutorials/password-strength-meter-in-angularjs" title="Views">
      ?️ <span>24,280span>
    a>
    <a href="/tutorials/password-strength-meter-in-angularjs#comments-section" title="Comments">
      ? <span class="comment-number" data-id="2448">5span>
    a>
  div>
div>

Here is the DOM structure of the element expected by extractStat().

这是extractStat()期望的元素的DOM结构。

<div class="profile__stat column is-narrow">
  <div class="stat">41,454div>
  <div class="label">Pageviewsdiv>
div>

Add the following content to the app/scotch.js file.

将以下内容添加到app/scotch.js文件。

/_ app/scotch.js _/

/**
 **_ Extract a single post from container element
 _**/
const extractPost = elem => {
  const title = elem.find('.card__title a');
  const image = elem.find('a**[**data-src]');
  const views = elem.find("a**[**title='Views'] span");
  const comments = elem.find("a**[**title='Comments'] span.comment-number");

  return {
    title: fetchElemInnerText(title),
    image: extractUrlAttribute('data-src')(image),
    url: extractScotchUrlAttribute('href')(title),
    views: extractNumber(views),
    comments: extractNumber(comments)
  };
};

/**
 _ Extract a single stat from container element
 _/
const extractStat = elem => {
  const statElem = elem.find(".stat")
  const labelElem = elem.find('.label');

  const lowercase = val => _.isString(val) ? val.toLowerCase() : null;

  const stat = extractNumber(statElem);
  const label = compose(lowercase, fetchElemInnerText)(labelElem);

  return { [label]: stat };
};

The extractPost() function extracts the title, image, URL, views and comments of a post by parsing the children of the given element. It uses a couple of helper functions we created earlier to extract data from the appropriate elements.

extractPost()函数通过解析给定元素的子元素来提取帖子的标题，图像，URL，视图和评论。它使用了我们先前创建的几个辅助函数来从适当的元素中提取数据。

Here is an example of the object returned from calling extractPost().

这是从调用extractPost()返回的对象的示例。

{
  title: "Password Strength Meter in AngularJS",
  image: "https://cdn.scotch.io/7540/iKZoyh9WSlSzB9Bt5MNK_post-cover-photo.jpg",
  url: "https://scotch.io//tutorials/password-strength-meter-in-angularjs",
  views: 24280,
  comments: 5
}

The extractStat() function extracts the stat data contained in the given element. Here is an example of the object returned from calling extractStat().

extractStat()函数提取给定元素中包含的统计数据。这是从调用extractStat()返回的对象的示例。

{ pageviews: 41454 }

提取苏格兰威士忌作者资料 ( Fetching the Scotch Author Profile )

Now we will proceed to define the extractAuthorProfile() function that extracts the complete profile of the Scotch author. Add the following content to the app/scotch.js file.

现在，我们将继续定义extractAuthorProfile()函数，该函数提取Scotch作者的完整档案。将以下内容添加到app/scotch.js文件。

/_ app/scotch.js _/

/**
 **_ Extract profile from a Scotch author's page using the Cheerio parser instance
 _** and returns the author profile object
 */
const extractAuthorProfile = $ => {

  const mainSite = $('#sitemain');
  const metaScotch = $("meta**[**property='og:url']");
  const scotchHero = mainSite.find('section.hero--scotch');
  const superGrid = mainSite.find('section.super-grid');

  const authorTitle = scotchHero.find(".profilename h1.title");
  const profileRole = authorTitle.find(".tag");
  const profileAvatar = scotchHero.find("img.profileavatar");
  const profileStats = scotchHero.find(".profilestats .profilestat");
  const authorLinks = scotchHero.find(".author-links a**[**target='_blank']");
  const authorPosts = superGrid.find(".super-griditem **[**data-type='post']");

  const extractPosts = extractFromElems(extractPost)();
  const extractStats = extractFromElems(extractStat)(fromPairsToObject);
  const extractSocialUrls = extractFromElems(extractSocialUrl)(fromPairsToObject);

  return Promise.all(**[**
    fetchElemInnerText(authorTitle.contents().first()),
    fetchElemInnerText(profileRole),
    extractUrlAttribute('content')(metaScotch),
    extractUrlAttribute('src')(profileAvatar),
    extractSocialUrls(authorLinks)($),
    extractStats(profileStats)($),
    extractPosts(authorPosts)($)
  ]).then((**[** author, role, url, avatar, social, stats, posts ]) => ({ author, role, url, avatar, social, stats, posts }));

};

/**
 _ Fetches the Scotch profile of the given author
 _/
const fetchAuthorProfile = author => {
  const AUTHOR_URL = `${SCOTCH_BASE}/@${author.toLowerCase()}`;
  return composeAsync(extractAuthorProfile, fetchHtmlFromUrl)(AUTHOR_URL);
};

module.exports = { fetchAuthorProfile };

The extractAuthorProfile() function is very straight-forward. We first use $(the cheerio parser instance) to find a couple of elements and element collections.

extractAuthorProfile()函数非常简单。我们首先使用$ （cheerio解析器实例）来找到几个元素和元素集合。

Next, we use the extractFromElems() helper function together with the extractor functions we created earlier in this section (extractPost, extractStat and extractSocialUrl) to create higher-order extraction functions. Notice how we use the fromPairsToObject helper function we created earlier as a transform function.

接下来，我们将使用extractFromElems()帮助程序函数以及我们在本节前面创建的提取器函数（ extractPost ， extractStat和extractSocialUrl ）来创建高阶提取函数。请注意，我们如何将fromPairsToObject创建的fromPairsToObject帮助器函数用作转换函数。

Finally, we use Promise.all() to extract all the required data, leveraging on a couple of helper functions we created earlier. The extracted data is contained in an array structure following this sequence: author name, role, Scotch link, avatar link, social links, stats and posts.

最后，我们利用Promise.all()提取了所有必需的数据，并利用了我们先前创建的几个辅助函数。提取的数据按照以下顺序包含在数组结构中：作者姓名，角色，苏格兰链接，头像链接，社交链接，统计信息和帖子。

Notice how we use destructuring in the .then() promise handler to construct the final object that is returned when all the promises resolve. The returned object should look like the following:

注意，我们如何在.then()承诺处理程序中使用解构来构造最终的对象，该对象在所有承诺均得到解决时返回。返回的对象应如下所示：

{
  author: 'Glad Chinda',
  role: 'Author',
  url: 'https://scotch.io/@gladchinda',
  avatar: 'https://cdn.scotch.io/7540/EnhoZyJOQ2ez9kVhsS9B_profile.jpg',
  social: {
    twitter: 'https://twitter.com/gladchinda',
    github: 'https://github.com/gladchinda'
  },
  stats: {
    posts: 6,
    pageviews: 41454,
    readers: 31676
  },
  posts: [
    {
      title: 'Password Strength Meter in AngularJS',
      image: 'https://cdn.scotch.io/7540/iKZoyh9WSlSzB9Bt5MNK_post-cover-photo.jpg',
      url: 'https://scotch.io//tutorials/password-strength-meter-in-angularjs',
      views: 24280,
      comments: 5
    },
    ...
  ]
}

We also define the fetchAuthorProfile() function that accepts an author's Scotch username and returns a Promise that resolves to the profile of the author. For an author whose username is gladchinda, the Scotch URL is https://scotch.io/@gladchinda.

我们还定义了fetchAuthorProfile()函数，该函数接受作者的Scotch用户名，并返回一个解析为作者个人资料的Promise。对于用户名为gladchinda的作者，苏格兰语URL为https://scotch.io/@gladchinda 。

fetchAuthorProfile() uses the composeAsync() helper function to create a composed function that first fetches the DOM content of the author's Scotch page using the fetchHtmlFromUrl() helper function, and finally extracts the profile of the author using the extractAuthorProfile() function we just created.

fetchAuthorProfile()使用composeAsync()帮助程序函数创建一个组合函数，该函数首先使用fetchHtmlFromUrl()帮助程序函数提取作者的Scotch页面的DOM内容，最后使用我们刚才extractAuthorProfile()函数提取作者的个人资料创建。

Finally, we export fetchAuthorProfile as the only identifier in the module.exports object.

最后，我们将fetchAuthorProfile导出为module.exports对象中的唯一标识符。

完成路线 ( Finishing up with a Route )

We are almost done with our API. We need to add a route to our server to enable us fetch the profile of any Scotch author. The route will have the following structure, where the author parameter represents the username of the Scotch author.

我们的API差不多完成了。我们需要向服务器添加路由，以使我们能够获取任何Scotch作者的个人资料。该路由将具有以下结构，其中author参数代表苏格兰威士忌作者的用户名。

GET /scotch/:author

Let's go ahead and create this route. We will make a couple of changes to the server.js file. First, add the following to the server.js file to require some of the functions we need.

让我们继续创建此路线。我们将对server.js文件进行一些更改。首先，将以下内容添加到server.js文件中，以需要我们需要的一些功能。

/_ server.js _/

// Require the needed functions
const { sendResponse } = require('./app/helpers');
const { fetchAuthorProfile } = require('./app/scotch');

Finally, add the route to the server.js file immediately after the middlewares.

最后，在中间件之后立即将路由添加到server.js文件。

/_ server.js _/

// Add the Scotch author profile route
app.get('/scotch/:author', (req, res, next) => {
  const author = req.params.author;
  sendResponse(res)(fetchAuthorProfile(author));
});

As you can see, we pass the author received from the route parameter to the fetchAuthorProfile() function to get the profile of the given author. We then use the sendResponse() helper method to send the returned profile as a JSON response.

如您所见，我们将从路由参数接收的author传递给fetchAuthorProfile()函数，以获取给定作者的个人资料。然后，我们使用sendResponse()帮助器方法将返回的配置文件作为JSON响应发送。

We have successfully built our API using a web scraping technique. Go ahead and test the API by running npm start command on your terminal. Launch your favorite HTTP testing tool e.g Postman and test the API endpoint. If you followed all the steps correctly, you should have a result that looks like the following demo:

我们已经使用网络抓取技术成功构建了API。继续并通过在终端上运行npm start命令来测试API。启动您喜欢的HTTP测试工具，例如Postman并测试API端点。如果正确执行了所有步骤，则结果应类似于以下演示：

< img src =“ https://farm1.staticflickr.com/960/41038838905\_ab703d85fb\_o.jpg” width =“ 1280” height =“ 784” alt =“ Scotch Scraping API Demo”>

你可能感兴趣的:(python,javascript,java,js,编程语言,ViewUI)

Python清华镜像源使用方法（python 安装包) 程序代码狂人 linux 运维服务器
pipinstallpandas-ihttps://pypi.tuna.tsinghua.edu.cn/simple/把红字用要下载的包名替换掉即可pip：这是Python的包管理工具，用于安装和管理Python包。pip允许你从Python包索引（PythonPackageIndex，简称PyPI）下载和安装库。install：这是pip的一个子命令，用于安装包。当你指定install时，pip
JavaScript简介、如何在HTML中使用JavaScript以及JavaScript基本概念 v.15889726201 javascript html udp
一、JavaScript简介一个完整的JavaScript实现应该由ECMAScript（核心）、DOM（文档对象模型）、BOM（浏览器对象模型）三个不同的部分组成；ECMAScript提供核心语言；DOM（DocumentObjectModel）把整个页面映射为一个多层节点结构，是针对XML但经过扩展用于HTML的应用程序编程接口（API），借助DOM提供的API，开发人员可以轻松自如地删除、添
springboot集成钉钉_SpringBoot集成钉钉报警sdk（解决Failed to introspect Class异常）周愫理(西山飞鱼) springboot集成钉钉
1.pom文件配置在resources/lib目录下加入钉钉的sdk的jar包。com.dingtalk.apidingtalk3.0.12system${project.basedir}/src/main/resources/lib/taobao-sdk-java-auto_1479188381469-20191122.jarmaven插件配置：org.springframework.boots
Transaction rolled back because it has been marked as rollback-only linab112 BUG 数据库
目录1.问题说明2.示例代码3.原因4.解决方案1.问题说明Causedby:java.lang.RuntimeException:org.springframework.transaction.UnexpectedRollbackException:Transactionrolledbackbecauseithasbeenmarkedasrollback-only有事务的方法A调用有事务的方法B
java中的参数传递 linab112 java常用 java jvm 开发语言
目录1.说明2.基础数据类型3.基础数据类型的包装类4.对象，数组，集合1.说明java中只有值传递，当作为参数传递时，传递的是基础数据类型的值的副本，及引用类型的引用的副本。2.基础数据类型①基础数据类型的内存分配基础数据类型是在栈内存中分配，当你声明一个基本数据类型变量时，会直接在栈上分配空间，栈内存用于存储局部变量和方法调用时的临时变量，这种内存的分配和释放速度是非常快的。②栈的说明栈：栈是
python怎么处理表格的去重 Rhys.. python pandas 开发语言
在Python处理表格时，可以使用pandas库中的drop_duplicates方法对一个表格进行去重。这个方法能够根据某些列或者所有列的重复值来删除重复的行，并保留第一次出现的行或指定保留的情况。让我们来看一下如何对一个Excel表格去重的示例。假设你有一个Excel文件data.xlsx，我们要对其中的数据进行去重。首先，请确保你已经安装了pandas库。如果尚未安装，请使用以下命令进行安装
『OpenCV-Python』色彩空间及色彩转换 opencv
点赞+关注+收藏=学会了在计算机图像处理中，色彩空间是理解和操作图像色彩的重要基础。每一种色彩空间都有自己的适用范围。RGB是比较常见的色彩空间，除此之外比较常见的色彩空间还有GRAY、HSV、Lab、YUV等。为什么会有这么多色彩空间呢？有兼容性的原因，也有为了方便计算的原因。比如YUV这个是电视信号系统采用的，以前的老电视是黑白电视，只需要一个颜色通道，后来出现了彩色电视，为了使视频信号能够兼
Python如何写日志文件测试小白2951 python 开发语言
改目录加testcase加common，存放log6/写日志log.pyimportloggingimportosimporttimeclassLoggingUtil():def__init__(self,logger=None):#创建一个loggerself.logger=logging.getLogger(logger)level='INFO'iflevel=='DEBUG'orlevel=
PYTHON UI自动化，selenium第一节，登录测试小白2951 python 开发语言
'''1/安装Selenium库：使用pip工具来安装Selenium库。在命令行中输入以下命令：pipinstallselenium/requestspip常用基础命令查看已经安装的第三方库:piplist直接安装库：pipinstall库名指定版本安装：pipinstallrobotframework==2.8.7卸载已安装的库：pipuninstallrequests更新某个库：pipins
node.js基础入门语法总结 @dahai node.js node.js npm 前端
目录目录1.node内置fs模块1.fs.readFile()2.fs.writeFile()3.注意点2.path模块1.path.join()2.path.basename()3.path.extname()3.http模块1.创建基本的web服务器步骤2.res.end返回中文乱码问题4.模块化1.模块作用域2.module对象，向外共享模块作用域中的成员3.module.exports对象
【前端面试】深入了解Node.js基础贾明恣前端面试 node.js
前端与node.jsNode.js不是一门语言也不是框架，而是JavaScript运行时环境。基于GoogleV8引擎、同时它通过封装和抽象操作系统提供的底层功能，以及使用Libuv等c++/c的核心模块，扩展了JavaScript功能，使得JavaScript能够同时具有DOM操作(浏览器)和I/O、文件读写、操作数据库(服务器端)……能力，是目前最简单的全栈式语言。前端开发者经常需要安装Nod
python怎样读取数据类型_python相关的几种数据类型的存储读取方式瘦下来 python怎样读取数据类型
归纳一下python中不同数据保存格式的存储和读取，旨在方法整理和速度比较。从数据角度分两种，一是ndarray格式的纯数值数据的读写，二是对象(数据结构)如dict的文件存取。数值数据的读写.bin格式，np.tofile()和np.fromfile()importnumpyasnpa=np.random.randint(0,100,size=(10000,5000))print(a.dtype
java搜索框架_搜索引擎框架介绍 weixin_39568926 java搜索框架
一、搜索引擎基础介绍二、常见搜索引擎框架介绍与比较三、参考文章一、搜索引擎基础介绍1.什么是搜索引擎搜索引擎，通常指的是收集了万维网上几千万到几十亿个网页并对网页中的每一个词(即关键词)进行索引，建立索引数据库的全文搜索引擎。当用户查找某个关键词的时候，所有在页面内容中包含了该关键词的网页都将作为搜索结果被搜出来。再经过复杂的算法进行排序(或者包含商业化的竞价排名、商业推广或者广告)后，这些结果将
java搜索引擎框架_搜索引擎框架介绍君子Python java搜索引擎框架
原标题：搜索引擎框架介绍一、搜索引擎基础介绍1.什么是搜索引擎搜索引擎，通常指的是收集了万维网上几千万到几十亿个网页并对网页中的每一个词(即关键词)进行索引，建立索引数据库的全文搜索引擎。当用户查找某个关键词的时候，所有在页面内容中包含了该关键词的网页都将作为搜索结果被搜出来。再经过复杂的算法进行排序(或者包含商业化的竞价排名、商业推广或者广告)后，这些结果将按照与搜索关键词的相关度高低(或与相关
Python 3 基本数据类型详解 ivwdcwso 开发 python windows 开发语言
Python是一种简单而强大的编程语言，具有丰富的数据类型来处理不同类型的数据。在本文中，我们将介绍Python3中的基本数据类型，包括整数、浮点数、字符串、列表、元组、集合和字典，并举例说明它们的使用方法和应用场景。1.整数（int）整数是Python中最基本的数据类型之一，用于表示没有小数部分的数字。整数可以是正数、负数或零。x=10y=-5z=0应用场景：计数器和计数器变量。代表物品的数量或
Python的旅游网站数据爬虫分析与可视化大屏展示论文 IT实战课堂—x小凡同学 Python毕业设计项目 python 旅游爬虫
摘要随着互联网技术的迅猛发展，旅游行业也逐渐实现了数字化转型。旅游网站作为游客获取旅游信息的主要渠道，蕴含着丰富的旅游数据资源。本文旨在通过Python技术，实现旅游网站数据的爬虫分析，并利用可视化大屏展示分析结果，为旅游行业的数据驱动决策提供支持。关键词：Python；旅游网站；数据爬虫；可视化大屏一、引言旅游行业作为服务业的重要组成部分，其发展水平直接关系到国家经济的繁荣和人民生活的质量。随着
Python学习-九大数据类型整合，详细讲解小伙儿. Python python 开发语言学习
目录1.int(整型)2.float(浮点型)3.Bool(布尔类型)4.Str(字符串类型)5.None(空值)6.List(列表)7.Tuple(元组)8.Dict(字典)9.Set(集合)（字典，列表，元组，字符串知识点可能不全，可以参考本人之前发的博客进行学习，加油。）1.int(整型)特点和用途：1.可以表示正整数、负整数和零，没有小数部分。2.取值范围取决于您所使用的Python版本和
Python GIL（全局解释器锁）机制对多线程性能影响的深度分析人工智能机器学习python
在Python开发领域，GIL（GlobalInterpreterLock）一直是一个广受关注的技术话题。在3.13已经默认将GIL去除，在详细介绍3.13的更亲前，我们先要留了解GIL的技术本质、其对Python程序性能的影响。本文将主要基于CPython（用C语言实现的Python解释器，也是目前应用最广泛的Python解释器）展开讨论。GIL的技术定义GIL（GlobalInterprete
Python学习笔记 - 探索5种数据类型 Mr数据杨 Python 编程基础 python 数据类型
在当今的数字时代，编程已经成为一种基本技能，不仅适用于软件开发人员，更广泛地应用于数据分析、人工智能、自动化和科学研究等领域。Python作为一种强大且易于学习的编程语言，因其简洁的语法和广泛的应用场景，成为了初学者学习编程的首选语言。在学习Python编程的过程中，理解和掌握数据类型是至关重要的。数据类型决定了程序中可以进行的操作类型，以及如何存储和处理信息。理解不同数据类型的特性和使用场景，不
ubuntu 安装python Y.zh
在官网找到自己需要的python版本，我选择的是Gzippedsourcetarball下载压缩包#e.g.wgethttps://www.python.org/ftp/python/3.9.0/Python-3.9.0.tgz解压tar-xzvfPython-3.9.0.tgz安装#安装依赖sudoapt-getinstall-yzlibczlib1gzlib1g-devlibffi-devli
python中的九种数据类型的简单介绍 yaohappy10801 python 开发语言
目录一、基本数据类型：1.Int2.Float3.Bool二、复合型数据类型：2.1.Str2.2.List2.3.Tuple2.4.Dict2.5.Set三、None今天我来讲解一下python中的九种数据类型：在python中可以分为两类：基本数据类型和复合数据类型。在基本数据类型中包括：数字（int，float，bool）和字符串两种在复合数据类型种包括：字典（dict），元组（tuple）
每个 Java 工程师都必须知道的五个 API 性能优化技巧等风来.长 java java 性能优化开发语言程序人生职场和发展
为什么你的API响应这么慢？也许你需要解决这些问题。作为后端开发人员，我们总是在编写各种API，无论是为前端Web提供数据支持的HTTPRESTAPI，还是提供内部使用的RPCAPI。这些API在服务初期可能表现不错，但随着用户数量的增长，一开始响应很快的API变得越来越慢，直到用户抱怨：“你的系统太糟糕了。我只是浏览一个网页。怎么这么慢？”这时，您需要考虑如何优化您的API性能。要提高你的API
python爬虫项目（八十二）：爬取旅游攻略网站的用户评论，构建旅游景点推荐系统人工智能_SYBH 爬虫试读 2025年爬虫百篇实战宝典:从入门到精通 python 爬虫旅游开发语言金融信息可视化
构建一个旅游景点推荐系统，可以帮助用户根据他们的偏好和其他用户的评论来选择旅行目的地。在这个项目中，我们将通过爬取旅游攻略网站的用户评论数据，分析这些数据，并使用协同过滤等推荐算法来构建一个基本的推荐系统。本文将详细描述整个过程，包括爬虫部分和推荐系统的构建。目录文章大纲一、项目背景与目标项目的目标：二、目标网站分析与数据需求数据需求：目标网站：三、爬虫技术选型安装所需库四、使用Scrapy爬取用
用一个例子详细说明python单例模式 hunter206206 python python
单例模式是一种设计模式，它确保一个类只有一个实例，并提供一个全局访问点来访问该实例。这在需要控制资源（如数据库连接、文件系统等）的访问时非常有用。下面是一个使用Python实现单例模式的例子：classSingleton:_instance=Nonedef__new__(cls,*args,**kwargs):ifnotcls._instance:cls._instance=super(Singl
如何把一个python文件打包成一步一步安装的可执行程序 hunter206206 python python
将一个Python文件打包成可执行程序（如.exe文件），并实现一步一步的安装过程，通常需要以下步骤：1.将Python文件打包成可执行文件使用工具将Python脚本打包成可执行文件（如.exe）。常用的工具有PyInstaller和cx_Freeze。使用PyInstaller安装PyInstaller：pipinstallpyinstaller打包Python文件：pyinstaller--o
【设计模式】深入理解Python中的组合模式（Composite Pattern）写bug如流水 Python 架构设计设计模式 python 组合模式
深入理解Python中的组合模式（CompositePattern）在软件开发中，如何处理树形结构的数据和对象常常是一个挑战。**组合模式（CompositePattern）**为我们提供了一种灵活的方法来解决这一问题。它允许我们将对象组合成树形结构以表示“部分-整体”的层次关系，使得客户端可以以一致的方式对待单个对象和组合对象。在本文中，我们将详细探讨组合模式的定义、应用场景、实现方式，并通过示
推荐使用：Node.js 参考架构解然嫚Keegan
推荐使用：Node.js参考架构nodejs-reference-architectureTheRedHatandIBMNode.jsReferencearchitecture.Theteams'opinion'onwhatcomponentsourcustomersandinternalteamsshouldusewhenbuildingNode.jsapplicationsandguidanc
推荐项目：Node.js参考架构卢千怡
推荐项目：Node.js参考架构nodejs-reference-architectureTheRedHatandIBMNode.jsReferencearchitecture.Theteams'opinion'onwhatcomponentsourcustomersandinternalteamsshouldusewhenbuildingNode.jsapplicationsandguidanc
P叔带你学Python-1.6-测试Python环境 Python_P叔 P叔带你学Python python 开发语言
在编程中，测试是一项重要的工作，可以帮助我们验证代码的正确性和稳定性。在Python编程环境中，同样需要进行测试来确保Python的安装和配置是正确的。在本篇文章中，我们将介绍如何测试Python环境，以确保我们的Python开发环境正常工作。一、检查Python版本在测试Python环境之前，首先需要检查Python的版本。Python有多个版本，例如Python2.x和Python3.x，这两
【go语言】gorm 快速入门加油，旭杏 Go语言数据库
一、orm1.1什么是ormORM（对象关系映射，Object-RelationalMapping）是一种程序设计技术，用于在关系型数据库和面向对象编程语言之间进行转换和映射。ORM允许开发者通过面向对象的方式与数据库交互，而无需直接编写复杂的SQL查询语句。1.1.1主要概念对象与表的映射：ORM将数据库中的表映射为程序中的对象，将表中的每一行映射为一个对象实例的属性。每个对象实例代表数据库中的
矩阵求逆（JAVA）利用伴随矩阵 qiuwanchi 利用伴随矩阵求逆矩阵
package gaodai.matrix; import gaodai.determinant.DeterminantCalculation; import java.util.ArrayList; import java.util.List; import java.util.Scanner; /** * 矩阵求逆(利用伴随矩阵) * @author 邱万迟
单例（Singleton）模式 aoyouzi 单例模式 Singleton
3.1 概述如果要保证系统里一个类最多只能存在一个实例时，我们就需要单例模式。这种情况在我们应用中经常碰到，例如缓存池，数据库连接池，线程池，一些应用服务实例等。在多线程环境中，为了保证实例的唯一性其实并不简单，这章将和读者一起探讨如何实现单例模式。 3.2
[开源与自主研发]就算可以轻易获得外部技术支持,自己也必须研发 comsci 开源
现在国内有大量的信息技术产品，都是通过盗版，免费下载，开源，附送等方式从国外的开发者那里获得的。。。。。。虽然这种情况带来了国内信息产业的短暂繁荣，也促进了电子商务和互联网产业的快速发展，但是实际上，我们应该清醒的看到，这些产业的核心力量是被国外的
页面有两个frame,怎样点击一个的链接改变另一个的内容 Array_06 UI XHTML
<a src="地址" targets="这里写你要操作的Frame的名字" />搜索然后你点击连接以后你的新页面就会显示在你设置的Frame名字的框那里 targerts="",就是你要填写目标的显示页面位置 ===================== 例如： <frame src=&
Struts2实现单个/多个文件上传和下载 oloz 文件上传 struts
struts2单文件上传：步骤01:jsp页面  　　<form action="fileUplo
推荐10个在线logo设计网站 362217990 logo
在线设计Logo网站。 1、http://flickr.nosv.org（这个太简单） 2、http://www.logomaker.com/?source=1.5770.1 3、http://www.simwebsol.com/ImageTool 4、http://www.logogenerator.com/logo.php?nal=1&tpl_catlist[]=2 5、ht
jsp上传文件香水浓 jsp fileupload
1. jsp上传 Notice： 1. form表单 method 属性必须设置为 POST 方法，不能使用 GET 方法 2. form表单 enctype 属性需要设置为 multipart/form-data 3. form表单 action 属性需要设置为提交到后台处理文件上传的jsp文件地址或者servlet地址。例如 uploadFile.jsp 程序文件用来处理上传的文
我的架构经验系列文章 - 前端架构 agevs JavaScript Web 框架 UI jQuer
框架层面：近几年前端发展很快，前端之所以叫前端因为前端是已经可以独立成为一种职业了，js也不再是十年前的玩具了，以前富客户端RIA的应用可能会用flash/flex或是silverlight，现在可以使用js来完成大部分的功能，因此js作为一门前端的支撑语言也不仅仅是进行的简单的编码，越来越多框架性的东西出现了。越来越多的开发模式转变为后端只是吐json的数据源，而前端做所有UI的事情。MVCMV
android ksoap2 中把XML(DataSet) 当做参数传递 aijuans android
我的android app中需要发送webservice ，于是我使用了 ksop2 进行发送，在测试过程中不是很顺利,不能正常工作.我的web service 请求格式如下 [html] view plain copy <Envelope xmlns="http://schemas.
使用Spring进行统一日志管理 + 统一异常管理 baalwolf spring
统一日志和异常管理配置好后，SSH项目中，代码以往散落的log.info() 和 try..catch..finally 再也不见踪影！统一日志异常实现类： [java] view plain copy package com.pilelot.web.util; impor
Android SDK 国内镜像 BigBird2012 android sdk
一、镜像地址： 1、东软信息学院的 Android SDK 镜像，比配置代理下载快多了。配置地址， http://mirrors.neusoft.edu.cn/configurations.we#android 2、北京化工大学的： IPV4:ubuntu.buct.edu.cn IPV4:ubuntu.buct.cn IPV6:ubuntu.buct6.edu.cn
HTML无害化和Sanitize模块 bijian1013 JavaScript AngularJS Linky Sanitize
一.ng-bind-html、ng-bind-html-unsafe AngularJS非常注重安全方面的问题，它会尽一切可能把大多数攻击手段最小化。其中一个攻击手段是向你的web页面里注入不安全的HTML，然后利用它触发跨站攻击或者注入攻击。考虑这样一个例子，假设我们有一个变量存
[Maven学习笔记二]Maven命令 bit1129 maven
mvn compile compile编译命令将src/main/java和src/main/resources中的代码和配置文件编译到target/classes中，不会对src/test/java中的测试类进行编译 MVN编译使用 maven-resources-plugin:2.6:resources maven-compiler-plugin:2.5.1:compile &nbs
【Java命令二】jhat bit1129 Java命令
jhat用于分析使用jmap dump的文件，，可以将堆中的对象以html的形式显示出来，包括对象的数量，大小等等，并支持对象查询语言。 jhat默认开启监听端口7000的HTTP服务，jhat是Java Heap Analysis Tool的缩写 1. 用法： [hadoop@hadoop bin]$ jhat -help Usage: jhat [-stack <bool&g
JBoss 5.1.0 GA:Error installing to Instantiated: name=AttachmentStore state=Desc ronin47
进到类似目录 server/default/conf/bootstrap，打开文件 profile.xml找到： Xml代码<bean name="AttachmentStore" class="org.jboss.system.server.profileservice.repository.AbstractAtta
写给初学者的6条网页设计安全配色指南 brotherlamp UI ui自学 ui视频 ui教程 ui资料
网页设计中最基本的原则之一是，不管你花多长时间创造一个华丽的设计，其最终的角色都是这场秀中真正的明星——内容的衬托我仍然清楚地记得我最早的一次美术课，那时我还是一个小小的、对凡事都充满渴望的孩子，我摆放出一大堆漂亮的彩色颜料。我仍然记得当我第一次看到原色与另一种颜色混合变成第二种颜色时的那种兴奋，并且我想，既然两种颜色能创造出一种全新的美丽色彩，那所有颜色
有一个数组，每次从中间随机取一个，然后放回去，当所有的元素都被取过，返回总共的取的次数。写一个函数实现。复杂度是什么。 bylijinnan java 算法面试
import java.util.Random; import java.util.Set; import java.util.TreeSet; /** * http://weibo.com/1915548291/z7HtOF4sx * #面试题#有一个数组，每次从中间随机取一个，然后放回去，当所有的元素都被取过，返回总共的取的次数。 * 写一个函数实现。复杂度是什么
struts2获得request、session、application方式 chiangfai application
1、与Servlet API解耦的访问方式。 a.Struts2对HttpServletRequest、HttpSession、ServletContext进行了封装，构造了三个Map对象来替代这三种对象要获取这三个Map对象，使用ActionContext类。 -----> package pro.action; import java.util.Map; imp
改变python的默认语言设置 chenchao051 python
import sys sys.getdefaultencoding() 可以测试出默认语言，要改变的话，需要在python lib的site-packages文件夹下新建： sitecustomize.py，这个文件比较特殊，会在python启动时来加载，所以就可以在里面写上： import sys sys.setdefaultencoding('utf-8') &n
mysql导入数据load data infile用法 daizj mysql 导入数据
我们常常导入数据！mysql有一个高效导入方法，那就是load data infile 下面来看案例说明基本语法： load data [low_priority] [local] infile 'file_name txt' [replace | ignore] into table tbl_name [fields [terminated by't'] [OPTI
phpexcel导入excel表到数据库简单入门示例 dcj3sjt126com PHP Excel
跟导出相对应的，同一个数据表，也是将phpexcel类放在class目录下，将Excel表格中的内容读取出来放到数据库中 <?php error_reporting(E_ALL); set_time_limit(0); ?> <html> <head> <meta http-equiv="Content-Type"
22岁到72岁的男人对女人的要求 dcj3sjt126com
22岁男人对女人的要求是：一，美丽，二，性感，三，有份具品味的职业，四，极有耐性，善解人意，五，该聪明的时候聪明，六，作小鸟依人状时尽量自然，七，怎样穿都好看，八，懂得适当地撒娇，九，虽作惊喜反应，但看起来自然，十，上了床就是个无条件荡妇。 32岁的男人对女人的要求，略作修定，是：一，入得厨房，进得睡房，二，不必服侍皇太后，三，不介意浪漫蜡烛配盒饭，四，听多过说，五，不再傻笑，六，懂得独
Spring和HIbernate对DDM设计的支持 e200702084 DAO 设计模式 spring Hibernate 领域模型
A：数据访问对象 DAO和资源库在领域驱动设计中都很重要。DAO是关系型数据库和应用之间的契约。它封装了Web应用中的数据库CRUD操作细节。另一方面，资源库是一个独立的抽象，它与DAO进行交互，并提供到领域模型的“业务接口”。资源库使用领域的通用语言，处理所有必要的DAO，并使用领域理解的语言提供对领域模型的数据访问服务。
NoSql 数据库的特性比较 geeksun NoSQL
Redis 是一个开源的使用ANSI C语言编写、支持网络、可基于内存亦可持久化的日志型、Key-Value数据库，并提供多种语言的API。目前由VMware主持开发工作。 1. 数据模型作为Key-value型数据库，Redis也提供了键（Key）和值（Value）的映射关系。除了常规的数值或字符串，Redis的键值还可以是以下形式之一： Lists （列表） Sets
使用 Nginx Upload Module 实现上传文件功能 hongtoushizi nginx
转载自： http://www.tuicool.com/wx/aUrAzm 普通网站在实现文件上传功能的时候，一般是使用Python，Java等后端程序实现，比较麻烦。Nginx有一个Upload模块，可以非常简单的实现文件上传功能。此模块的原理是先把用户上传的文件保存到临时文件，然后在交由后台页面处理，并且把文件的原名，上传后的名称，文件类型，文件大小set到页面。下
spring-boot-web-ui及thymeleaf基本使用 jishiweili spring thymeleaf
视图控制层代码demo如下： @Controller @RequestMapping("/") public class MessageController { private final MessageRepository messageRepository; @Autowired public MessageController(Mes
数据源架构模式之活动记录 home198979 PHP 架构活动记录数据映射
hello!架构一、概念活动记录（Active Record）：一个对象，它包装数据库表或视图中某一行，封装数据库访问，并在这些数据上增加了领域逻辑。对象既有数据又有行为。活动记录使用直截了当的方法，把数据访问逻辑置于领域对象中。二、实现简单活动记录活动记录在php许多框架中都有应用，如cakephp。 <?php /** * 行数据入口类 *
Linux Shell脚本之自动修改IP pda158 linux centos Debian 脚本
作为一名 Linux SA，日常运维中很多地方都会用到脚本，而服务器的ip一般采用静态ip或者MAC绑定，当然后者比较操作起来相对繁琐，而前者我们可以设置主机名、ip信息、网关等配置。修改成特定的主机名在维护和管理方面也比较方便。如下脚本用途为：修改ip和主机名等相关信息，可以根据实际需求修改，举一反三！ #!/bin/sh #auto Change ip netmask ga
开发环境搭建独浮云 eclipse jdk tomcat
最近在开发过程中，经常出现MyEclipse内存溢出等错误，需要重启的情况，好麻烦。对于一般的JAVA+TOMCAT项目开发，其实没有必要使用重量级的MyEclipse，使用eclipse就足够了。尤其是开发机器硬件配置一般的人。 &n

按字母分类： A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 其他