Interface - phantomjs - API Reference - headless WebKit with JavaScript API - Google Project Hosting

Interface - phantomjs - API Reference - headless WebKit with JavaScript API - Google Project Hosting

API Reference

Updated
Apr 1, 2012

 

by [email protected]

Applies to: PhantomJS 1.5.

Note: This page serves a reference. To learn step-by-step on how to use PhantomJS, please refer to the quick start guide.

Assuming PhantomJS is built and its executable is place somewhere in the PATH, it can invoked as follows:

phantomjs [options] somescript.js [argument [argument ...]]

The script code will be executed as if it running in a web browser with an empty page. Since PhantomJS is headless, there will not be anything visible shown up on the screen.

If PhantomJS is invoked without any argument, it will enter the interactive mode (REPL).

Command-line Options

Supported command-line options are:

  • --cookies-file=/path/to/cookies.txt specifies the file name to store the persistent cookies.
  • --disk-cache=[yes|no] enables disk cache (at desktop services cache storage location, default is 'no').
  • --help or -h lists all possible command-line options.
  • --ignore-ssl-errors=[yes|no] ignores SSL errors, such as expired or self-signed certificate errors (default is no).
  • --load-images=[yes|no] load all inlined images (default is 'yes').
  • --local-to-remote-url-access=[yes|no] allows local content to access remote URL (default is no).
  • --max-disk-cache-size=size limits the size of disk cache (in KB)
  • --output-encoding=encoding sets the encoding used for terminal output (default is utf8).
  • --proxy=address:port specifies the proxy server to use (e.g. --proxy=192.168.1.42:8080).
  • --proxy-type=[http|socks5] specifies the type of the proxy server.
  • --script-encoding=encoding sets the encoding used for the starting script (default is utf8).
  • --version or -v prints out the version of PhantomJS.
  • --web-security=[yes|no] disables web security and allows cross-domain XHR (default is yes)

Rather than passing all options in the command-line, it is also possible to store the options in a file using JavaScript Object Notation (JSON) and then tell PhantomJS to read it:

phantomjs --config=/path/to/config.json script.js arg1 arg2 arg3

where the contents of config.json looks like:

{
    'ignoreSslErrors': true,
    'localToRemoteUrlAccessEnabled': true
}    

'phantom' Object

The interface with various PhantomJS functionalities is carried out using a new host object named phantom, added as a child of the `window` object. The properties and functions of the phantom object are described in the following sections.

Properties

args (array)

This read-only property is an array of the arguments passed to the script. Deprecated: Please use system.args from the System module.

libraryPath (string)

This property stores the path which is used by injectJs function to resolve the script name. Initially it is set to the location of the script invoked by PhantomJS.

scriptName (string)

This read-only property stores the name of the invoked script file. Deprecated: Please use system.args[0] from the System module.

version (object)

This read-only property holds PhantomJS version. Example value: { major:1, minor:0, patch:0 }.

Functions

exit(returnValue)

Exits the program with the specified return value. If no return value is specified, it is set to 0.

injectJs(filename)

Injects external script code from the specified file. If the file can not be found in the current directory, libraryPath is used for additional look up.

This function returns true if injection is successful, otherwise it returns false.

'WebPage' Object

WebPage object encapsulates a web page. It is usually instantiated using the new keyword. The properties, functions, and callbacks of the WebPage object are described in the following sections.

Properties

clipRect (object)

This property defines the rectangular area of the web page to be rasterized when render() is invoked. If no clipping rectangle is set, render() will process the entire web page.

Example: page.clipRect = { top: 14, left: 3, width: 400, height: 300 }

content (string)

This property stores the content of the web page, enclosed in HTML/XML element. Setting the property will effectively reload the web page with the new content.

libraryPath (string)

This property stores the path which is used by injectJs function to resolve the script name. Initially it is set to the location of the script invoked by PhantomJS.

settings (object)

This property stores various settings of the web page:

  • javascriptEnabled defines whether to execute the script in the page or not (default to true)
  • loadImages defines whether to load the inlined images or not
  • localToRemoteUrlAccessEnabled defines whether local resource (e.g. from file) can access remote URLs or not (default to false)
  • userAgent defines the user agent sent to server when the web page requests resources.
  • userName sets the user name used for HTTP authentication
  • password sets the password used for HTTP authentication
  • XSSAuditingEnabled defines whether load requests should be monitored for cross-site scripting attempts (default to false)
  • webSecurityEnabled defines whether web security should be enabled or not (default to `true)

Note: The settings apply only during the call to the open() function. Subsequent modification of the settings will not have any impact.

viewportSize (object)

This property sets the size of the viewport for the layout process. It is useful to set the preferred initial size before loading the page, e.g. to choose between landscape vs portrait.

Because PhantomJS is headless (nothing is shown), !viewportSize effectively simulates the size of the window like in a traditional browser.

Example: page.viewportSize = { width: 480, height: 800 }

Functions

evaluate(function)

Evaluates the given function in the context of the web page. The execution is sandboxed, the web page has no access to the phantom object and it can't probe its own setting. Any return value must be of a simple object, i.e. no function or closure.

Example:

console.log('Page title is ' + page.evaluate(function () {
    return document.title;
}));

includeJs(URL, callback)

Includes external script from the specified URL (usually remote location) and executes the callback upon completion.

Example:

page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
    // jQuery is loaded, now manipulate the DOM
});

injectJs(filename)

Injects external script code from the specified file. If the file can not be found in the current directory, libraryPath is used for additional look up.

This function returns true if injection is successful, otherwise it returns false.

open(URL, optional_callback)

Opens the URL and loads it to the page. Once page is loaded, the optional callback is called using onLoadFinished, and also provides the page status to the function ('success' or 'fail').

Example:

page.open('http://www.google.com/', function(status) {
    console.log(status);
    // do something here
});

release

Releases memory heap associated with this page. Do not use the page instance after calling this.

Due to some technical limitation, the web page object might not be completely garbage collected. This is often encountered when the same object is used over and over again. Calling this function may stop the increasing heap allocation.

render(fileName)

Renders the web page to an image buffer and save it as the specified file.

Currently the output format is automatically set based on the file extension. Supported formats are PNG, JPEG, and PDF.

sendEvent(type, ...)

Sends an event to the web page.

The first argument is the event type. Supported type are mouseup, mousedown, mousemove, and click. The next two arguments represents the mouse position.

As of now, left button is the only pressed button for the event. For mousemove however, there is no button pressed (i.e. it is not dragging).

The events are not like synthetic DOM events. Each event is sent to the web page as if it comes as part of user interaction.

uploadFile(selector, fileName)

Uploads the specified file to the form element associated with the selector.

This function is used to automate the upload of a file which is usually handled with a file dialog in a traditional browser. Since there is dialog in this headless mode, such an upload mechanism is handled via this special function.

Example:

page.uploadFile('input[name=image]', '/path/to/some/photo.jpg');

Callbacks

onAlert

This callback is invoked when there is a JavaScript alert. The only argument passed to the callback is the string for the message.

onConsoleMessage

This callback is invoked when there is a JavaScript console. The callback may accept up to three arguments: the string for the message, the line number, and the source identifier.

By default any console message from the web page is not displayed. Using this callback is a typical way to redirect it, such as:

page.onConsoleMessage = function (msg) { console.log(msg); };

onError

This callback is invoked when there is a JavaScript execution error. It is a good way to catch problems when evaluating a script in the web page context. The arguments passed to the callback are the error message and the stack trace (as an array).

Example:

page.onError = function (msg, trace) {
    console.log(msg);
    trace.forEach(function(item) {
        console.log('  ', item.file, ':', item.line);
    })
}

onInitialized

This callback is invoked after the web page is created and before a URL is loaded. The callback may be used to change global objects.

onLoadFinished

This callback is invoked when the page finishes the loading. It may accept an argument status which equals to "success" if there is no error and "failed" is error has occurred.

onLoadStarted

This callback is invoked when the page starts the loading. There is no argument passed to the callback.

onResourceRequested

This callback is invoked when the page requests a resource. The only argument to the callback is the request object.

onResourceReceived

This callback is invoked when the a resource requested by the page is received. The only argument to the callback is the request object.

If the resource is large and sent by the server in multiple chunks, onResourceReceived will be invoked for every chunk received by PhantomJS.

Module

Module API modelled after CommonJS Modules is available, currently only supporting webpage, system, and fs built-in modules.

For compatibility reason, WebPage object at the global scope is still available. It will be deprecated in some future release. The new recommended way to create a web page is as follows:

var page = require('webpage').create();

page.open(url, function (status) {
  // do something
});

System Module

A set of functions to access system-level functionalities is available, modelled after CommonJS System proposal.

To start using, it needs to be instantiated via the system module such as:

var system = require('system');

Read-only properties:

  • platform is the name of the platform, the value is always phantomjs.

Query functions:

  • env returns the list (as key value pair) of the environment variables.
  • args returns the list of command-line arguments. The first one is always the script name, it is followed by the subsequent arguments.

An example printenv.js demonstrates the same functionality as in the Unix printenv utility:

var system = require('system'),
    env = system.env,
    key;

for (key in env) {
    if (env.hasOwnProperty(key)) {
        console.log(key + '=' + env[key]);
    }
}
phantom.exit();

An example arguments.js prints all the command-line arguments:

var system = require('system');
if (system.args.length === 1) {
    console.log('Try to pass some args when invoking this script!');
} else {
    system.args.forEach(function (arg, i) {
            console.log(i + ': ' + arg);
    });
}
phantom.exit();

If the script is invoked:

phantomjs arguments.js answer 42

gives the following result:

0: arguments.js
1: answer
2: 42

Filesystem Module

A set of API functions is available to access files and directories. They are modelled after CommonJS Filesystem proposal.

To start using, it needs to be instantiated via the fs module such as:

var fs = require('fs');

Read-only properties:

  • separator is the path separator (forward slash or backslash, depending on the operating system).
  • workingDirectory is the current working directory.

Query functions:

  • list(path) returns the list of all the names of all the files in a specified path.
  • absolute(path) returns the absolute path starting from the root file system, resolved from the current working directory.
  • exists(path) returns true if a file or a directory exists.
  • isDirectory(path) returns true if the specified path is a directory.
  • isFile(path) returns true if the specified path is a file.
  • isAbsolute(path) returns true if the specified path is an absolute path.
  • isExecutable(path) returns true if the specified file can be executed.
  • isReadable(path) returns true if a file or a directory is readable.
  • isWritable(path) returns true if a file or a directory is writeable.
  • isLink(path) returns true if the specified path is a symbolic link.
  • readLink(path) returns the target of a symbolic link.

Directory-related functions:

  • changeWorkingDirectory(path) changes the current working directory to the specified path.
  • makeDirectory(path) creates a new directory.
  • makeTree(path) creates a directory including any missing parent directories.
  • removeDirectory(path) removes a directory if it is empty
  • removeTree(path) removes the specified path, regardless of whether it is a file or a directory.
  • copyTree(source, destination) copies the entire files from a source path to the destination path.

File-related functions:

  • open(path, mode) returns a stream object representing the stream interface to the specified file (mode can be r for read, w for write, or a for append).
  • read(path) returns the entire content of a file.
  • write(path, content, mode) writes content to a file (mode can be w for write or a for append).
  • size(path) returns the size (in bytes) of the file specified by the path.
  • remove(path) removes the file specified by the path.
  • copy(source, destination) copies a file to another.
  • move(source, destination) movies a file to another, effectively renaming it.
  • touch(path) touches a file (i.e. changes its access timestamp).

A stream object returned from the open() function has the following functions:

  • read returns the content of the stream.
  • write(data) writes the string to the stream.
  • readLine reads only a line from the stream and return it.
  • writeLine(data) writes the data as a line to the stream.
  • flush() flushes all pending input output.
  • close() completes the stream operation.

WebServer Module

Starting from version 1.4, PhantomJS script can start a web server. The implementation is using the embedded web server module Mongoose.

A very simple example is as follows. It always gives the same response for any request.

var server, service;

server = require('webserver').create();

service = server.listen(8080, function (request, response) {
    response.statusCode = 200;
    response.write('<html><body>Hello!</body></html>');
    response.close();
});

The request object passed to the callback function may contain the following properties:

  • method defines the request method (GET, POST, ...)
  • url contains the complete request URL, including the query string (if any)
  • httpVersion has the actual HTTP version
  • headers stores all HTTP headers as key-value pair
  • post request body (only POST and PUT method)
    • if Content-Type is set to application/x-www-form-urlencoded (default for FORM submit), the content will be URL-decoded and the undecoded content will be stored in an extra property postRaw

The response object should be used to create the response:

  • headers stores all HTTP headers as key-value pair (set these BEFORE calling write for the first time)
  • statusCode sets the returned status code
  • write(data) sends a chunk for the response body (it can be called multiple times)
  • close() close the HTTP connection
    • To avoid Client detecting a connection drop, remember to use write() at least once. Sending an empty string (i.e. write("")) would be enough if the only aim is, for example, to return a HTTP 200 Success

Note 1: This WebServer module is intended for ease of communication between PhantomJS scripts and the outside world. It is not recommended to use it as a general production server.

Note 2: The API for the module is still very experimental. Depending on the needs, the functionalities and the corresponding API will be expanded in the next versions.

你可能感兴趣的:(JavaScript)