|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface WebBot
Download web pages using a simple emulated browser, without evaluating JavaScript. Images and content files on the page are downloaded, concurrently much like a real browser. Frames are also downloaded.
Caching can be turned on/off using enableCache(boolean)
.
The concurrent downloading behaviour can be adjusted by calling
emulateBrowser(String)
Cookies are saved and recorded by HttpClient
as for normal requests.
Headers can be added or overridden for every download.
Resource finding is done by parsing the HTML pages downloaded. Parsing can be slow and can miss things downloaded by javascript, or added to DOM by JavaScript. To enable custom parsing, or pre-baking graphs of items to download a JavaScript callback function can be called after every item is downloaded.
var web = require('webbot'); // Uncomment the following to emulate Webmetrics Fullpage Breakdown // web.emulateBrowser("wm"); // Grab the HttpClient to make more advanced requests var c = test.openHttpClient(); test.beginTransaction(); test.beginStep("step1", 20000); var get = c.newGet("http://www.bbc.co.uk/news"); var r = web.execute(get); r.searchString("news"); test.endStep(); test.endTransaction();
Nested Class Summary | |
---|---|
static interface |
WebBot.Response
The list of responses made during an execute() request. |
Method Summary | |
---|---|
void |
addHeader(java.lang.String name,
java.lang.String value)
Add a header to each request made. |
void |
autoAddHeaders(boolean autoAdd)
Enable/Disable auto adding of some common headers. |
void |
clearCache()
Clear any currently cached data. |
void |
emulateBrowser(java.lang.String browser)
Emulate the given browser's concurrent downloads, user agent and headers. |
void |
enableCache(boolean enableCache)
Enable/Disable the use of the cache. |
WebBot.Response |
execute(HttpRequest request)
Perform the given Http request. |
WebBot.Response |
execute(HttpRequest request,
NativeFunction callback)
Perform the given Http request, for every item that is downloaded call the given callback. |
void |
loadFrames(boolean loadSubFrames)
Enable/Disable the loading of sub frames. |
void |
removeHeader(java.lang.String name)
Remove a header. |
void |
resetHeaders()
Remove all user added headers. |
void |
setMaxConnections(int maxConnections)
Set a limit to the maximum amount of concurrent connections open at any one time. |
void |
setMaxConnectionsPerHost(int maxConnectionsPerHost)
Set a limit to the maximum amount of concurrent connections allowed to be open towards a single host. |
Method Detail |
---|
void emulateBrowser(java.lang.String browser)
browser
- the browser to emulate. Possible values are 'wm', 'chrome', 'firefox', 'ie8', 'ie9'.void setMaxConnections(int maxConnections)
Default is 35.
maxConnections
- the maximum number of concurrent connectionsvoid setMaxConnectionsPerHost(int maxConnectionsPerHost)
The default is 6.
maxConnectionsPerHost
- the maximum number of concurrent connections per hostvoid loadFrames(boolean loadSubFrames)
loadSubFrames
- if true frames are parsed for further downloadsvoid enableCache(boolean enableCache)
enableCache
- if true items can be cached.void clearCache()
void addHeader(java.lang.String name, java.lang.String value)
name
- the header name to addvalue
- the header value to addvoid removeHeader(java.lang.String name)
name
- the header to removevoid resetHeaders()
void autoAddHeaders(boolean autoAdd)
Accept-Encoding: gzip,deflate Pragma: no-cache User-Agent: <useragent> Accept-Language: en-US,en Accept: *//*Default is on.
autoAdd
- if true the above headers are automatically addedWebBot.Response execute(HttpRequest request)
The cache is checked, and if the item is not there it is downloaded. Any Html that is downloaded will be searched for images, css and script links and these will be downloaded. No JavaScript is executed however. Frames will also be downloaded and the same process to downloading their images, css and scripts applies.
request
- This object can be obtained via call to HttpClient.newGet(String)
,
HttpClient.newPost(String)
, etc...
WebBot.Response execute(HttpRequest request, NativeFunction callback)
Items downloaded will not be added to the cache.
The callback takes the request response and returns an array of new response to made.
var r = web.execute(get, function(response) { if (response.getUrl() === "http://somesite.biz") { return [ c.newGet("http://somesite.biz/images/logo.png"), c.newGet("http://somesite.biz/css/default.css"), c.newGet("http://somesite.biz/script/common.js") ]; } else { return []; } });
request
- This object can be obtained via call to HttpClient.newGet(String)
,
HttpClient.newPost(String)
, etc...callback
- a javascript function that takes a response and generates new requests to make
given the response.
|
© 2023 Vercara, LLC. All Rights Reserved. | |||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |