RapidHarvest Plugings Development

Now you can develop your plugins for downloading files from special sites.
You need to have some knowledge in programming and JavaScript. RapidHarvest gives the possibility to control the download process using standard java script and some provided RapidHarvest functions that you are able to call using JavaScript.

Engines description:

One engine describes one server type. When RapidHarvest starts it searchs for all the files with .eng extension located in the "/engines" directory under the directory where RapidHarvest.exe file is located.

Each .eng file is in XML format and describes one engine.

Engine file description:

<parameters>

<parameter name="engine_name">EngineName</parameter>
<parameter name="url">server.com</parameter>
<parameter name="script">engine.js</parameter>
<parameter name="user_name">username</parameter>
<parameter name="password">password</parameter>

</parameters>

EngineName : Is the Name given to a Engine. It will identify the engine internally so it must be different for each engine.
server.com: It must be replaced by the server that you want to associate to this engine.
engine.js:
It points to the JavaScript file that will be used to process the urls associated with this engine.
username: Optional parameter, this value will be accesible from the script using the "user_name" internal script parameter value. Please read the next section for information about internal script parameters.
password: Optional parameter, this value will be accesible from the script using the "password" internal script parameter value. Please read the next section for information about internal script parameters.

Script File Description:

Each engine has associated an script file to it. This is donne using the field "script" in the engine file.
This script file is a file with one mandatory function:
- onStart() : Is the main enter point. This function is called by RapidHarvest each time an URL start being processed.

The two files must be placed under the "engines" directory located in the same directory that rapidharvest.exe.

Communicate with RapidHarvest from Script functions:

For getting function parameters, setting function results and calling RapidHarvest internal functions we use two main Functions:
RH_SetParameter(parameterName,paramaterValue): This function is used for setting one parameter in RapidHarvest.

RH_GetParameter(parameterName) : This function is used for getting one parameter from RapidHarvest.

parameterName : Name of the parameter. You can create your own parameters calling this function. All this paramaters will be valid pending all the time life of the URL download work.
There are special parameter names that will serve to "put" and "get" RapidHarvest information. All this special parameters will be discussed in the next section and depends of the function being called.

Calling RapidHarvest Functions:

RapidHarvest allow calling the following functions from a Script:

post,get,wait,download

All these functions call are asynchronous is why we must define the call back function fore each functionc all. Another constraint is that you can call only one rapidharvest function each time this function is really called when the actual java script function finished.

To call this functions from an script you must set the parameter: "call" using the RH_SetParameter function. A typical example:

function onStart()
{

RH_SetParameter("REQUEST_URL","http://servertopost/action.php");
RH_SetParameter("REQUEST_POST_DATA","postvariable1=value&postvaraible2");
RH_SetParameter("call","post");
RH_SetParameter("callback","onProcessHtmlPageLink();");

}

 

Explanation:

The first two lines set the parameter "REQUEST_URL" and "REQUEST_POST_DATA" needed by the function to call. The call to RH_SetParameter("call","post") will actually indicate to RapidHarvest the function to call. As you can only call one RapidHarvest function by time then you need to set the variable "callback" to the name of the function that will be called once the RapidHarvest function will finish. You also must define this function in your script. Returning values will be obtained in the CallBack function that you must define in your script. From this function you can call again another function setting the "call" parameter.

Functions Parameters and Results

Each time RapidHarvest call your callback function it returns the result of the called function. You can get this values calling RH_GetParameter(parameterName)

parameterName: is the name of the parameter that we want to get.

Example: From the last example we can define the callback function like this:

function onProcessHtmlPageLink()
{
var bErrors;
bErrors = false;

var sTextHTML,sHTMLHeaders;

sHTMLHeaders = RH_GetParameter("RESULT_HEADERS");
sTextHTML = RH_GetParameter("RESULT_TEXT");

RH_Alert(sHTMLHeaders);

...

}

Explanation: The first Call gets the "RESULT_HEADERS" parameter. In this example this value contains the headers from the last "post" request. It also gets the "RESULT_TEXT" this value contains the HTML text obtained from the last "post" request. The call to RH_Alert(sHTMLHeaders) shows an alert with the headers information. (Please see below for a complet list of functions with parameter and result descriptions)

 

onStart() function:

This function is the first function called when a download process starts.
You can get the following initial parameters:
"URL" : Original url for the download work.
"user_name" : user name used for this engine. It is set in the engine file.
"password" : password used for this engine. It is set in the engine file.

Remember: you get all this parameters using RH_GetParameter function.

Example: We are going to complete our first example function.

function onStart()
{
sURL = RH_GetParameter("URL");

RH_SetParameter("REQUEST_URL",sURL);
RH_SetParameter("REQUEST_POST_DATA","postvariable1=value&postvaraible2");
RH_SetParameter("call","post");
RH_SetParameter("callback","onProcessHtmlPageLink();");

}

RapidHarvest Functions:

function Description Input Parameters Returned Parameters
post Make a post request
  • "REQUEST_URL": Url to post in.
  • "REQUEST_HEADERS": Headers to send.
    Format: header1=value1&header2=value2...
  • "REQUEST_POST_DATA": Post data to send. Format: data1=value1&data2=value2...
  • "RESULT_HEADERS": Headers in result.
    Format: Header1=value1&header2=value2...
  • "RESULT_TEXT": text of the result.
get Make a get request
  • "REQUEST_URL": Url to get in.
  • "REQUEST_HEADERS": Headers to send.
    Format: header1=value1&header2=value2...
  • "RESULT_HEADERS": Headers in result.
    Format: Header1=value1&header2=value2...
  • "RESULT_TEXT": text of the result.
wait Wait "n" seconds
  • "WAIT_TIME": Time to wait in seconds.
  • "REQUEST_URL": Optional, used to show the url link when viewing download properties (Not used for downloading only for info).
 
download Download one file. this file will be downloaded and stored using the file name by default (from the url) or using the parameter "FILE_NAME" if provided
  • "REQUEST_URL": Url to the file.
  • "REQUEST_HEADERS": Headers to send.
    Format: header1=value1&header2=value2...
  • "METHOD": you can specify "post" to make the download using "post" method
  • "IN_TEXT_ERROR": you can specify a text. If this text is found in the download then the download will be flaged erroneous.
  • "REPLACE": if it is set to "1" and the target file exists allready it will be replaced by the current one.
  • "FILE_NAME": Optional, file name to use to store the file.
  • "FILE_NAME": final file name. Is the true filename that was used.
execute_exe It will execute one external exe file.
  • "EXE_FILE": the absolute or relative exe file path, ATTENTION: the relative path is set from the engines directory. Tha means that if you copy an exe file to the engine directory you can call it directly usin the exe file name. Ex: exefile.exe
  • COMMAND_LINE: the command line to call the exe file. ATTENTION: you must include a blank space at the beggining. Ex: " p1 p2"
This function doesn't implement yet result Parameters, if you need to comunicate with exe files use temporary created files.

Every function can also set the variables: "ERRORS" , "ERROR_DESC" and "LIMIT_REACHED" to indicate that an error has occurred.

ERRORS: Indicates that an error has occurred.
ERROR_DESC: Gives the description of the error. It will be showed int the column status in the download list.
LIMIT_REACHED: Set the ERRORS variable to "1" and "LIMIT_REACHED" variable to "1" to engage the waiting process for all the downloads that correspond with this engine.

There are also some function utils functions that are not asynchronous:

 

Extra Useful Parameters

You can use also the RH_GetParameter function to obtain some useful values: