Usage

To use coast in a project:

import coast

In the terminal

Currently under development

Modules

The Search module

How we conduct searching

To find the best or better online articles, we try to determine online articles that are relevant and rigorous. For relevance, we rely on the topic model used by Google. For rigour, we further distinguish between reasoning and experience.

Relevance and rigour are inherently challenging to assess for research. Such challenges increase when using a search engine to find appropriate online content. Consider, for example, that a search engine:

  • typically uses a keyword-based search query, and such queries do not necessarily allow for finer-grained searching;
  • is likely to optimise the search results to the searcher, based on the search engine’s history of prior searches by that searcher;
  • is likely to maintain its own topic model to determine relevance of results e.g. whether an online article relates to software testing

COAST uses the Google Custom Search API [1] to automate the Google searches. Google, like other online search engines, uses a keyword based search. We therefore need to implement relevance and rigour in terms of keywords.

For the criterion of relevance, COAST uses a simple keyword search (e.g. the presence of the words <“software” AND “testing”> in an online document) and to allow the Google search engine’s internal topic model(s) to find relevant content based on that simple keyword search.

For the criterion of rigour, and specifically reasoning and experience, COAST uses a set of keywords to seek to limit searches based on reasoning and, separately, experience. We provide sample reasoning and experience indicators at https://github.com/zedrem/coast/sample_data.

COAST uses a logic of nine distinct queries, summarised in the table below and visualised in the figure (see also the example below for one of the actual search strings).

Table: Logic for each set of searches and resulting datasets (T=topic; R=reasoning; E=Experience; !=logical not)

Search Set T R E !T !R !E
1       x x x
2   x   x   x
3   x x x    
4     x x x  
5 x x       x
6 x x x      
7 x   x   x  
8 x       x x
9       o x x
_images/venn-diagram.png

Example: A search query for one of our search segments (S1: !(T + R + E))

todays_random_query + ' -”software” -”testing” -"i" -"me" -"we" -"us" -"my" -"experience" -"experiences" -"experienced" -"our" -"but" -"because" -"for example" -"due to" -"first of all" -"however" -"as a result" -"since" -"reason" -"therefore"'

To clarify, search set S1 would generate dataset S1. Ideally, we want the search engine to find online content that contains reasoning and experience relating to software testing. The search query for set S6 targets that ideal content. We conduct the other eight sets of searches to allow us to evaluate the quality of content in S6, and to evaluate the frequency of URL citations to research. For example, search S3 is intended to find online content that contains reasoning and experience, but where the content is not about software testing.

Setting up the search engines

In order to use COAST, you will need to set up 9 instances of Googles Custom Search API. Instructions on how to do this can be found at: https://developers.google.com/custom-search/json-api/v1/overview.

Once you have set up your nine search engines, you will need to create and record the API keys and search engine IDs in a JSON file. This must use the following format:

::
{
“search_engines”: [
{
“segment_id”: 1, “name”: “cse-1”, “api_key”: “api-key1”, “search_engine_id”: “search_engine_id1”

}, {

“segment_id”: 2, “name”: “cse-2”, “api_key”: “api-key2”, “search_engine_id”: “search_engine_id2”

]

}

Setting up the database

COAST also requires MongoDB to record all results. Instructions on how to set up MongoDB can be found at: https://docs.mongodb.com/manual/installation/.

Test that MongoDB is installed correctly and working before continuing.

Conducting the searches

The easiest way to use the search module is by using the run_coast_daily_search function. This can be wrapped up into simple Python script and ran as a cronjob (or scheduled task on Windows).

::

from coast import search

config_file_path = “./path/to/config/file.json”

search.run_coast_daily_search(config_file_path)

This function takes in a config file as a parameter, so first we need to create the config file. The config file is a JSON file that should contain the following keys:

  • start_date - The date when the search period began (the first day you ran the search). This should be in the following format ‘dd-mm-yyyy’
  • topic_file - The path to a text file which contains all of the topic keywords & phrases that you wish to search on.
  • reasoning_file - The path to a text file which contains all of the reasoning markers that you wish to use (We have provided a sample at: https://github.com/zedrem/coast/sample_data).
  • experience_file - The path to a text file which contains all of the experience markers that you wish to use (We have provided a sample at: https://github.com/zedrem/coast/sample_data).
  • api_details_file - The path to your API config file that you created in the ‘Setting up the search engines’ section.
  • db_url - The URL to your MongoDB instance (e.g. http://localhost:27017).
  • db_client - The name of the database instance to use (e.g. ‘coast_test’).
  • number_of_runs - The number of times that each query will be run.
  • number_of_results - The number of results to be returned by each search.
  • search_backup_dir - The path to a directory where you will store a copy of the output from each query. This means that the data isn’t lost if there is ever a problem writing to the database.

NOTE: The Google Custom Search API is limited to 100 free searches per day. So you have to keep this in mind when deciding your number_of_runs and number_of_results (each search returns 10 results). For example, one possible solution could be to use 10 runs of 100 results (10 pages); 10 * 10 = 100 total searches.

The resulting config file should look like as follows:

::
{
“start_date”: “20-04-2018”, “topic_file”: “./path/to/topic_file.txt”, “reasoning_file”: “./path/to/reasoning_file.txt”, “experience_file”: “./path/to/experience_file.txt”, “api_details_file”: “./api_config_test.json”, “db_url”: “mongodb://localhost:27017”, “db_client”: “coast_test”, “number_of_runs”: 3, “number_of_results”: 30, “search_backup_dir”: “./path/to/backup/dir/”

}

The full list of functions available is given below.

Function list

The Extraction module

The Markers module

Currently under development

The Citations module

Currently under development

The Clarity of Writing module

Currently under development