This Node.js application uses the Google Custom Search API to retrieve search results based on specified criteria such as keywords, domains (including subdomains), locations, and publication date. The script filters out results containing undesirable phrases and continues paginating until it accumulates the number of unique new results defined by your configuration. These unique results are then appended to a CSV file, and their URLs are stored in a separate file so that duplicates are skipped in future searches.
Note: The API credentials and sensitive configuration details are stored in a
config.json
file, which is excluded from version control via.gitignore
to protect your credentials. Your search results, stored in thegoogle_search_results.csv
file, and your list of processed URLs, stored in theprocessed_urls.txt
file, are also excluded from version control via.gitignore
to protect the privacy of your searches.
intitle:
operator).site:
operator to restrict results to specified domains (and all of their subdomains).maxResults
value.processed_urls.txt
) to avoid reprocessing duplicates on subsequent searches.google_search_results.csv
instead of overwriting them.Clone the repository to your local machine:
git clone https://github.com/thejessicafelts/job-seeker.git
cd job-seeker
Install the required Node.js dependencies in your project directory:
npm install node-fetch@2
Create a file named config.json
in the root of your repository. This file should contain your search criteria and API credentials. A sample configuration file is provided in the repository as sampleConfig.json
; you should rename it to config.json
and update the values as needed. A sample file for processed URLs is also provided in the repository as sampleProcessedUrls.txt
; you should rename it to processed_urls.txt
.
Below is an example configuration:
{
"intitleKeywords": ["frontend developer"],
"keywords": ["experienced", "senior"],
"avoidKeywords": ["government clearance"],
"minDate": ["2025-01-01"],
"domains": ["workday.com", "icims.com"],
"locations": ["Remote", "USA"],
"maxResults": 10,
"apiKey": "YOUR_API_KEY",
"cx": "YOUR_CSE_ID"
}
IMPORTANT:
config.json
to your repository. The .gitignore
file in this repository already excludes config.json
to protect your sensitive information."YOUR_API_KEY"
and "YOUR_CSE_ID
with your actual credientials (see next section for instructions on how to set these up).Getting Your API Key:
config.json
file under the "apiKey"
field.Creating a Custom Search Engine (CSE) and Obtaining the CSE ID:
cx
parameter).config.json
file under the "cx"
field.Once the configuration is complete, run the script with:
node googleSearch.js
The script will:
"maxResults"
new unique results are accumulated.google_search_results.csv
file.processed_urls.txt
so that subsequent runs do not process duplicates.Feel free to adjust the parameters in config.json
to meet your specific search criteria. The main script (googleSearch.js
) is modular and uses clearly defined functions, making it straightforward to modify the query logic or output processing as needed.