
This Node.js application uses the Google Custom Search API to retrieve search results based on specified criteria such as keywords, domains (including subdomains), locations, and publication date. The script filters out results containing undesirable phrases and continues paginating until it accumulates the number of unique new results defined by your configuration. These unique results are then appended to a CSV file, and their URLs are stored in a separate file so that duplicates are skipped in future searches.
Note: The API credentials and sensitive configuration details are stored in a
config.jsonfile, which is excluded from version control via.gitignoreto protect your credentials. Your search results, stored in thegoogle_search_results.csvfile, and your list of processed URLs, stored in theprocessed_urls.txtfile, are also excluded from version control via.gitignoreto protect the privacy of your searches.
intitle: operator).site: operator to restrict results to specified domains (and all of their subdomains).maxResults value.processed_urls.txt) to avoid reprocessing duplicates on subsequent searches.google_search_results.csv instead of overwriting them.Clone the repository to your local machine:
git clone https://github.com/thejessicafelts/job-seeker.git
cd job-seeker
Install the required Node.js dependencies in your project directory:
npm install node-fetch@2
Create a file named config.json in the root of your repository. This file should contain your search criteria and API credentials. A sample configuration file is provided in the repository as sampleConfig.json; you should rename it to config.json and update the values as needed. A sample file for processed URLs is also provided in the repository as sampleProcessedUrls.txt; you should rename it to processed_urls.txt.
Below is an example configuration:
{
"intitleKeywords": ["frontend developer"],
"keywords": ["experienced", "senior"],
"avoidKeywords": ["government clearance"],
"minDate": ["2025-01-01"],
"domains": ["workday.com", "icims.com"],
"locations": ["Remote", "USA"],
"maxResults": 10,
"apiKey": "YOUR_API_KEY",
"cx": "YOUR_CSE_ID"
}
IMPORTANT:
config.json to your repository. The .gitignore file in this repository already excludes config.json to protect your sensitive information."YOUR_API_KEY" and "YOUR_CSE_ID with your actual credientials (see next section for instructions on how to set these up).Getting Your API Key:
config.json file under the "apiKey" field.Creating a Custom Search Engine (CSE) and Obtaining the CSE ID:
cx parameter).config.json file under the "cx" field.Once the configuration is complete, run the script with:
node googleSearch.js
The script will:
"maxResults" new unique results are accumulated.google_search_results.csv file.processed_urls.txt so that subsequent runs do not process duplicates.Feel free to adjust the parameters in config.json to meet your specific search criteria. The main script (googleSearch.js) is modular and uses clearly defined functions, making it straightforward to modify the query logic or output processing as needed.