## get links of countries from table of sovereign states
Xpath to select data from table
Country:
```
//table[contains(@class, 'sortable') and contains(@class, 'wikitable')]/tbody/tr[not(contains(@style, 'background'))]/td[1 and contains(@style, 'vertical-align:top;')]/b/a
```
## scrapy
Pagination guide:
Response:
Using selectors:
Download files/images:
Exporting JSON:
### new project
```
scrapy startproject wikipedia_country_scraper
```
### create spider
```
scrapy genspider countrydownloader https://en.wikipedia.org/wiki/List_of_sovereign_states
```
### using scrapy shell
- Install `ipython`:
```
poetry add ipython
```
- Add to `scrapy.cfg` under `[settings]`:
```
shell = ipython
```
- Run scrapy shell:
```
scrapy shell
```
- Fetch an URL:
```
fetch("https://en.wikipedia.org/wiki/List_of_sovereign_states")
```
- Print the response:
```
response
```
- Extract data using xpath:
```
countries = response.xpath("//table[contains(@class, 'sortable') and contains(@class, 'wikitable')]/tbody/tr[not(contains(@style, 'background'))]/td[1 and contains(@style, 'vertical-align:top;')]/b/a/@href")
countries[0]
```
- Extract the data:
```
countries[0].get()
```
### Spider
[start_requests](https://docs.scrapy.org/en/latest/topics/spiders.html#scrapy.Spider.start_requests) generates a [Request](https://docs.scrapy.org/en/latest/_modules/scrapy/http/request.html#Request) for each url in `start_urls`.
By default if no callback is specified in a response, `parse()` is called.