lpmd.core.scrape module
lpmd.core.scrape.
- class lpmd.core.scrape.BaseScraper(data_id)[source]
Bases:
objectBase class for scraper.
Other scraper classes should inherit from this class. The class has base methods for scraping data, and in other scraper subclass get_scraped_data should be customized.
- Parameters
- data_idstr
String expressing which data should be scraped in data_catalogue.yml.
- get_scraped_data(partition_id)[source]
Get scraped data corresponding to partition_id.
- Parameters
- partition_idstr
String expressing which partition data should be scraped in data_catalogue.yml.
- Returns
- df_scrapedpandas.core.frame.DataFrame
Scraped data that are not cleansed.
- save_scraped_data(partition_id, path=None, **kwargs)[source]
Save scraped data.
- Parameters
- partition_idstr
String expressing which partition data should be scraped in data_catalogue.yml.
- pathstr, default None
Sting expressing the path to save. If None, scraped data is stored at working current directory.
- kwargs
Additional keyword arguments passed to
pandas.DataFrame.to_csv.
- Returns
- has_savedbool
If scraped data is successfully saved, True. Otherwise, False.
- save_batch(path=None, **kwargs)[source]
Save scraped data in batches that are defined in partition section of data_catalogue.yml.
- Parameters
- pathstr, default None
Sting expressing the path to save. If None, scraped data is stored at working current directory.
- kwargs
Additional keyword arguments passed to
pandas.DataFrame.to_csv.
- Returns
- dict_resultdict
Dict expressing whether partition_id in question is successfully saved.
- class lpmd.core.scrape.ScraperShipment[source]
Bases:
BaseScraperScraper class for data on livestock products shipment.
- get_scraped_data(partition_id)[source]
Get scraped data corresponding to partition_id for livestock products shipment.
- Parameters
- partition_idstr
String expressing which partition data should be scraped in data_catalogue.yml.
- Returns
- df_scrapedpandas.core.frame.DataFrame
Scraped data that are not cleansed.
- class lpmd.core.scrape.ScraperSlaughter[source]
Bases:
BaseScraperScraper class for data on animals slaughtered and abattoirs.
- get_scraped_data(partition_id)[source]
Get scraped data corresponding to partition_id for animals slaughtered and abattoirs.
- Parameters
- partition_idstr
String expressing which partition data should be scraped in data_catalogue.yml.
- Returns
- df_scrapedpandas.core.frame.DataFrame
Scraped data that are not cleansed.
- class lpmd.core.scrape.ScraperCarcass[source]
Bases:
BaseScraperScraper class for data on carcass.