lpmd.core.scrape module

lpmd.core.scrape.

class lpmd.core.scrape.BaseScraper(data_id)[source]

Bases: object

Base class for scraper.

Other scraper classes should inherit from this class. The class has base methods for scraping data, and in other scraper subclass get_scraped_data should be customized.

Parameters

data_idstr: String expressing which data should be scraped in data_catalogue.yml.

__init__(data_id)[source]

get_scraped_data(partition_id)[source]

Get scraped data corresponding to partition_id.

Parameters

partition_idstr: String expressing which partition data should be scraped in data_catalogue.yml.

Returns

df_scrapedpandas.core.frame.DataFrame: Scraped data that are not cleansed.

save_scraped_data(partition_id, path=None, **kwargs)[source]

Save scraped data.

Parameters

partition_idstr: String expressing which partition data should be scraped in data_catalogue.yml.
pathstr, default None: Sting expressing the path to save. If None, scraped data is stored at working current directory.
kwargs: Additional keyword arguments passed to pandas.DataFrame.to_csv.

Returns

has_savedbool: If scraped data is successfully saved, True. Otherwise, False.

save_batch(path=None, **kwargs)[source]

Save scraped data in batches that are defined in partition section of data_catalogue.yml.

Parameters

pathstr, default None: Sting expressing the path to save. If None, scraped data is stored at working current directory.
kwargs: Additional keyword arguments passed to pandas.DataFrame.to_csv.

Returns