Python Awesome » Scrape
113 FOLLOWERS
Find articles on web scraping in python, libraries and frameworks and various tools for web scraping. PythonAwesome's goal is to be the go-to website for developers looking for Python libraries and open source projects to get their work done.
Python Awesome » Scrape
2y ago
scrape-open-data
Scrapes every available dataset from Socrata and stores them as newline-delimited JSON in this repository, to track changes over time through Git scraping.
socrata/data.delaware.gov.jsonl contains the latest datasets for a specific domain. This is updated twice a day.
socrata/data.delaware.gov.stats.jsonl contains information on page views and download numbers. This is updated once a week to avoid every single fetch including updated counts for many different datasets.
scrape_socrata.py
Run python scrape_socrata.py socrata/ to scrape the data from Socrata and save it in the ..read more
Python Awesome » Scrape
2y ago
discord-badges-screaper
Hello world, ?
I made a badges finder that keeps a discord log that will be useful for you and I shared it right away.
5 line, user token ✨
3 line, ID of the server you want to scan ?
2 line, chat channel of that server ⭐
54 line, to discard the scanned webhook link ?
GitHub
View Github ..read more
Python Awesome » Scrape
2y ago
WebScrappy
WebScrappy it’s a easy to use library to do Web Scrapping
It’s not mature enough and i keep working for making it better
For now, you can read the little tutorial!
Download
git clone https://github.com/zsendokame/webscrappy
# Or
pip install webscrappy
Tutorial
import WebScrappy
import requests
get = requests.get('http://example.com')
findClass = WebScrappy.getClass(get.text, "The class you wan't to get!")
print(findClass)
# {'upbutton49': {'line': '49', 'class': 'upbutton', 'html': '<button onclick="action" id="upbutton">Go Up</button>', 'text': 'Text ..read more
Python Awesome » Scrape
2y ago
WebScraping
Web scraping Pyton program that scrapes Job website for python developer jobs and exports the data to a csv file.
Requests – downloads the data from the web server (HTTP protocol) and saves the response .
The response variable contains all the html data that can then be used to extract whatever information you need.
Beautiful Soup library is used to parse the html data.
Title, company name , location, salary and job summary are extracted to a python dictionary.
Pandas is used to load the data into a dataframe and export to csv.
GitHub
View Github ..read more
Python Awesome » Scrape
2y ago
SkyScrapers Collection of variety of Web Scraping Apps
The web-scrapers involved in this project are:
StockSymbolScraper
UnsplashImagesScraper
Tech
ScyScrapers uses a number of open source projects to work properly:
BeautifulSoup – Beautiful Soup is a Python library for pulling data out of HTML and XML files.
Selenium – Selenium is for automating web applications for testing purposes.
❗ ❗ ❗ Please check the markdown files named as THEORY inside respecetive project directories to get insight about the individual project.
And of course SkyScrapers itself is open source with a public reposi ..read more
Python Awesome » Scrape
2y ago
Vis’Yerres SGDF – Modules Woob
Vous avez le sentiment que l’intranet des Scouts et Guides de France ne
convient pas à l’usage de votre groupe ?
Vous réalisez une application s’orientant à l’usage unique de ce
groupe, et souhaitez interagir avec l’intranet ?
Vous avez toqué à la bonne porte !
Ce projet définit des modules Woob pour interagir avec l’intranet SGDF,
ainsi qu’avec quelques autres ressources.
Un exemple d’utilisation est le suivant :
from visyerres_sgdf_woob import MODULES_PATH
from woob.core.ouiboube import WebNip
woob = WebNip(modules_path=MODULES_PATH)
backend = woob.build_bac ..read more
Python Awesome » Scrape
2y ago
SimpleTelegramScraper – the best scraper on GitHub
This simple python script scrapes accounts from public groups via Telegram API and saves them in a CSV file with their username, usersID, access hash, groupName, groupID and last seen online.
You can choose to scrape all members, active members(users online today and yesterday), members active in the last week, past month or not active members.
It can scrape more than 95% of users in a group! Bots are not included in the CSV file. The admins are also saved separately on admins.csv file.
It can sometimes occur that towards the end a bug occurs ..read more
Python Awesome » Scrape
2y ago
Free HTTP Proxy List ?
It is a lightweight project that hourly scrapes lots of free-proxy sites, validates if it works, and serves a clean proxy list.
Scraper found 1272 proxies at the latest update. Usable proxies are below.
Usage
Click the file format that you want and copy the URL.
File
Content
Count
data.txt
ip_address:port combined (seperated new line)
153
data.json
ip, port
153
data-with-geolocation.json
ip, port, geolocation
153
Sources
free-proxy-list.net
us-proxy.org
proxydb.net
free-proxy-list.com
proxy-list.download
vpnoverview.com
proxyscan.io
proxylist.geonode.co ..read more
Python Awesome » Scrape
2y ago
subscrape
A Python scraper for substrate chains that uses Subscan.
Usage
copy config/sample_scrape_config.json to config/scrape_config.json and configure to your desire.
make sure there is a data/parachains folder
run
corresponding files will be created in data/
If a file already exists in data/, that operation will be skipped in subsequent runs.
Architecture
On overview is given in this Twitter thread: https://twitter.com/alice_und_bob/status/1493714489014956037
General
We use the following methods in the projects:
Logging: https://docs.python.org/3/howto/logging.html
Async Operations: htt ..read more
Python Awesome » Scrape
2y ago
Web Scraper
This project is made in pyhthon. It took some info. from website list then add them into data.json file.
The dependencies used are:
request
bs4
Algorithm:
The csv file opened, csv reader module is imported and file is opened.
A header array is made and the header values are added.
Row was accessed using loop, url oepned and values of country and asin are inserted according to the loop.
By using BeautifulSoup we took the html code and got the raequired information, the informations are added in a json file (data.json)
GitHub
View Github ..read more