Link Grabber

Link Grabber provides a quick and easy way to grab links from a single web page. This python package is a simple wrapper around BeautifulSoup, focusing on grabbing HTML's hyperlink tag, "a."

Dependecies:

Python 2.7, 3.3, 3.4
BeautifulSoup
Requests
Six

How-To

$ python setup.py install

OR

$ pip install linkGrabber

Quickie

import re
import linkGrabber

links = linkGrabber.Links("http://www.google.com")
links.find()
# limit the number of "a" tags to 5
links.find(limit=5)
# filter the "a" tag href attribute
links.find(href=re.compile("plus.google.com"))

Documentation

http://linkgrabber.neurosnap.net/

find

Parameters:

filters (dict): Beautiful Soup's filters as a dictionary
limit (int): Limit the number of links in sequential order
reverse (bool): Reverses how the list of <a> tags are sorted
sort (function): Accepts a function that accepts which key to sort upon within the List class

Find all links that have a style containing "11px"

import re
from linkGrabber import Links

links = Links("http://www.google.com")
links.find(style=re.compile("11px"), limit=5)

Reverse the sort before limiting links:

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(limit=2, reverse=True)

Sort by a link's attribute:

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(limit=3, sort=lambda key: key['text'])

Exclude text:

import re

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(exclude=[{ "text": re.compile("Read More") }])

Remove duplicate URLs and make the output pretty:

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(duplicates=False, pretty=True)

Link Dictionary

All attrs from BeautifulSoup's Tag object are available in the dictionary as well as a few extras:

text (text inbetween the <a></a> tag)
seo (parse all text after last "/" in URL and attempt to make it human readable)

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
conf		conf
docs		docs
linkGrabber		linkGrabber
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGES.rst		CHANGES.rst
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Link Grabber

How-To

Quickie

Documentation

find

Link Dictionary

About

Releases

Packages

Languages

License

MohamedHuzien/linkGrabber

Folders and files

Latest commit

History

Repository files navigation

Link Grabber

How-To

Quickie

Documentation

find

Link Dictionary

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages