17 Commits

Author SHA1 Message Date
629ea88388 Merge branch 'master' into new_docs 2020-03-13 02:47:31 +00:00
05922fc9a7 updating builder docstring 2020-03-13 02:46:30 +00:00
c7cabe8d63 adding new doctype 2020-03-12 23:58:31 +00:00
5b24952e1e Merge branch 'develop' 2020-03-12 21:30:27 +00:00
511986a131 updating cli.py 2020-03-12 21:30:12 +00:00
21472e8100 adding mkdocs 2020-03-12 01:27:47 +00:00
4e0091f80f latest 2020-03-11 20:44:46 +00:00
cd8117343b Merge branch 'develop' 2020-03-09 12:26:00 +00:00
8dc88f6361 updating poetry installation instructions 2020-03-09 12:25:45 +00:00
306eb82237 Merge branch 'develop' 2020-03-09 12:22:43 +00:00
0a77fa34fd adding poetry to installation instructions 2020-03-09 12:22:35 +00:00
5aefcc2a2d Merge branch 'develop' 2020-03-09 12:12:28 +00:00
0034340d63 updating comments document 2020-03-09 12:12:11 +00:00
02cb79c4b2 Merge branch 'master' into develop 2020-03-09 11:57:22 +00:00
26b346d359 Merge branch 'documentation' 2020-03-09 11:56:19 +00:00
78544673b4 Merge branch 'develop' 2020-03-09 11:38:49 +00:00
e8ce4b59f8 Merge branch 'documentation' into develop 2020-03-09 11:38:36 +00:00
30 changed files with 935 additions and 386 deletions

BIN
.DS_Store vendored

Binary file not shown.

View File

@@ -1,3 +1,8 @@
{
"python.pythonPath": "/Users/dtomlinson/.virtualenvs/musicbrainzapi/bin/python"
"python.pythonPath": "/Users/dtomlinson/.virtualenvs/musicbrainzapi/bin/python",
"restructuredtext.confPath": "${workspaceFolder}/docs/source",
"restructuredtext.linter.executablePath": "/Users/dtomlinson/.virtualenvs/utility-doc8/bin/doc8",
"files.trimTrailingWhitespace": true,
"restructuredtext.languageServer.trace.server": "messages",
"editor.fontSize": 13,
}

View File

@@ -70,6 +70,27 @@ In the root of the repo in a virtual environment run:
python ./setup.py install
poetry
------
Clone the repo:
.. code-block:: bash
git clone https://github.com/dtomlinson91/musicbrainzapi-cv-airelogic.git
In a virtual environment install poetry:
.. code-block:: bash
pip install poetry
In the root of the repo in a virtual environment run:
.. code-block:: bash
poetry install --no-dev
Docker
------

View File

@@ -115,3 +115,12 @@ Although inelegant, and not guaranteed to capture the specific behaviour we want
Musicbrainz provides a schema for their api. If this were to be placed in a production environment then readdressing this should be a priority - we should be checking the values returned, using the schema as a guide, and replacing missing values accordingly. We should not rely on ``try except`` blocks to do this as it can be unreliable and is prone to raise other errors.
Further statistical analysis
----------------------------
Standard descriptive statistics are provided. I did consider including a more deeper analysis but opted not to for several reasons:
- Without a specific problem or question to answer - explorative work can take a lot of time and may not yield satisfactory results. Questions I did consider are:
+ `For active artists, based on their previous lyrics count what is the predicition of their next album?` Although a sensible question I'm not sure how useful the predicition would be - I am sure for some artists they would follow a pattern over time, but I'm not convinced all artists would and I imagine the results would be mixed.
+ `Anomaly detection - for artists with large releases, what albums stood out as larger than usual and what feature (or track) caused this anomaly?` - This would be a good question to answer and we have many tools available. As we have numeric data - clustering could be a candidate (DBSCAN or even K-MEANS). I opted not to because of time and the fact it would bloat the requirements up. Feature flags are an option when handling extra packages, ``pip install musicbrainzapi[analysis]`` for example, but nonetheless this would be an interesting question to answer and I beleive one of the easier ones to implement if it was desired.

1
docs/about.md Normal file
View File

@@ -0,0 +1 @@
# Some about page

17
docs/index.md Normal file
View File

@@ -0,0 +1,17 @@
# Welcome to MkDocs
For full documentation visit [mkdocs.org](https://www.mkdocs.org).
## Commands
* `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit.
## Project layout
mkdocs.yml # The configuration file.
docs/
index.md # The documentation homepage.
... # Other markdown pages, images and other files.

20
docs/python.md Normal file
View File

@@ -0,0 +1,20 @@
# Some python
## Some test python
!!! warning "Important warning"
This isn't yet complete.
```python
def _generate_word_cloud(self) -> None:
"""Generates a word cloud
"""
self.wc = WordCloud(
max_words=150,
width=1500,
height=1500,
mask=self.char_mask,
random_state=1,
).generate_from_frequencies(self.freq)
return self
```

83
docs/test.md Normal file
View File

@@ -0,0 +1,83 @@
# musicbrainzapi.wordcloud package
Wordcloud from lyrics.
### class musicbrainzapi.wordcloud.LyricsWordcloud(pillow_img: PIL.PngImagePlugin.PngImageFile, all_albums_lyrics_count: Lyrics.all_albums_lyrics_count)
Bases: `object`
Create a word cloud from Lyrics.
* **Variables**
* **all_albums_lyrics_count** (*list*) List of all albums + track lyrics counted by each word
* **char_mask** (*np.array*) numpy array containing data for the word cloud image
* **freq** (*collections.Counter*) Counter object containing counts for all words across all tracks
* **lyrics_list** (*list*) List of all words from all lyrics across all tracks.
* **pillow_img** (*PIL.PngImagePlugin.PngImageFile*) pillow image of the word cloud base
* **wc** (*wordcloud.WordCloud*) WordCloud object
#### \__init__(pillow_img: PIL.PngImagePlugin.PngImageFile, all_albums_lyrics_count: Lyrics.all_albums_lyrics_count)
Create a worcloud object.
* **Parameters**
* **pillow_img** (*PIL.PngImagePlugin.PngImageFile*) pillow image of the word cloud base
* **all_albums_lyrics_count** (*Lyrics.all_albums_lyrics_count*) List of all albums + track lyrics counted by each word
#### classmethod use_microphone(all_albums_lyrics_count: Lyrics.all_albums_lyrics_count)
Class method to instantiate with a microphone base image.
* **Parameters**
**all_albums_lyrics_count** (*Lyrics.all_albums_lyrics_count*) List of all albums + track lyrics counted by each word
#### static generate_grey_colours(\*args, \*\*kwargs)
Static method to generate a random grey colour.
#### _get_lyrics_list()
Gets all words from lyrics in a single list + cleans them.
#### _get_frequencies()
Get frequencies of words from a list.
#### _get_char_mask()
Gets a numpy array for the image file.
#### _generate_word_cloud()
Generates a word cloud
#### _generate_plot()
Plots the wordcloud and sets matplotlib options.
#### create_word_cloud()
Creates a word cloud

5
docs/testdoc.md Normal file
View File

@@ -0,0 +1,5 @@
# Test documentation using mkdocstrings
## Reference
::: musicbrainzapi.api.lyrics.builder

39
mkdocs.yml Normal file
View File

@@ -0,0 +1,39 @@
site_name: Musicbrainzapi
nav:
- Welcome:
- Home: index.md
- About: about.md
- API:
- API: test.md
- Code:
- python.md
- testdoc.md
# theme: material
theme:
name: "material"
palette:
primary: "yellow"
accent: "red"
feature:
tabs: true
markdown_extensions:
- admonition
- codehilite:
guess_lang: true
- toc:
permalink: true
plugins:
- search
- mkdocstrings
repo_name: "dtomlinson91/musicbrainzapi"
repo_url: "https://github.com/dtomlinson91/musicbrainzapi-cv-airelogic"
extra:
social:
- type: "github"
link: "https://github.com/dtomlinson91/musicbrainzapi"

821
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -20,20 +20,18 @@ click = "^7.0"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
python-language-server = "^0.31.8"
Rope = "^0.16.0"
Pyflakes = "^2.1.1"
McCabe = "^0.6.1"
pycodestyle = "^2.5.0"
pydocstyle = "^5.0.2"
autopep8 = "^1.5"
YAPF = "^0.29.0"
pudb = "^2019.2"
pyls-black = "^0.4.4"
sphinx = "^2.4.4"
sphinx_rtd_theme = "^0.4.3"
sphinx-click = "^2.3.1"
coverage = "^5.0.3"
prospector = "^1.2.0"
pylint = "^2.4.4"
pydoc-markdown = {git = "https://github.com/NiklasRosenstein/pydoc-markdown.git", rev = "develop"}
mkdocs = "^1.1"
mkdocs-material = "^4.6.3"
pymdown-extensions = "^6.3"
mkdocstrings = "^0.8.0"
beeprint = "^2.4.10"
[tool.poetry.plugins."console_scripts"]
"musicbrainzapi" = "musicbrainzapi.cli.cli:cli"

View File

@@ -1,53 +1,49 @@
from __future__ import annotations
from collections import Counter
import html
import json
import math
import string
from typing import Union, Dict
from collections import Counter
from typing import Dict, Union
import addict
import click
import musicbrainzngs
import numpy as np
import requests
from beeprint import pp
from musicbrainzapi.api.lyrics.concrete_builder import LyricsConcreteBuilder
from musicbrainzapi.api.lyrics import Lyrics
from musicbrainzapi.api import authenticate
from musicbrainzapi.api.lyrics import Lyrics
from musicbrainzapi.api.lyrics.concrete_builder import LyricsConcreteBuilder
class LyricsBuilder(LyricsConcreteBuilder):
"""docstring for LyricsBuilder
"""
This interface will build a Lyrics object.
Attributes
----------
album_statistics : addict.Dict
Dictionary containing album statistics
all_albums : list
List of all albums + track titles
all_albums_lyrics : list
List of all albums + track lyrics
all_albums_lyrics_count : list
List of all albums + track lyrics counted by each word
all_albums_lyrics_sum : list
List of all albums + track lyrics counted and summed up.
all_albums_lyrics_url : list
List of all albums + link to lyrics api for each track.
musicbrainz_artists : addict.Dict
Dictionary of response from Musicbrainzapi
release_group_ids : addict.Dict
Dictionary of Musicbrainz release-group ids
total_track_count : int
Total number of tracks across all albums
year_statistics : addict.Dict
Dictionary containing album statistics
!!! info "Attributes"
- musicbrainz_artists (addict.Dict): A dict response from the Musicbrainz api for all artists.
- release_group_ids (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- all_albums (list): : A dict response from the Musicbrainz api for all artists.
- total_track_count (list): : A dict response from the Musicbrainz api for all artists.
- all_albums_lyrics_url (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- all_albums_lyrics (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- all_albums_lyrics_count (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- all_albums_lyrics_sum (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- album_statistics (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- album_statistics (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- year_statistics (addict.Dict): : A dict response from the Musicbrainz api for all artists.
- year_statistics (addict.Dict): : A dict response from the Musicbrainz api for all artists.
Example:
A test example.
"""
@property
def product(self) -> Lyrics:
product = self._product
return product
return self._product
@property
def artist(self) -> str:
@@ -95,17 +91,12 @@ class LyricsBuilder(LyricsConcreteBuilder):
def construct_lyrics_url(artist: str, song: str) -> str:
"""Builds the URL for the lyrics api.
Parameters
----------
artist : str
Artist
song : str
Track title
Args:
artist (str): Your chosen artist.
song (str): A song to find a lyrics url for.
Returns
-------
str
URL for lyrics from the lyrics api.
Returns:
str: The url of the lyrics api for chosen song.
"""
lyrics_api_base = 'https://api.lyrics.ovh/v1'
@@ -167,7 +158,7 @@ class LyricsBuilder(LyricsConcreteBuilder):
Dict[str, int]
Dictionary of statistic and value.
"""
if len(nums) == 0:
if not nums:
return
avg = math.ceil(np.mean(nums))
median = math.ceil(np.median(nums))
@@ -179,7 +170,7 @@ class LyricsBuilder(LyricsConcreteBuilder):
p_75 = math.ceil(np.percentile(nums, 75))
p_90 = math.ceil(np.percentile(nums, 90))
count = len(nums)
_d = addict.Dict(
return addict.Dict(
('avg', avg),
('median', median),
('std', std),
@@ -191,10 +182,10 @@ class LyricsBuilder(LyricsConcreteBuilder):
('p_90', p_90),
('count', count),
)
return _d
def __init__(self) -> None:
"""Create a builder instance to build a Lyrics object."""
"""Create a `LyricsBuilder`.
"""
self.reset()
def reset(self) -> None:
@@ -518,3 +509,6 @@ class LyricsBuilder(LyricsConcreteBuilder):
self.year_statistics = addict.Dict(
**self.year_statistics, **addict.Dict((year, _d))
)
if __name__ == "__main__":
pp(LyricsBuilder)

View File

@@ -6,55 +6,64 @@ import click
from musicbrainzapi.__version__ import __version__
from musicbrainzapi.__header__ import __header__
CONTEXT_SETTINGS = dict(auto_envvar_prefix='COMPLEX')
# pylint:disable=invalid-name
CONTEXT_SETTINGS = dict(auto_envvar_prefix="COMPLEX")
class Environment(object):
class Environment:
"""Environment class to house shared parameters between all subcommands."""
def __init__(self):
self.verbose = False
self.home = os.getcwd()
pass_environment = click.make_pass_decorator(Environment, ensure=True)
pass_environment = click.make_pass_decorator(
Environment, ensure=True
)
cmd_folder = os.path.abspath(
os.path.join(os.path.dirname(__file__), 'commands')
os.path.join(os.path.dirname(__file__), "commands")
)
class ComplexCLI(click.MultiCommand):
"""Access and run subcommands."""
def list_commands(self, ctx):
rv = []
for filename in os.listdir(cmd_folder):
if filename.endswith('.py') and filename.startswith('cmd_'):
rv.append(filename[4:-3])
"""List all subcommands."""
rv = [
filename[4:-3]
for filename in os.listdir(cmd_folder)
if filename.endswith(".py") and filename.startswith("cmd_")
]
rv.sort()
return rv
def get_command(self, ctx, cmd_name):
mod = import_module(f'musicbrainzapi.cli.commands.cmd_{cmd_name}')
"""Get chosen subcummands."""
mod = import_module(f"musicbrainzapi.cli.commands.cmd_{cmd_name}")
return getattr(mod, cmd_name)
@click.command(cls=ComplexCLI, context_settings=CONTEXT_SETTINGS)
@click.option(
'-p',
'--path',
type=click.Path(
exists=True, file_okay=False, resolve_path=True, writable=True
),
help='Local path to save any output files.',
default=os.getcwd()
"-p",
"--path",
type=click.Path(exists=True, file_okay=False, resolve_path=True, writable=True),
help="Local path to save any output files.",
default=os.getcwd(),
)
# @click.option('-v', '--verbose', is_flag=True, help='Enables verbose mode.')
@click.option("-v", "--verbose", is_flag=True, help="Enables verbose mode.")
@click.version_option(
version=__version__,
prog_name=__header__,
message=f'{__header__} version {__version__} 🎤',
message=f"{__header__} version {__version__} 🎤",
)
@pass_environment
def cli(ctx, path):
"""Base command for the musicbrainzapi program."""
# ctx.verbose = verbose
def cli(ctx, verbose, path):
"""Display base command for the musicbrainzapi program."""
ctx.verbose = verbose
if path is not None:
click.echo(f'Path set to {os.path.expanduser(path)}')
click.echo(f"Path set to {os.path.expanduser(path)}")
ctx.path = os.path.expanduser(path)

View File

@@ -20,59 +20,57 @@ if typing.TYPE_CHECKING:
import PIL.PngImagePlugin.PngImageFile
# pylint:disable=line-too-long
class LyricsWordcloud:
"""Create a word cloud from Lyrics.
"""
Create a Wordcloud from Lyrics.
The docstring continues here.
It should contain:
- something
- something else
Args:
pillow_img (PIL.PngImagePlugin.PngImageFile): pillow image of the word
cloud base
all_albums_lyrics_count (dict): A dictionary containing the lyrics from
a whole album.
!!! Attributes
- `pillow_img` (pillow): A pillow image.
Anything else can go here.
Example:
Here is how you can use it
Attributes
----------
all_albums_lyrics_count : list
List of all albums + track lyrics counted by each word
char_mask : np.array
numpy array containing data for the word cloud image
freq : collections.Counter
Counter object containing counts for all words across all tracks
lyrics_list : list
List of all words from all lyrics across all tracks.
pillow_img : PIL.PngImagePlugin.PngImageFile
pillow image of the word cloud base
wc : wordcloud.WordCloud
WordCloud object
"""
def __init__(
self,
pillow_img: 'PIL.PngImagePlugin.PngImageFile',
all_albums_lyrics_count: 'Lyrics.all_albums_lyrics_count',
pillow_img: "PIL.PngImagePlugin.PngImageFile",
all_albums_lyrics_count: "Lyrics.all_albums_lyrics_count",
):
"""
Create a worcloud object.
Parameters
----------
pillow_img : PIL.PngImagePlugin.PngImageFile
pillow image of the word cloud base
all_albums_lyrics_count : Lyrics.all_albums_lyrics_count
List of all albums + track lyrics counted by each word
"""
self.pillow_img = pillow_img
self.all_albums_lyrics_count = all_albums_lyrics_count
self.test = []
@classmethod
def use_microphone(
cls, all_albums_lyrics_count: 'Lyrics.all_albums_lyrics_count',
cls, all_albums_lyrics_count: "Lyrics.all_albums_lyrics_count",
) -> LyricsWordcloud:
"""
Class method to instantiate with a microphone base image.
"""Create a LyricsWordcloud using a microphone as a base image.
Parameters
----------
all_albums_lyrics_count : Lyrics.all_albums_lyrics_count
List of all albums + track lyrics counted by each word
Args:
all_albums_lyrics_count (dict): A dictionary containing the lyrics from a whole album.
Returns:
LyricsWordcloud: Instance of itself with a micrphone image loaded in.
"""
mic_resource = resources.path(
'musicbrainzapi.wordcloud.resources', 'mic.png'
)
mic_resource = resources.path("musicbrainzapi.wordcloud.resources", "mic.png")
with mic_resource as m:
mic_img = Image.open(m)
@@ -86,9 +84,14 @@ class LyricsWordcloud:
*args,
**kwargs,
) -> str:
"""Static method to generate a random grey colour."""
colour = f'hsl(0, 0%, {random.randint(60, 100)}%)'
return colour
"""Static method to return a random grey color.
Returns:
str: A random grey colour in `hsl` form.
Can be any grey colour.
"""
return f"hsl(0, 0%, {random.randint(60, 100)}%)"
def _get_lyrics_list(self) -> None:
"""Gets all words from lyrics in a single list + cleans them.
@@ -101,12 +104,8 @@ class LyricsWordcloud:
for word in track:
for _ in range(1, word[1]):
cleaned = word[0]
cleaned = re.sub(
r'[\(\[].*?[\)\]]', ' ', cleaned
)
cleaned = re.sub(
r'[^a-zA-Z0-9\s]', '', cleaned
)
cleaned = re.sub(r"[\(\[].*?[\)\]]", " ", cleaned)
cleaned = re.sub(r"[^a-zA-Z0-9\s]", "", cleaned)
cleaned = cleaned.lower()
if cleaned in STOPWORDS:
continue
@@ -129,11 +128,7 @@ class LyricsWordcloud:
"""Generates a word cloud
"""
self.wc = WordCloud(
max_words=150,
width=1500,
height=1500,
mask=self.char_mask,
random_state=1,
max_words=150, width=1500, height=1500, mask=self.char_mask, random_state=1,
).generate_from_frequencies(self.freq)
return self
@@ -141,12 +136,10 @@ class LyricsWordcloud:
"""Plots the wordcloud and sets matplotlib options.
"""
plt.imshow(
self.wc.recolor(
color_func=self.generate_grey_colours, random_state=3
),
interpolation='bilinear',
self.wc.recolor(color_func=self.generate_grey_colours, random_state=3),
interpolation="bilinear",
)
plt.axis('off')
plt.axis("off")
return self
# def show_word_cloud(self):

Binary file not shown.