Merge branch 'master' into develop
This commit is contained in:
@@ -17,6 +17,8 @@ Summary
|
||||
|
||||
Musicbrainzapi is a Python module with a CLI that allows you to search for an artist and receive summary statistics on lyrics across all albums + tracks.
|
||||
|
||||
The module can also generate and display a wordcloud from the lyrics.
|
||||
|
||||
In addition to basic statistics the module further allows you to save details of an artist. You can save album information, the lyrics themselves and track lists.
|
||||
|
||||
The module (currently) provides a simple CLI with some underlying assumptions:
|
||||
|
||||
101
docs/source/comments.rst
Normal file
101
docs/source/comments.rst
Normal file
@@ -0,0 +1,101 @@
|
||||
***********************
|
||||
Comments + Improvements
|
||||
***********************
|
||||
|
||||
Python packages
|
||||
===============
|
||||
|
||||
In this project we use the following Python packages:
|
||||
|
||||
+----------------+-------------------------------------------------------------------------+
|
||||
| musicbrainzngs | This is a python wrapper around the Musicbrainz api. |
|
||||
| | This was used primarily to save time - the module handles all |
|
||||
| | the endpoints and it provides checks to make sure variables |
|
||||
| | passed are valid for the api. Behind the scenes it is using |
|
||||
| | the requests library, and parsing the output into a python dict. |
|
||||
+----------------+-------------------------------------------------------------------------+
|
||||
| addict | The addict library gives us a Dict class. This is a personal preference |
|
||||
| | but I find the syntax easier to work with than standard python when |
|
||||
| | dealing with many dictionaries. It is just subclass of the default |
|
||||
| | ``dict`` class. |
|
||||
+----------------+-------------------------------------------------------------------------+
|
||||
| numpy | One of the best python libraries - it gives us easy access to quantiles |
|
||||
| | and other basic stats. |
|
||||
+----------------+-------------------------------------------------------------------------+
|
||||
| beautifultable | Prints nice tables to stdout. Useful for showing data with a CLI. |
|
||||
+----------------+-------------------------------------------------------------------------+
|
||||
| wordcloud | The best library (I've found) for generating wordclouds. |
|
||||
+----------------+-------------------------------------------------------------------------+
|
||||
| click | I personally prefer click over alternatives like Cleo. This is used |
|
||||
| | to provide the framework for the CLI. |
|
||||
+----------------+-------------------------------------------------------------------------+
|
||||
|
||||
Caveats
|
||||
=======
|
||||
|
||||
The lyrics.ovh api requires the artist to match exactly what it has on record - it will not do any parsing to try look for similar matches. An example of this can be seen with the band "The All‐American Rejects". Musicbrainz returns the band with the "-", but the lyrics.ovh api requires a space character instead.
|
||||
|
||||
A solution to this would be to filter the artist name if it contains any of these characters. But without thorough testing I did not implement this - as it could break other artists.
|
||||
|
||||
|
||||
Improvements
|
||||
============
|
||||
|
||||
Although fully (as far as I have tested) functional - the module could be improved several ways.
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
Implementing a thorough test suite using ``pytest`` and ``coverage`` would be beneficial. Changes to the way the module parses data could be made with confidence if testing were implemented. As the data returned from Musicbrainz publishes a schema, this could be used to implement tests to make sure the code is fully covered.
|
||||
|
||||
Code restructure
|
||||
----------------
|
||||
|
||||
The :class:`musicbrainzapi.api.lyrics.concrete_builder.LyricsConcreteBuilder` class could be improved. Many of the methods defined in here no longer need to be present. Some of the functionality (url checking for example) could be removed and implemented in other ways (a Mixin is one solution).
|
||||
|
||||
If other ways of filtering were to be added (as opposed to the current default of just Albums) this class would be useful in constructing our :class:`musicbrainzapi.api.lyrics.Lyrics` objects consistently.
|
||||
|
||||
Additional functionality to the lyrics command
|
||||
-----------------------------------------------
|
||||
|
||||
The command could be improved in a few ways:
|
||||
|
||||
Different aggregations
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ability for the user to specify something other than album or year to group by. For artists with large libraries, it might be useful to see results aggregated by other types of releases.
|
||||
|
||||
Multiple artists
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
Searching for multiple artists and comparing is certainly possible in the current iteration (click provides a nice way to accept multiple artists and then we create our ``Lyrics`` objects from these) this wasn't implemented. There are rate limiting factors which may slow down the program and increase runtime considerably.
|
||||
|
||||
Speed improvements
|
||||
-------------------
|
||||
|
||||
The musicbrainz api isn't too slow, however, the lyrics.ovh api can be.
|
||||
|
||||
One solution would be to implement threading - as we are waiting on HTTP requests this suggests threading could be a good candidate. An alternative to threading (if we are dealing with many requests) could be asyncio.
|
||||
|
||||
This wasn't implemented primarily because of time - but threading could be implemented on each call we make to the API.
|
||||
|
||||
An alternative, and I beleive an interesting solution, would be to use AWS Lambda (serverless).
|
||||
|
||||
There is a caveat to this solution and it is cost - threading is free but adds development time and increases complexity. AWS isn't free but allows you to scale the requests out.
|
||||
|
||||
A solution would be to use a module like `Zappa`_. I have used this module before and it is a great tool to create lambda functions quickly.
|
||||
|
||||
If more control was needed one solution could be:
|
||||
|
||||
- Generate UUID of the current instance
|
||||
- For each request to the API, dispatch a lambda function (using ``boto3``) which will run against the api. This function should take the UUID from before.
|
||||
- Once finished either
|
||||
|
||||
+ Save results in DynamoDB with the UUID
|
||||
+ Send results to SQS/SNS (not desirable, the lyrics size could be large)
|
||||
|
||||
- As soon as the lambdas have been dispatched, the script could either poll from a queue, or read the events queue of the DynamoDB to retrieve the results. Processing the lyrics could then begin.
|
||||
|
||||
This requires the user to have an internet connection - which is a current requirement. Requests to the api could be made simultaneously - without adding the complexity that comes with threading. This would not solve any API rate limiting - we are required to provide an application user_agent to the api to identify the app.
|
||||
|
||||
.. _Zappa: https://github.com/Miserlou/Zappa
|
||||
@@ -8,6 +8,7 @@ Table of Contents
|
||||
|
||||
introduction
|
||||
CLI
|
||||
comments
|
||||
changelog
|
||||
|
||||
.. toctree::
|
||||
|
||||
40
poetry.lock
generated
40
poetry.lock
generated
@@ -268,14 +268,6 @@ optional = false
|
||||
python-versions = ">=3.5"
|
||||
version = "8.2.0"
|
||||
|
||||
[[package]]
|
||||
category = "main"
|
||||
description = "multidict implementation"
|
||||
name = "multidict"
|
||||
optional = false
|
||||
python-versions = ">=3.5"
|
||||
version = "4.7.5"
|
||||
|
||||
[[package]]
|
||||
category = "main"
|
||||
description = "Python bindings for the MusicBrainz NGS and the Cover Art Archive webservices"
|
||||
@@ -355,14 +347,6 @@ version = ">=0.12"
|
||||
[package.extras]
|
||||
dev = ["pre-commit", "tox"]
|
||||
|
||||
[[package]]
|
||||
category = "main"
|
||||
description = "Easy to use progress bars"
|
||||
name = "progress"
|
||||
optional = false
|
||||
python-versions = "*"
|
||||
version = "1.5"
|
||||
|
||||
[[package]]
|
||||
category = "dev"
|
||||
description = "A full-screen, console-based Python debugger"
|
||||
@@ -795,7 +779,7 @@ docs = ["sphinx", "jaraco.packaging (>=3.2)", "rst.linker (>=1.9)"]
|
||||
testing = ["jaraco.itertools", "func-timeout"]
|
||||
|
||||
[metadata]
|
||||
content-hash = "6731dc2e0e02c3693160c1be0ffa007ff49972bf424aa6968a14178f9ef39e9a"
|
||||
content-hash = "78a3288551032af1e115b4920ee345cb1a4fcbfcca3c7caca6bd6f7935ac3876"
|
||||
python-versions = "^3.7"
|
||||
|
||||
[metadata.files]
|
||||
@@ -1009,25 +993,6 @@ more-itertools = [
|
||||
{file = "more-itertools-8.2.0.tar.gz", hash = "sha256:b1ddb932186d8a6ac451e1d95844b382f55e12686d51ca0c68b6f61f2ab7a507"},
|
||||
{file = "more_itertools-8.2.0-py3-none-any.whl", hash = "sha256:5dd8bcf33e5f9513ffa06d5ad33d78f31e1931ac9a18f33d37e77a180d393a7c"},
|
||||
]
|
||||
multidict = [
|
||||
{file = "multidict-4.7.5-cp35-cp35m-macosx_10_13_x86_64.whl", hash = "sha256:fc3b4adc2ee8474cb3cd2a155305d5f8eda0a9c91320f83e55748e1fcb68f8e3"},
|
||||
{file = "multidict-4.7.5-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:42f56542166040b4474c0c608ed051732033cd821126493cf25b6c276df7dd35"},
|
||||
{file = "multidict-4.7.5-cp35-cp35m-win32.whl", hash = "sha256:7774e9f6c9af3f12f296131453f7b81dabb7ebdb948483362f5afcaac8a826f1"},
|
||||
{file = "multidict-4.7.5-cp35-cp35m-win_amd64.whl", hash = "sha256:c2c37185fb0af79d5c117b8d2764f4321eeb12ba8c141a95d0aa8c2c1d0a11dd"},
|
||||
{file = "multidict-4.7.5-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:e439c9a10a95cb32abd708bb8be83b2134fa93790a4fb0535ca36db3dda94d20"},
|
||||
{file = "multidict-4.7.5-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:85cb26c38c96f76b7ff38b86c9d560dea10cf3459bb5f4caf72fc1bb932c7136"},
|
||||
{file = "multidict-4.7.5-cp36-cp36m-win32.whl", hash = "sha256:620b37c3fea181dab09267cd5a84b0f23fa043beb8bc50d8474dd9694de1fa6e"},
|
||||
{file = "multidict-4.7.5-cp36-cp36m-win_amd64.whl", hash = "sha256:6e6fef114741c4d7ca46da8449038ec8b1e880bbe68674c01ceeb1ac8a648e78"},
|
||||
{file = "multidict-4.7.5-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:a326f4240123a2ac66bb163eeba99578e9d63a8654a59f4688a79198f9aa10f8"},
|
||||
{file = "multidict-4.7.5-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:dc561313279f9d05a3d0ffa89cd15ae477528ea37aa9795c4654588a3287a9ab"},
|
||||
{file = "multidict-4.7.5-cp37-cp37m-win32.whl", hash = "sha256:4b7df040fb5fe826d689204f9b544af469593fb3ff3a069a6ad3409f742f5928"},
|
||||
{file = "multidict-4.7.5-cp37-cp37m-win_amd64.whl", hash = "sha256:317f96bc0950d249e96d8d29ab556d01dd38888fbe68324f46fd834b430169f1"},
|
||||
{file = "multidict-4.7.5-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:b51249fdd2923739cd3efc95a3d6c363b67bbf779208e9f37fd5e68540d1a4d4"},
|
||||
{file = "multidict-4.7.5-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:ae402f43604e3b2bc41e8ea8b8526c7fa7139ed76b0d64fc48e28125925275b2"},
|
||||
{file = "multidict-4.7.5-cp38-cp38-win32.whl", hash = "sha256:bb519becc46275c594410c6c28a8a0adc66fe24fef154a9addea54c1adb006f5"},
|
||||
{file = "multidict-4.7.5-cp38-cp38-win_amd64.whl", hash = "sha256:544fae9261232a97102e27a926019100a9db75bec7b37feedd74b3aa82f29969"},
|
||||
{file = "multidict-4.7.5.tar.gz", hash = "sha256:aee283c49601fa4c13adc64c09c978838a7e812f85377ae130a24d7198c0331e"},
|
||||
]
|
||||
musicbrainzngs = [
|
||||
{file = "musicbrainzngs-0.7.1-py2.py3-none-any.whl", hash = "sha256:e841a8f975104c0a72290b09f59326050194081a5ae62ee512f41915090e1a10"},
|
||||
{file = "musicbrainzngs-0.7.1.tar.gz", hash = "sha256:ab1c0100fd0b305852e65f2ed4113c6de12e68afd55186987b8ed97e0f98e627"},
|
||||
@@ -1099,9 +1064,6 @@ pluggy = [
|
||||
{file = "pluggy-0.13.1-py2.py3-none-any.whl", hash = "sha256:966c145cd83c96502c3c3868f50408687b38434af77734af1e9ca461a4081d2d"},
|
||||
{file = "pluggy-0.13.1.tar.gz", hash = "sha256:15b2acde666561e1298d71b523007ed7364de07029219b604cf808bfa1c765b0"},
|
||||
]
|
||||
progress = [
|
||||
{file = "progress-1.5.tar.gz", hash = "sha256:69ecedd1d1bbe71bf6313d88d1e6c4d2957b7f1d4f71312c211257f7dae64372"},
|
||||
]
|
||||
pudb = [
|
||||
{file = "pudb-2019.2.tar.gz", hash = "sha256:e8f0ea01b134d802872184b05bffc82af29a1eb2f9374a277434b932d68f58dc"},
|
||||
]
|
||||
|
||||
@@ -13,11 +13,9 @@ python = "^3.7"
|
||||
requests = "^2.23.0"
|
||||
musicbrainzngs = "^0.7.1"
|
||||
addict = "^2.2.1"
|
||||
progress = "^1.5"
|
||||
numpy = "^1.18.1"
|
||||
beautifultable = "^0.8.0"
|
||||
wordcloud = "^1.6.0"
|
||||
multidict = "^4.7.5"
|
||||
click = "^7.0"
|
||||
|
||||
[tool.poetry.dev-dependencies]
|
||||
|
||||
@@ -2,8 +2,6 @@ from __future__ import annotations
|
||||
from abc import ABC, abstractstaticmethod, abstractmethod
|
||||
from typing import Union
|
||||
|
||||
from musicbrainzapi.api import authenticate
|
||||
|
||||
|
||||
class LyricsConcreteBuilder(ABC):
|
||||
"""Abstract concrete builder for Lyrics
|
||||
@@ -45,8 +43,24 @@ class LyricsConcreteBuilder(ABC):
|
||||
pass
|
||||
|
||||
@abstractstaticmethod
|
||||
def set_useragent():
|
||||
authenticate.set_useragent()
|
||||
def set_useragent() -> None:
|
||||
pass
|
||||
|
||||
@abstractstaticmethod
|
||||
def construct_lyrics_url() -> None:
|
||||
pass
|
||||
|
||||
@abstractstaticmethod
|
||||
def request_lyrics_from_url() -> None:
|
||||
pass
|
||||
|
||||
@abstractstaticmethod
|
||||
def strip_punctuation() -> None:
|
||||
pass
|
||||
|
||||
@abstractstaticmethod
|
||||
def get_descriptive_statistics() -> None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self) -> None:
|
||||
@@ -79,3 +93,27 @@ class LyricsConcreteBuilder(ABC):
|
||||
@abstractmethod
|
||||
def find_all_tracks(self) -> None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_lyrics_urls(self) -> None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_all_lyrics(self) -> None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def count_words_in_lyrics(self) -> None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def calculate_track_totals(self) -> None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def calculate_final_average_by_album(self) -> None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def calculate_final_average_by_year(self) -> None:
|
||||
pass
|
||||
|
||||
Reference in New Issue
Block a user