adding mkdocs

This commit is contained in:
2020-03-12 01:27:47 +00:00
parent 4e0091f80f
commit 21472e8100
25 changed files with 411 additions and 66 deletions

View File

@@ -1,20 +0,0 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

1
docs/about.md Normal file
View File

@@ -0,0 +1 @@
# Some about page

17
docs/index.md Normal file
View File

@@ -0,0 +1,17 @@
# Welcome to MkDocs
For full documentation visit [mkdocs.org](https://www.mkdocs.org).
## Commands
* `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit.
## Project layout
mkdocs.yml # The configuration file.
docs/
index.md # The documentation homepage.
... # Other markdown pages, images and other files.

View File

@@ -1,35 +0,0 @@
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd

20
docs/python.md Normal file
View File

@@ -0,0 +1,20 @@
# Some python
## Some test python
!!! warning "Important warning"
This isn't yet complete.
```python
def _generate_word_cloud(self) -> None:
"""Generates a word cloud
"""
self.wc = WordCloud(
max_words=150,
width=1500,
height=1500,
mask=self.char_mask,
random_state=1,
).generate_from_frequencies(self.freq)
return self
```

View File

@@ -1,48 +0,0 @@
***
CLI
***
As the CLI is provided by `Click`_ , you can pass the ``--help`` option to the base command, or any subcommands, to see information on usage and all available options.
.. _Click: https://click.palletsprojects.com/en/7.x/
Full options of the CLI are provided on this page.
.. important:: The ``--path`` option should be provided to the base command. This is so the path provided can be used in all subcommands.
Quickstart
==========
If you want to see everything the module offers run the following:
.. code-block:: bash
musicbrainzapi --path . lyrics -a "savage garden" -c gb --show-summary all --wordcloud --save-output
This will search for all tracks across all albums for the artist Savage Garden.
``--show-summary all`` will show descriptive statistics for both albums and years for this artist.
``--wordcloud`` will generate a wordcloud showing the most popular words across all lyrics.
``--save-output`` will save the module's output to disk as ``.json`` files.
Outputs
=======
The following files will be saved to disk
- all_albums_lyrics_sum.json - Total number of words in a track for each album.
- year_statistics.json - Descriptive statistics by year.
- album_statistics.json - Descriptive statistics by album
- all_albums_with_tracks.json - Track titles for each album.
- all_albums_with_lyrics.json - Lyrics for each track for each album.
- all_albums_lyrics_count.json - Shows a frequency count of each word in every track.
CLI Documentation
=================
.. click:: musicbrainzapi.cli.cli:cli
:prog: musicbrainzapi
:show-nested:

View File

@@ -1,5 +0,0 @@
@import url("css/theme.css");
.highlight {
background: white !important
}

View File

@@ -1 +0,0 @@
.. include:: ../../CHANGELOG.rst

View File

@@ -1,126 +0,0 @@
***********************
Comments + Improvements
***********************
Python packages
===============
In this project we use the following Python packages:
+----------------+-------------------------------------------------------------------------+
| musicbrainzngs | This is a python wrapper around the Musicbrainz api. |
| | This was used primarily to save time - the module handles all |
| | the endpoints and it provides checks to make sure variables |
| | passed are valid for the api. Behind the scenes it is using |
| | the requests library, and parsing the output into a python dict. |
+----------------+-------------------------------------------------------------------------+
| addict | The addict library gives us a Dict class. This is a personal preference |
| | but I find the syntax easier to work with than standard python when |
| | dealing with many dictionaries. It is just subclass of the default |
| | ``dict`` class. |
+----------------+-------------------------------------------------------------------------+
| numpy | One of the best python libraries - it gives us easy access to quantiles |
| | and other basic stats. |
+----------------+-------------------------------------------------------------------------+
| beautifultable | Prints nice tables to stdout. Useful for showing data with a CLI. |
+----------------+-------------------------------------------------------------------------+
| wordcloud | The best library (I've found) for generating wordclouds. |
+----------------+-------------------------------------------------------------------------+
| click | I personally prefer click over alternatives like Cleo. This is used |
| | to provide the framework for the CLI. |
+----------------+-------------------------------------------------------------------------+
Caveats
=======
The lyrics.ovh api requires the artist to match exactly what it has on record - it will not do any parsing to try look for similar matches. An example of this can be seen with the band "The AllAmerican Rejects". Musicbrainz returns the band with the "-", but the lyrics.ovh api requires a space character instead.
A solution to this would be to filter the artist name if it contains any of these characters. But without thorough testing I did not implement this - as it could break other artists.
Improvements
============
Although fully (as far as I have tested) functional - the module could be improved several ways.
Testing
-------
Implementing a thorough test suite using ``pytest`` and ``coverage`` would be beneficial. Changes to the way the module parses data could be made with confidence if testing were implemented. As the data returned from Musicbrainz publishes a schema, this could be used to implement tests to make sure the code is fully covered.
Code restructure
----------------
The :class:`musicbrainzapi.api.lyrics.concrete_builder.LyricsConcreteBuilder` class could be improved. Many of the methods defined in here no longer need to be present. Some of the functionality (url checking for example) could be removed and implemented in other ways (a Mixin is one solution).
If other ways of filtering were to be added (as opposed to the current default of just Albums) then this class would be useful to build our :class:`musicbrainzapi.api.lyrics.Lyrics` objects consistently.
Additional functionality to the lyrics command
-----------------------------------------------
The command could be improved in a few ways:
Different aggregations
^^^^^^^^^^^^^^^^^^^^^^
The ability for the user to specify something other than album or year to group by. For artists with large libraries, it might be useful to see results aggregated by other types of releases.
Multiple artists
^^^^^^^^^^^^^^^^
Searching for multiple artists and comparing is certainly possible in the current iteration (click provides a nice way to accept multiple artists and then we create our ``Lyrics`` objects from these) this wasn't implemented. There are rate limiting factors which may slow down the program and in the current implementation it could increase runtime considerably.
Speed improvements
-------------------
The musicbrainz api isn't too slow, however, the lyrics.ovh api can be.
One solution would be to implement threading - as we are waiting on HTTP requests this suggests threading could be a good candidate. An alternative to threading (if we are dealing with many requests) could be asyncio.
This wasn't implemented primarily because of time - but threading could be implemented on each call we make to the API.
An alternative, and I believe an interesting solution, would be to use AWS Lambda (serverless).
There is a caveat to this solution and it is cost - threading is free but adds development time and increases complexity. AWS isn't free but allows you to scale the requests out.
A solution would be to use a module like `Zappa`_. I have used this module before and it is a great tool to create lambda functions quickly.
If more control was needed one solution could be:
- Generate UUID of the current instance
- For each request to the API, dispatch a lambda function (using ``boto3``) which will run against the api. This function should take the UUID from before.
- Once finished either
+ Save results in DynamoDB with the UUID
+ Send results to SQS/SNS (not desirable, the lyrics size could be large)
- As soon as the lambdas have been dispatched, the script could either poll from a queue, or read the events queue of the DynamoDB to retrieve the results. Processing the lyrics could then begin.
This requires the user to have an internet connection - which is a current requirement. Requests to the api could be made simultaneously - without adding the complexity that comes with threading. This would not solve any API rate limiting - we are required to provide an application user_agent to the api to identify the app.
An interesting solution, and one I did consider, was to have the program run entirely in lambda, requiring no depdencies and just a simple front end that sends requests, and uses ``boto3`` to retrieve. The simplicity of this, and the fact that AWS provide an SDK for many languages, means the cient code could run in any language.
An interface to AWS API Gateway would provide the entry point to the lambda.
Writing it in this manner (with an api backend) would mean a webapp of the program could be possible, with the frontend served with something like ``Vuejs`` or ``React``.
.. _Zappa: https://github.com/Miserlou/Zappa
Error catching
--------------
Handling missing data from both APIs is done with error catching (namely ``ValueError`` and ``TypeError``).
Although inelegant, and not guaranteed to capture the specific behaviour we want to catch (missing data etc.) it is a solution and appears to work quite well.
Musicbrainz provides a schema for their api. If this were to be placed in a production environment then readdressing this should be a priority - we should be checking the values returned, using the schema as a guide, and replacing missing values accordingly. We should not rely on ``try except`` blocks to do this as it can be unreliable and is prone to raise other errors.
Further statistical analysis
----------------------------
Standard descriptive statistics are provided. I did consider including a more deeper analysis but opted not to for several reasons:
- Without a specific problem or question to answer - explorative work can take a lot of time and may not yield satisfactory results. Questions I did consider are:
+ `For active artists, based on their previous lyrics count what is the predicition of their next album?` Although a sensible question I'm not sure how useful the predicition would be - I am sure for some artists they would follow a pattern over time, but I'm not convinced all artists would and I imagine the results would be mixed.
+ `Anomaly detection - for artists with large releases, what albums stood out as larger than usual and what feature (or track) caused this anomaly?` - This would be a good question to answer and we have many tools available. As we have numeric data - clustering could be a candidate (DBSCAN or even K-MEANS). I opted not to because of time and the fact it would bloat the requirements up. Feature flags are an option when handling extra packages, ``pip install musicbrainzapi[analysis]`` for example, but nonetheless this would be an interesting question to answer and I beleive one of the easier ones to implement if it was desired.

View File

@@ -1,100 +0,0 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import musicbrainzapi
from musicbrainzapi.__version__ import __version__
import sphinx_rtd_theme
import sphinx_click
# -- Project information -----------------------------------------------------
project = 'musicbrainzapi'
copyright = '2020, Daniel Tomlinson'
author = 'Daniel Tomlinson'
# The full version, including alpha/beta/rc tags
release = __version__
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon',
'sphinx.ext.todo',
'sphinx_click.ext',
'sphinx.ext.intersphinx',
'sphinx.ext.autosectionlabel'
]
# -- Napoleon Settings -----------------------------------------------------
napoleon_google_docstring = False
napoleon_numpy_docstring = True
napoleon_include_init_with_doc = True
napoleon_include_private_with_doc = True
napoleon_include_special_with_doc = False
napoleon_use_admonition_for_examples = False
napoleon_use_admonition_for_notes = False
napoleon_use_admonition_for_references = False
napoleon_use_ivar = True
napoleon_use_param = True
napoleon_use_rtype = True
napoleon_use_keyword = True
autodoc_member_order = 'bysource'
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The master toctree document.
master_doc = 'index'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
html_static_path = ['_static']
html_context = {'css_files': ['_static/custom.css']}
html_theme_options = {
'collapse_navigation': True,
'display_version': True,
'prev_next_buttons_location': 'both',
'navigation_depth': -1,
#'navigation_depth': 3,
}
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# Enable todo
todo_include_todos = True

View File

@@ -1,5 +0,0 @@
.. role:: modname
:class: modname
.. role:: title
:class: title

View File

@@ -1,30 +0,0 @@
*****************
Table of Contents
*****************
.. toctree::
:maxdepth: 2
:caption: Contents
introduction
CLI
comments
changelog
.. toctree::
:caption: API
:maxdepth: 2
modules/modules
.. toctree::
:caption: Table of Contents
self
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@@ -1 +0,0 @@
.. include:: ../../README.rst

View File

@@ -1,7 +0,0 @@
musicbrainzapi
--------------
.. toctree::
:maxdepth: 3
musicbrainzapi

View File

@@ -1,38 +0,0 @@
musicbrainzapi.api.lyrics package
=================================
.. automodule:: musicbrainzapi.api.lyrics
:members:
:undoc-members:
:show-inheritance:
:private-members:
Submodules
----------
musicbrainzapi.api.lyrics.builder module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: musicbrainzapi.api.lyrics.builder
:members:
:undoc-members:
:show-inheritance:
:private-members:
musicbrainzapi.api.lyrics.concrete_builder module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: musicbrainzapi.api.lyrics.concrete_builder
:members:
:undoc-members:
:show-inheritance:
:private-members:
musicbrainzapi.api.lyrics.director module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: musicbrainzapi.api.lyrics.director
:members:
:undoc-members:
:show-inheritance:
:private-members:

View File

@@ -1,28 +0,0 @@
musicbrainzapi.api package
===========================
.. automodule:: musicbrainzapi.api
:members:
:undoc-members:
:show-inheritance:
:private-members:
Subpackages
-----------
.. toctree::
:maxdepth: 1
musicbrainzapi.api.lyrics
Submodules
----------
musicbrainzapi.api.authenticate module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: musicbrainzapi.api.authenticate
:members:
:undoc-members:
:show-inheritance:
:private-members:

View File

@@ -1,19 +0,0 @@
musicbrainzapi
===============
.. automodule:: musicbrainzapi
:members:
:undoc-members:
:show-inheritance:
:private-members:
Subpackages
-----------
.. toctree::
:maxdepth: 1
musicbrainzapi.api
musicbrainzapi.wordcloud

View File

@@ -1,9 +0,0 @@
********************************
musicbrainzapi.wordcloud package
********************************
.. automodule:: musicbrainzapi.wordcloud
:members:
:undoc-members:
:show-inheritance:
:private-members:

83
docs/test.md Normal file
View File

@@ -0,0 +1,83 @@
# musicbrainzapi.wordcloud package
Wordcloud from lyrics.
### class musicbrainzapi.wordcloud.LyricsWordcloud(pillow_img: PIL.PngImagePlugin.PngImageFile, all_albums_lyrics_count: Lyrics.all_albums_lyrics_count)
Bases: `object`
Create a word cloud from Lyrics.
* **Variables**
* **all_albums_lyrics_count** (*list*) List of all albums + track lyrics counted by each word
* **char_mask** (*np.array*) numpy array containing data for the word cloud image
* **freq** (*collections.Counter*) Counter object containing counts for all words across all tracks
* **lyrics_list** (*list*) List of all words from all lyrics across all tracks.
* **pillow_img** (*PIL.PngImagePlugin.PngImageFile*) pillow image of the word cloud base
* **wc** (*wordcloud.WordCloud*) WordCloud object
#### \__init__(pillow_img: PIL.PngImagePlugin.PngImageFile, all_albums_lyrics_count: Lyrics.all_albums_lyrics_count)
Create a worcloud object.
* **Parameters**
* **pillow_img** (*PIL.PngImagePlugin.PngImageFile*) pillow image of the word cloud base
* **all_albums_lyrics_count** (*Lyrics.all_albums_lyrics_count*) List of all albums + track lyrics counted by each word
#### classmethod use_microphone(all_albums_lyrics_count: Lyrics.all_albums_lyrics_count)
Class method to instantiate with a microphone base image.
* **Parameters**
**all_albums_lyrics_count** (*Lyrics.all_albums_lyrics_count*) List of all albums + track lyrics counted by each word
#### static generate_grey_colours(\*args, \*\*kwargs)
Static method to generate a random grey colour.
#### _get_lyrics_list()
Gets all words from lyrics in a single list + cleans them.
#### _get_frequencies()
Get frequencies of words from a list.
#### _get_char_mask()
Gets a numpy array for the image file.
#### _generate_word_cloud()
Generates a word cloud
#### _generate_plot()
Plots the wordcloud and sets matplotlib options.
#### create_word_cloud()
Creates a word cloud

5
docs/testdoc.md Normal file
View File

@@ -0,0 +1,5 @@
# Test documentation using mkdocstrings
## Reference
::: musicbrainzapi.api.lyrics.builder