1. Resources and references#

I’ve put together some resources and references using Python (but keep in mind, R is another popular route into data science).

1.1. Some Python packages#

First, I list some indispensable Python libraries used in data science. In addition to core Python, you should also start getting familiar with a few other tools:

  • Python’s classics:

    • NumPy – numerical computing and array manipulation.

    • SciPy – scientific computing and statistics.

    • Matplotlib – basic plotting library.

  • Data Manipulation:

    • Pandas – data structures and analysis tools.

  • Statistical Analysis:

    • statsmodels – estimation of statistical models, statistical tests, and data exploration.

  • Machine Learning:

  • Natural Language Processing (NLP):

    • NLTK – platform for working with human language data.

    • SpaCy – main library for NLP tasks.

    • Gensim – topic modeling library.

  • Data Visualization:

    • Seaborn – statistical data visualization.

    • Plotly – interactive graphing library.

  • Web Scraping:

1.2. Resources and references for general data sciences#

Books#

I do not provide references on the basic mathematical foundations of data science, which usually include linear algebra, calculus (with a focus on optimization), probability theory, statistics (both elementary and inferential), discrete mathematics (graphs, combinatorics, logic), and sometimes numerical methods. I also do not include general references on statistics, machine learning, or Python programming itself, as well as topics related to databases such as SQL, relational database design, and NoSQL systems. There are numerous high-quality resources available for all these areas.

Jupyter (note)books#

Among the previous references:

  • Jake VanderPlas’ Python Data Science Handbook

  • Aurélien Géron’s notebooks, a series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

  • Wes McKinney’s “Python for Data Analysis” open edition and notebooks.

1.3. Resources and references for data sciences in neurosciences#

References#

Spike Train and Electrophysiology Data Analysis#

Math books#

Python packages#

  • syncopy - Systems Neuroscience Computing in Python: a Python package for large-scale analysis of electrophysiological data, with the following article.

  • MNE - Open-source Python package for exploring, visualizing, and analyzing human neurophysiological data: MEG, EEG, sEEG, ECoG, NIRS, and more).

  • Elephant - Electrophysiology Analysis Toolkit is an emerging open-source, community centered library for the analysis of electrophysiological data in the Python programming language. Elephant focuses on generic analysis functions for spike train data and time series recordings from electrodes GitHub repository

  • NeuralEnsemble – a community-based initiative to promote and coordinate open-source software development in neuroscience. Inactive since 2022.

Jupyter (note)book(s)#

Blog(s) and blog posts#

Misc.#

Other tools#

Before analyzing data, we first need to read electrophysiology recordings and handle the different standards used.

  • The pyABF library was created by Scott Harden. We will return to that package in a future section.

1.4. Sometimes we don’t even know what we’re talking about#

Data science, statistics, math, machine learning—sure, they’re all great when applied to modeling and analyzing spikes and bursts. But let’s not forget: we also need to paddle upstream to the very source of those signals. Where do the spikes and bursts records come from? The experimental lab. And what do they actually represent? The wild and real dynamics of real neurons.