Pyr Science Q&A
On Thursday, July 29, 2021, the most detailed examination of mammalian brain circuitry to date was released from Princeton University’s Seung Lab, The Allen Institute, and Baylor College of Medicine. You can learn more about the dataset that will become the next citizen science project here.
Seung Lab researcher Alex Bae hosted a Q&A with Eyewire players. Below find a transcript of all the questions and answers from Alex, Amy and M below. To learn more about the science of Pyr, check out this preprint on bioRxiv.
Jwb52z: Will the data collected and all of the reconstruction of the neurons and the larger component structures be put into something like a searchable database for doctors and researchers?
The current data is public here: https://www.microns-explorer.org/cortical-mm3
Data can be queried for Synapses, cell type, nuclei, and functional data are available for 75,000 cells. If you scroll down to the Data Inventory section on the page Alex linked you can see all the types of data that are publicly available.
Sliet: Just for comparison how many neurons is this dataset of eyewire containing?
It is estimated that e2198 has about 10-15k cells total (at the time we are unsure if this number includes glial cells and/or cells with no visible soma in the dataset). As of March 2021, we have done about 3,694 cells in Eyewire
sliet: Pyr exists to improve the qualitiy of the AI reconstructions? What will the “gameplay” be like?
If you click this link you can see that these cells have a lot of mergers: https://ngl.microns-explorer.org/#!gs://microns-static-links/mm3/astrocytes.json and that’s just a few astrocytes, there are lots of missed extensions on axons and mergers as well
annkri: how much proofreading do you estimate needs to be done by players?
If you look at the reconstructed cells, they are huge! It’s found that the longest axon of a single cell is ~20mm. We have few tracers in our team tried proofreading number of neurons and it took them tens of hours for proofreading a single cell.
Merger Removals: 375/cell
Missing branches: 440/cellTotal: 815/cell
~81 million edits for the whole volume
To make the process more efficient, we are also planning an effort to utilize AI. AI that can guide you to the location where the errors are. So you don’t have to search through all parts of your neuron to even find an error to correct it. Furthermore, one idea is AI suggesting few possible options of correction too.
jwb52z: I wonder what the proportional rate between the reconstructions and how many times they’ll need to be checked for making sure the data is correct will be compared to now.
This is still up in the air. We need to find ways to reduce repetitive coverage in order for the project to be scalable. It also must retain accuracy. A tricky problem to solve!
Sliet: was the AI trained derived from eyewire or was it a complete thing of its own?
Eyewire data is from the mouse retina and the newly released data [Pyr] is mouse primary visual cortex data. The biological structures are different and also the eyewire images have been acquired a decade ago, so the imaging quality also improved. Neurons in different parts of a body can be different in some ways, so it will be interesting to see how tracing will change. We acquired separate training data in this released volume for the training data.
jwb52z: When can we sign up at the new site?
You can sign up to be notified now at https://pyr.ai/
Eyewire Mystics, Scythes, Scouts, and Advanced players will be the first invited.
jwb52z: If you look at the neurons in different areas of a body, they’re, I think you could say, specialized?
For example, retina is sandwich kind of structure where the connections are mostly feedforward, propagating in single direction. You would see neurites mostly lying around similar layers. But in primary visual cortex, that is not true you see many cells sending connections to different layers and also different regions. So you can see their shape looks very different from retinal cells.
Sliet: what is hoped to be learned from the new project?
We have more information available for this dataset, such as synapses. In Eyewire images we couldn’t explicitly see the synapses, they sort of have to be inferred. We can try to look for different cell types but now we can use the connectivity information. This dataset has automatically labeled 500 million synapses!
In Eyewire, we mostly used the shape of neurons to identify cell types. Now it’s possible to use how the cells provide and receive connections. Similar shaped neurons can have different connectivity properties, maybe leading to different subtypes. Also, cortex is where much more complicated information processing happens as the complex connections in the dataset show. So people are attempting to find connection information that could motivate and improve current AI architectures. These are just small examples using the iarpa dataset.
In fact, we are expecting more to come from researchers around the world by releasing this dataset public. It will still require tons of manual proofreading effort, which will be coming from Pyr.
KrzysztofKruk: Will Pyr be concentrated only on neurons or other types of cells too?
As a researcher, it’d be preferred to have neurons proofread first.We’ll likely start with cells of interest to the researchers and work from there. Pyr’s proofreading effort will work closely side by side with the scientists.
Annkri: Will the proofreading done by players be used for ai training making the progress faster as it go?
Definitely. Human proofread data is always valuable to have. The more the better. Based on our Eyewire experience, Eyewire produced very accurate data so we are expecting similar performance in Pyr as well.
KrzysztofKruk: Are most of the cells contained inside the dataset or do they have outside projections?
Some axons end inside the dataset but a number of cells don’t. You’d be surprised how widely axon spreads out. But this dataset is one of the largest dataset that suffers relatively less from cutoff.
Here’s a proofread neuron — you can see how many axonal branches leave the volume: https://microns-explorer.org/mm3/layer5_thick_tufted
To see more try checking out the microns gallery.
davidjones1105: Bit of a basic question but is it pronounced to rhyme with ‘peer’ or with ‘tire’?
Peer as in “pyramidal” neuron 🙂
sliet: why vision as the theme?
It’s mainly our choice. We designed our experiments to answer the visual circuit. We have the functional data recorded before doing the reconstructions and we had mouse watch movies and recorded the responses of cells. We can relate this with how the cells are connected to get more intuition on how the brain processes visual information. Also vision is relatively well established, maybe this is not accurate. Historically studied for a long time.
annkri: do you think in weeks, months or years before the first players will be able to play?
We anticipate Pyr will be available to ranked Eyewire players in 2022.
Davidjones1105: What will happen to EW when Pyr is released? Will they run alongside each other, or are we hoping to have E2198 finished by then?
It is likely the projects will run in parallel for some time. we still have plenty of cells in the Outer Realms. We’re only just starting Sector 3 out of 8 here in on Eyewire 🙂
KrzysztofKruk: how big (in petabytes, exabytes?) is the dataset?
In the highest resolution, the electron microscopy images are >1PB. KrzysztofKruk notes “that’s big. It’s about 4000x bigger than the zfish images”
Puzzlerine: Will EW be a pre-training for Pyr?
Yes! In a way Eyewire is already pre-training for Pyr. Mystics helped settle the debate on whether expert Eyewirers can transition from one dataset to another. Spoiler alert: experts in Eyewire are experts in electron microscopy neuron reconstruction.
KrzysztofKruk: What was the biggest challenge with this project?
There were a lot of challenges. Scaling up to cubic millimeter was not as trivial. We had success with (100um)^3 dataset so people may think just using 1000x more time or money could do it. But it wasn’t like that. We had new problems such as more image defects when acquiring larger images.
We once had a joke that meshing this dataset would be equal to buying a porsche.
The computation cost is equal to buying a porsche.